This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
5
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
1
select_cc.ll
-
Thumb/
-
branchless-cmp.ll

Differential D53236

[SelectionDAG] swap select_cc operands to enable folding
ClosedPublic

Authored by labrinea on Oct 12 2018, 6:39 PM.

Download Raw Diff

Details

Reviewers

llvm-commits
bogner
javed.absar
spatel

Commits

rGe15c982f6d6a: [SelectionDAG] swap select_cc operands to enable folding
rL346484: [SelectionDAG] swap select_cc operands to enable folding

Summary

The DAGCombiner tries to SimplifySelectCC as follows:
select_cc(x, y, 16, 0, cc) -> shl(zext(set_cc(x, y, cc)), 4)
It can't cope with the situation of reordered operands:
select_cc(x, y, 0, 16, cc)
In that case we just need to swap the operands and invert the Condition Code:
select_cc(x, y, 16, 0, ~cc)

Diff Detail

Event Timeline

labrinea created this revision.Oct 12 2018, 6:39 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 12 2018, 6:39 PM

dmgreen added a subscriber: dmgreen.Oct 14 2018, 1:47 AM

labrinea added a reviewer: spatel.Oct 15 2018, 2:49 AM

Is the motivating case integer or FP?
I'm asking because we have a canonicalization for integer cmp+sel for the IR in these tests, but we're missing the corresponding FP transform.
If we add the FP canonicalization in IR, would there still be a need for this backend patch? Ie, is something generating this select code in the DAG itself?

In D53236#1265594, @spatel wrote:

Is the motivating case integer or FP?
I'm asking because we have a canonicalization for integer cmp+sel for the IR in these tests, but we're missing the corresponding FP transform.
If we add the FP canonicalization in IR, would there still be a need for this backend patch? Ie, is something generating this select code in the DAG itself?

Yes, the motivating example was floating point comparison. Indeed, I added the unordered predicates and modified InstCombine to handle fpcmp. It then swapped the operands and inverted the predicates for me. But what does "Canonical" actually mean? The backend still won't be able to do the transformation of the description for select_cc(x, y, 0, 16, cc) if cc is already canonical, assuming this is a reachable state in DAG.

In D53236#1275285, @labrinea wrote:

In D53236#1265594, @spatel wrote:

Is the motivating case integer or FP?
I'm asking because we have a canonicalization for integer cmp+sel for the IR in these tests, but we're missing the corresponding FP transform.
If we add the FP canonicalization in IR, would there still be a need for this backend patch? Ie, is something generating this select code in the DAG itself?

Yes, the motivating example was floating point comparison. Indeed, I added the unordered predicates and modified InstCombine to handle fpcmp. It then swapped the operands and inverted the predicates for me. But what does "Canonical" actually mean? The backend still won't be able to do the transformation of the description for select_cc(x, y, 0, 16, cc) if cc is already canonical, assuming this is a reachable state in DAG.

Yes, we can't guarantee that this pattern won't appear in the DAG if the cc was already the canonical predicate. We could try harder in instcombine to make the larger constant appear as the 'true value' for the select, but we can't guarantee it (if there are extra uses of the compare, then we won't do that transform in IR and increase the instruction count).

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18287	I don't know how to expose this as a bug, but this scares me. We are changing the value of CC without swapping the N2/N3 values, and it's possible to fall-through to the code underneath this block. At that point, we have the wrong CC value. Please move this block of code into a helper function as a preliminary step, so we don't have this risk.
test/CodeGen/AArch64/select_cc.ll
1–4	Please add this test file to trunk as a preliminary step and use utils/update_llc_test_checks.py to auto-generate the check lines.

I've autogenerated the filecheck lines to show the diff compared to the trunk codegen. For making sure we never fall-through to the next block, having changed the CC but not swapped (N2, N3), I've moved all the preconditions to the beginning of the block (instead of moving the block into a helper function).

labrinea added inline comments.Oct 29 2018, 5:35 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18282	Shouldn't `TLI.getBooleanContents(N0.getValueType())` be `TLI.getBooleanContents(getSetCCResultType(N0.getValueType()))` instead, or it doesn't matter?

spatel added inline comments.Oct 29 2018, 2:03 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18282	Given that we're using getSetCCResultType(N0.getValueType()) when we create the SCC value below, that seems right. But that should be a separate clean-up step either before or after this patch. I'm not sure if there's a way to expose that bug in a test though.
18282–18284	clang-format?
18329–18331	clang-format?

Rebased and clang-formatted.

LGTM

(For reference: I was wondering why x86 doesn't show any diffs for this change; it looks like there's custom code in X86ISelLowering that already does the same thing.)

This revision is now accepted and ready to land.Nov 6 2018, 12:13 PM

In D53236#1289222, @spatel wrote:

LGTM

(For reference: I was wondering why x86 doesn't show any diffs for this change; it looks like there's custom code in X86ISelLowering that already does the same thing.)

I see. Thanks for the review!

Closed by commit rL346484: [SelectionDAG] swap select_cc operands to enable folding (authored by alelab01). · Explain WhyNov 9 2018, 3:12 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

14 lines

test/

CodeGen/

AArch64/

select_cc.ll

45 lines

Thumb/

branchless-cmp.ll

14 lines

Diff 169531

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,270 Lines • ▼ Show 20 Lines	if (ConstAndRHS && ConstAndRHS->getAPIntValue().countPopulation() == 1) {
getShiftAmountTy(Shl.getValueType()));		getShiftAmountTy(Shl.getValueType()));
SDValue Shr = DAG.getNode(ISD::SRA, SDLoc(N0), VT, Shl, ShrAmt);		SDValue Shr = DAG.getNode(ISD::SRA, SDLoc(N0), VT, Shl, ShrAmt);

return DAG.getNode(ISD::AND, DL, VT, Shr, N3);		return DAG.getNode(ISD::AND, DL, VT, Shr, N3);
}		}
}		}

// fold select C, 16, 0 -> shl C, 4		// fold select C, 16, 0 -> shl C, 4
if (N2C && isNullConstant(N3) && N2C->getAPIntValue().isPowerOf2() &&		ConstantSDNode *N3C = dyn_cast<ConstantSDNode>(N3.getNode());
TLI.getBooleanContents(N0.getValueType()) ==		bool Fold = N2C && isNullConstant(N3) && N2C->getAPIntValue().isPowerOf2();
		bool Swap = N3C && isNullConstant(N2) && N3C->getAPIntValue().isPowerOf2();

		labrineaAuthorUnsubmitted Not Done Reply Inline Actions Shouldn't `TLI.getBooleanContents(N0.getValueType())` be `TLI.getBooleanContents(getSetCCResultType(N0.getValueType()))` instead, or it doesn't matter? labrinea: Shouldn't `TLI.getBooleanContents(N0.getValueType())` be `TLI.getBooleanContents…
		spatelUnsubmitted Not Done Reply Inline Actions Given that we're using getSetCCResultType(N0.getValueType()) when we create the SCC value below, that seems right. But that should be a separate clean-up step either before or after this patch. I'm not sure if there's a way to expose that bug in a test though. spatel: Given that we're using getSetCCResultType(N0.getValueType()) when we create the SCC value below…
		if ((Fold \|\| Swap) && TLI.getBooleanContents(N0.getValueType()) ==
TargetLowering::ZeroOrOneBooleanContent) {		TargetLowering::ZeroOrOneBooleanContent) {
		spatelUnsubmitted Not Done Reply Inline Actions clang-format? spatel: clang-format?

		if (Swap) {
		CC = ISD::getSetCCInverse(CC, N0.getValueType().isInteger());
		spatelUnsubmitted Not Done Reply Inline Actions I don't know how to expose this as a bug, but this scares me. We are changing the value of CC without swapping the N2/N3 values, and it's possible to fall-through to the code underneath this block. At that point, we have the wrong CC value. Please move this block of code into a helper function as a preliminary step, so we don't have this risk. spatel: I don't know how to expose this as a bug, but this scares me. We are changing the value of CC…
		std::swap(N2C, N3C);
		}

// If the caller doesn't want us to simplify this into a zext of a compare,		// If the caller doesn't want us to simplify this into a zext of a compare,
// don't do it.		// don't do it.
if (NotExtCompare && N2C->isOne())		if (NotExtCompare && N2C->isOne())
return SDValue();		return SDValue();

// Get a SetCC of the condition		// Get a SetCC of the condition
// NOTE: Don't create a SETCC if it's not legal on this target.		// NOTE: Don't create a SETCC if it's not legal on this target.
if (!LegalOperations \|\|		if (!LegalOperations \|\|
Show All 22 Lines	if (!LegalOperations \|\|
return Temp;		return Temp;

// shl setcc result by log2 n2c		// shl setcc result by log2 n2c
return DAG.getNode(		return DAG.getNode(
ISD::SHL, DL, N2.getValueType(), Temp,		ISD::SHL, DL, N2.getValueType(), Temp,
DAG.getConstant(N2C->getAPIntValue().logBase2(), SDLoc(Temp),		DAG.getConstant(N2C->getAPIntValue().logBase2(), SDLoc(Temp),
getShiftAmountTy(Temp.getValueType())));		getShiftAmountTy(Temp.getValueType())));
}		}
}		}

// Check to see if this is an integer abs.		// Check to see if this is an integer abs.
		spatelUnsubmitted Not Done Reply Inline Actions clang-format? spatel: clang-format?
// select_cc setg[te] X, 0, X, -X ->		// select_cc setg[te] X, 0, X, -X ->
// select_cc setgt X, -1, X, -X ->		// select_cc setgt X, -1, X, -X ->
// select_cc setl[te] X, 0, -X, X ->		// select_cc setl[te] X, 0, -X, X ->
// select_cc setlt X, 1, -X, X ->		// select_cc setlt X, 1, -X, X ->
// Y = sra (X, size(X)-1); xor (add (X, Y), Y)		// Y = sra (X, size(X)-1); xor (add (X, Y), Y)
if (N1C) {		if (N1C) {
ConstantSDNode *SubC = nullptr;		ConstantSDNode *SubC = nullptr;
if (((N1C->isNullValue() && (CC == ISD::SETGT \|\| CC == ISD::SETGE)) \|\|		if (((N1C->isNullValue() && (CC == ISD::SETGT \|\| CC == ISD::SETGE)) \|\|
▲ Show 20 Lines • Show All 690 Lines • Show Last 20 Lines

test/CodeGen/AArch64/select_cc.ll

This file was added.

				; RUN: llc < %s -mtriple=arm64 \| FileCheck %s

				; CHECK_LABEL: select_ogt_float
				; CHECK: fcmp s0, s1
				spatelUnsubmitted Not Done Reply Inline Actions Please add this test file to trunk as a preliminary step and use utils/update_llc_test_checks.py to auto-generate the check lines. spatel: Please add this test file to trunk as a preliminary step and use utils/update_llc_test_checks.
				; CHECK_NEXT: cset w8, gt
				; CHECK_NEXT: lsl x0, x8, #2
				define i64 @select_ogt_float(float %a, float %b) {
				entry:
				%cc = fcmp ogt float %a, %b
				%sel = select i1 %cc, i64 4, i64 0
				ret i64 %sel
				}

				; CHECK_LABEL: select_ule_float_inverse
				; CHECK: fcmp s0, s1
				; CHECK_NEXT: cset w8, gt
				; CHECK_NEXT: lsl x0, x8, #2
				define i64 @select_ule_float_inverse(float %a, float %b) {
				entry:
				%cc = fcmp ule float %a, %b
				%sel = select i1 %cc, i64 0, i64 4
				ret i64 %sel
				}

				; CHECK_LABEL: select_eq_i32
				; CHECK: cmp w0, w1
				; CHECK_NEXT: cset w8, eq
				; CHECK_NEXT: lsl x0, x8, #2
				define i64 @select_eq_i32(i32 %a, i32 %b) {
				entry:
				%cc = icmp eq i32 %a, %b
				%sel = select i1 %cc, i64 4, i64 0
				ret i64 %sel
				}

				; CHECK_LABEL: select_ne_i32_inverse
				; CHECK: cmp w0, w1
				; CHECK_NEXT: cset w8, eq
				; CHECK_NEXT: lsl x0, x8, #2
				define i64 @select_ne_i32_inverse(i32 %a, i32 %b) {
				entry:
				%cc = icmp ne i32 %a, %b
				%sel = select i1 %cc, i64 0, i64 4
				ret i64 %sel
				}

test/CodeGen/Thumb/branchless-cmp.ll

	Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines
	; CHECK-NOT: b{{(ne)\|(eq)}}			; CHECK-NOT: b{{(ne)\|(eq)}}
	; CHECK: subs r0, r0, r1			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: movs r1, #0			; CHECK-NEXT: movs r1, #0
	; CHECK-NEXT: subs r1, r1, r0			; CHECK-NEXT: subs r1, r1, r0
	; CHECK-NEXT: adcs r1, r0			; CHECK-NEXT: adcs r1, r0
	; CHECK-NEXT: lsls r0, r1, #2			; CHECK-NEXT: lsls r0, r1, #2
	}			}

	; FIXME: This one hasn't changed actually
	; but could look like test3b
	define i32 @test4a(i32 %a, i32 %b) {			define i32 @test4a(i32 %a, i32 %b) {
	entry:			entry:
	%cmp = icmp ne i32 %a, %b			%cmp = icmp ne i32 %a, %b
	%cond = select i1 %cmp, i32 0, i32 4			%cond = select i1 %cmp, i32 0, i32 4
	ret i32 %cond			ret i32 %cond
	; CHECK-LABEL: test4a:			; CHECK-LABEL: test4a:
	; CHECK-NOT: b{{(ne)\|(eq)}}			; CHECK-NOT: b{{(ne)\|(eq)}}
	; CHECK: mov r2, r0			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: movs r0, #0			; CHECK-NEXT: movs r1, #0
	; CHECK-NEXT: movs r3, #4			; CHECK-NEXT: subs r1, r1, r0
	; CHECK-NEXT: cmp r2, r1			; CHECK-NEXT: adcs r1, r0
	; CHECK-NEXT: bne .[[BRANCH:[A-Z0-9_]+]]			; CHECK-NEXT: lsls r0, r1, #2
	; CHECK: mov r0, r3
	; CHECK: .[[BRANCH]]:
	}			}

	define i32 @test4b(i32 %a, i32 %b) {			define i32 @test4b(i32 %a, i32 %b) {
	entry:			entry:
	%cmp = icmp ne i32 %a, %b			%cmp = icmp ne i32 %a, %b
	%cond = select i1 %cmp, i32 4, i32 0			%cond = select i1 %cmp, i32 4, i32 0
	ret i32 %cond			ret i32 %cond
	; CHECK-LABEL: test4b:			; CHECK-LABEL: test4b:
	; CHECK-NOT: b{{(ne)\|(eq)}}			; CHECK-NOT: b{{(ne)\|(eq)}}
	; CHECK: subs r0, r0, r1			; CHECK: subs r0, r0, r1
	; CHECK-NEXT: subs r1, r0, #1			; CHECK-NEXT: subs r1, r0, #1
	; CHECK-NEXT: sbcs r0, r1			; CHECK-NEXT: sbcs r0, r1
	; CHECK-NEXT: lsls r0, r0, #2			; CHECK-NEXT: lsls r0, r0, #2
	}			}