Download Raw Diff

Details

Reviewers

craig.topper
hfinkel
efriedma

Commits

rGa0547c3d9f71: [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
rL301457: [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)

Summary

Besides better codegen, the motivation is to be able to canonicalize this pattern in IR (currently we don't) knowing that the backend is prepared for that.

This may also allow removing code for special constant cases in DAGCombiner::foldSelectOfConstants() that was added in D30180.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Apr 11 2017, 9:03 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptApr 11 2017, 9:03 AM

arsenm added a subscriber: arsenm.Apr 11 2017, 9:44 AM

arsenm added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1885 ↗	(On Diff #94837)	!LegalOperations check first? Also why not check if the xor/zext are legal?

spatel added inline comments.Apr 11 2017, 4:26 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
1885 ↗	(On Diff #94837)	No good reason - just lazy. Although if we add the TLI checks, I don't think we can also hoist !LegalOps? I'll upload a new patch with the extra checks.

Patch updated:
Check legality of the new ops, so the transform can fire post-legalization too.

Ping.

I'm not sure this is consistently beneficial; particularly for vectors, if the operand is a comparison (or something derived from a comparison), sign-extending it could be free.

In D31944#729560, @efriedma wrote:

I'm not sure this is consistently beneficial; particularly for vectors, if the operand is a comparison (or something derived from a comparison), sign-extending it could be free.

Hmm...any ideas how to limit in that case? I was assuming that since we have:

// select Cond, 0, 1 --> zext (!Cond)

...this also makes sense.

If we have a compare op, we should be able to fold the 'not' op introduced here directly into the compare predicate. Or in the case of a crippled ISA like SSE that lacks inverted predicates, we might be able to fold the 'not' into the zext/mask...but as I think we can see in the SSE test already included, we're missing that fold. Ie, instead of:

movaps {{.*#+}} xmm1 = [1,1,1,1]
xorps %xmm1, %xmm0
andps %xmm1, %xmm0

We should have done:

pandn	[1,1,1,1], %xmm0

We can clean up after legalization either way, I guess. I'm more concerned about the test coverage; passing i1 vectors as arguments to a function isn't really representative of how i1 vectors are used in practice.

spatel mentioned this in D32230: [DAG] add splat vector support for 'and' in SimplifyDemandedBits.Apr 19 2017, 8:53 AM

spatel mentioned this in rL300725: [DAG] add splat vector support for 'and' in SimplifyDemandedBits.Apr 19 2017, 11:18 AM

Patch updated:
Added more vector tests and rebased test diffs after:
https://reviews.llvm.org/rL300725
https://reviews.llvm.org/rL300763
https://reviews.llvm.org/rL300772

I made changes to the DAG's simplifyDemandedBits and then got distracted with related IR transforms...
The demanded-bits changes helped x86 vector codegen in the way I expected, but I think the ARM regressions show that we're missing some generic folds.

Let me know if there are other tests we should have to expose these folds.

The getNOT call on ARM returns:
v4i1 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
(the 'v4i1' is backed by 32-bit constants, and they are 1, not -1)...which then does not match the definition for TLI.isConstTrueVal(). Something similar happens with x86 too, but we catch the not(setcc) pattern post-legalization. That fails on ARM because there are size changing ops obfuscating the pattern:

      t50: v4i32 = setcc t16, t19, seteq:ch
    t51: v4i16 = truncate t50
    t49: v4i16 = BUILD_VECTOR Constant:i32<1>, Constant:i32<1>, Constant:i32<1>, Constant:i32<1>
  t52: v4i16 = xor t51, t49
t53: v4i32 = any_extend t52

I'm not sure your description is the full story about the ARM code for cmpgt_sext_inc_vec; it looks like the following gets simplified for AVX2 (might want to include this as a testcase):

define <4 x i64> @cmpgt_sext_inc_vec(<4 x i64> %x, <4 x i64> %y) {
  %cmp = icmp sgt <4 x i64> %x, %y
  %ext = sext <4 x i1> %cmp to <4 x i64>
  %add = add <4 x i64> %ext, <i64 1, i64 1, i64 1, i64 1>
  ret <4 x i64> %add
}

A testcase for something like "((a != b) & (c != d)) + 1" might also be interesting.

spatel mentioned this in D32505: [TargetLowering] fix isConstTrueVal to account for build vector truncation.Apr 26 2017, 7:05 AM

spatel mentioned this in rL301412: [x86] change tests to use sext, not zext; NFC.Apr 26 2017, 7:49 AM

Patch updated:
No code changes again, but added tests with:
rL301362 / rL301412
And improved 'true' detection with:
rL301408

I think all tests are improvements or neutral now. Let me know if I missed the intent on the new tests or if there's any other interesting pattern to look at. I was expecting ARM to match to 'vbic', but I'm not sure if that's better in practice than what we see here?

I think you've covered the interesting cases; LGTM.

I was expecting ARM to match to 'vbic'

For sext_inc_vec? It's doing the xor in the wrong width for it to match. Try changing the return type of sext_inc_vec to <4 x i64>, and you'll see essentially the same thing on AVX2.

This revision is now accepted and ready to land.Apr 26 2017, 10:49 AM

In D31944#738386, @efriedma wrote:

I was expecting ARM to match to 'vbic'

For sext_inc_vec? It's doing the xor in the wrong width for it to match. Try changing the return type of sext_inc_vec to <4 x i64>, and you'll see essentially the same thing on AVX2.

Ah, my NEON literacy is even worse than my SSE. I overlooked the differing widths.

Closed by commit rL301457: [DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X) (authored by spatel). · Explain WhyApr 26 2017, 1:39 PM

This revision was automatically updated to reflect the committed changes.

Diff 96812

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,860 Lines • ▼ Show 20 Lines	if (DAG.isConstantIntBuildVectorOrConstantInt(N0)) {
return DAG.FoldConstantArithmetic(ISD::ADD, DL, VT, N0.getNode(),		return DAG.FoldConstantArithmetic(ISD::ADD, DL, VT, N0.getNode(),
N1.getNode());		N1.getNode());
}		}

// fold (add x, 0) -> x		// fold (add x, 0) -> x
if (isNullConstant(N1))		if (isNullConstant(N1))
return N0;		return N0;

// fold ((c1-A)+c2) -> (c1+c2)-A
if (isConstantOrConstantVector(N1, /* NoOpaque */ true)) {		if (isConstantOrConstantVector(N1, /* NoOpaque */ true)) {
if (N0.getOpcode() == ISD::SUB)		// fold ((c1-A)+c2) -> (c1+c2)-A
if (isConstantOrConstantVector(N0.getOperand(0), /* NoOpaque */ true)) {		if (N0.getOpcode() == ISD::SUB &&
		isConstantOrConstantVector(N0.getOperand(0), /* NoOpaque */ true)) {
		// FIXME: Adding 2 constants should be handled by FoldConstantArithmetic.
return DAG.getNode(ISD::SUB, DL, VT,		return DAG.getNode(ISD::SUB, DL, VT,
DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(0)),		DAG.getNode(ISD::ADD, DL, VT, N1, N0.getOperand(0)),
N0.getOperand(1));		N0.getOperand(1));
}		}

		// add (sext i1 X), 1 -> zext (not i1 X)
		// We don't transform this pattern:
		// add (zext i1 X), -1 -> sext (not i1 X)
		// because most (?) targets generate better code for the zext form.
		if (N0.getOpcode() == ISD::SIGN_EXTEND && N0.hasOneUse() &&
		isOneConstantOrOneSplatConstant(N1)) {
		SDValue X = N0.getOperand(0);
		if ((!LegalOperations \|\|
		(TLI.isOperationLegal(ISD::XOR, X.getValueType()) &&
		TLI.isOperationLegal(ISD::ZERO_EXTEND, VT))) &&
		X.getScalarValueSizeInBits() == 1) {
		SDValue Not = DAG.getNOT(DL, X, X.getValueType());
		return DAG.getNode(ISD::ZERO_EXTEND, DL, VT, Not);
		}
		}
}		}

if (SDValue NewSel = foldBinOpIntoSelect(N))		if (SDValue NewSel = foldBinOpIntoSelect(N))
return NewSel;		return NewSel;

// reassociate add		// reassociate add
if (SDValue RADD = ReassociateOps(ISD::ADD, DL, N0, N1))		if (SDValue RADD = ReassociateOps(ISD::ADD, DL, N0, N1))
return RADD;		return RADD;
▲ Show 20 Lines • Show All 14,512 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/bool-ext-inc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=arm-eabi -mattr=neon \| FileCheck %s			; RUN: llc < %s -mtriple=arm-eabi -mattr=neon \| FileCheck %s

	define i32 @sext_inc(i1 zeroext %x) {			define i32 @sext_inc(i1 zeroext %x) {
	; CHECK-LABEL: sext_inc:			; CHECK-LABEL: sext_inc:
	; CHECK: @ BB#0:			; CHECK: @ BB#0:
	; CHECK-NEXT: rsb r0, r0, #1			; CHECK-NEXT: eor r0, r0, #1
	; CHECK-NEXT: mov pc, lr			; CHECK-NEXT: mov pc, lr
	%ext = sext i1 %x to i32			%ext = sext i1 %x to i32
	%add = add i32 %ext, 1			%add = add i32 %ext, 1
	ret i32 %add			ret i32 %add
	}			}

	define <4 x i32> @sext_inc_vec(<4 x i1> %x) {			define <4 x i32> @sext_inc_vec(<4 x i1> %x) {
	; CHECK-LABEL: sext_inc_vec:			; CHECK-LABEL: sext_inc_vec:
	; CHECK: @ BB#0:			; CHECK: @ BB#0:
	; CHECK-NEXT: vmov d16, r0, r1			; CHECK-NEXT: vmov.i16 d16, #0x1
	; CHECK-NEXT: vmov.i32 q9, #0x1f			; CHECK-NEXT: vmov d17, r0, r1
	; CHECK-NEXT: vmov.i32 q10, #0x1			; CHECK-NEXT: vmov.i32 q9, #0x1
				; CHECK-NEXT: veor d16, d17, d16
	; CHECK-NEXT: vmovl.u16 q8, d16			; CHECK-NEXT: vmovl.u16 q8, d16
	; CHECK-NEXT: vneg.s32 q9, q9			; CHECK-NEXT: vand q8, q8, q9
	; CHECK-NEXT: vshl.i32 q8, q8, #31
	; CHECK-NEXT: vshl.s32 q8, q8, q9
	; CHECK-NEXT: vadd.i32 q8, q8, q10
	; CHECK-NEXT: vmov r0, r1, d16			; CHECK-NEXT: vmov r0, r1, d16
	; CHECK-NEXT: vmov r2, r3, d17			; CHECK-NEXT: vmov r2, r3, d17
	; CHECK-NEXT: mov pc, lr			; CHECK-NEXT: mov pc, lr
	%ext = sext <4 x i1> %x to <4 x i32>			%ext = sext <4 x i1> %x to <4 x i32>
	%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

	define <4 x i32> @cmpgt_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) {			define <4 x i32> @cmpgt_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) {
	; CHECK-LABEL: cmpgt_sext_inc_vec:			; CHECK-LABEL: cmpgt_sext_inc_vec:
	; CHECK: @ BB#0:			; CHECK: @ BB#0:
	; CHECK-NEXT: mov r12, sp			; CHECK-NEXT: mov r12, sp
	; CHECK-NEXT: vmov d19, r2, r3			; CHECK-NEXT: vmov d19, r2, r3
	; CHECK-NEXT: vmov.i32 q10, #0x1			; CHECK-NEXT: vmov.i32 q10, #0x1
	; CHECK-NEXT: vld1.64 {d16, d17}, [r12]			; CHECK-NEXT: vld1.64 {d16, d17}, [r12]
	; CHECK-NEXT: vmov d18, r0, r1			; CHECK-NEXT: vmov d18, r0, r1
	; CHECK-NEXT: vcgt.s32 q8, q9, q8			; CHECK-NEXT: vcge.s32 q8, q8, q9
	; CHECK-NEXT: vadd.i32 q8, q8, q10			; CHECK-NEXT: vand q8, q8, q10
	; CHECK-NEXT: vmov r0, r1, d16			; CHECK-NEXT: vmov r0, r1, d16
	; CHECK-NEXT: vmov r2, r3, d17			; CHECK-NEXT: vmov r2, r3, d17
	; CHECK-NEXT: mov pc, lr			; CHECK-NEXT: mov pc, lr
	%cmp = icmp sgt <4 x i32> %x, %y			%cmp = icmp sgt <4 x i32> %x, %y
	%ext = sext <4 x i1> %cmp to <4 x i32>			%ext = sext <4 x i1> %cmp to <4 x i32>
	%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

	define <4 x i32> @cmpne_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) {			define <4 x i32> @cmpne_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) {
	; CHECK-LABEL: cmpne_sext_inc_vec:			; CHECK-LABEL: cmpne_sext_inc_vec:
	; CHECK: @ BB#0:			; CHECK: @ BB#0:
	; CHECK-NEXT: mov r12, sp			; CHECK-NEXT: mov r12, sp
	; CHECK-NEXT: vmov d19, r2, r3			; CHECK-NEXT: vmov d19, r2, r3
				; CHECK-NEXT: vmov.i32 q10, #0x1
	; CHECK-NEXT: vld1.64 {d16, d17}, [r12]			; CHECK-NEXT: vld1.64 {d16, d17}, [r12]
	; CHECK-NEXT: vmov d18, r0, r1			; CHECK-NEXT: vmov d18, r0, r1
	; CHECK-NEXT: vceq.i32 q8, q9, q8			; CHECK-NEXT: vceq.i32 q8, q9, q8
	; CHECK-NEXT: vmov.i32 q9, #0x1			; CHECK-NEXT: vand q8, q8, q10
	; CHECK-NEXT: vmvn q8, q8
	; CHECK-NEXT: vadd.i32 q8, q8, q9
	; CHECK-NEXT: vmov r0, r1, d16			; CHECK-NEXT: vmov r0, r1, d16
	; CHECK-NEXT: vmov r2, r3, d17			; CHECK-NEXT: vmov r2, r3, d17
	; CHECK-NEXT: mov pc, lr			; CHECK-NEXT: mov pc, lr
	%cmp = icmp ne <4 x i32> %x, %y			%cmp = icmp ne <4 x i32> %x, %y
	%ext = sext <4 x i1> %cmp to <4 x i32>			%ext = sext <4 x i1> %cmp to <4 x i32>
	%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

llvm/trunk/test/CodeGen/X86/bool-ext-inc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=avx2 \| FileCheck %s

	; FIXME: add (sext i1 X), 1 -> zext (not i1 X)			; add (sext i1 X), 1 -> zext (not i1 X)

	define i32 @sext_inc(i1 zeroext %x) nounwind {			define i32 @sext_inc(i1 zeroext %x) nounwind {
	; CHECK-LABEL: sext_inc:			; CHECK-LABEL: sext_inc:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %dil, %ecx			; CHECK-NEXT: xorb $1, %dil
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movzbl %dil, %eax
	; CHECK-NEXT: subl %ecx, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ext = sext i1 %x to i32			%ext = sext i1 %x to i32
	%add = add i32 %ext, 1			%add = add i32 %ext, 1
	ret i32 %add			ret i32 %add
	}			}

	; FIXME: add (sext i1 X), 1 -> zext (not i1 X)			; add (sext i1 X), 1 -> zext (not i1 X)

	define <4 x i32> @sext_inc_vec(<4 x i1> %x) nounwind {			define <4 x i32> @sext_inc_vec(<4 x i1> %x) nounwind {
	; CHECK-LABEL: sext_inc_vec:			; CHECK-LABEL: sext_inc_vec:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vpslld $31, %xmm0, %xmm0			; CHECK-NEXT: vbroadcastss {{.*}}(%rip), %xmm1
	; CHECK-NEXT: vpsrad $31, %xmm0, %xmm0			; CHECK-NEXT: vandnps %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1
	; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%ext = sext <4 x i1> %x to <4 x i32>			%ext = sext <4 x i1> %x to <4 x i32>
	%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

	define <4 x i32> @cmpgt_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) nounwind {			define <4 x i32> @cmpgt_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) nounwind {
	; CHECK-LABEL: cmpgt_sext_inc_vec:			; CHECK-LABEL: cmpgt_sext_inc_vec:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vpcmpgtd %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1			; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1
	; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vpandn %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%cmp = icmp sgt <4 x i32> %x, %y			%cmp = icmp sgt <4 x i32> %x, %y
	%ext = sext <4 x i1> %cmp to <4 x i32>			%ext = sext <4 x i1> %cmp to <4 x i32>
	%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

	define <4 x i32> @cmpne_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) nounwind {			define <4 x i32> @cmpne_sext_inc_vec(<4 x i32> %x, <4 x i32> %y) nounwind {
	; CHECK-LABEL: cmpne_sext_inc_vec:			; CHECK-LABEL: cmpne_sext_inc_vec:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; CHECK-NEXT: vpsrld $31, %xmm0, %xmm0
	; CHECK-NEXT: vpxor %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1
	; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%cmp = icmp ne <4 x i32> %x, %y			%cmp = icmp ne <4 x i32> %x, %y
	%ext = sext <4 x i1> %cmp to <4 x i32>			%ext = sext <4 x i1> %cmp to <4 x i32>
	%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

	define <4 x i64> @cmpgt_sext_inc_vec256(<4 x i64> %x, <4 x i64> %y) nounwind {			define <4 x i64> @cmpgt_sext_inc_vec256(<4 x i64> %x, <4 x i64> %y) nounwind {
	; CHECK-LABEL: cmpgt_sext_inc_vec256:			; CHECK-LABEL: cmpgt_sext_inc_vec256:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vpcmpgtq %ymm1, %ymm0, %ymm0			; CHECK-NEXT: vpcmpgtq %ymm1, %ymm0, %ymm0
	; CHECK-NEXT: vpbroadcastq {{.*}}(%rip), %ymm1			; CHECK-NEXT: vpbroadcastq {{.*}}(%rip), %ymm1
	; CHECK-NEXT: vpaddq %ymm1, %ymm0, %ymm0			; CHECK-NEXT: vpandn %ymm1, %ymm0, %ymm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%cmp = icmp sgt <4 x i64> %x, %y			%cmp = icmp sgt <4 x i64> %x, %y
	%ext = sext <4 x i1> %cmp to <4 x i64>			%ext = sext <4 x i1> %cmp to <4 x i64>
	%add = add <4 x i64> %ext, <i64 1, i64 1, i64 1, i64 1>			%add = add <4 x i64> %ext, <i64 1, i64 1, i64 1, i64 1>
	ret <4 x i64> %add			ret <4 x i64> %add
	}			}

	define i32 @bool_logic_and_math(i32 %a, i32 %b, i32 %c, i32 %d) nounwind {			define i32 @bool_logic_and_math(i32 %a, i32 %b, i32 %c, i32 %d) nounwind {
	; CHECK-LABEL: bool_logic_and_math:			; CHECK-LABEL: bool_logic_and_math:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: cmpl %esi, %edi			; CHECK-NEXT: cmpl %esi, %edi
	; CHECK-NEXT: setne %al			; CHECK-NEXT: sete %al
	; CHECK-NEXT: cmpl %ecx, %edx			; CHECK-NEXT: cmpl %ecx, %edx
	; CHECK-NEXT: setne %cl			; CHECK-NEXT: sete %cl
	; CHECK-NEXT: andb %al, %cl			; CHECK-NEXT: orb %al, %cl
	; CHECK-NEXT: movzbl %cl, %ecx			; CHECK-NEXT: movzbl %cl, %eax
	; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: subl %ecx, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%cmp1 = icmp ne i32 %a, %b			%cmp1 = icmp ne i32 %a, %b
	%cmp2 = icmp ne i32 %c, %d			%cmp2 = icmp ne i32 %c, %d
	%and = and i1 %cmp1, %cmp2			%and = and i1 %cmp1, %cmp2
	%ext = sext i1 %and to i32			%ext = sext i1 %and to i32
	%add = add i32 %ext, 1			%add = add i32 %ext, 1
	ret i32 %add			ret i32 %add
	}			}

	define <4 x i32> @bool_logic_and_math_vec(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d) nounwind {			define <4 x i32> @bool_logic_and_math_vec(<4 x i32> %a, <4 x i32> %b, <4 x i32> %c, <4 x i32> %d) nounwind {
	; CHECK-LABEL: bool_logic_and_math_vec:			; CHECK-LABEL: bool_logic_and_math_vec:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; CHECK-NEXT: vpcmpeqd %xmm3, %xmm2, %xmm1
	; CHECK-NEXT: vpcmpeqd %xmm3, %xmm2, %xmm2			; CHECK-NEXT: vpcmpeqd %xmm2, %xmm2, %xmm2
	; CHECK-NEXT: vpxor %xmm1, %xmm2, %xmm1			; CHECK-NEXT: vpxor %xmm2, %xmm1, %xmm1
	; CHECK-NEXT: vpandn %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vpandn %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1			; CHECK-NEXT: vpbroadcastd {{.*}}(%rip), %xmm1
	; CHECK-NEXT: vpaddd %xmm1, %xmm0, %xmm0			; CHECK-NEXT: vpandn %xmm1, %xmm0, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%cmp1 = icmp ne <4 x i32> %a, %b			%cmp1 = icmp ne <4 x i32> %a, %b
	%cmp2 = icmp ne <4 x i32> %c, %d			%cmp2 = icmp ne <4 x i32> %c, %d
	%and = and <4 x i1> %cmp1, %cmp2			%and = and <4 x i1> %cmp1, %cmp2
	%ext = sext <4 x i1> %and to <4 x i32>			%ext = sext <4 x i1> %and to <4 x i32>
	%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>			%add = add <4 x i32> %ext, <i32 1, i32 1, i32 1, i32 1>
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 96812

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/ARM/bool-ext-inc.ll

llvm/trunk/test/CodeGen/X86/bool-ext-inc.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 96812

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/ARM/bool-ext-inc.ll

llvm/trunk/test/CodeGen/X86/bool-ext-inc.ll

[DAGCombiner] add (sext i1 X), 1 --> zext (not i1 X)
ClosedPublic