This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86ISelDAGToDAG.cpp
-
X86InstrArithmetic.td
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
bmi.ll
-
tbm_patterns.ll

Differential D55870

[X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post processing
ClosedPublic

Authored by craig.topper on Dec 18 2018, 5:15 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel
andreadb

Commits

rG84a00bd98a11: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post…
rL349661: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post…

Summary

The (cmp (and X, Y) 0) pattern is greedy and ends up forming a TESTrr and consuming the and when it might be better to use one of the BMI/TBM like BLSR or BLSI.

This patch moves removes the pattern from isel and adds a post processing check to combine TESTrr+ANDrr into just a TESTrr. With this patch we are able to select the BMI/TBM instructions, but we'll also emit a TESTrr when the result is compared to 0. In many cases the peephole pass will be able to use optimizeCompareInstr to remove the TEST, but its probably not perfect.

Diff Detail

Repository: rL LLVM

Event Timeline

craig.topper created this revision.Dec 18 2018, 5:15 PM

andreadb added inline comments.Dec 19 2018, 5:42 AM

test/CodeGen/X86/bmi.ll
531–539 ↗	(On Diff #178817)	This is a strange/interesting test. If %a is zero, then %t1 is also zero. If %a is not zero, then %t1 has exactly one bit set. --> Testing if %t1 is equal to 0, is equivalent to testing if %a is 0. The only case where %t2 is TRUE, is if %a is 0. This whole logic could be folded into a icmp + select. So we don't even need to select a BLSI. This sequence should be optimized at IR level. I didn't test if it is what happens. That being said. I take that the the purpose of this test was different. Probably, this test should be rewritten in a way that doesn't expose that simplification?

The change LGTM.

However, tests look a bit too fragile. If the goal is to verify the selection of BMI/TBM instructions, then tests should be made a bit more robust.
Future improvemenets may break some of those patterns; the code from some of those tests can be aggressively simplified...
I suggest to improve the tests first, and then commit this patch.
I think that we should probably raise a couple of bugs for missing simplifications. Most of those missing simplifications can be probably catched at IR level.

test/CodeGen/X86/bmi.ll
624–635 ↗	(On Diff #178817)	Again. Here we may prefer POPCNT to BLSI. It tends to have better latency/throughput overall. I think it is worthy to raise a bug for this. Speaking about these tests in general: I think that we should make these more robust (maybe in a separate patch). We can probably make this test more robust by changing how we check the result. For example, rather than comparing against zero, we can compare against a specific power-of-2. That would force the selection of BLSI, since we would need to know the position of that bit. We can probably do something similar to improve the other test.
880–882 ↗	(On Diff #178817)	Same. Could be a simple `test+cmov`. But - again - I take that the purpose of this test is not to check if we are smart enough to fold away that sequence...

This revision is now accepted and ready to land.Dec 19 2018, 6:06 AM

craig.topper marked 3 inline comments as done.Dec 19 2018, 8:24 AM

craig.topper added inline comments.

test/CodeGen/X86/bmi.ll
531–539 ↗	(On Diff #178817)	The tests were intended to test use the Z flag from the BMI instructions.
624–635 ↗	(On Diff #178817)	I thought we just established that BLSI could be replaced with a compare of the input with 0. Why would we replace it with POPCNT?

craig.topper marked 2 inline comments as done.Dec 19 2018, 8:31 AM

craig.topper added inline comments.

test/CodeGen/X86/bmi.ll
624–635 ↗	(On Diff #178817)	Doesn't BLSI have better throughput than POPCNT on Intel CPUs? BLSI is on 2 ports. POPCNT is only one port.
880–882 ↗	(On Diff #178817)	How could this be a TEST+CMOV? ZF will be set if the input is zero or has exactly one bit set.

andreadb added inline comments.Dec 19 2018, 8:39 AM

test/CodeGen/X86/bmi.ll
624–635 ↗	(On Diff #178817)	Right, forget about that comment. A simple compare of the input is better for this case.

andreadb added inline comments.Dec 19 2018, 8:51 AM

test/CodeGen/X86/bmi.ll
531–539 ↗	(On Diff #178817)	I see. In that case, then don't worry about changing those tests. I think it is still worthy to raise a bug about teaching how to fold that computation at IR level. We keep these tests to verify that the Z flag is actually used.

andreadb added inline comments.Dec 19 2018, 8:55 AM

test/CodeGen/X86/bmi.ll
880–882 ↗	(On Diff #178817)	Ouch.. right. We are not extracting a bit here. We are resetting the lowest set bit here. So TEST+CMOV is not fine. I misread the code.

andreadb added inline comments.Dec 19 2018, 9:22 AM

test/CodeGen/X86/bmi.ll
624–635 ↗	(On Diff #178817)	Interesting. On AMD family 16h and 17h, POPCNT and BLSI use the same ALU pipes. However BLSI is 2 uops, versus 1 uop for the POPCNT. So, the throughput of BLSI is half the throughput of POPCNT. That being said, we already agreed that ideally, a simple compare of the input would have been better.

Closed by commit rL349661: [X86] Don't match TESTrr from (cmp (and X, Y), 0) during isel. Defer to post… (authored by ctopper). · Explain WhyDec 19 2018, 10:52 AM

This revision was automatically updated to reflect the committed changes.

craig.topper mentioned this in D55813: [X86] Add isel patterns to match BMI/TBMI instructions when lowering has turned the root nodes into one of the flag producing binops..Dec 21 2018, 11:02 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86ISelDAGToDAG.cpp

25 lines

X86InstrArithmetic.td

11 lines

test/

CodeGen/

X86/

bmi.ll

24 lines

tbm_patterns.ll

27 lines

Diff 178918

llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 892 Lines • ▼ Show 20 Lines	while (Position != CurDAG->allnodes_begin()) {
if (N->use_empty() \|\| !N->isMachineOpcode())		if (N->use_empty() \|\| !N->isMachineOpcode())
continue;		continue;

if (tryOptimizeRem8Extend(N)) {		if (tryOptimizeRem8Extend(N)) {
MadeChange = true;		MadeChange = true;
continue;		continue;
}		}

// Attempt to remove vectors moves that were inserted to zero upper bits.		// Look for a TESTrr+ANDrr pattern where both operands of the test are
		// the same. Rewrite to remove the AND.
		unsigned Opc = N->getMachineOpcode();
		if ((Opc == X86::TEST8rr \|\| Opc == X86::TEST16rr \|\|
		Opc == X86::TEST32rr \|\| Opc == X86::TEST64rr) &&
		N->getOperand(0) == N->getOperand(1) &&
		N->isOnlyUserOf(N->getOperand(0).getNode()) &&
		N->getOperand(0).isMachineOpcode()) {
		SDValue And = N->getOperand(0);
		unsigned N0Opc = And.getMachineOpcode();
		if (N0Opc == X86::AND8rr \|\| N0Opc == X86::AND16rr \|\|
		N0Opc == X86::AND32rr \|\| N0Opc == X86::AND64rr) {
		MachineSDNode *Test = CurDAG->getMachineNode(Opc, SDLoc(N),
		MVT::i32,
		And.getOperand(0),
		And.getOperand(1));
		ReplaceUses(N, Test);
		MadeChange = true;
		continue;
		}
		}

if (N->getMachineOpcode() != TargetOpcode::SUBREG_TO_REG)		// Attempt to remove vectors moves that were inserted to zero upper bits.
		if (Opc != TargetOpcode::SUBREG_TO_REG)
continue;		continue;

unsigned SubRegIdx = N->getConstantOperandVal(2);		unsigned SubRegIdx = N->getConstantOperandVal(2);
if (SubRegIdx != X86::sub_xmm && SubRegIdx != X86::sub_ymm)		if (SubRegIdx != X86::sub_xmm && SubRegIdx != X86::sub_ymm)
continue;		continue;

SDValue Move = N->getOperand(1);		SDValue Move = N->getOperand(1);
if (!Move.isMachineOpcode())		if (!Move.isMachineOpcode())
▲ Show 20 Lines • Show All 2,987 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrArithmetic.td

	Show First 20 Lines • Show All 1,206 Lines • ▼ Show 20 Lines
	// they don't have all the usual imm8 and REV forms, and are encoded into a			// they don't have all the usual imm8 and REV forms, and are encoded into a
	// different space.			// different space.
	def X86testpat : PatFrag<(ops node:$lhs, node:$rhs),			def X86testpat : PatFrag<(ops node:$lhs, node:$rhs),
	(X86cmp (and_su node:$lhs, node:$rhs), 0)>;			(X86cmp (and_su node:$lhs, node:$rhs), 0)>;

	let isCompare = 1 in {			let isCompare = 1 in {
	let Defs = [EFLAGS] in {			let Defs = [EFLAGS] in {
	let isCommutable = 1 in {			let isCommutable = 1 in {
	def TEST8rr : BinOpRR_F<0x84, "test", Xi8 , X86testpat>;			// Avoid selecting these and instead use a test+and. Post processing will
	def TEST16rr : BinOpRR_F<0x84, "test", Xi16, X86testpat>;			// combine them. This gives bunch of other patterns that start with
	def TEST32rr : BinOpRR_F<0x84, "test", Xi32, X86testpat>;			// and a chance to match.
	def TEST64rr : BinOpRR_F<0x84, "test", Xi64, X86testpat>;			def TEST8rr : BinOpRR_F<0x84, "test", Xi8 , null_frag>;
				def TEST16rr : BinOpRR_F<0x84, "test", Xi16, null_frag>;
				def TEST32rr : BinOpRR_F<0x84, "test", Xi32, null_frag>;
				def TEST64rr : BinOpRR_F<0x84, "test", Xi64, null_frag>;
	} // isCommutable			} // isCommutable

	def TEST8mr : BinOpMR_F<0x84, "test", Xi8 , X86testpat>;			def TEST8mr : BinOpMR_F<0x84, "test", Xi8 , X86testpat>;
	def TEST16mr : BinOpMR_F<0x84, "test", Xi16, X86testpat>;			def TEST16mr : BinOpMR_F<0x84, "test", Xi16, X86testpat>;
	def TEST32mr : BinOpMR_F<0x84, "test", Xi32, X86testpat>;			def TEST32mr : BinOpMR_F<0x84, "test", Xi32, X86testpat>;
	def TEST64mr : BinOpMR_F<0x84, "test", Xi64, X86testpat>;			def TEST64mr : BinOpMR_F<0x84, "test", Xi64, X86testpat>;

	def TEST8ri : BinOpRI_F<0xF6, "test", Xi8 , X86testpat, MRM0r>;			def TEST8ri : BinOpRI_F<0xF6, "test", Xi8 , X86testpat, MRM0r>;
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/bmi.ll

Show First 20 Lines • Show All 513 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = select i1 %t2, i32 %b, i32 %t1		%t3 = select i1 %t2, i32 %b, i32 %t1
ret i32 %t3		ret i32 %t3
}		}

define i32 @blsi32_z2(i32 %a, i32 %b, i32 %c) nounwind {		define i32 @blsi32_z2(i32 %a, i32 %b, i32 %c) nounwind {
; X86-LABEL: blsi32_z2:		; X86-LABEL: blsi32_z2:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: blsil {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl %eax, %ecx
; X86-NEXT: negl %ecx
; X86-NEXT: testl %eax, %ecx
; X86-NEXT: leal {{[0-9]+}}(%esp), %eax		; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx		; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
; X86-NEXT: cmovel %eax, %ecx		; X86-NEXT: cmovel %eax, %ecx
; X86-NEXT: movl (%ecx), %eax		; X86-NEXT: movl (%ecx), %eax
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: blsi32_z2:		; X64-LABEL: blsi32_z2:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movl %esi, %eax		; X64-NEXT: movl %esi, %eax
; X64-NEXT: movl %edi, %ecx		; X64-NEXT: blsil %edi, %ecx
; X64-NEXT: negl %ecx
; X64-NEXT: testl %edi, %ecx
; X64-NEXT: cmovnel %edx, %eax		; X64-NEXT: cmovnel %edx, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = sub i32 0, %a		%t0 = sub i32 0, %a
%t1 = and i32 %t0, %a		%t1 = and i32 %t0, %a
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = select i1 %t2, i32 %b, i32 %c		%t3 = select i1 %t2, i32 %b, i32 %c
ret i32 %t3		ret i32 %t3
}		}
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
; X86-NEXT: movl (%ecx), %eax		; X86-NEXT: movl (%ecx), %eax
; X86-NEXT: movl 4(%ecx), %edx		; X86-NEXT: movl 4(%ecx), %edx
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: blsi64_z2:		; X64-LABEL: blsi64_z2:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movq %rsi, %rax		; X64-NEXT: movq %rsi, %rax
; X64-NEXT: movq %rdi, %rcx		; X64-NEXT: blsiq %rdi, %rcx
; X64-NEXT: negq %rcx
; X64-NEXT: testq %rdi, %rcx
; X64-NEXT: cmovneq %rdx, %rax		; X64-NEXT: cmovneq %rdx, %rax
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = sub i64 0, %a		%t0 = sub i64 0, %a
%t1 = and i64 %t0, %a		%t1 = and i64 %t0, %a
%t2 = icmp eq i64 %t1, 0		%t2 = icmp eq i64 %t1, 0
%t3 = select i1 %t2, i64 %b, i64 %c		%t3 = select i1 %t2, i64 %b, i64 %c
ret i64 %t3		ret i64 %t3
}		}
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = select i1 %t2, i32 %b, i32 %t1		%t3 = select i1 %t2, i32 %b, i32 %t1
ret i32 %t3		ret i32 %t3
}		}

define i32 @blsr32_z2(i32 %a, i32 %b, i32 %c) nounwind {		define i32 @blsr32_z2(i32 %a, i32 %b, i32 %c) nounwind {
; X86-LABEL: blsr32_z2:		; X86-LABEL: blsr32_z2:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: blsrl {{[0-9]+}}(%esp), %eax
; X86-NEXT: leal -1(%eax), %ecx
; X86-NEXT: testl %eax, %ecx
; X86-NEXT: leal {{[0-9]+}}(%esp), %eax		; X86-NEXT: leal {{[0-9]+}}(%esp), %eax
; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx		; X86-NEXT: leal {{[0-9]+}}(%esp), %ecx
; X86-NEXT: cmovel %eax, %ecx		; X86-NEXT: cmovel %eax, %ecx
; X86-NEXT: movl (%ecx), %eax		; X86-NEXT: movl (%ecx), %eax
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: blsr32_z2:		; X64-LABEL: blsr32_z2:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movl %esi, %eax		; X64-NEXT: movl %esi, %eax
; X64-NEXT: # kill: def $edi killed $edi def $rdi		; X64-NEXT: blsrl %edi, %ecx
; X64-NEXT: leal -1(%rdi), %ecx
; X64-NEXT: testl %edi, %ecx
; X64-NEXT: cmovnel %edx, %eax		; X64-NEXT: cmovnel %edx, %eax
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = sub i32 %a, 1		%t0 = sub i32 %a, 1
%t1 = and i32 %t0, %a		%t1 = and i32 %t0, %a
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = select i1 %t2, i32 %b, i32 %c		%t3 = select i1 %t2, i32 %b, i32 %c
ret i32 %t3		ret i32 %t3
}		}
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
; X86-NEXT: movl (%ecx), %eax		; X86-NEXT: movl (%ecx), %eax
; X86-NEXT: movl 4(%ecx), %edx		; X86-NEXT: movl 4(%ecx), %edx
; X86-NEXT: popl %esi		; X86-NEXT: popl %esi
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: blsr64_z2:		; X64-LABEL: blsr64_z2:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: movq %rsi, %rax		; X64-NEXT: movq %rsi, %rax
; X64-NEXT: leaq -1(%rdi), %rcx		; X64-NEXT: blsrq %rdi, %rcx
; X64-NEXT: testq %rdi, %rcx
; X64-NEXT: cmovneq %rdx, %rax		; X64-NEXT: cmovneq %rdx, %rax
; X64-NEXT: retq		; X64-NEXT: retq
%t0 = sub i64 %a, 1		%t0 = sub i64 %a, 1
%t1 = and i64 %t0, %a		%t1 = and i64 %t0, %a
%t2 = icmp eq i64 %t1, 0		%t2 = icmp eq i64 %t1, 0
%t3 = select i1 %t2, i64 %b, i64 %c		%t3 = select i1 %t2, i64 %b, i64 %c
ret i64 %t3		ret i64 %t3
}		}
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/tbm_patterns.ll

Show First 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%t3 = select i1 %t2, i32 %b, i32 %t1		%t3 = select i1 %t2, i32 %b, i32 %t1
ret i32 %t3		ret i32 %t3
}		}

define i32 @test_x86_tbm_blcfill_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {		define i32 @test_x86_tbm_blcfill_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_blcfill_u32_z2:		; CHECK-LABEL: test_x86_tbm_blcfill_u32_z2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movl %esi, %eax
; CHECK-NEXT: # kill: def $edi killed $edi def $rdi		; CHECK-NEXT: blcfilll %edi, %ecx
; CHECK-NEXT: leal 1(%rdi), %ecx
; CHECK-NEXT: testl %edi, %ecx
; CHECK-NEXT: cmovnel %edx, %eax		; CHECK-NEXT: cmovnel %edx, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = add i32 %a, 1		%t0 = add i32 %a, 1
%t1 = and i32 %t0, %a		%t1 = and i32 %t0, %a
%t2 = icmp eq i32 %t1, 0		%t2 = icmp eq i32 %t1, 0
%t3 = select i1 %t2, i32 %b, i32 %c		%t3 = select i1 %t2, i32 %b, i32 %c
ret i32 %t3		ret i32 %t3
}		}
Show All 20 Lines	; CHECK-NEXT: retq
%t3 = select i1 %t2, i64 %b, i64 %t1		%t3 = select i1 %t2, i64 %b, i64 %t1
ret i64 %t3		ret i64 %t3
}		}

define i64 @test_x86_tbm_blcfill_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {		define i64 @test_x86_tbm_blcfill_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_blcfill_u64_z2:		; CHECK-LABEL: test_x86_tbm_blcfill_u64_z2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq %rsi, %rax		; CHECK-NEXT: movq %rsi, %rax
; CHECK-NEXT: leaq 1(%rdi), %rcx		; CHECK-NEXT: blcfillq %rdi, %rcx
; CHECK-NEXT: testq %rdi, %rcx
; CHECK-NEXT: cmovneq %rdx, %rax		; CHECK-NEXT: cmovneq %rdx, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = add i64 %a, 1		%t0 = add i64 %a, 1
%t1 = and i64 %t0, %a		%t1 = and i64 %t0, %a
%t2 = icmp eq i64 %t1, 0		%t2 = icmp eq i64 %t1, 0
%t3 = select i1 %t2, i64 %b, i64 %c		%t3 = select i1 %t2, i64 %b, i64 %c
ret i64 %t3		ret i64 %t3
}		}
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%t4 = select i1 %t3, i32 %b, i32 %t2		%t4 = select i1 %t3, i32 %b, i32 %t2
ret i32 %t4		ret i32 %t4
}		}

define i32 @test_x86_tbm_blcic_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {		define i32 @test_x86_tbm_blcic_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_blcic_u32_z2:		; CHECK-LABEL: test_x86_tbm_blcic_u32_z2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movl %esi, %eax
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: blcicl %edi, %ecx
; CHECK-NEXT: notl %ecx
; CHECK-NEXT: incl %edi
; CHECK-NEXT: testl %ecx, %edi
; CHECK-NEXT: cmovnel %edx, %eax		; CHECK-NEXT: cmovnel %edx, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = xor i32 %a, -1		%t0 = xor i32 %a, -1
%t1 = add i32 %a, 1		%t1 = add i32 %a, 1
%t2 = and i32 %t1, %t0		%t2 = and i32 %t1, %t0
%t3 = icmp eq i32 %t2, 0		%t3 = icmp eq i32 %t2, 0
%t4 = select i1 %t3, i32 %b, i32 %c		%t4 = select i1 %t3, i32 %b, i32 %c
ret i32 %t4		ret i32 %t4
Show All 23 Lines	; CHECK-NEXT: retq
%t4 = select i1 %t3, i64 %b, i64 %t2		%t4 = select i1 %t3, i64 %b, i64 %t2
ret i64 %t4		ret i64 %t4
}		}

define i64 @test_x86_tbm_blcic_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {		define i64 @test_x86_tbm_blcic_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_blcic_u64_z2:		; CHECK-LABEL: test_x86_tbm_blcic_u64_z2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq %rsi, %rax		; CHECK-NEXT: movq %rsi, %rax
; CHECK-NEXT: movq %rdi, %rcx		; CHECK-NEXT: blcicq %rdi, %rcx
; CHECK-NEXT: notq %rcx
; CHECK-NEXT: incq %rdi
; CHECK-NEXT: testq %rcx, %rdi
; CHECK-NEXT: cmovneq %rdx, %rax		; CHECK-NEXT: cmovneq %rdx, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = xor i64 %a, -1		%t0 = xor i64 %a, -1
%t1 = add i64 %a, 1		%t1 = add i64 %a, 1
%t2 = and i64 %t1, %t0		%t2 = and i64 %t1, %t0
%t3 = icmp eq i64 %t2, 0		%t3 = icmp eq i64 %t2, 0
%t4 = select i1 %t3, i64 %b, i64 %c		%t4 = select i1 %t3, i64 %b, i64 %c
ret i64 %t4		ret i64 %t4
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%t4 = select i1 %t3, i32 %b, i32 %t2		%t4 = select i1 %t3, i32 %b, i32 %t2
ret i32 %t4		ret i32 %t4
}		}

define i32 @test_x86_tbm_tzmsk_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {		define i32 @test_x86_tbm_tzmsk_u32_z2(i32 %a, i32 %b, i32 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_tzmsk_u32_z2:		; CHECK-LABEL: test_x86_tbm_tzmsk_u32_z2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movl %esi, %eax		; CHECK-NEXT: movl %esi, %eax
; CHECK-NEXT: movl %edi, %ecx		; CHECK-NEXT: tzmskl %edi, %ecx
; CHECK-NEXT: notl %ecx
; CHECK-NEXT: decl %edi
; CHECK-NEXT: testl %edi, %ecx
; CHECK-NEXT: cmovnel %edx, %eax		; CHECK-NEXT: cmovnel %edx, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = xor i32 %a, -1		%t0 = xor i32 %a, -1
%t1 = add i32 %a, -1		%t1 = add i32 %a, -1
%t2 = and i32 %t0, %t1		%t2 = and i32 %t0, %t1
%t3 = icmp eq i32 %t2, 0		%t3 = icmp eq i32 %t2, 0
%t4 = select i1 %t3, i32 %b, i32 %c		%t4 = select i1 %t3, i32 %b, i32 %c
ret i32 %t4		ret i32 %t4
Show All 23 Lines	; CHECK-NEXT: retq
%t4 = select i1 %t3, i64 %b, i64 %t2		%t4 = select i1 %t3, i64 %b, i64 %t2
ret i64 %t4		ret i64 %t4
}		}

define i64 @test_x86_tbm_tzmsk_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {		define i64 @test_x86_tbm_tzmsk_u64_z2(i64 %a, i64 %b, i64 %c) nounwind {
; CHECK-LABEL: test_x86_tbm_tzmsk_u64_z2:		; CHECK-LABEL: test_x86_tbm_tzmsk_u64_z2:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq %rsi, %rax		; CHECK-NEXT: movq %rsi, %rax
; CHECK-NEXT: movq %rdi, %rcx		; CHECK-NEXT: tzmskq %rdi, %rcx
; CHECK-NEXT: notq %rcx
; CHECK-NEXT: decq %rdi
; CHECK-NEXT: testq %rdi, %rcx
; CHECK-NEXT: cmovneq %rdx, %rax		; CHECK-NEXT: cmovneq %rdx, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%t0 = xor i64 %a, -1		%t0 = xor i64 %a, -1
%t1 = add i64 %a, -1		%t1 = add i64 %a, -1
%t2 = and i64 %t0, %t1		%t2 = and i64 %t0, %t1
%t3 = icmp eq i64 %t2, 0		%t3 = icmp eq i64 %t2, 0
%t4 = select i1 %t3, i64 %b, i64 %c		%t4 = select i1 %t3, i64 %b, i64 %c
ret i64 %t4		ret i64 %t4
Show All 36 Lines