This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Canonicalise adds/subs of i1 vectors using XOR
ClosedPublic

Authored by david-arm on Feb 23 2021, 5:45 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
craig.topper
paulwalker-arm
RKSimon

Commits

rG87dbcd88651a: [CodeGen] Canonicalise adds/subs of i1 vectors using XOR

Summary

When calling SelectionDAG::getNode() to create an ADD or SUB
of two vectors with i1 element types we can canonicalise this
to use XOR instead, where 1+1 is treated as wrapping around
to 0 and 0-1 wraps to 1.

I've added the following tests for SVE targets:

CodeGen/AArch64/sve-pred-arith.ll

and modified some X86 tests to reflect the much simpler codegen
required.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Feb 23 2021, 5:45 AM

Herald added subscribers: pengfei, hiraditya, kristof.beyls. · View Herald TranscriptFeb 23 2021, 5:45 AM

david-arm requested review of this revision.Feb 23 2021, 5:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2021, 5:45 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B90379: Diff 325754.Feb 23 2021, 6:22 AM

david-arm added a child revision: D97299: [IR][SVE] Add new llvm.experimental.stepvector intrinsic.Feb 23 2021, 7:51 AM

craig.topper added inline comments.Feb 23 2021, 9:45 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5318	Isn't this also possible for sub?

Also canonicalised SUB of two predicate vectors to using XOR.

david-arm marked an inline comment as done.Feb 24 2021, 1:28 AM

david-arm added inline comments.

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5318	Yes it is - this is what the X86 target already seems to do. Thanks for the suggestion!

Harbormaster completed remote builds in B90559: Diff 326014.Feb 24 2021, 2:05 AM

Not sure of the value of the "ILLEGAL" tests given they're actually testing we can type legalise ISD::XOR, which I'm presuming is already well tested. That said it's nothing I cannot live with.

This revision is now accepted and ready to land.Feb 24 2021, 2:53 AM

Why is this in getNode() rather than DAGCombine? https://github.com/llvm/llvm-project/commit/6f5a805bbbed5d0cdaaf67846dffa7f044afb407 for example does something very similar in DAGCombine. What's the guideline for correct placement here?

In D97276#2584555, @nikic wrote:

Why is this in getNode() rather than DAGCombine? https://github.com/llvm/llvm-project/commit/6f5a805bbbed5d0cdaaf67846dffa7f044afb407 for example does something very similar in DAGCombine. What's the guideline for correct placement here?

We've followed similar logic as used for i1 based min/max and reductions. My thinking is that the earlier you can canonicalise expressions the better, and you cannot get earlier than never creating them in the first place. Ultimately the goal here is that for some operations there is never a need to consider i1 based vectors. This is partially true today because i1 vectors are rarely a legal type. However i1 vectors are legal for SVE and thus we're hitting corner cases that have not been hit before (for aarch64 at least).

In D97276#2584555, @nikic wrote:

Why is this in getNode() rather than DAGCombine? https://github.com/llvm/llvm-project/commit/6f5a805bbbed5d0cdaaf67846dffa7f044afb407 for example does something very similar in DAGCombine. What's the guideline for correct placement here?

Something that purely depends on the VT should be handled in getNode() - I can't recall the reasons I didn't put it there at the the time

Thanks for the clarification!

Closed by commit rG87dbcd88651a: [CodeGen] Canonicalise adds/subs of i1 vectors using XOR (authored by david-arm). · Explain WhyFeb 25 2021, 2:31 AM

This revision was automatically updated to reflect the committed changes.

david-arm marked an inline comment as done.

david-arm added a commit: rG87dbcd88651a: [CodeGen] Canonicalise adds/subs of i1 vectors using XOR.

RKSimon mentioned this in rG9490b9f14b89: [DAG] Move simplification of SADDSAT/SSUBSAT/UADDSAT/USUBSAT of vXi1 to getNode….Feb 25 2021, 9:49 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

3 lines

test/

CodeGen/

AArch64/

sve-pred-arith.ll

164 lines

X86/

avx512-mask-op.ll

342 lines

avx512bw-mask-op.ll

24 lines

Diff 326331

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,309 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getNode(unsigned Opcode, const SDLoc &DL, EVT VT,
case ISD::SUB:		case ISD::SUB:
assert(VT.isInteger() && "This operator does not apply to FP types!");		assert(VT.isInteger() && "This operator does not apply to FP types!");
assert(N1.getValueType() == N2.getValueType() &&		assert(N1.getValueType() == N2.getValueType() &&
N1.getValueType() == VT && "Binary operator types must match!");		N1.getValueType() == VT && "Binary operator types must match!");
// (X ^\|+- 0) -> X. This commonly occurs when legalizing i64 values, so		// (X ^\|+- 0) -> X. This commonly occurs when legalizing i64 values, so
// it's worth handling here.		// it's worth handling here.
if (N2C && N2C->isNullValue())		if (N2C && N2C->isNullValue())
return N1;		return N1;
		if ((Opcode == ISD::ADD \|\| Opcode == ISD::SUB) && VT.isVector() &&
		craig.topperUnsubmitted Done Reply Inline Actions Isn't this also possible for sub? craig.topper: Isn't this also possible for sub?
		david-armAuthorUnsubmitted Done Reply Inline Actions Yes it is - this is what the X86 target already seems to do. Thanks for the suggestion! david-arm: Yes it is - this is what the X86 target already seems to do. Thanks for the suggestion!
		VT.getVectorElementType() == MVT::i1)
		return getNode(ISD::XOR, DL, VT, N1, N2);
break;		break;
case ISD::MUL:		case ISD::MUL:
assert(VT.isInteger() && "This operator does not apply to FP types!");		assert(VT.isInteger() && "This operator does not apply to FP types!");
assert(N1.getValueType() == N2.getValueType() &&		assert(N1.getValueType() == N2.getValueType() &&
N1.getValueType() == VT && "Binary operator types must match!");		N1.getValueType() == VT && "Binary operator types must match!");
if (N2C && (N1.getOpcode() == ISD::VSCALE) && Flags.hasNoSignedWrap()) {		if (N2C && (N1.getOpcode() == ISD::VSCALE) && Flags.hasNoSignedWrap()) {
const APInt &MulImm = N1->getConstantOperandAPInt(0);		const APInt &MulImm = N1->getConstantOperandAPInt(0);
const APInt &N2CImm = N2C->getAPIntValue();		const APInt &N2CImm = N2C->getAPIntValue();
▲ Show 20 Lines • Show All 4,909 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-pred-arith.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=aarch64-linux-gnu -mattr=+sve < %s 2>%t \| FileCheck %s
				; RUN: FileCheck --check-prefix=WARN --allow-empty %s <%t

				; If this check fails please read test/CodeGen/AArch64/README for instructions on how to resolve it.
				; WARN-NOT: warning

				; LEGAL ADDS

				define <vscale x 16 x i1> @add_nxv16i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) {
				; CHECK-LABEL: add_nxv16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.b
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = add <vscale x 16 x i1> %a, %b
				ret <vscale x 16 x i1> %res;
				}

				define <vscale x 8 x i1> @add_nxv8i1(<vscale x 8 x i1> %a, <vscale x 8 x i1> %b) {
				; CHECK-LABEL: add_nxv8i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.h
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = add <vscale x 8 x i1> %a, %b
				ret <vscale x 8 x i1> %res;
				}

				define <vscale x 4 x i1> @add_nxv4i1(<vscale x 4 x i1> %a, <vscale x 4 x i1> %b) {
				; CHECK-LABEL: add_nxv4i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.s
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = add <vscale x 4 x i1> %a, %b
				ret <vscale x 4 x i1> %res;
				}

				define <vscale x 2 x i1> @add_nxv2i1(<vscale x 2 x i1> %a, <vscale x 2 x i1> %b) {
				; CHECK-LABEL: add_nxv2i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.d
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = add <vscale x 2 x i1> %a, %b
				ret <vscale x 2 x i1> %res;
				}


				; ILLEGAL ADDS

				define aarch64_sve_vector_pcs <vscale x 64 x i1> @add_nxv64i1(<vscale x 64 x i1> %a, <vscale x 64 x i1> %b) {
				; CHECK-LABEL: add_nxv64i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: str p8, [sp, #3, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p7, [sp, #4, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p6, [sp, #5, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p5, [sp, #6, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p4, [sp, #7, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ldr p4, [x3]
				; CHECK-NEXT: ldr p5, [x0]
				; CHECK-NEXT: ldr p6, [x1]
				; CHECK-NEXT: ldr p7, [x2]
				; CHECK-NEXT: ptrue p8.b
				; CHECK-NEXT: eor p0.b, p8/z, p0.b, p5.b
				; CHECK-NEXT: eor p1.b, p8/z, p1.b, p6.b
				; CHECK-NEXT: eor p2.b, p8/z, p2.b, p7.b
				; CHECK-NEXT: eor p3.b, p8/z, p3.b, p4.b
				; CHECK-NEXT: ldr p8, [sp, #3, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p7, [sp, #4, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p6, [sp, #5, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p5, [sp, #6, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p4, [sp, #7, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = add <vscale x 64 x i1> %a, %b
				ret <vscale x 64 x i1> %res;
				}


				; LEGAL SUBS

				define <vscale x 16 x i1> @sub_xv16i1(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) {
				; CHECK-LABEL: sub_xv16i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.b
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = sub <vscale x 16 x i1> %a, %b
				ret <vscale x 16 x i1> %res;
				}

				define <vscale x 8 x i1> @sub_xv8i1(<vscale x 8 x i1> %a, <vscale x 8 x i1> %b) {
				; CHECK-LABEL: sub_xv8i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.h
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = sub <vscale x 8 x i1> %a, %b
				ret <vscale x 8 x i1> %res;
				}

				define <vscale x 4 x i1> @sub_xv4i1(<vscale x 4 x i1> %a, <vscale x 4 x i1> %b) {
				; CHECK-LABEL: sub_xv4i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.s
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = sub <vscale x 4 x i1> %a, %b
				ret <vscale x 4 x i1> %res;
				}

				define <vscale x 2 x i1> @sub_xv2i1(<vscale x 2 x i1> %a, <vscale x 2 x i1> %b) {
				; CHECK-LABEL: sub_xv2i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p2.d
				; CHECK-NEXT: eor p0.b, p2/z, p0.b, p1.b
				; CHECK-NEXT: ret
				%res = sub <vscale x 2 x i1> %a, %b
				ret <vscale x 2 x i1> %res;
				}


				; ILLEGAL SUBGS


				define aarch64_sve_vector_pcs <vscale x 64 x i1> @sub_nxv64i1(<vscale x 64 x i1> %a, <vscale x 64 x i1> %b) {
				; CHECK-LABEL: sub_nxv64i1:
				; CHECK: // %bb.0:
				; CHECK-NEXT: str x29, [sp, #-16]! // 8-byte Folded Spill
				; CHECK-NEXT: addvl sp, sp, #-1
				; CHECK-NEXT: str p8, [sp, #3, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p7, [sp, #4, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p6, [sp, #5, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p5, [sp, #6, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: str p4, [sp, #7, mul vl] // 2-byte Folded Spill
				; CHECK-NEXT: .cfi_escape 0x0f, 0x0c, 0x8f, 0x00, 0x11, 0x10, 0x22, 0x11, 0x08, 0x92, 0x2e, 0x00, 0x1e, 0x22 // sp + 16 + 8 * VG
				; CHECK-NEXT: .cfi_offset w29, -16
				; CHECK-NEXT: ldr p4, [x3]
				; CHECK-NEXT: ldr p5, [x0]
				; CHECK-NEXT: ldr p6, [x1]
				; CHECK-NEXT: ldr p7, [x2]
				; CHECK-NEXT: ptrue p8.b
				; CHECK-NEXT: eor p0.b, p8/z, p0.b, p5.b
				; CHECK-NEXT: eor p1.b, p8/z, p1.b, p6.b
				; CHECK-NEXT: eor p2.b, p8/z, p2.b, p7.b
				; CHECK-NEXT: eor p3.b, p8/z, p3.b, p4.b
				; CHECK-NEXT: ldr p8, [sp, #3, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p7, [sp, #4, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p6, [sp, #5, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p5, [sp, #6, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: ldr p4, [sp, #7, mul vl] // 2-byte Folded Reload
				; CHECK-NEXT: addvl sp, sp, #1
				; CHECK-NEXT: ldr x29, [sp], #16 // 8-byte Folded Reload
				; CHECK-NEXT: ret
				%res = sub <vscale x 64 x i1> %a, %b
				ret <vscale x 64 x i1> %res;
				}

llvm/test/CodeGen/X86/avx512-mask-op.ll

Show First 20 Lines • Show All 3,798 Lines • ▼ Show 20 Lines	; X86-NEXT: retl
%v1 = icmp eq <16 x i32> %a, zeroinitializer		%v1 = icmp eq <16 x i32> %a, zeroinitializer
%mask1 = bitcast <16 x i1> %v1 to i16		%mask1 = bitcast <16 x i1> %v1 to i16
%val = zext i16 %mask1 to i32		%val = zext i16 %mask1 to i32
%val1 = add i32 %val, %val		%val1 = add i32 %val, %val
ret i32 %val1		ret i32 %val1
}		}

define i16 @test_v16i1_add(i16 %x, i16 %y) {		define i16 @test_v16i1_add(i16 %x, i16 %y) {
; KNL-LABEL: test_v16i1_add:		; CHECK-LABEL: test_v16i1_add:
; KNL: ## %bb.0:		; CHECK: ## %bb.0:
; KNL-NEXT: kmovw %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; KNL-NEXT: kmovw %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; KNL-NEXT: kxorw %k1, %k0, %k0		; CHECK-NEXT: ## kill: def $ax killed $ax killed $eax
; KNL-NEXT: kmovw %k0, %eax		; CHECK-NEXT: retq
; KNL-NEXT: ## kill: def $ax killed $ax killed $eax
; KNL-NEXT: retq
;
; SKX-LABEL: test_v16i1_add:
; SKX: ## %bb.0:
; SKX-NEXT: kmovd %edi, %k0
; SKX-NEXT: kmovd %esi, %k1
; SKX-NEXT: kxorw %k1, %k0, %k0
; SKX-NEXT: kmovd %k0, %eax
; SKX-NEXT: ## kill: def $ax killed $ax killed $eax
; SKX-NEXT: retq
;
; AVX512BW-LABEL: test_v16i1_add:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: kmovd %edi, %k0
; AVX512BW-NEXT: kmovd %esi, %k1
; AVX512BW-NEXT: kxorw %k1, %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: ## kill: def $ax killed $ax killed $eax
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test_v16i1_add:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: kmovw %edi, %k0
; AVX512DQ-NEXT: kmovw %esi, %k1
; AVX512DQ-NEXT: kxorw %k1, %k0, %k0
; AVX512DQ-NEXT: kmovw %k0, %eax
; AVX512DQ-NEXT: ## kill: def $ax killed $ax killed $eax
; AVX512DQ-NEXT: retq
;		;
; X86-LABEL: test_v16i1_add:		; X86-LABEL: test_v16i1_add:
; X86: ## %bb.0:		; X86: ## %bb.0:
; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k0		; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1		; X86-NEXT: xorw {{[0-9]+}}(%esp), %ax
; X86-NEXT: kxorw %k1, %k0, %k0
; X86-NEXT: kmovd %k0, %eax
; X86-NEXT: ## kill: def $ax killed $ax killed $eax
; X86-NEXT: retl		; X86-NEXT: retl
%m0 = bitcast i16 %x to <16 x i1>		%m0 = bitcast i16 %x to <16 x i1>
%m1 = bitcast i16 %y to <16 x i1>		%m1 = bitcast i16 %y to <16 x i1>
%m2 = add <16 x i1> %m0, %m1		%m2 = add <16 x i1> %m0, %m1
%ret = bitcast <16 x i1> %m2 to i16		%ret = bitcast <16 x i1> %m2 to i16
ret i16 %ret		ret i16 %ret
}		}

define i16 @test_v16i1_sub(i16 %x, i16 %y) {		define i16 @test_v16i1_sub(i16 %x, i16 %y) {
; KNL-LABEL: test_v16i1_sub:		; CHECK-LABEL: test_v16i1_sub:
; KNL: ## %bb.0:		; CHECK: ## %bb.0:
; KNL-NEXT: kmovw %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; KNL-NEXT: kmovw %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; KNL-NEXT: kxorw %k1, %k0, %k0		; CHECK-NEXT: ## kill: def $ax killed $ax killed $eax
; KNL-NEXT: kmovw %k0, %eax		; CHECK-NEXT: retq
; KNL-NEXT: ## kill: def $ax killed $ax killed $eax
; KNL-NEXT: retq
;
; SKX-LABEL: test_v16i1_sub:
; SKX: ## %bb.0:
; SKX-NEXT: kmovd %edi, %k0
; SKX-NEXT: kmovd %esi, %k1
; SKX-NEXT: kxorw %k1, %k0, %k0
; SKX-NEXT: kmovd %k0, %eax
; SKX-NEXT: ## kill: def $ax killed $ax killed $eax
; SKX-NEXT: retq
;
; AVX512BW-LABEL: test_v16i1_sub:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: kmovd %edi, %k0
; AVX512BW-NEXT: kmovd %esi, %k1
; AVX512BW-NEXT: kxorw %k1, %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: ## kill: def $ax killed $ax killed $eax
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test_v16i1_sub:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: kmovw %edi, %k0
; AVX512DQ-NEXT: kmovw %esi, %k1
; AVX512DQ-NEXT: kxorw %k1, %k0, %k0
; AVX512DQ-NEXT: kmovw %k0, %eax
; AVX512DQ-NEXT: ## kill: def $ax killed $ax killed $eax
; AVX512DQ-NEXT: retq
;		;
; X86-LABEL: test_v16i1_sub:		; X86-LABEL: test_v16i1_sub:
; X86: ## %bb.0:		; X86: ## %bb.0:
; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k0		; X86-NEXT: movzwl {{[0-9]+}}(%esp), %eax
; X86-NEXT: kmovw {{[0-9]+}}(%esp), %k1		; X86-NEXT: xorw {{[0-9]+}}(%esp), %ax
; X86-NEXT: kxorw %k1, %k0, %k0
; X86-NEXT: kmovd %k0, %eax
; X86-NEXT: ## kill: def $ax killed $ax killed $eax
; X86-NEXT: retl		; X86-NEXT: retl
%m0 = bitcast i16 %x to <16 x i1>		%m0 = bitcast i16 %x to <16 x i1>
%m1 = bitcast i16 %y to <16 x i1>		%m1 = bitcast i16 %y to <16 x i1>
%m2 = sub <16 x i1> %m0, %m1		%m2 = sub <16 x i1> %m0, %m1
%ret = bitcast <16 x i1> %m2 to i16		%ret = bitcast <16 x i1> %m2 to i16
ret i16 %ret		ret i16 %ret
}		}

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	; X86-NEXT: retl
%m0 = bitcast i16 %x to <16 x i1>		%m0 = bitcast i16 %x to <16 x i1>
%m1 = bitcast i16 %y to <16 x i1>		%m1 = bitcast i16 %y to <16 x i1>
%m2 = mul <16 x i1> %m0, %m1		%m2 = mul <16 x i1> %m0, %m1
%ret = bitcast <16 x i1> %m2 to i16		%ret = bitcast <16 x i1> %m2 to i16
ret i16 %ret		ret i16 %ret
}		}

define i8 @test_v8i1_add(i8 %x, i8 %y) {		define i8 @test_v8i1_add(i8 %x, i8 %y) {
; KNL-LABEL: test_v8i1_add:		; CHECK-LABEL: test_v8i1_add:
; KNL: ## %bb.0:		; CHECK: ## %bb.0:
; KNL-NEXT: kmovw %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; KNL-NEXT: kmovw %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; KNL-NEXT: kxorw %k1, %k0, %k0		; CHECK-NEXT: ## kill: def $al killed $al killed $eax
; KNL-NEXT: kmovw %k0, %eax		; CHECK-NEXT: retq
; KNL-NEXT: ## kill: def $al killed $al killed $eax
; KNL-NEXT: retq
;
; SKX-LABEL: test_v8i1_add:
; SKX: ## %bb.0:
; SKX-NEXT: kmovd %edi, %k0
; SKX-NEXT: kmovd %esi, %k1
; SKX-NEXT: kxorb %k1, %k0, %k0
; SKX-NEXT: kmovd %k0, %eax
; SKX-NEXT: ## kill: def $al killed $al killed $eax
; SKX-NEXT: retq
;
; AVX512BW-LABEL: test_v8i1_add:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: kmovd %edi, %k0
; AVX512BW-NEXT: kmovd %esi, %k1
; AVX512BW-NEXT: kxorw %k1, %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: ## kill: def $al killed $al killed $eax
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test_v8i1_add:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: kmovw %edi, %k0
; AVX512DQ-NEXT: kmovw %esi, %k1
; AVX512DQ-NEXT: kxorb %k1, %k0, %k0
; AVX512DQ-NEXT: kmovw %k0, %eax
; AVX512DQ-NEXT: ## kill: def $al killed $al killed $eax
; AVX512DQ-NEXT: retq
;		;
; X86-LABEL: test_v8i1_add:		; X86-LABEL: test_v8i1_add:
; X86: ## %bb.0:		; X86: ## %bb.0:
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k0		; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k1		; X86-NEXT: xorb {{[0-9]+}}(%esp), %al
; X86-NEXT: kxorb %k1, %k0, %k0
; X86-NEXT: kmovd %k0, %eax
; X86-NEXT: ## kill: def $al killed $al killed $eax
; X86-NEXT: retl		; X86-NEXT: retl
%m0 = bitcast i8 %x to <8 x i1>		%m0 = bitcast i8 %x to <8 x i1>
%m1 = bitcast i8 %y to <8 x i1>		%m1 = bitcast i8 %y to <8 x i1>
%m2 = add <8 x i1> %m0, %m1		%m2 = add <8 x i1> %m0, %m1
%ret = bitcast <8 x i1> %m2 to i8		%ret = bitcast <8 x i1> %m2 to i8
ret i8 %ret		ret i8 %ret
}		}

define i8 @test_v8i1_sub(i8 %x, i8 %y) {		define i8 @test_v8i1_sub(i8 %x, i8 %y) {
; KNL-LABEL: test_v8i1_sub:		; CHECK-LABEL: test_v8i1_sub:
; KNL: ## %bb.0:		; CHECK: ## %bb.0:
; KNL-NEXT: kmovw %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; KNL-NEXT: kmovw %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; KNL-NEXT: kxorw %k1, %k0, %k0		; CHECK-NEXT: ## kill: def $al killed $al killed $eax
; KNL-NEXT: kmovw %k0, %eax		; CHECK-NEXT: retq
; KNL-NEXT: ## kill: def $al killed $al killed $eax
; KNL-NEXT: retq
;
; SKX-LABEL: test_v8i1_sub:
; SKX: ## %bb.0:
; SKX-NEXT: kmovd %edi, %k0
; SKX-NEXT: kmovd %esi, %k1
; SKX-NEXT: kxorb %k1, %k0, %k0
; SKX-NEXT: kmovd %k0, %eax
; SKX-NEXT: ## kill: def $al killed $al killed $eax
; SKX-NEXT: retq
;
; AVX512BW-LABEL: test_v8i1_sub:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: kmovd %edi, %k0
; AVX512BW-NEXT: kmovd %esi, %k1
; AVX512BW-NEXT: kxorw %k1, %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: ## kill: def $al killed $al killed $eax
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test_v8i1_sub:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: kmovw %edi, %k0
; AVX512DQ-NEXT: kmovw %esi, %k1
; AVX512DQ-NEXT: kxorb %k1, %k0, %k0
; AVX512DQ-NEXT: kmovw %k0, %eax
; AVX512DQ-NEXT: ## kill: def $al killed $al killed $eax
; AVX512DQ-NEXT: retq
;		;
; X86-LABEL: test_v8i1_sub:		; X86-LABEL: test_v8i1_sub:
; X86: ## %bb.0:		; X86: ## %bb.0:
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k0		; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k1		; X86-NEXT: xorb {{[0-9]+}}(%esp), %al
; X86-NEXT: kxorb %k1, %k0, %k0
; X86-NEXT: kmovd %k0, %eax
; X86-NEXT: ## kill: def $al killed $al killed $eax
; X86-NEXT: retl		; X86-NEXT: retl
%m0 = bitcast i8 %x to <8 x i1>		%m0 = bitcast i8 %x to <8 x i1>
%m1 = bitcast i8 %y to <8 x i1>		%m1 = bitcast i8 %y to <8 x i1>
%m2 = sub <8 x i1> %m0, %m1		%m2 = sub <8 x i1> %m0, %m1
%ret = bitcast <8 x i1> %m2 to i8		%ret = bitcast <8 x i1> %m2 to i8
ret i8 %ret		ret i8 %ret
}		}

▲ Show 20 Lines • Show All 1,158 Lines • ▼ Show 20 Lines
; X86-NEXT: vpmovm2b %k0, %zmm0		; X86-NEXT: vpmovm2b %k0, %zmm0
; X86-NEXT: retl		; X86-NEXT: retl
%a_i = trunc i32 %a to i1		%a_i = trunc i32 %a to i1
%maskv = insertelement <64 x i1> <i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i1 %a_i, i32 0		%maskv = insertelement <64 x i1> <i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, i1 %a_i, i32 0
ret <64 x i1> %maskv		ret <64 x i1> %maskv
}		}

define i1 @test_v1i1_add(i1 %x, i1 %y) {		define i1 @test_v1i1_add(i1 %x, i1 %y) {
; KNL-LABEL: test_v1i1_add:		; CHECK-LABEL: test_v1i1_add:
; KNL: ## %bb.0:		; CHECK: ## %bb.0:
; KNL-NEXT: kmovw %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; KNL-NEXT: kmovw %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; KNL-NEXT: kxorw %k1, %k0, %k0		; CHECK-NEXT: ## kill: def $al killed $al killed $eax
; KNL-NEXT: kshiftlw $15, %k0, %k0		; CHECK-NEXT: retq
; KNL-NEXT: kshiftrw $15, %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: movb %al, -{{[0-9]+}}(%rsp)
; KNL-NEXT: movb -{{[0-9]+}}(%rsp), %al
; KNL-NEXT: retq
;
; SKX-LABEL: test_v1i1_add:
; SKX: ## %bb.0:
; SKX-NEXT: andl $1, %edi
; SKX-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; SKX-NEXT: andl $1, %esi
; SKX-NEXT: movb %sil, -{{[0-9]+}}(%rsp)
; SKX-NEXT: kmovb -{{[0-9]+}}(%rsp), %k0
; SKX-NEXT: kmovb -{{[0-9]+}}(%rsp), %k1
; SKX-NEXT: kxorw %k1, %k0, %k0
; SKX-NEXT: kshiftlb $7, %k0, %k0
; SKX-NEXT: kshiftrb $7, %k0, %k0
; SKX-NEXT: kmovb %k0, -{{[0-9]+}}(%rsp)
; SKX-NEXT: movb -{{[0-9]+}}(%rsp), %al
; SKX-NEXT: retq
;
; AVX512BW-LABEL: test_v1i1_add:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: kmovd %edi, %k0
; AVX512BW-NEXT: kmovd %esi, %k1
; AVX512BW-NEXT: kxorw %k1, %k0, %k0
; AVX512BW-NEXT: kshiftlw $15, %k0, %k0
; AVX512BW-NEXT: kshiftrw $15, %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: movb %al, -{{[0-9]+}}(%rsp)
; AVX512BW-NEXT: movb -{{[0-9]+}}(%rsp), %al
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test_v1i1_add:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: andl $1, %edi
; AVX512DQ-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; AVX512DQ-NEXT: andl $1, %esi
; AVX512DQ-NEXT: movb %sil, -{{[0-9]+}}(%rsp)
; AVX512DQ-NEXT: kmovb -{{[0-9]+}}(%rsp), %k0
; AVX512DQ-NEXT: kmovb -{{[0-9]+}}(%rsp), %k1
; AVX512DQ-NEXT: kxorw %k1, %k0, %k0
; AVX512DQ-NEXT: kshiftlb $7, %k0, %k0
; AVX512DQ-NEXT: kshiftrb $7, %k0, %k0
; AVX512DQ-NEXT: kmovb %k0, -{{[0-9]+}}(%rsp)
; AVX512DQ-NEXT: movb -{{[0-9]+}}(%rsp), %al
; AVX512DQ-NEXT: retq
;		;
; X86-LABEL: test_v1i1_add:		; X86-LABEL: test_v1i1_add:
; X86: ## %bb.0:		; X86: ## %bb.0:
; X86-NEXT: pushl %eax
; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: andb $1, %al
; X86-NEXT: movb %al, {{[0-9]+}}(%esp)
; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: andb $1, %al
; X86-NEXT: movb %al, {{[0-9]+}}(%esp)
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k0
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k1
; X86-NEXT: kxorw %k1, %k0, %k0
; X86-NEXT: kshiftlb $7, %k0, %k0
; X86-NEXT: kshiftrb $7, %k0, %k0
; X86-NEXT: kmovb %k0, {{[0-9]+}}(%esp)
; X86-NEXT: movb {{[0-9]+}}(%esp), %al		; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: popl %ecx		; X86-NEXT: xorb {{[0-9]+}}(%esp), %al
; X86-NEXT: retl		; X86-NEXT: retl
%m0 = bitcast i1 %x to <1 x i1>		%m0 = bitcast i1 %x to <1 x i1>
%m1 = bitcast i1 %y to <1 x i1>		%m1 = bitcast i1 %y to <1 x i1>
%m2 = add <1 x i1> %m0, %m1		%m2 = add <1 x i1> %m0, %m1
%ret = bitcast <1 x i1> %m2 to i1		%ret = bitcast <1 x i1> %m2 to i1
ret i1 %ret		ret i1 %ret
}		}

define i1 @test_v1i1_sub(i1 %x, i1 %y) {		define i1 @test_v1i1_sub(i1 %x, i1 %y) {
; KNL-LABEL: test_v1i1_sub:		; CHECK-LABEL: test_v1i1_sub:
; KNL: ## %bb.0:		; CHECK: ## %bb.0:
; KNL-NEXT: kmovw %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; KNL-NEXT: kmovw %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; KNL-NEXT: kxorw %k1, %k0, %k0		; CHECK-NEXT: ## kill: def $al killed $al killed $eax
; KNL-NEXT: kshiftlw $15, %k0, %k0		; CHECK-NEXT: retq
; KNL-NEXT: kshiftrw $15, %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
; KNL-NEXT: movb %al, -{{[0-9]+}}(%rsp)
; KNL-NEXT: movb -{{[0-9]+}}(%rsp), %al
; KNL-NEXT: retq
;
; SKX-LABEL: test_v1i1_sub:
; SKX: ## %bb.0:
; SKX-NEXT: andl $1, %edi
; SKX-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; SKX-NEXT: andl $1, %esi
; SKX-NEXT: movb %sil, -{{[0-9]+}}(%rsp)
; SKX-NEXT: kmovb -{{[0-9]+}}(%rsp), %k0
; SKX-NEXT: kmovb -{{[0-9]+}}(%rsp), %k1
; SKX-NEXT: kxorw %k1, %k0, %k0
; SKX-NEXT: kshiftlb $7, %k0, %k0
; SKX-NEXT: kshiftrb $7, %k0, %k0
; SKX-NEXT: kmovb %k0, -{{[0-9]+}}(%rsp)
; SKX-NEXT: movb -{{[0-9]+}}(%rsp), %al
; SKX-NEXT: retq
;
; AVX512BW-LABEL: test_v1i1_sub:
; AVX512BW: ## %bb.0:
; AVX512BW-NEXT: kmovd %edi, %k0
; AVX512BW-NEXT: kmovd %esi, %k1
; AVX512BW-NEXT: kxorw %k1, %k0, %k0
; AVX512BW-NEXT: kshiftlw $15, %k0, %k0
; AVX512BW-NEXT: kshiftrw $15, %k0, %k0
; AVX512BW-NEXT: kmovd %k0, %eax
; AVX512BW-NEXT: movb %al, -{{[0-9]+}}(%rsp)
; AVX512BW-NEXT: movb -{{[0-9]+}}(%rsp), %al
; AVX512BW-NEXT: retq
;
; AVX512DQ-LABEL: test_v1i1_sub:
; AVX512DQ: ## %bb.0:
; AVX512DQ-NEXT: andl $1, %edi
; AVX512DQ-NEXT: movb %dil, -{{[0-9]+}}(%rsp)
; AVX512DQ-NEXT: andl $1, %esi
; AVX512DQ-NEXT: movb %sil, -{{[0-9]+}}(%rsp)
; AVX512DQ-NEXT: kmovb -{{[0-9]+}}(%rsp), %k0
; AVX512DQ-NEXT: kmovb -{{[0-9]+}}(%rsp), %k1
; AVX512DQ-NEXT: kxorw %k1, %k0, %k0
; AVX512DQ-NEXT: kshiftlb $7, %k0, %k0
; AVX512DQ-NEXT: kshiftrb $7, %k0, %k0
; AVX512DQ-NEXT: kmovb %k0, -{{[0-9]+}}(%rsp)
; AVX512DQ-NEXT: movb -{{[0-9]+}}(%rsp), %al
; AVX512DQ-NEXT: retq
;		;
; X86-LABEL: test_v1i1_sub:		; X86-LABEL: test_v1i1_sub:
; X86: ## %bb.0:		; X86: ## %bb.0:
; X86-NEXT: pushl %eax
; X86-NEXT: .cfi_def_cfa_offset 8
; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: andb $1, %al
; X86-NEXT: movb %al, {{[0-9]+}}(%esp)
; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: andb $1, %al
; X86-NEXT: movb %al, {{[0-9]+}}(%esp)
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k0
; X86-NEXT: kmovb {{[0-9]+}}(%esp), %k1
; X86-NEXT: kxorw %k1, %k0, %k0
; X86-NEXT: kshiftlb $7, %k0, %k0
; X86-NEXT: kshiftrb $7, %k0, %k0
; X86-NEXT: kmovb %k0, {{[0-9]+}}(%esp)
; X86-NEXT: movb {{[0-9]+}}(%esp), %al		; X86-NEXT: movb {{[0-9]+}}(%esp), %al
; X86-NEXT: popl %ecx		; X86-NEXT: xorb {{[0-9]+}}(%esp), %al
; X86-NEXT: retl		; X86-NEXT: retl
%m0 = bitcast i1 %x to <1 x i1>		%m0 = bitcast i1 %x to <1 x i1>
%m1 = bitcast i1 %y to <1 x i1>		%m1 = bitcast i1 %y to <1 x i1>
%m2 = sub <1 x i1> %m0, %m1		%m2 = sub <1 x i1> %m0, %m1
%ret = bitcast <1 x i1> %m2 to i1		%ret = bitcast <1 x i1> %m2 to i1
ret i1 %ret		ret i1 %ret
}		}

▲ Show 20 Lines • Show All 235 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512bw-mask-op.ll

Show First 20 Lines • Show All 146 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%me = or <64 x i1> %mc, %md		%me = or <64 x i1> %mc, %md
%ret = bitcast <64 x i1> %me to i64		%ret = bitcast <64 x i1> %me to i64
ret i64 %ret		ret i64 %ret
}		}

define i32 @test_v32i1_add(i32 %x, i32 %y) {		define i32 @test_v32i1_add(i32 %x, i32 %y) {
; CHECK-LABEL: test_v32i1_add:		; CHECK-LABEL: test_v32i1_add:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: kmovd %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: kmovd %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; CHECK-NEXT: kxord %k1, %k0, %k0
; CHECK-NEXT: kmovd %k0, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m0 = bitcast i32 %x to <32 x i1>		%m0 = bitcast i32 %x to <32 x i1>
%m1 = bitcast i32 %y to <32 x i1>		%m1 = bitcast i32 %y to <32 x i1>
%m2 = add <32 x i1> %m0, %m1		%m2 = add <32 x i1> %m0, %m1
%ret = bitcast <32 x i1> %m2 to i32		%ret = bitcast <32 x i1> %m2 to i32
ret i32 %ret		ret i32 %ret
}		}

define i32 @test_v32i1_sub(i32 %x, i32 %y) {		define i32 @test_v32i1_sub(i32 %x, i32 %y) {
; CHECK-LABEL: test_v32i1_sub:		; CHECK-LABEL: test_v32i1_sub:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: kmovd %edi, %k0		; CHECK-NEXT: movl %edi, %eax
; CHECK-NEXT: kmovd %esi, %k1		; CHECK-NEXT: xorl %esi, %eax
; CHECK-NEXT: kxord %k1, %k0, %k0
; CHECK-NEXT: kmovd %k0, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m0 = bitcast i32 %x to <32 x i1>		%m0 = bitcast i32 %x to <32 x i1>
%m1 = bitcast i32 %y to <32 x i1>		%m1 = bitcast i32 %y to <32 x i1>
%m2 = sub <32 x i1> %m0, %m1		%m2 = sub <32 x i1> %m0, %m1
%ret = bitcast <32 x i1> %m2 to i32		%ret = bitcast <32 x i1> %m2 to i32
ret i32 %ret		ret i32 %ret
}		}

Show All 10 Lines	; CHECK-NEXT: retq
%m2 = mul <32 x i1> %m0, %m1		%m2 = mul <32 x i1> %m0, %m1
%ret = bitcast <32 x i1> %m2 to i32		%ret = bitcast <32 x i1> %m2 to i32
ret i32 %ret		ret i32 %ret
}		}

define i64 @test_v64i1_add(i64 %x, i64 %y) {		define i64 @test_v64i1_add(i64 %x, i64 %y) {
; CHECK-LABEL: test_v64i1_add:		; CHECK-LABEL: test_v64i1_add:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: kmovq %rdi, %k0		; CHECK-NEXT: movq %rdi, %rax
; CHECK-NEXT: kmovq %rsi, %k1		; CHECK-NEXT: xorq %rsi, %rax
; CHECK-NEXT: kxorq %k1, %k0, %k0
; CHECK-NEXT: kmovq %k0, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m0 = bitcast i64 %x to <64 x i1>		%m0 = bitcast i64 %x to <64 x i1>
%m1 = bitcast i64 %y to <64 x i1>		%m1 = bitcast i64 %y to <64 x i1>
%m2 = add <64 x i1> %m0, %m1		%m2 = add <64 x i1> %m0, %m1
%ret = bitcast <64 x i1> %m2 to i64		%ret = bitcast <64 x i1> %m2 to i64
ret i64 %ret		ret i64 %ret
}		}

define i64 @test_v64i1_sub(i64 %x, i64 %y) {		define i64 @test_v64i1_sub(i64 %x, i64 %y) {
; CHECK-LABEL: test_v64i1_sub:		; CHECK-LABEL: test_v64i1_sub:
; CHECK: ## %bb.0:		; CHECK: ## %bb.0:
; CHECK-NEXT: kmovq %rdi, %k0		; CHECK-NEXT: movq %rdi, %rax
; CHECK-NEXT: kmovq %rsi, %k1		; CHECK-NEXT: xorq %rsi, %rax
; CHECK-NEXT: kxorq %k1, %k0, %k0
; CHECK-NEXT: kmovq %k0, %rax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%m0 = bitcast i64 %x to <64 x i1>		%m0 = bitcast i64 %x to <64 x i1>
%m1 = bitcast i64 %y to <64 x i1>		%m1 = bitcast i64 %y to <64 x i1>
%m2 = sub <64 x i1> %m0, %m1		%m2 = sub <64 x i1> %m0, %m1
%ret = bitcast <64 x i1> %m2 to i64		%ret = bitcast <64 x i1> %m2 to i64
ret i64 %ret		ret i64 %ret
}		}

▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines