This is an archive of the discontinued LLVM Phabricator instance.

[x86] favor vector constant load to avoid GPR to XMM transfer, part 2
ClosedPublic

Authored by spatel on May 18 2020, 8:49 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon

Commits

rGfa038e03504c: [x86] favor vector constant load to avoid GPR to XMM transfer, part 2

Summary

This replaces the build_vector lowering code that was just added in D80013 and matches the pattern later from the x86-specific "vzext_movl". That seems to result in the same or better improvements and gets rid of the 'TODO' items from that patch.

AFAICT, we always shrink wider constant vectors to 128-bit on these patterns, so we still get the implicit zero-extension to ymm/zmm without wasting space on larger vector constants. There's a trade-off there because that means we miss potential load-folding.

Similarly, we could load scalar constants here with implicit zero-extension even to 128-bit. That saves constant space, but it means we forego load-folding, and so it increases register pressure. This seems like a good middle-ground between those 2 options?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.May 18 2020, 8:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2020, 8:49 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

spatel retitled this revision from [x86] favor vector constant load to avoid GPR to XMM transert, part 2 to [x86] favor vector constant load to avoid GPR to XMM transfer, part 2.May 18 2020, 8:49 AM

RKSimon added inline comments.May 18 2020, 11:18 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
35802	Would creating X86ISD::VZEXT_LOAD be better? We'd need to zext i8/i16 cases to i32 constant pool entries but we'd avoid the upper zero constants.

spatel marked an inline comment as done.May 18 2020, 1:20 PM

spatel added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp

35802

My 1st draft of this patch used VZEXT_LOAD, and I think it worked without having to explicitly zext here. Does this match what you're thinking of? If so, this leads to the trade-off mentioned in the description - we save some constant space, but lose some load folds.

diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp
index 655147076a4..dc61867b418 100644
--- a/llvm/lib/Target/X86/X86ISelLowering.cpp
+++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
@@ -35788,6 +35788,26 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG &DAG,
       }
     }

+    // Load a scalar integer constant directly to XMM instead of transferring an
+    // immediate value from GPR.
+    // TODO: Would it be better to load a 128-bit vector constant instead?
+    // vzext_movl (scalar_to_vector C) --> vzext_load &C
+    if (N0.getOpcode() == ISD::SCALAR_TO_VECTOR) {
+      if (auto *C = dyn_cast<ConstantSDNode>(N0.getOperand(0))) {
+        MVT PVT = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout());
+        SDValue CP = DAG.getConstantPool(C->getConstantIntValue(), PVT);
+        Align Alignment = cast<ConstantPoolSDNode>(CP)->getAlign();
+        EVT ScalarVT = N0.getOperand(0).getValueType();
+        SDVTList MemTys = DAG.getVTList(VT, MVT::Other);
+        SDValue MemOps[] = {DAG.getEntryNode(), CP};
+        MachinePointerInfo MPI =
+            MachinePointerInfo::getConstantPool(DAG.getMachineFunction());
+        return DAG.getMemIntrinsicNode(X86ISD::VZEXT_LOAD, DL, MemTys, MemOps,
+                                       ScalarVT, MPI, Alignment,
+                                       MachineMemOperand::MOLoad);
+      }
+    }
+
     return SDValue();
   }
   case X86ISD::BLENDI: {

spatel mentioned this in D80223: [x86] favor vector constant load to avoid GPR to XMM transfer, part 1.5.May 19 2020, 10:33 AM

Patch updated:
This is rebased on top of D80223, so we can see incremental diffs like enabling more load folding.

spatel added a parent revision: D80223: [x86] favor vector constant load to avoid GPR to XMM transfer, part 1.5.May 19 2020, 10:40 AM

Ideally we'd have a way to fold vzext_load ops inside X86InstrInfo::foldMemoryOperandCustom (by zero padding the constant pool entry where necessary) but I'm not certain how easy that is.

So we probably want to go with this variant (sorry for the wild goose chase).

@craig.topper any thoughts?

In D80131#2049497, @RKSimon wrote:

Ideally we'd have a way to fold vzext_load ops inside X86InstrInfo::foldMemoryOperandCustom (by zero padding the constant pool entry where necessary) but I'm not certain how easy that is.

So we probably want to go with this variant (sorry for the wild goose chase).

@craig.topper any thoughts?

I think the caller of foldMemoryOperandImpl is responsible for copying the memoperand over to the new instruction. So changing the memory reference out from under it will break that at the very least. We'd also be deferring our usual load folding to the peephole pass which isn't as quite strong as SelectionDAG I think.

If the load is in a loop we potentialy unfold it in MachineLICM and hoist it out of the loop. So maybe what we really want is a later constant pool shrinking pass that runs after Machine LICM. We have a similar issue with broadcasts from constant pool don't we? Lowing of build_vector favors forming broadcasts of constants without knowing if we can fold.

LGTM - let's go with this approach. I've raised PR46048 to hopefully help us with reducing the size of constant pool loads when useful.

This revision is now accepted and ready to land.May 23 2020, 7:53 AM

Closed by commit rGfa038e03504c: [x86] favor vector constant load to avoid GPR to XMM transfer, part 2 (authored by spatel). · Explain WhyMay 25 2020, 5:20 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D79886: [DAGCombiner] try to move splat after binop with splat constant.May 25 2020, 5:26 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelLowering.cpp

33 lines

test/

CodeGen/

X86/

10 lines

6 lines

40 lines

3 lines

insert-into-constant-vector.ll

48 lines

packss.ll

7 lines

pshufb-mask-comments.ll

2 lines

ret-mmx.ll

2 lines

sad.ll

65 lines

srem-seteq-vec-nonsplat.ll

6 lines

vec_set-A.ll

2 lines

vec_shift2.ll

4 lines

vector-lzcnt-128.ll

12 lines

vector-shuffle-256-v16.ll

3 lines

vector-shuffle-256-v32.ll

16 lines

vector-shuffle-256-v8.ll

10 lines

vector-shuffle-512-v32.ll

10 lines

vector-shuffle-512-v64.ll

10 lines

vector-shuffle-512-v8.ll

10 lines

vector-shuffle-combining-avx512f.ll

10 lines

vector-shuffle-combining-xop.ll

29 lines

vector-shuffle-v1.ll

12 lines

vector-tzcnt-128.ll

24 lines

Diff 266008

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,203 Lines • ▼ Show 20 Lines	if (NumNonZero == 1) {
// If we have a constant or non-constant insertion into the low element of		// If we have a constant or non-constant insertion into the low element of
// a vector, we can do this with SCALAR_TO_VECTOR + shuffle of zero into		// a vector, we can do this with SCALAR_TO_VECTOR + shuffle of zero into
// the rest of the elements. This will be matched as movd/movq/movss/movsd		// the rest of the elements. This will be matched as movd/movq/movss/movsd
// depending on what the source datatype is.		// depending on what the source datatype is.
if (Idx == 0) {		if (Idx == 0) {
if (NumZero == 0)		if (NumZero == 0)
return DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Item);		return DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Item);

// Just load a vector integer constant. Loading is better for code size,
// avoids move GPR immediate --> XMM, and reduces register pressure.
if (IsAllConstants && VT.isInteger()) {
// TODO: Remove -1 restriction with demanded elements improvement?
// TODO: Insert 128-bit load into wider undef vector?
if (VT.is128BitVector() && !isAllOnesConstant(Item))
return SDValue();
}

if (EltVT == MVT::i32 \|\| EltVT == MVT::f32 \|\| EltVT == MVT::f64 \|\|		if (EltVT == MVT::i32 \|\| EltVT == MVT::f32 \|\| EltVT == MVT::f64 \|\|
(EltVT == MVT::i64 && Subtarget.is64Bit())) {		(EltVT == MVT::i64 && Subtarget.is64Bit())) {
assert((VT.is128BitVector() \|\| VT.is256BitVector() \|\|		assert((VT.is128BitVector() \|\| VT.is256BitVector() \|\|
VT.is512BitVector()) &&		VT.is512BitVector()) &&
"Expected an SSE value type!");		"Expected an SSE value type!");
Item = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Item);		Item = DAG.getNode(ISD::SCALAR_TO_VECTOR, dl, VT, Item);
// Turn it into a MOVL (i.e. movss, movsd, or movd) to a zero vector.		// Turn it into a MOVL (i.e. movss, movsd, or movd) to a zero vector.
return getShuffleVectorZeroOrUndef(Item, 0, true, Subtarget, DAG);		return getShuffleVectorZeroOrUndef(Item, 0, true, Subtarget, DAG);
▲ Show 20 Lines • Show All 25,574 Lines • ▼ Show 20 Lines	case X86ISD::VBROADCAST: {
return SDValue();		return SDValue();
}		}
case X86ISD::VZEXT_MOVL: {		case X86ISD::VZEXT_MOVL: {
SDValue N0 = N.getOperand(0);		SDValue N0 = N.getOperand(0);

// If this a vzmovl of a full vector load, replace it with a vzload, unless		// If this a vzmovl of a full vector load, replace it with a vzload, unless
// the load is volatile.		// the load is volatile.
if (N0.hasOneUse() && ISD::isNormalLoad(N0.getNode())) {		if (N0.hasOneUse() && ISD::isNormalLoad(N0.getNode())) {
auto *LN = cast<LoadSDNode>(N0);		auto *LN = cast<LoadSDNode>(N0);
		RKSimonUnsubmitted Not Done Reply Inline Actions Would creating X86ISD::VZEXT_LOAD be better? We'd need to zext i8/i16 cases to i32 constant pool entries but we'd avoid the upper zero constants. RKSimon: Would creating X86ISD::VZEXT_LOAD be better? We'd need to zext i8/i16 cases to i32 constant…
		spatelAuthorUnsubmitted Done Reply Inline Actions My 1st draft of this patch used VZEXT_LOAD, and I think it worked without having to explicitly zext here. Does this match what you're thinking of? If so, this leads to the trade-off mentioned in the description - we save some constant space, but lose some load folds. diff --git a/llvm/lib/Target/X86/X86ISelLowering.cpp b/llvm/lib/Target/X86/X86ISelLowering.cpp index 655147076a4..dc61867b418 100644 --- a/llvm/lib/Target/X86/X86ISelLowering.cpp +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp @@ -35788,6 +35788,26 @@ static SDValue combineTargetShuffle(SDValue N, SelectionDAG &DAG, } } + // Load a scalar integer constant directly to XMM instead of transferring an + // immediate value from GPR. + // TODO: Would it be better to load a 128-bit vector constant instead? + // vzext_movl (scalar_to_vector C) --> vzext_load &C + if (N0.getOpcode() == ISD::SCALAR_TO_VECTOR) { + if (auto C = dyn_cast<ConstantSDNode>(N0.getOperand(0))) { + MVT PVT = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout()); + SDValue CP = DAG.getConstantPool(C->getConstantIntValue(), PVT); + Align Alignment = cast<ConstantPoolSDNode>(CP)->getAlign(); + EVT ScalarVT = N0.getOperand(0).getValueType(); + SDVTList MemTys = DAG.getVTList(VT, MVT::Other); + SDValue MemOps[] = {DAG.getEntryNode(), CP}; + MachinePointerInfo MPI = + MachinePointerInfo::getConstantPool(DAG.getMachineFunction()); + return DAG.getMemIntrinsicNode(X86ISD::VZEXT_LOAD, DL, MemTys, MemOps, + ScalarVT, MPI, Alignment, + MachineMemOperand::MOLoad); + } + } + return SDValue(); } case X86ISD::BLENDI: { spatel:* My 1st draft of this patch used VZEXT_LOAD, and I think it worked without having to explicitly…
if (LN->isSimple()) {		if (LN->isSimple()) {
SDVTList Tys = DAG.getVTList(VT, MVT::Other);		SDVTList Tys = DAG.getVTList(VT, MVT::Other);
SDValue Ops[] = {LN->getChain(), LN->getBasePtr()};		SDValue Ops[] = {LN->getChain(), LN->getBasePtr()};
SDValue VZLoad = DAG.getMemIntrinsicNode(		SDValue VZLoad = DAG.getMemIntrinsicNode(
X86ISD::VZEXT_LOAD, DL, Tys, Ops, VT.getVectorElementType(),		X86ISD::VZEXT_LOAD, DL, Tys, Ops, VT.getVectorElementType(),
LN->getPointerInfo(), LN->getAlign(),		LN->getPointerInfo(), LN->getAlign(),
LN->getMemOperand()->getFlags());		LN->getMemOperand()->getFlags());
DCI.CombineTo(N.getNode(), VZLoad);		DCI.CombineTo(N.getNode(), VZLoad);
Show All 33 Lines	if (N0.hasOneUse() && N0.getOpcode() == ISD::SCALAR_TO_VECTOR &&
SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, In);		SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, MVT::i32, In);
MVT VecVT = MVT::getVectorVT(MVT::i32, VT.getVectorNumElements() * 2);		MVT VecVT = MVT::getVectorVT(MVT::i32, VT.getVectorNumElements() * 2);
SDValue SclVec = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VecVT, Trunc);		SDValue SclVec = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, VecVT, Trunc);
SDValue Movl = DAG.getNode(X86ISD::VZEXT_MOVL, DL, VecVT, SclVec);		SDValue Movl = DAG.getNode(X86ISD::VZEXT_MOVL, DL, VecVT, SclVec);
return DAG.getBitcast(VT, Movl);		return DAG.getBitcast(VT, Movl);
}		}
}		}

		// Load a scalar integer constant directly to XMM instead of transferring an
		// immediate value from GPR.
		// vzext_movl (scalar_to_vector C) --> load [C,0...]
		if (N0.getOpcode() == ISD::SCALAR_TO_VECTOR) {
		if (auto *C = dyn_cast<ConstantSDNode>(N0.getOperand(0))) {
		// Create a vector constant - scalar constant followed by zeros.
		EVT ScalarVT = N0.getOperand(0).getValueType();
		Type ScalarTy = ScalarVT.getTypeForEVT(DAG.getContext());
		unsigned NumElts = VT.getVectorNumElements();
		Constant *Zero = ConstantInt::getNullValue(ScalarTy);
		SmallVector<Constant *, 32> ConstantVec(NumElts, Zero);
		ConstantVec[0] = const_cast<ConstantInt *>(C->getConstantIntValue());

		// Load the vector constant from constant pool.
		MVT PVT = DAG.getTargetLoweringInfo().getPointerTy(DAG.getDataLayout());
		SDValue CP = DAG.getConstantPool(ConstantVector::get(ConstantVec), PVT);
		MachinePointerInfo MPI =
		MachinePointerInfo::getConstantPool(DAG.getMachineFunction());
		Align Alignment = cast<ConstantPoolSDNode>(CP)->getAlign();
		return DAG.getLoad(VT, DL, DAG.getEntryNode(), CP, MPI, Alignment,
		MachineMemOperand::MOLoad);
		}
		}

return SDValue();		return SDValue();
}		}
case X86ISD::BLENDI: {		case X86ISD::BLENDI: {
SDValue N0 = N.getOperand(0);		SDValue N0 = N.getOperand(0);
SDValue N1 = N.getOperand(1);		SDValue N1 = N.getOperand(1);

// blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) to narrower types.		// blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) to narrower types.
// TODO: Handle MVT::v16i16 repeated blend mask.		// TODO: Handle MVT::v16i16 repeated blend mask.
▲ Show 20 Lines • Show All 13,215 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx-load-store.ll

	Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne .LBB9_2			; CHECK-NEXT: jne .LBB9_2
	; CHECK-NEXT: # %bb.1: # %cif_mask_all			; CHECK-NEXT: # %bb.1: # %cif_mask_all
	; CHECK-NEXT: .LBB9_2: # %cif_mask_mixed			; CHECK-NEXT: .LBB9_2: # %cif_mask_mixed
	; CHECK-NEXT: xorl %eax, %eax			; CHECK-NEXT: xorl %eax, %eax
	; CHECK-NEXT: testb %al, %al			; CHECK-NEXT: testb %al, %al
	; CHECK-NEXT: jne .LBB9_4			; CHECK-NEXT: jne .LBB9_4
	; CHECK-NEXT: # %bb.3: # %cif_mixed_test_all			; CHECK-NEXT: # %bb.3: # %cif_mixed_test_all
	; CHECK-NEXT: movl $-1, %eax			; CHECK-NEXT: vmovaps {{.*#+}} xmm0 = [4294967295,0,0,0]
	; CHECK-NEXT: vmovd %eax, %xmm0
	; CHECK-NEXT: vmaskmovps %ymm0, %ymm0, (%rax)			; CHECK-NEXT: vmaskmovps %ymm0, %ymm0, (%rax)
	; CHECK-NEXT: .LBB9_4: # %cif_mixed_test_any_check			; CHECK-NEXT: .LBB9_4: # %cif_mixed_test_any_check
	;			;
	; CHECK_O0-LABEL: f_f:			; CHECK_O0-LABEL: f_f:
	; CHECK_O0: # %bb.0: # %allocas			; CHECK_O0: # %bb.0: # %allocas
	; CHECK_O0-NEXT: # implicit-def: $al			; CHECK_O0-NEXT: # implicit-def: $al
	; CHECK_O0-NEXT: testb $1, %al			; CHECK_O0-NEXT: testb $1, %al
	; CHECK_O0-NEXT: jne .LBB9_1			; CHECK_O0-NEXT: jne .LBB9_1
	; CHECK_O0-NEXT: jmp .LBB9_2			; CHECK_O0-NEXT: jmp .LBB9_2
	; CHECK_O0-NEXT: .LBB9_1: # %cif_mask_all			; CHECK_O0-NEXT: .LBB9_1: # %cif_mask_all
	; CHECK_O0-NEXT: .LBB9_2: # %cif_mask_mixed			; CHECK_O0-NEXT: .LBB9_2: # %cif_mask_mixed
	; CHECK_O0-NEXT: # implicit-def: $al			; CHECK_O0-NEXT: # implicit-def: $al
	; CHECK_O0-NEXT: testb $1, %al			; CHECK_O0-NEXT: testb $1, %al
	; CHECK_O0-NEXT: jne .LBB9_3			; CHECK_O0-NEXT: jne .LBB9_3
	; CHECK_O0-NEXT: jmp .LBB9_4			; CHECK_O0-NEXT: jmp .LBB9_4
	; CHECK_O0-NEXT: .LBB9_3: # %cif_mixed_test_all			; CHECK_O0-NEXT: .LBB9_3: # %cif_mixed_test_all
	; CHECK_O0-NEXT: movl $-1, %eax			; CHECK_O0-NEXT: vmovdqa {{.*#+}} xmm0 = [4294967295,0,0,0]
	; CHECK_O0-NEXT: vmovd %eax, %xmm0
	; CHECK_O0-NEXT: vmovdqa %xmm0, %xmm0			; CHECK_O0-NEXT: vmovdqa %xmm0, %xmm0
	; CHECK_O0-NEXT: vmovaps %xmm0, %xmm1			; CHECK_O0-NEXT: vmovaps %xmm0, %xmm1
	; CHECK_O0-NEXT: # implicit-def: $rcx			; CHECK_O0-NEXT: # implicit-def: $rax
	; CHECK_O0-NEXT: # implicit-def: $ymm2			; CHECK_O0-NEXT: # implicit-def: $ymm2
	; CHECK_O0-NEXT: vmaskmovps %ymm2, %ymm1, (%rcx)			; CHECK_O0-NEXT: vmaskmovps %ymm2, %ymm1, (%rax)
	; CHECK_O0-NEXT: .LBB9_4: # %cif_mixed_test_any_check			; CHECK_O0-NEXT: .LBB9_4: # %cif_mixed_test_any_check
	allocas:			allocas:
	br i1 undef, label %cif_mask_all, label %cif_mask_mixed			br i1 undef, label %cif_mask_all, label %cif_mask_mixed

	cif_mask_all:			cif_mask_all:
	unreachable			unreachable

	cif_mask_mixed:			cif_mask_mixed:
	▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx2-arith.ll

	Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%y = mul <8 x i16> %x, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>			%y = mul <8 x i16> %x, <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>
	ret <8 x i16> %y			ret <8 x i16> %y
	}			}

	define <8 x i32> @mul_const9(<8 x i32> %x) {			define <8 x i32> @mul_const9(<8 x i32> %x) {
	; X32-LABEL: mul_const9:			; X32-LABEL: mul_const9:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movl $2, %eax			; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [2,0,0,0]
	; X32-NEXT: vmovd %eax, %xmm1
	; X32-NEXT: vpmulld %ymm1, %ymm0, %ymm0			; X32-NEXT: vpmulld %ymm1, %ymm0, %ymm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: mul_const9:			; X64-LABEL: mul_const9:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movl $2, %eax			; X64-NEXT: vmovdqa {{.*#+}} xmm1 = [2,0,0,0]
	; X64-NEXT: vmovd %eax, %xmm1
	; X64-NEXT: vpmulld %ymm1, %ymm0, %ymm0			; X64-NEXT: vpmulld %ymm1, %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%y = mul <8 x i32> %x, <i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%y = mul <8 x i32> %x, <i32 2, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <8 x i32> %y			ret <8 x i32> %y
	}			}

	; %x * 0x01010101			; %x * 0x01010101
	define <4 x i32> @mul_const10(<4 x i32> %x) {			define <4 x i32> @mul_const10(<4 x i32> %x) {
	Show All 31 Lines

llvm/test/CodeGen/X86/combine-udiv.ll

	Show First 20 Lines • Show All 584 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vpsrlw $1, %xmm0, %xmm1			; AVX-NEXT: vpsrlw $1, %xmm0, %xmm1
	; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3,4,5,6,7]			; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3,4,5,6,7]
	; AVX-NEXT: vpmulhuw {{.*}}(%rip), %xmm0, %xmm0			; AVX-NEXT: vpmulhuw {{.*}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: vpmulhuw {{.*}}(%rip), %xmm0, %xmm0			; AVX-NEXT: vpmulhuw {{.*}}(%rip), %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; XOP-LABEL: combine_vec_udiv_nonuniform2:			; XOP-LABEL: combine_vec_udiv_nonuniform2:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: movl $65535, %eax # imm = 0xFFFF			; XOP-NEXT: vpshlw {{.*}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: vmovd %eax, %xmm1
	; XOP-NEXT: vpshlw %xmm1, %xmm0, %xmm0
	; XOP-NEXT: vpmulhuw {{.*}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpmulhuw {{.*}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: vpshlw {{.*}}(%rip), %xmm0, %xmm0			; XOP-NEXT: vpshlw {{.*}}(%rip), %xmm0, %xmm0
	; XOP-NEXT: retq			; XOP-NEXT: retq
	%1 = udiv <8 x i16> %x, <i16 -34, i16 35, i16 36, i16 -37, i16 38, i16 -39, i16 40, i16 -41>			%1 = udiv <8 x i16> %x, <i16 -34, i16 35, i16 36, i16 -37, i16 38, i16 -39, i16 40, i16 -41>
	ret <8 x i16> %1			ret <8 x i16> %1
	}			}

	define <8 x i16> @combine_vec_udiv_nonuniform3(<8 x i16> %x) {			define <8 x i16> @combine_vec_udiv_nonuniform3(<8 x i16> %x) {
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; SSE41-NEXT: packuswb %xmm2, %xmm2			; SSE41-NEXT: packuswb %xmm2, %xmm2
	; SSE41-NEXT: psrlw $7, %xmm2			; SSE41-NEXT: psrlw $7, %xmm2
	; SSE41-NEXT: pand {{.*}}(%rip), %xmm2			; SSE41-NEXT: pand {{.*}}(%rip), %xmm2
	; SSE41-NEXT: movaps {{.*#+}} xmm0 = [0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; SSE41-NEXT: movaps {{.*#+}} xmm0 = [0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2			; SSE41-NEXT: pblendvb %xmm0, %xmm1, %xmm2
	; SSE41-NEXT: movdqa %xmm2, %xmm0			; SSE41-NEXT: movdqa %xmm2, %xmm0
	; SSE41-NEXT: retq			; SSE41-NEXT: retq
	;			;
	; AVX1-LABEL: combine_vec_udiv_nonuniform4:			; AVX-LABEL: combine_vec_udiv_nonuniform4:
	; AVX1: # %bb.0:			; AVX: # %bb.0:
	; AVX1-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; AVX-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX1-NEXT: vpmullw {{.*}}(%rip), %xmm1, %xmm1			; AVX-NEXT: vpmullw {{.*}}(%rip), %xmm1, %xmm1
	; AVX1-NEXT: vpsrlw $8, %xmm1, %xmm1			; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1
	; AVX1-NEXT: vpackuswb %xmm1, %xmm1, %xmm1			; AVX-NEXT: vpackuswb %xmm1, %xmm1, %xmm1
	; AVX1-NEXT: vpsrlw $7, %xmm1, %xmm1			; AVX-NEXT: vpsrlw $7, %xmm1, %xmm1
	; AVX1-NEXT: vpand {{.*}}(%rip), %xmm1, %xmm1			; AVX-NEXT: vpand {{.*}}(%rip), %xmm1, %xmm1
	; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]			; AVX-NEXT: vmovdqa {{.*#+}} xmm2 = [0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX1-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0			; AVX-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0
	; AVX1-NEXT: retq			; AVX-NEXT: retq
	;
	; AVX2-LABEL: combine_vec_udiv_nonuniform4:
	; AVX2: # %bb.0:
	; AVX2-NEXT: movl $171, %eax
	; AVX2-NEXT: vmovd %eax, %xmm1
	; AVX2-NEXT: vpmovzxbw {{.*#+}} xmm2 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; AVX2-NEXT: vpmullw %xmm1, %xmm2, %xmm1
	; AVX2-NEXT: vpsrlw $8, %xmm1, %xmm1
	; AVX2-NEXT: vpackuswb %xmm1, %xmm1, %xmm1
	; AVX2-NEXT: vpsrlw $7, %xmm1, %xmm1
	; AVX2-NEXT: vpand {{.*}}(%rip), %xmm1, %xmm1
	; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = [0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
	; AVX2-NEXT: vpblendvb %xmm2, %xmm0, %xmm1, %xmm0
	; AVX2-NEXT: retq
	;			;
	; XOP-LABEL: combine_vec_udiv_nonuniform4:			; XOP-LABEL: combine_vec_udiv_nonuniform4:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero			; XOP-NEXT: vpmovzxbw {{.*#+}} xmm1 = xmm0[0],zero,xmm0[1],zero,xmm0[2],zero,xmm0[3],zero,xmm0[4],zero,xmm0[5],zero,xmm0[6],zero,xmm0[7],zero
	; XOP-NEXT: vpmullw {{.*}}(%rip), %xmm1, %xmm1			; XOP-NEXT: vpmullw {{.*}}(%rip), %xmm1, %xmm1
	; XOP-NEXT: vpxor %xmm2, %xmm2, %xmm2			; XOP-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; XOP-NEXT: vpperm {{.*#+}} xmm1 = xmm1[1,3,5,7,9,11,13,15],xmm2[1,3,5,7,9,11,13,15]			; XOP-NEXT: vpperm {{.*#+}} xmm1 = xmm1[1,3,5,7,9,11,13,15],xmm2[1,3,5,7,9,11,13,15]
	; XOP-NEXT: movl $249, %eax			; XOP-NEXT: movl $249, %eax
	▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/fcmp-constant.ll

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%1 = fcmp ueq <2 x double> <double 0x3FF0000000000000, double 0xFFEFFFFFFFFFFFFF>, undef		%1 = fcmp ueq <2 x double> <double 0x3FF0000000000000, double 0xFFEFFFFFFFFFFFFF>, undef
%2 = sext <2 x i1> %1 to <2 x i64>		%2 = sext <2 x i1> %1 to <2 x i64>
ret <2 x i64> %2		ret <2 x i64> %2
}		}

define <2 x i64> @fcmp_ueq_v2f64_undef_elt() {		define <2 x i64> @fcmp_ueq_v2f64_undef_elt() {
; CHECK-LABEL: fcmp_ueq_v2f64_undef_elt:		; CHECK-LABEL: fcmp_ueq_v2f64_undef_elt:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movq $-1, %rax		; CHECK-NEXT: movaps {{.*#+}} xmm0 = [18446744073709551615,0]
; CHECK-NEXT: movq %rax, %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%1 = fcmp ueq <2 x double> <double 0x3FF0000000000000, double 0xFFEFFFFFFFFFFFFF>, <double undef, double 0x3FF0000000000000>		%1 = fcmp ueq <2 x double> <double 0x3FF0000000000000, double 0xFFEFFFFFFFFFFFFF>, <double undef, double 0x3FF0000000000000>
%2 = sext <2 x i1> %1 to <2 x i64>		%2 = sext <2 x i1> %1 to <2 x i64>
ret <2 x i64> %2		ret <2 x i64> %2
}		}

define <4 x i32> @fcmp_ueq_v4f32() {		define <4 x i32> @fcmp_ueq_v4f32() {
; CHECK-LABEL: fcmp_ueq_v4f32:		; CHECK-LABEL: fcmp_ueq_v4f32:
▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/insert-into-constant-vector.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	; X64AVX-NEXT: retq			; X64AVX-NEXT: retq
	%ins = insertelement <4 x i32> <i32 42, i32 1, i32 2, i32 3>, i32 %x, i32 3			%ins = insertelement <4 x i32> <i32 42, i32 1, i32 2, i32 3>, i32 %x, i32 3
	ret <4 x i32> %ins			ret <4 x i32> %ins
	}			}

	define <2 x i64> @elt0_v2i64(i64 %x) {			define <2 x i64> @elt0_v2i64(i64 %x) {
	; X32SSE-LABEL: elt0_v2i64:			; X32SSE-LABEL: elt0_v2i64:
	; X32SSE: # %bb.0:			; X32SSE: # %bb.0:
	; X32SSE-NEXT: movl $1, %eax			; X32SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32SSE-NEXT: movd %eax, %xmm1			; X32SSE-NEXT: unpcklpd {{.*#+}} xmm0 = xmm0[0],mem[0]
	; X32SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
	; X32SSE-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; X32SSE-NEXT: retl			; X32SSE-NEXT: retl
	;			;
	; X64SSE2-LABEL: elt0_v2i64:			; X64SSE2-LABEL: elt0_v2i64:
	; X64SSE2: # %bb.0:			; X64SSE2: # %bb.0:
	; X64SSE2-NEXT: movq %rdi, %xmm1			; X64SSE2-NEXT: movq %rdi, %xmm1
	; X64SSE2-NEXT: movapd {{.*#+}} xmm0 = <u,1>			; X64SSE2-NEXT: movapd {{.*#+}} xmm0 = <u,1>
	; X64SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]			; X64SSE2-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; X64SSE2-NEXT: retq			; X64SSE2-NEXT: retq
	;			;
	; X64SSE4-LABEL: elt0_v2i64:			; X64SSE4-LABEL: elt0_v2i64:
	; X64SSE4: # %bb.0:			; X64SSE4: # %bb.0:
	; X64SSE4-NEXT: movdqa {{.*#+}} xmm0 = <u,1>			; X64SSE4-NEXT: movdqa {{.*#+}} xmm0 = <u,1>
	; X64SSE4-NEXT: pinsrq $0, %rdi, %xmm0			; X64SSE4-NEXT: pinsrq $0, %rdi, %xmm0
	; X64SSE4-NEXT: retq			; X64SSE4-NEXT: retq
	;			;
	; X32AVX-LABEL: elt0_v2i64:			; X32AVX-LABEL: elt0_v2i64:
	; X32AVX: # %bb.0:			; X32AVX: # %bb.0:
	; X32AVX-NEXT: movl $1, %eax			; X32AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; X32AVX-NEXT: vmovd %eax, %xmm0			; X32AVX-NEXT: vunpcklpd {{.*#+}} xmm0 = xmm0[0],mem[0]
	; X32AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
	; X32AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
	; X32AVX-NEXT: retl			; X32AVX-NEXT: retl
	;			;
	; X64AVX-LABEL: elt0_v2i64:			; X64AVX-LABEL: elt0_v2i64:
	; X64AVX: # %bb.0:			; X64AVX: # %bb.0:
	; X64AVX-NEXT: vmovdqa {{.*#+}} xmm0 = <u,1>			; X64AVX-NEXT: vmovdqa {{.*#+}} xmm0 = <u,1>
	; X64AVX-NEXT: vpinsrq $0, %rdi, %xmm0, %xmm0			; X64AVX-NEXT: vpinsrq $0, %rdi, %xmm0, %xmm0
	; X64AVX-NEXT: retq			; X64AVX-NEXT: retq
	%ins = insertelement <2 x i64> <i64 42, i64 1>, i64 %x, i32 0			%ins = insertelement <2 x i64> <i64 42, i64 1>, i64 %x, i32 0
	▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
	; X64AVX512F-NEXT: retq			; X64AVX512F-NEXT: retq
	%ins = insertelement <8 x float> <float 42.0, float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0>, float %x, i32 6			%ins = insertelement <8 x float> <float 42.0, float 1.0, float 2.0, float 3.0, float 4.0, float 5.0, float 6.0, float 7.0>, float %x, i32 6
	ret <8 x float> %ins			ret <8 x float> %ins
	}			}

	define <8 x i64> @elt5_v8i64(i64 %x) {			define <8 x i64> @elt5_v8i64(i64 %x) {
	; X32SSE-LABEL: elt5_v8i64:			; X32SSE-LABEL: elt5_v8i64:
	; X32SSE: # %bb.0:			; X32SSE: # %bb.0:
	; X32SSE-NEXT: movl $4, %eax			; X32SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; X32SSE-NEXT: movd %eax, %xmm2			; X32SSE-NEXT: movaps {{.*#+}} xmm2 = [4,0,0,0]
	; X32SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero			; X32SSE-NEXT: movlhps {{.*#+}} xmm2 = xmm2[0],xmm0[0]
	; X32SSE-NEXT: punpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm0[0]
	; X32SSE-NEXT: movaps {{.*#+}} xmm0 = [42,0,1,0]			; X32SSE-NEXT: movaps {{.*#+}} xmm0 = [42,0,1,0]
	; X32SSE-NEXT: movaps {{.*#+}} xmm1 = [2,0,3,0]			; X32SSE-NEXT: movaps {{.*#+}} xmm1 = [2,0,3,0]
	; X32SSE-NEXT: movaps {{.*#+}} xmm3 = [6,0,7,0]			; X32SSE-NEXT: movaps {{.*#+}} xmm3 = [6,0,7,0]
	; X32SSE-NEXT: retl			; X32SSE-NEXT: retl
	;			;
	; X64SSE2-LABEL: elt5_v8i64:			; X64SSE2-LABEL: elt5_v8i64:
	; X64SSE2: # %bb.0:			; X64SSE2: # %bb.0:
	; X64SSE2-NEXT: movq %rdi, %xmm0			; X64SSE2-NEXT: movq %rdi, %xmm0
	Show All 10 Lines
	; X64SSE4-NEXT: pinsrq $1, %rdi, %xmm2			; X64SSE4-NEXT: pinsrq $1, %rdi, %xmm2
	; X64SSE4-NEXT: movaps {{.*#+}} xmm0 = [42,1]			; X64SSE4-NEXT: movaps {{.*#+}} xmm0 = [42,1]
	; X64SSE4-NEXT: movaps {{.*#+}} xmm1 = [2,3]			; X64SSE4-NEXT: movaps {{.*#+}} xmm1 = [2,3]
	; X64SSE4-NEXT: movaps {{.*#+}} xmm3 = [6,7]			; X64SSE4-NEXT: movaps {{.*#+}} xmm3 = [6,7]
	; X64SSE4-NEXT: retq			; X64SSE4-NEXT: retq
	;			;
	; X32AVX1-LABEL: elt5_v8i64:			; X32AVX1-LABEL: elt5_v8i64:
	; X32AVX1: # %bb.0:			; X32AVX1: # %bb.0:
	; X32AVX1-NEXT: movl $4, %eax			; X32AVX1-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; X32AVX1-NEXT: vmovd %eax, %xmm0			; X32AVX1-NEXT: vmovaps {{.*#+}} xmm1 = [4,0,0,0]
	; X32AVX1-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero			; X32AVX1-NEXT: vmovlhps {{.*#+}} xmm0 = xmm1[0],xmm0[0]
	; X32AVX1-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
	; X32AVX1-NEXT: vinsertf128 $1, {{\.LCPI.*}}, %ymm0, %ymm1			; X32AVX1-NEXT: vinsertf128 $1, {{\.LCPI.*}}, %ymm0, %ymm1
	; X32AVX1-NEXT: vmovaps {{.*#+}} ymm0 = [42,0,1,0,2,0,3,0]			; X32AVX1-NEXT: vmovaps {{.*#+}} ymm0 = [42,0,1,0,2,0,3,0]
	; X32AVX1-NEXT: retl			; X32AVX1-NEXT: retl
	;			;
	; X64AVX1-LABEL: elt5_v8i64:			; X64AVX1-LABEL: elt5_v8i64:
	; X64AVX1: # %bb.0:			; X64AVX1: # %bb.0:
	; X64AVX1-NEXT: vmovdqa {{.*#+}} xmm0 = <4,u,6,7>			; X64AVX1-NEXT: vmovdqa {{.*#+}} xmm0 = <4,u,6,7>
	; X64AVX1-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0			; X64AVX1-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0
	; X64AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],mem[4,5,6,7]			; X64AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm0[0,1,2,3],mem[4,5,6,7]
	; X64AVX1-NEXT: vmovaps {{.*#+}} ymm0 = [42,1,2,3]			; X64AVX1-NEXT: vmovaps {{.*#+}} ymm0 = [42,1,2,3]
	; X64AVX1-NEXT: retq			; X64AVX1-NEXT: retq
	;			;
	; X32AVX2-LABEL: elt5_v8i64:			; X32AVX2-LABEL: elt5_v8i64:
	; X32AVX2: # %bb.0:			; X32AVX2: # %bb.0:
	; X32AVX2-NEXT: movl $4, %eax			; X32AVX2-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; X32AVX2-NEXT: vmovd %eax, %xmm0			; X32AVX2-NEXT: vmovaps {{.*#+}} xmm1 = [4,0,0,0]
	; X32AVX2-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero			; X32AVX2-NEXT: vmovlhps {{.*#+}} xmm0 = xmm1[0],xmm0[0]
	; X32AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]			; X32AVX2-NEXT: vinsertf128 $1, {{\.LCPI.*}}, %ymm0, %ymm1
	; X32AVX2-NEXT: vinserti128 $1, {{\.LCPI.*}}, %ymm0, %ymm1
	; X32AVX2-NEXT: vmovaps {{.*#+}} ymm0 = [42,0,1,0,2,0,3,0]			; X32AVX2-NEXT: vmovaps {{.*#+}} ymm0 = [42,0,1,0,2,0,3,0]
	; X32AVX2-NEXT: retl			; X32AVX2-NEXT: retl
	;			;
	; X64AVX2-LABEL: elt5_v8i64:			; X64AVX2-LABEL: elt5_v8i64:
	; X64AVX2: # %bb.0:			; X64AVX2: # %bb.0:
	; X64AVX2-NEXT: vmovdqa {{.*#+}} xmm0 = <4,u,6,7>			; X64AVX2-NEXT: vmovdqa {{.*#+}} xmm0 = <4,u,6,7>
	; X64AVX2-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0			; X64AVX2-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0
	; X64AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm0[0,1,2,3],mem[4,5,6,7]			; X64AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm0[0,1,2,3],mem[4,5,6,7]
	; X64AVX2-NEXT: vmovaps {{.*#+}} ymm0 = [42,1,2,3]			; X64AVX2-NEXT: vmovaps {{.*#+}} ymm0 = [42,1,2,3]
	; X64AVX2-NEXT: retq			; X64AVX2-NEXT: retq
	;			;
	; X32AVX512F-LABEL: elt5_v8i64:			; X32AVX512F-LABEL: elt5_v8i64:
	; X32AVX512F: # %bb.0:			; X32AVX512F: # %bb.0:
	; X32AVX512F-NEXT: vmovdqa {{.*#+}} ymm0 = [42,0,1,0,2,0,3,0]			; X32AVX512F-NEXT: vmovaps {{.*#+}} ymm0 = [42,0,1,0,2,0,3,0]
	; X32AVX512F-NEXT: movl $4, %eax			; X32AVX512F-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; X32AVX512F-NEXT: vmovd %eax, %xmm1			; X32AVX512F-NEXT: vmovaps {{.*#+}} xmm2 = [4,0,0,0]
	; X32AVX512F-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero			; X32AVX512F-NEXT: vmovlhps {{.*#+}} xmm1 = xmm2[0],xmm1[0]
	; X32AVX512F-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]			; X32AVX512F-NEXT: vinsertf128 $1, {{\.LCPI.*}}, %ymm1, %ymm1
	; X32AVX512F-NEXT: vinserti128 $1, {{\.LCPI.*}}, %ymm1, %ymm1			; X32AVX512F-NEXT: vinsertf64x4 $1, %ymm1, %zmm0, %zmm0
	; X32AVX512F-NEXT: vinserti64x4 $1, %ymm1, %zmm0, %zmm0
	; X32AVX512F-NEXT: retl			; X32AVX512F-NEXT: retl
	;			;
	; X64AVX512F-LABEL: elt5_v8i64:			; X64AVX512F-LABEL: elt5_v8i64:
	; X64AVX512F: # %bb.0:			; X64AVX512F: # %bb.0:
	; X64AVX512F-NEXT: vmovq %rdi, %xmm1			; X64AVX512F-NEXT: vmovq %rdi, %xmm1
	; X64AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,8,6,7]			; X64AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm2 = [0,1,2,3,4,8,6,7]
	; X64AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm0 = <42,1,2,3,4,u,6,7>			; X64AVX512F-NEXT: vmovdqa64 {{.*#+}} zmm0 = <42,1,2,3,4,u,6,7>
	; X64AVX512F-NEXT: vpermt2q %zmm1, %zmm2, %zmm0			; X64AVX512F-NEXT: vpermt2q %zmm1, %zmm2, %zmm0
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/packss.ll

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	}			}

	define <8 x i16> @trunc_ashr_v4i64_demandedelts(<4 x i64> %a0) {			define <8 x i16> @trunc_ashr_v4i64_demandedelts(<4 x i64> %a0) {
	; X86-SSE-LABEL: trunc_ashr_v4i64_demandedelts:			; X86-SSE-LABEL: trunc_ashr_v4i64_demandedelts:
	; X86-SSE: # %bb.0:			; X86-SSE: # %bb.0:
	; X86-SSE-NEXT: psllq $63, %xmm1			; X86-SSE-NEXT: psllq $63, %xmm1
	; X86-SSE-NEXT: psllq $63, %xmm0			; X86-SSE-NEXT: psllq $63, %xmm0
	; X86-SSE-NEXT: psrlq $63, %xmm0			; X86-SSE-NEXT: psrlq $63, %xmm0
	; X86-SSE-NEXT: movdqa {{.*#+}} xmm2 = <1,0,u,u>			; X86-SSE-NEXT: movdqa {{.*#+}} xmm2 = [1,0,0,0]
	; X86-SSE-NEXT: pxor %xmm2, %xmm0			; X86-SSE-NEXT: pxor %xmm2, %xmm0
	; X86-SSE-NEXT: pcmpeqd %xmm3, %xmm3			; X86-SSE-NEXT: psubq %xmm2, %xmm0
	; X86-SSE-NEXT: paddq %xmm3, %xmm0
	; X86-SSE-NEXT: psrlq $63, %xmm1			; X86-SSE-NEXT: psrlq $63, %xmm1
	; X86-SSE-NEXT: pxor %xmm2, %xmm1			; X86-SSE-NEXT: pxor %xmm2, %xmm1
	; X86-SSE-NEXT: paddq %xmm3, %xmm1			; X86-SSE-NEXT: psubq %xmm2, %xmm1
	; X86-SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]			; X86-SSE-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,0,0]
	; X86-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]			; X86-SSE-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,0,0]
	; X86-SSE-NEXT: packssdw %xmm1, %xmm0			; X86-SSE-NEXT: packssdw %xmm1, %xmm0
	; X86-SSE-NEXT: retl			; X86-SSE-NEXT: retl
	;			;
	; X86-AVX1-LABEL: trunc_ashr_v4i64_demandedelts:			; X86-AVX1-LABEL: trunc_ashr_v4i64_demandedelts:
	; X86-AVX1: # %bb.0:			; X86-AVX1: # %bb.0:
	; X86-AVX1-NEXT: vpsllq $63, %xmm0, %xmm1			; X86-AVX1-NEXT: vpsllq $63, %xmm0, %xmm1
	▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/pshufb-mask-comments.ll

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%3 = bitcast <2 x i64> %2 to <16 x i8>		%3 = bitcast <2 x i64> %2 to <16 x i8>
%4 = tail call <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8> %V, <16 x i8> %3)		%4 = tail call <16 x i8> @llvm.x86.ssse3.pshuf.b.128(<16 x i8> %V, <16 x i8> %3)
ret <16 x i8> %4		ret <16 x i8> %4
}		}

define <16 x i8> @test5(<16 x i8> %V) {		define <16 x i8> @test5(<16 x i8> %V) {
; CHECK-LABEL: test5:		; CHECK-LABEL: test5:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: movaps {{.*#+}} xmm1 = [1,0]		; CHECK-NEXT: movaps {{.*#+}} xmm1 = [1,0,0,0]
; CHECK-NEXT: movaps %xmm1, (%rax)		; CHECK-NEXT: movaps %xmm1, (%rax)
; CHECK-NEXT: movaps {{.*#+}} xmm1 = [1,1]		; CHECK-NEXT: movaps {{.*#+}} xmm1 = [1,1]
; CHECK-NEXT: movaps %xmm1, (%rax)		; CHECK-NEXT: movaps %xmm1, (%rax)
; CHECK-NEXT: pshufb (%rax), %xmm0		; CHECK-NEXT: pshufb (%rax), %xmm0
; CHECK-NEXT: retq		; CHECK-NEXT: retq
store <2 x i64> <i64 1, i64 0>, <2 x i64>* undef, align 16		store <2 x i64> <i64 1, i64 0>, <2 x i64>* undef, align 16
%l = load <2 x i64>, <2 x i64>* undef, align 16		%l = load <2 x i64>, <2 x i64>* undef, align 16
%shuffle = shufflevector <2 x i64> %l, <2 x i64> undef, <2 x i32> zeroinitializer		%shuffle = shufflevector <2 x i64> %l, <2 x i64> undef, <2 x i32> zeroinitializer
Show All 24 Lines

llvm/test/CodeGen/X86/ret-mmx.ll

	Show All 26 Lines
	; CHECK-NEXT: movl $1, %eax			; CHECK-NEXT: movl $1, %eax
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	ret <1 x i64> <i64 1>			ret <1 x i64> <i64 1>
	}			}

	define <2 x i32> @t3() nounwind {			define <2 x i32> @t3() nounwind {
	; CHECK-LABEL: t3:			; CHECK-LABEL: t3:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movaps {{.*#+}} xmm0 = <1,0,u,u>			; CHECK-NEXT: movaps {{.*#+}} xmm0 = [1,0,0,0]
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	ret <2 x i32> <i32 1, i32 0>			ret <2 x i32> <i32 1, i32 0>
	}			}

	define double @t4() nounwind {			define double @t4() nounwind {
	; CHECK-LABEL: t4:			; CHECK-LABEL: t4:
	; CHECK: ## %bb.0:			; CHECK: ## %bb.0:
	; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; CHECK-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	ret double bitcast (<2 x i32> <i32 1, i32 0> to double)			ret double bitcast (<2 x i32> <i32 1, i32 0> to double)
	}			}

llvm/test/CodeGen/X86/sad.ll

Show First 20 Lines • Show All 538 Lines • ▼ Show 20 Lines	middle.block:
ret i32 %12		ret i32 %12
}		}

define i32 @sad_2i8() nounwind {		define i32 @sad_2i8() nounwind {
; SSE2-LABEL: sad_2i8:		; SSE2-LABEL: sad_2i8:
; SSE2: # %bb.0: # %entry		; SSE2: # %bb.0: # %entry
; SSE2-NEXT: pxor %xmm0, %xmm0		; SSE2-NEXT: pxor %xmm0, %xmm0
; SSE2-NEXT: movq $-1024, %rax # imm = 0xFC00		; SSE2-NEXT: movq $-1024, %rax # imm = 0xFC00
; SSE2-NEXT: movl $65535, %ecx # imm = 0xFFFF		; SSE2-NEXT: movdqa {{.*#+}} xmm1 = [65535,0,0,0]
; SSE2-NEXT: movd %ecx, %xmm1
; SSE2-NEXT: .p2align 4, 0x90		; SSE2-NEXT: .p2align 4, 0x90
; SSE2-NEXT: .LBB3_1: # %vector.body		; SSE2-NEXT: .LBB3_1: # %vector.body
; SSE2-NEXT: # =>This Inner Loop Header: Depth=1		; SSE2-NEXT: # =>This Inner Loop Header: Depth=1
; SSE2-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero		; SSE2-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
; SSE2-NEXT: movd {{.*#+}} xmm3 = mem[0],zero,zero,zero		; SSE2-NEXT: movd {{.*#+}} xmm3 = mem[0],zero,zero,zero
; SSE2-NEXT: pand %xmm1, %xmm2		; SSE2-NEXT: pand %xmm1, %xmm2
; SSE2-NEXT: pand %xmm1, %xmm3		; SSE2-NEXT: pand %xmm1, %xmm3
; SSE2-NEXT: psadbw %xmm2, %xmm3		; SSE2-NEXT: psadbw %xmm2, %xmm3
▲ Show 20 Lines • Show All 433 Lines • ▼ Show 20 Lines
; SSE2-NEXT: paddd {{.*}}(%rip), %xmm2		; SSE2-NEXT: paddd {{.*}}(%rip), %xmm2
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]
; SSE2-NEXT: paddd %xmm2, %xmm0		; SSE2-NEXT: paddd %xmm2, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
; SSE2-NEXT: paddd %xmm0, %xmm1		; SSE2-NEXT: paddd %xmm0, %xmm1
; SSE2-NEXT: movd %xmm1, %eax		; SSE2-NEXT: movd %xmm1, %eax
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; AVX1-LABEL: sad_unroll_nonzero_initial:		; AVX-LABEL: sad_unroll_nonzero_initial:
; AVX1: # %bb.0: # %bb		; AVX: # %bb.0: # %bb
; AVX1-NEXT: vmovdqu (%rdi), %xmm0		; AVX-NEXT: vmovdqu (%rdi), %xmm0
; AVX1-NEXT: vpsadbw (%rsi), %xmm0, %xmm0		; AVX-NEXT: vpsadbw (%rsi), %xmm0, %xmm0
; AVX1-NEXT: vmovdqu (%rdx), %xmm1		; AVX-NEXT: vmovdqu (%rdx), %xmm1
; AVX1-NEXT: vpsadbw (%rcx), %xmm1, %xmm1		; AVX-NEXT: vpsadbw (%rcx), %xmm1, %xmm1
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpaddd {{.*}}(%rip), %xmm0, %xmm0		; AVX-NEXT: vpaddd {{.*}}(%rip), %xmm0, %xmm0
; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]		; AVX-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]		; AVX-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; AVX-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vmovd %xmm0, %eax		; AVX-NEXT: vmovd %xmm0, %eax
; AVX1-NEXT: retq		; AVX-NEXT: retq
;
; AVX2-LABEL: sad_unroll_nonzero_initial:
; AVX2: # %bb.0: # %bb
; AVX2-NEXT: vmovdqu (%rdi), %xmm0
; AVX2-NEXT: vpsadbw (%rsi), %xmm0, %xmm0
; AVX2-NEXT: vmovdqu (%rdx), %xmm1
; AVX2-NEXT: vpsadbw (%rcx), %xmm1, %xmm1
; AVX2-NEXT: movl $1, %eax
; AVX2-NEXT: vmovd %eax, %xmm2
; AVX2-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX2-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
; AVX2-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX2-NEXT: vmovd %xmm0, %eax
; AVX2-NEXT: retq
;
; AVX512-LABEL: sad_unroll_nonzero_initial:
; AVX512: # %bb.0: # %bb
; AVX512-NEXT: vmovdqu (%rdi), %xmm0
; AVX512-NEXT: vpsadbw (%rsi), %xmm0, %xmm0
; AVX512-NEXT: vmovdqu (%rdx), %xmm1
; AVX512-NEXT: vpsadbw (%rcx), %xmm1, %xmm1
; AVX512-NEXT: movl $1, %eax
; AVX512-NEXT: vmovd %eax, %xmm2
; AVX512-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX512-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
; AVX512-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX512-NEXT: vmovd %xmm0, %eax
; AVX512-NEXT: retq
bb:		bb:
%tmp = load <16 x i8>, <16 x i8>* %arg, align 1		%tmp = load <16 x i8>, <16 x i8>* %arg, align 1
%tmp4 = load <16 x i8>, <16 x i8>* %arg1, align 1		%tmp4 = load <16 x i8>, <16 x i8>* %arg1, align 1
%tmp5 = zext <16 x i8> %tmp to <16 x i32>		%tmp5 = zext <16 x i8> %tmp to <16 x i32>
%tmp6 = zext <16 x i8> %tmp4 to <16 x i32>		%tmp6 = zext <16 x i8> %tmp4 to <16 x i32>
%tmp7 = sub nsw <16 x i32> %tmp5, %tmp6		%tmp7 = sub nsw <16 x i32> %tmp5, %tmp6
%tmp8 = icmp slt <16 x i32> %tmp7, zeroinitializer		%tmp8 = icmp slt <16 x i32> %tmp7, zeroinitializer
%tmp9 = sub nsw <16 x i32> zeroinitializer, %tmp7		%tmp9 = sub nsw <16 x i32> zeroinitializer, %tmp7
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll

	Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines
	; CHECK-SSE2-NEXT: pcmpeqd %xmm1, %xmm0			; CHECK-SSE2-NEXT: pcmpeqd %xmm1, %xmm0
	; CHECK-SSE2-NEXT: psrld $31, %xmm0			; CHECK-SSE2-NEXT: psrld $31, %xmm0
	; CHECK-SSE2-NEXT: retq			; CHECK-SSE2-NEXT: retq
	;			;
	; CHECK-SSE41-LABEL: test_srem_even_allones_eq:			; CHECK-SSE41-LABEL: test_srem_even_allones_eq:
	; CHECK-SSE41: # %bb.0:			; CHECK-SSE41: # %bb.0:
	; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]			; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]
	; CHECK-SSE41-NEXT: pmuldq {{.*}}(%rip), %xmm1			; CHECK-SSE41-NEXT: pmuldq {{.*}}(%rip), %xmm1
	; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm2 = <2454267027,u,0,u>			; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm2 = [2454267027,0,0,0]
	; CHECK-SSE41-NEXT: pmuldq %xmm0, %xmm2			; CHECK-SSE41-NEXT: pmuldq %xmm0, %xmm2
	; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]			; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
	; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm2[0,1],xmm1[2,3],xmm2[4,5],xmm1[6,7]			; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm2[0,1],xmm1[2,3],xmm2[4,5],xmm1[6,7]
	; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm1 = [1,1,4294967295,1]			; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm1 = [1,1,4294967295,1]
	; CHECK-SSE41-NEXT: pmulld %xmm0, %xmm1			; CHECK-SSE41-NEXT: pmulld %xmm0, %xmm1
	; CHECK-SSE41-NEXT: paddd %xmm2, %xmm1			; CHECK-SSE41-NEXT: paddd %xmm2, %xmm1
	; CHECK-SSE41-NEXT: movdqa %xmm1, %xmm2			; CHECK-SSE41-NEXT: movdqa %xmm1, %xmm2
	; CHECK-SSE41-NEXT: psrad $3, %xmm2			; CHECK-SSE41-NEXT: psrad $3, %xmm2
	▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines
	; CHECK-SSE2-NEXT: pcmpeqd %xmm1, %xmm0			; CHECK-SSE2-NEXT: pcmpeqd %xmm1, %xmm0
	; CHECK-SSE2-NEXT: pandn %xmm3, %xmm0			; CHECK-SSE2-NEXT: pandn %xmm3, %xmm0
	; CHECK-SSE2-NEXT: retq			; CHECK-SSE2-NEXT: retq
	;			;
	; CHECK-SSE41-LABEL: test_srem_even_allones_ne:			; CHECK-SSE41-LABEL: test_srem_even_allones_ne:
	; CHECK-SSE41: # %bb.0:			; CHECK-SSE41: # %bb.0:
	; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]			; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]
	; CHECK-SSE41-NEXT: pmuldq {{.*}}(%rip), %xmm1			; CHECK-SSE41-NEXT: pmuldq {{.*}}(%rip), %xmm1
	; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm2 = <2454267027,u,0,u>			; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm2 = [2454267027,0,0,0]
	; CHECK-SSE41-NEXT: pmuldq %xmm0, %xmm2			; CHECK-SSE41-NEXT: pmuldq %xmm0, %xmm2
	; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]			; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
	; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm2[0,1],xmm1[2,3],xmm2[4,5],xmm1[6,7]			; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm2[0,1],xmm1[2,3],xmm2[4,5],xmm1[6,7]
	; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm1 = [1,1,4294967295,1]			; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm1 = [1,1,4294967295,1]
	; CHECK-SSE41-NEXT: pmulld %xmm0, %xmm1			; CHECK-SSE41-NEXT: pmulld %xmm0, %xmm1
	; CHECK-SSE41-NEXT: paddd %xmm2, %xmm1			; CHECK-SSE41-NEXT: paddd %xmm2, %xmm1
	; CHECK-SSE41-NEXT: movdqa %xmm1, %xmm2			; CHECK-SSE41-NEXT: movdqa %xmm1, %xmm2
	; CHECK-SSE41-NEXT: psrad $3, %xmm2			; CHECK-SSE41-NEXT: psrad $3, %xmm2
	▲ Show 20 Lines • Show All 845 Lines • ▼ Show 20 Lines
	; CHECK-SSE2-NEXT: pcmpeqd %xmm1, %xmm0			; CHECK-SSE2-NEXT: pcmpeqd %xmm1, %xmm0
	; CHECK-SSE2-NEXT: psrld $31, %xmm0			; CHECK-SSE2-NEXT: psrld $31, %xmm0
	; CHECK-SSE2-NEXT: retq			; CHECK-SSE2-NEXT: retq
	;			;
	; CHECK-SSE41-LABEL: test_srem_even_one:			; CHECK-SSE41-LABEL: test_srem_even_one:
	; CHECK-SSE41: # %bb.0:			; CHECK-SSE41: # %bb.0:
	; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]			; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]
	; CHECK-SSE41-NEXT: pmuldq {{.*}}(%rip), %xmm1			; CHECK-SSE41-NEXT: pmuldq {{.*}}(%rip), %xmm1
	; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm2 = <2454267027,u,0,u>			; CHECK-SSE41-NEXT: movdqa {{.*#+}} xmm2 = [2454267027,0,0,0]
	; CHECK-SSE41-NEXT: pmuldq %xmm0, %xmm2			; CHECK-SSE41-NEXT: pmuldq %xmm0, %xmm2
	; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]			; CHECK-SSE41-NEXT: pshufd {{.*#+}} xmm2 = xmm2[1,1,3,3]
	; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm2[0,1],xmm1[2,3],xmm2[4,5],xmm1[6,7]			; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm2 = xmm2[0,1],xmm1[2,3],xmm2[4,5],xmm1[6,7]
	; CHECK-SSE41-NEXT: paddd %xmm0, %xmm2			; CHECK-SSE41-NEXT: paddd %xmm0, %xmm2
	; CHECK-SSE41-NEXT: movdqa %xmm2, %xmm1			; CHECK-SSE41-NEXT: movdqa %xmm2, %xmm1
	; CHECK-SSE41-NEXT: psrad $3, %xmm1			; CHECK-SSE41-NEXT: psrad $3, %xmm1
	; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1,2,3],xmm2[4,5],xmm1[6,7]			; CHECK-SSE41-NEXT: pblendw {{.*#+}} xmm1 = xmm1[0,1,2,3],xmm2[4,5],xmm1[6,7]
	; CHECK-SSE41-NEXT: psrld $31, %xmm2			; CHECK-SSE41-NEXT: psrld $31, %xmm2
	▲ Show 20 Lines • Show All 2,030 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec_set-A.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86			; RUN: llc < %s -mtriple=i386-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86
	; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64

	define <2 x i64> @test1() nounwind {			define <2 x i64> @test1() nounwind {
	; X86-LABEL: test1:			; X86-LABEL: test1:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movaps {{.*#+}} xmm0 = [1,0,0,0]			; X86-NEXT: movaps {{.*#+}} xmm0 = [1,0,0,0]
	; X86-NEXT: retl			; X86-NEXT: retl
	;			;
	; X64-LABEL: test1:			; X64-LABEL: test1:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movaps {{.*#+}} xmm0 = [1,0]			; X64-NEXT: movaps {{.*#+}} xmm0 = [1,0,0,0]
	; X64-NEXT: retq			; X64-NEXT: retq
	ret <2 x i64> < i64 1, i64 0 >			ret <2 x i64> < i64 1, i64 0 >
	}			}

llvm/test/CodeGen/X86/vec_shift2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X32			; RUN: llc < %s -mtriple=i686-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X32
	; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64			; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64

	define <2 x i64> @t1(<2 x i64> %b1, <2 x i64> %c) nounwind {			define <2 x i64> @t1(<2 x i64> %b1, <2 x i64> %c) nounwind {
	; X32-LABEL: t1:			; X32-LABEL: t1:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: psrlw {{\.LCPI.*}}, %xmm0			; X32-NEXT: psrlw $14, %xmm0
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: t1:			; X64-LABEL: t1:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: psrlw {{.*}}(%rip), %xmm0			; X64-NEXT: psrlw $14, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%tmp1 = bitcast <2 x i64> %b1 to <8 x i16>			%tmp1 = bitcast <2 x i64> %b1 to <8 x i16>
	%tmp2 = tail call <8 x i16> @llvm.x86.sse2.psrl.w( <8 x i16> %tmp1, <8 x i16> bitcast (<4 x i32> < i32 14, i32 undef, i32 undef, i32 undef > to <8 x i16>) ) nounwind readnone			%tmp2 = tail call <8 x i16> @llvm.x86.sse2.psrl.w( <8 x i16> %tmp1, <8 x i16> bitcast (<4 x i32> < i32 14, i32 undef, i32 undef, i32 undef > to <8 x i16>) ) nounwind readnone
	%tmp3 = bitcast <8 x i16> %tmp2 to <2 x i64>			%tmp3 = bitcast <8 x i16> %tmp2 to <2 x i64>
	ret <2 x i64> %tmp3			ret <2 x i64> %tmp3
	}			}

	define <4 x i32> @t2(<2 x i64> %b1, <2 x i64> %c) nounwind {			define <4 x i32> @t2(<2 x i64> %b1, <2 x i64> %c) nounwind {
	Show All 20 Lines

llvm/test/CodeGen/X86/vector-lzcnt-128.ll

	Show First 20 Lines • Show All 1,660 Lines • ▼ Show 20 Lines
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%out = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> %in, i1 -1)			%out = call <16 x i8> @llvm.ctlz.v16i8(<16 x i8> %in, i1 -1)
	ret <16 x i8> %out			ret <16 x i8> %out
	}			}

	define <2 x i64> @foldv2i64() nounwind {			define <2 x i64> @foldv2i64() nounwind {
	; SSE-LABEL: foldv2i64:			; SSE-LABEL: foldv2i64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0]			; SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0,0,0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; NOBW-LABEL: foldv2i64:			; NOBW-LABEL: foldv2i64:
	; NOBW: # %bb.0:			; NOBW: # %bb.0:
	; NOBW-NEXT: vmovaps {{.*#+}} xmm0 = [55,0]			; NOBW-NEXT: vmovaps {{.*#+}} xmm0 = [55,0,0,0]
	; NOBW-NEXT: retq			; NOBW-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: foldv2i64:			; AVX512VLBWDQ-LABEL: foldv2i64:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovaps {{.*#+}} xmm0 = [55,0]			; AVX512VLBWDQ-NEXT: vmovaps {{.*#+}} xmm0 = [55,0,0,0]
	; AVX512VLBWDQ-NEXT: retq			; AVX512VLBWDQ-NEXT: retq
	;			;
	; X32-SSE-LABEL: foldv2i64:			; X32-SSE-LABEL: foldv2i64:
	; X32-SSE: # %bb.0:			; X32-SSE: # %bb.0:
	; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0,0,0]			; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0,0,0]
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%out = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 0)			%out = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 0)
	ret <2 x i64> %out			ret <2 x i64> %out
	}			}

	define <2 x i64> @foldv2i64u() nounwind {			define <2 x i64> @foldv2i64u() nounwind {
	; SSE-LABEL: foldv2i64u:			; SSE-LABEL: foldv2i64u:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0]			; SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0,0,0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; NOBW-LABEL: foldv2i64u:			; NOBW-LABEL: foldv2i64u:
	; NOBW: # %bb.0:			; NOBW: # %bb.0:
	; NOBW-NEXT: vmovaps {{.*#+}} xmm0 = [55,0]			; NOBW-NEXT: vmovaps {{.*#+}} xmm0 = [55,0,0,0]
	; NOBW-NEXT: retq			; NOBW-NEXT: retq
	;			;
	; AVX512VLBWDQ-LABEL: foldv2i64u:			; AVX512VLBWDQ-LABEL: foldv2i64u:
	; AVX512VLBWDQ: # %bb.0:			; AVX512VLBWDQ: # %bb.0:
	; AVX512VLBWDQ-NEXT: vmovaps {{.*#+}} xmm0 = [55,0]			; AVX512VLBWDQ-NEXT: vmovaps {{.*#+}} xmm0 = [55,0,0,0]
	; AVX512VLBWDQ-NEXT: retq			; AVX512VLBWDQ-NEXT: retq
	;			;
	; X32-SSE-LABEL: foldv2i64u:			; X32-SSE-LABEL: foldv2i64u:
	; X32-SSE: # %bb.0:			; X32-SSE: # %bb.0:
	; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0,0,0]			; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [55,0,0,0]
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%out = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 -1)			%out = call <2 x i64> @llvm.ctlz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 -1)
	ret <2 x i64> %out			ret <2 x i64> %out
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 707 Lines • ▼ Show 20 Lines
	; AVX2-LABEL: shuffle_v16i16_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:			; AVX2-LABEL: shuffle_v16i16_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]			; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
	; AVX2-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[14,15,0,1,0,1,0,1,0,1,0,1,0,1,0,1,16,17,16,17,16,17,16,17,16,17,16,17,16,17,16,17]			; AVX2-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[14,15,0,1,0,1,0,1,0,1,0,1,0,1,0,1,16,17,16,17,16,17,16,17,16,17,16,17,16,17,16,17]
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512VL-LABEL: shuffle_v16i16_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:			; AVX512VL-LABEL: shuffle_v16i16_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: movl $15, %eax			; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm1 = [15,0,0,0]
	; AVX512VL-NEXT: vmovd %eax, %xmm1
	; AVX512VL-NEXT: vpermw %ymm0, %ymm1, %ymm0			; AVX512VL-NEXT: vpermw %ymm0, %ymm1, %ymm0
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; XOPAVX1-LABEL: shuffle_v16i16_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:			; XOPAVX1-LABEL: shuffle_v16i16_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
	; XOPAVX1: # %bb.0:			; XOPAVX1: # %bb.0:
	; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm1			; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
	; XOPAVX1-NEXT: vpperm {{.*#+}} xmm1 = xmm1[14,15],xmm0[0,1,0,1,0,1,0,1,0,1,0,1,0,1]			; XOPAVX1-NEXT: vpperm {{.*#+}} xmm1 = xmm1[14,15],xmm0[0,1,0,1,0,1,0,1,0,1,0,1,0,1]
	; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,2,3,4,5,6,7]			; XOPAVX1-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,0,2,3,4,5,6,7]
	▲ Show 20 Lines • Show All 7,088 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,594 Lines • ▼ Show 20 Lines
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
; AVX1-NEXT: vpalignr {{.*#+}} xmm0 = xmm2[15],xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]		; AVX1-NEXT: vpalignr {{.*#+}} xmm0 = xmm2[15],xmm0[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14]
; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]		; AVX1-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:		; AVX2-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
; AVX2: # %bb.0:		; AVX2: # %bb.0:
		; AVX2-NEXT: vmovdqa {{.*#+}} xmm1 = [15,0,0,0]
; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]		; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
; AVX2-NEXT: movl $15, %eax
; AVX2-NEXT: vmovd %eax, %xmm1
; AVX2-NEXT: vpshufb %ymm1, %ymm0, %ymm0		; AVX2-NEXT: vpshufb %ymm1, %ymm0, %ymm0
; AVX2-NEXT: retq		; AVX2-NEXT: retq
;		;
; AVX512VLBW-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:		; AVX512VLBW-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
; AVX512VLBW: # %bb.0:		; AVX512VLBW: # %bb.0:
		; AVX512VLBW-NEXT: vmovdqa {{.*#+}} xmm1 = [15,0,0,0]
; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]		; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
; AVX512VLBW-NEXT: movl $15, %eax
; AVX512VLBW-NEXT: vmovd %eax, %xmm1
; AVX512VLBW-NEXT: vpshufb %ymm1, %ymm0, %ymm0		; AVX512VLBW-NEXT: vpshufb %ymm1, %ymm0, %ymm0
; AVX512VLBW-NEXT: retq		; AVX512VLBW-NEXT: retq
;		;
; AVX512VLVBMI-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:		; AVX512VLVBMI-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
; AVX512VLVBMI: # %bb.0:		; AVX512VLVBMI: # %bb.0:
; AVX512VLVBMI-NEXT: movl $31, %eax		; AVX512VLVBMI-NEXT: vmovdqa {{.*#+}} xmm1 = [31,0,0,0]
; AVX512VLVBMI-NEXT: vmovd %eax, %xmm1
; AVX512VLVBMI-NEXT: vpermb %ymm0, %ymm1, %ymm0		; AVX512VLVBMI-NEXT: vpermb %ymm0, %ymm1, %ymm0
; AVX512VLVBMI-NEXT: retq		; AVX512VLVBMI-NEXT: retq
;		;
; XOPAVX1-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:		; XOPAVX1-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
; XOPAVX1: # %bb.0:		; XOPAVX1: # %bb.0:
; XOPAVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1		; XOPAVX1-NEXT: vpxor %xmm1, %xmm1, %xmm1
; XOPAVX1-NEXT: vpshufb %xmm1, %xmm0, %xmm1		; XOPAVX1-NEXT: vpshufb %xmm1, %xmm0, %xmm1
; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2		; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
; XOPAVX1-NEXT: vpperm {{.*#+}} xmm0 = xmm2[15],xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]		; XOPAVX1-NEXT: vpperm {{.*#+}} xmm0 = xmm2[15],xmm0[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
; XOPAVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; XOPAVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; XOPAVX1-NEXT: retq		; XOPAVX1-NEXT: retq
;		;
; XOPAVX2-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:		; XOPAVX2-LABEL: shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
; XOPAVX2: # %bb.0:		; XOPAVX2: # %bb.0:
		; XOPAVX2-NEXT: vmovdqa {{.*#+}} xmm1 = [15,0,0,0]
; XOPAVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]		; XOPAVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
; XOPAVX2-NEXT: movl $15, %eax
; XOPAVX2-NEXT: vmovd %eax, %xmm1
; XOPAVX2-NEXT: vpshufb %ymm1, %ymm0, %ymm0		; XOPAVX2-NEXT: vpshufb %ymm1, %ymm0, %ymm0
; XOPAVX2-NEXT: retq		; XOPAVX2-NEXT: retq
%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 31, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>		%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 31, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
ret <32 x i8> %shuffle		ret <32 x i8> %shuffle
}		}

define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16(<32 x i8> %a, <32 x i8> %b) {		define <32 x i8> @shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16(<32 x i8> %a, <32 x i8> %b) {
; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:		; AVX1-LABEL: shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:
▲ Show 20 Lines • Show All 1,140 Lines • ▼ Show 20 Lines	; XOPAVX2-NEXT: retq
%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 14, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 16, i32 30, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>		%shuffle = shufflevector <32 x i8> %a, <32 x i8> %b, <32 x i32> <i32 0, i32 14, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 16, i32 30, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
ret <32 x i8> %shuffle		ret <32 x i8> %shuffle
}		}

define <32 x i8> @shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16(<32 x i8> %a, <32 x i8> %b) {		define <32 x i8> @shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16(<32 x i8> %a, <32 x i8> %b) {
; AVX1-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:		; AVX1-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:
; AVX1: # %bb.0:		; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]		; AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [15,0,0,0]
; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1		; AVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0		; AVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1-NEXT: retq		; AVX1-NEXT: retq
;		;
; AVX2OR512VL-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:		; AVX2OR512VL-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:
; AVX2OR512VL: # %bb.0:		; AVX2OR512VL: # %bb.0:
; AVX2OR512VL-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]		; AVX2OR512VL-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
; AVX2OR512VL-NEXT: retq		; AVX2OR512VL-NEXT: retq
;		;
; XOPAVX1-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:		; XOPAVX1-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:
; XOPAVX1: # %bb.0:		; XOPAVX1: # %bb.0:
; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm1		; XOPAVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
; XOPAVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]		; XOPAVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [15,0,0,0]
; XOPAVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1		; XOPAVX1-NEXT: vpshufb %xmm2, %xmm1, %xmm1
; XOPAVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0		; XOPAVX1-NEXT: vpshufb %xmm2, %xmm0, %xmm0
; XOPAVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0		; XOPAVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; XOPAVX1-NEXT: retq		; XOPAVX1-NEXT: retq
;		;
; XOPAVX2-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:		; XOPAVX2-LABEL: shuffle_v32i8_15_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_31_16_16_16_16_16_16_16_16_16_16_16_16_16_16_16:
; XOPAVX2: # %bb.0:		; XOPAVX2: # %bb.0:
; XOPAVX2-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]		; XOPAVX2-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
▲ Show 20 Lines • Show All 2,383 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll

	Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]
	; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]			; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]
	; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,0,0,0,4,4,4,4]			; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,0,0,0,4,4,4,4]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2OR512VL-LABEL: shuffle_v8f32_70000000:			; AVX2OR512VL-LABEL: shuffle_v8f32_70000000:
	; AVX2OR512VL: # %bb.0:			; AVX2OR512VL: # %bb.0:
	; AVX2OR512VL-NEXT: movl $7, %eax			; AVX2OR512VL-NEXT: vmovaps {{.*#+}} xmm1 = [7,0,0,0]
	; AVX2OR512VL-NEXT: vmovd %eax, %xmm1			; AVX2OR512VL-NEXT: vpermps %ymm0, %ymm1, %ymm0
	; AVX2OR512VL-NEXT: vpermd %ymm0, %ymm1, %ymm0
	; AVX2OR512VL-NEXT: retq			; AVX2OR512VL-NEXT: retq
	%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <8 x float> %a, <8 x float> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <8 x float> %shuffle			ret <8 x float> %shuffle
	}			}

	define <8 x float> @shuffle_v8f32_01014545(<8 x float> %a, <8 x float> %b) {			define <8 x float> @shuffle_v8f32_01014545(<8 x float> %a, <8 x float> %b) {
	; ALL-LABEL: shuffle_v8f32_01014545:			; ALL-LABEL: shuffle_v8f32_01014545:
	; ALL: # %bb.0:			; ALL: # %bb.0:
	▲ Show 20 Lines • Show All 1,304 Lines • ▼ Show 20 Lines
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]			; AVX1-NEXT: vperm2f128 {{.*#+}} ymm1 = ymm0[2,3,0,1]
	; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]			; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2,3,4,5,6,7]
	; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,0,0,0,4,4,4,4]			; AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[3,0,0,0,4,4,4,4]
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2OR512VL-LABEL: shuffle_v8i32_70000000:			; AVX2OR512VL-LABEL: shuffle_v8i32_70000000:
	; AVX2OR512VL: # %bb.0:			; AVX2OR512VL: # %bb.0:
	; AVX2OR512VL-NEXT: movl $7, %eax			; AVX2OR512VL-NEXT: vmovaps {{.*#+}} xmm1 = [7,0,0,0]
	; AVX2OR512VL-NEXT: vmovd %eax, %xmm1			; AVX2OR512VL-NEXT: vpermps %ymm0, %ymm1, %ymm0
	; AVX2OR512VL-NEXT: vpermd %ymm0, %ymm1, %ymm0
	; AVX2OR512VL-NEXT: retq			; AVX2OR512VL-NEXT: retq
	%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <8 x i32> %a, <8 x i32> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <8 x i32> %shuffle			ret <8 x i32> %shuffle
	}			}

	define <8 x i32> @shuffle_v8i32_01014545(<8 x i32> %a, <8 x i32> %b) {			define <8 x i32> @shuffle_v8i32_01014545(<8 x i32> %a, <8 x i32> %b) {
	; AVX1-LABEL: shuffle_v8i32_01014545:			; AVX1-LABEL: shuffle_v8i32_01014545:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	▲ Show 20 Lines • Show All 1,870 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-512-v32.ll

	Show First 20 Lines • Show All 195 Lines • ▼ Show 20 Lines
	; ALL-NEXT: retq			; ALL-NEXT: retq
	%shuffle = shufflevector <32 x i16> %a, <32 x i16> undef, <32 x i32> <i32 3, i32 0, i32 1, i32 2, i32 7, i32 4, i32 5, i32 6, i32 11, i32 8, i32 9, i32 10, i32 15, i32 12, i32 13, i32 14, i32 19, i32 16, i32 17, i32 18, i32 23, i32 20, i32 21, i32 22, i32 27, i32 24, i32 25, i32 26, i32 31, i32 28, i32 29, i32 30>			%shuffle = shufflevector <32 x i16> %a, <32 x i16> undef, <32 x i32> <i32 3, i32 0, i32 1, i32 2, i32 7, i32 4, i32 5, i32 6, i32 11, i32 8, i32 9, i32 10, i32 15, i32 12, i32 13, i32 14, i32 19, i32 16, i32 17, i32 18, i32 23, i32 20, i32 21, i32 22, i32 27, i32 24, i32 25, i32 26, i32 31, i32 28, i32 29, i32 30>
	ret <32 x i16> %shuffle			ret <32 x i16> %shuffle
	}			}

	define <32 x i16> @shuffle_v32i16_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz(<32 x i16> %a) {			define <32 x i16> @shuffle_v32i16_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz(<32 x i16> %a) {
	; KNL-LABEL: shuffle_v32i16_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:			; KNL-LABEL: shuffle_v32i16_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:
	; KNL: ## %bb.0:			; KNL: ## %bb.0:
	; KNL-NEXT: movl $65535, %eax ## imm = 0xFFFF			; KNL-NEXT: vmovaps {{.*#+}} xmm1 = [65535,0,0,0]
	; KNL-NEXT: vmovd %eax, %xmm1			; KNL-NEXT: vandps %ymm1, %ymm0, %ymm0
	; KNL-NEXT: vpand %ymm1, %ymm0, %ymm0
	; KNL-NEXT: retq			; KNL-NEXT: retq
	;			;
	; SKX-LABEL: shuffle_v32i16_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:			; SKX-LABEL: shuffle_v32i16_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:
	; SKX: ## %bb.0:			; SKX: ## %bb.0:
	; SKX-NEXT: movl $65535, %eax ## imm = 0xFFFF			; SKX-NEXT: vmovaps {{.*#+}} xmm1 = [65535,0,0,0]
	; SKX-NEXT: vmovd %eax, %xmm1			; SKX-NEXT: vandps %zmm1, %zmm0, %zmm0
	; SKX-NEXT: vpandq %zmm1, %zmm0, %zmm0
	; SKX-NEXT: retq			; SKX-NEXT: retq
	%shuffle = shufflevector <32 x i16> %a, <32 x i16> zeroinitializer, <32 x i32> <i32 0, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>			%shuffle = shufflevector <32 x i16> %a, <32 x i16> zeroinitializer, <32 x i32> <i32 0, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32, i32 32>
	ret <32 x i16> %shuffle			ret <32 x i16> %shuffle
	}			}

	define <32 x i16> @shuffle_v32i16_ashr_00_02_04_06_32_34_36_38_08_10_12_14_40_42_44_46_16_18_20_22_48_50_52_54_24_26_28_30_56_58_60_62(<16 x i32> %a0, <16 x i32> %a1) nounwind {			define <32 x i16> @shuffle_v32i16_ashr_00_02_04_06_32_34_36_38_08_10_12_14_40_42_44_46_16_18_20_22_48_50_52_54_24_26_28_30_56_58_60_62(<16 x i32> %a0, <16 x i32> %a1) nounwind {
	; KNL-LABEL: shuffle_v32i16_ashr_00_02_04_06_32_34_36_38_08_10_12_14_40_42_44_46_16_18_20_22_48_50_52_54_24_26_28_30_56_58_60_62:			; KNL-LABEL: shuffle_v32i16_ashr_00_02_04_06_32_34_36_38_08_10_12_14_40_42_44_46_16_18_20_22_48_50_52_54_24_26_28_30_56_58_60_62:
	; KNL: ## %bb.0:			; KNL: ## %bb.0:
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-512-v64.ll

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	; AVX512VBMI-NEXT: retq
%shuffle = shufflevector <64 x i8> %a, <64 x i8> %b, <64 x i32> <i32 79, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 95, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 111, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 127, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62>		%shuffle = shufflevector <64 x i8> %a, <64 x i8> %b, <64 x i32> <i32 79, i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 95, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 111, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 127, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62>
ret <64 x i8> %shuffle		ret <64 x i8> %shuffle
}		}


define <64 x i8> @shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz(<64 x i8> %a) {		define <64 x i8> @shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz(<64 x i8> %a) {
; AVX512F-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:		; AVX512F-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:
; AVX512F: # %bb.0:		; AVX512F: # %bb.0:
; AVX512F-NEXT: movl $255, %eax		; AVX512F-NEXT: vmovaps {{.*#+}} xmm1 = [255,0,0,0]
; AVX512F-NEXT: vmovd %eax, %xmm1		; AVX512F-NEXT: vandps %ymm1, %ymm0, %ymm0
; AVX512F-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX512F-NEXT: retq		; AVX512F-NEXT: retq
;		;
; AVX512BW-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:		; AVX512BW-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:
; AVX512BW: # %bb.0:		; AVX512BW: # %bb.0:
; AVX512BW-NEXT: vpshufb {{.*#+}} zmm0 = zmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero		; AVX512BW-NEXT: vpshufb {{.*#+}} zmm0 = zmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
; AVX512BW-NEXT: retq		; AVX512BW-NEXT: retq
;		;
; AVX512DQ-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:		; AVX512DQ-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:
; AVX512DQ: # %bb.0:		; AVX512DQ: # %bb.0:
; AVX512DQ-NEXT: movl $255, %eax		; AVX512DQ-NEXT: vmovaps {{.*#+}} xmm1 = [255,0,0,0]
; AVX512DQ-NEXT: vmovd %eax, %xmm1		; AVX512DQ-NEXT: vandps %ymm1, %ymm0, %ymm0
; AVX512DQ-NEXT: vpand %ymm1, %ymm0, %ymm0
; AVX512DQ-NEXT: retq		; AVX512DQ-NEXT: retq
;		;
; AVX512VBMI-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:		; AVX512VBMI-LABEL: shuffle_v64i8_0zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz:
; AVX512VBMI: # %bb.0:		; AVX512VBMI: # %bb.0:
; AVX512VBMI-NEXT: vpshufb {{.*#+}} zmm0 = zmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero		; AVX512VBMI-NEXT: vpshufb {{.*#+}} zmm0 = zmm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero
; AVX512VBMI-NEXT: retq		; AVX512VBMI-NEXT: retq
%shuffle = shufflevector <64 x i8> %a, <64 x i8> zeroinitializer, <64 x i32> <i32 0, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64>		%shuffle = shufflevector <64 x i8> %a, <64 x i8> zeroinitializer, <64 x i32> <i32 0, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64, i32 64>
ret <64 x i8> %shuffle		ret <64 x i8> %shuffle
▲ Show 20 Lines • Show All 699 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-512-v8.ll

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <8 x double> %shuffle			ret <8 x double> %shuffle
	}			}

	define <8 x double> @shuffle_v8f64_70000000(<8 x double> %a, <8 x double> %b) {			define <8 x double> @shuffle_v8f64_70000000(<8 x double> %a, <8 x double> %b) {
	; ALL-LABEL: shuffle_v8f64_70000000:			; ALL-LABEL: shuffle_v8f64_70000000:
	; ALL: # %bb.0:			; ALL: # %bb.0:
	; ALL-NEXT: movl $7, %eax			; ALL-NEXT: vmovaps {{.*#+}} xmm1 = [7,0,0,0]
	; ALL-NEXT: vmovd %eax, %xmm1			; ALL-NEXT: vpermpd %zmm0, %zmm1, %zmm0
	; ALL-NEXT: vpermq %zmm0, %zmm1, %zmm0
	; ALL-NEXT: ret{{[l\|q]}}			; ALL-NEXT: ret{{[l\|q]}}
	%shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <8 x double> %a, <8 x double> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <8 x double> %shuffle			ret <8 x double> %shuffle
	}			}

	define <8 x double> @shuffle_v8f64_01014545(<8 x double> %a, <8 x double> %b) {			define <8 x double> @shuffle_v8f64_01014545(<8 x double> %a, <8 x double> %b) {
	; ALL-LABEL: shuffle_v8f64_01014545:			; ALL-LABEL: shuffle_v8f64_01014545:
	; ALL: # %bb.0:			; ALL: # %bb.0:
	▲ Show 20 Lines • Show All 800 Lines • ▼ Show 20 Lines
	; AVX512F-32-NEXT: retl			; AVX512F-32-NEXT: retl
	%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 0, i32 6, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <8 x i64> %shuffle			ret <8 x i64> %shuffle
	}			}

	define <8 x i64> @shuffle_v8i64_70000000(<8 x i64> %a, <8 x i64> %b) {			define <8 x i64> @shuffle_v8i64_70000000(<8 x i64> %a, <8 x i64> %b) {
	; ALL-LABEL: shuffle_v8i64_70000000:			; ALL-LABEL: shuffle_v8i64_70000000:
	; ALL: # %bb.0:			; ALL: # %bb.0:
	; ALL-NEXT: movl $7, %eax			; ALL-NEXT: vmovaps {{.*#+}} xmm1 = [7,0,0,0]
	; ALL-NEXT: vmovd %eax, %xmm1			; ALL-NEXT: vpermpd %zmm0, %zmm1, %zmm0
	; ALL-NEXT: vpermq %zmm0, %zmm1, %zmm0
	; ALL-NEXT: ret{{[l\|q]}}			; ALL-NEXT: ret{{[l\|q]}}
	%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			%shuffle = shufflevector <8 x i64> %a, <8 x i64> %b, <8 x i32> <i32 7, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	ret <8 x i64> %shuffle			ret <8 x i64> %shuffle
	}			}

	define <8 x i64> @shuffle_v8i64_01014545(<8 x i64> %a, <8 x i64> %b) {			define <8 x i64> @shuffle_v8i64_01014545(<8 x i64> %a, <8 x i64> %b) {
	; ALL-LABEL: shuffle_v8i64_01014545:			; ALL-LABEL: shuffle_v8i64_01014545:
	; ALL: # %bb.0:			; ALL: # %bb.0:
	▲ Show 20 Lines • Show All 1,299 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll

Show First 20 Lines • Show All 893 Lines • ▼ Show 20 Lines	; X64-NEXT: retq
%res0 = call <8 x double> @llvm.x86.avx512.mask.vpermi2var.pd.512(<8 x double> %x0, <8 x i64> <i64 15, i64 0, i64 8, i64 7, i64 12, i64 6, i64 11, i64 4>, <8 x double> %x1, i8 -1)		%res0 = call <8 x double> @llvm.x86.avx512.mask.vpermi2var.pd.512(<8 x double> %x0, <8 x i64> <i64 15, i64 0, i64 8, i64 7, i64 12, i64 6, i64 11, i64 4>, <8 x double> %x1, i8 -1)
%res1 = call <8 x double> @llvm.x86.avx512.maskz.vpermt2var.pd.512(<8 x i64> <i64 12, i64 5, i64 14, i64 7, i64 8, i64 1, i64 10, i64 3>, <8 x double> %res0, <8 x double> %res0, i8 -1)		%res1 = call <8 x double> @llvm.x86.avx512.maskz.vpermt2var.pd.512(<8 x i64> <i64 12, i64 5, i64 14, i64 7, i64 8, i64 1, i64 10, i64 3>, <8 x double> %res0, <8 x double> %res0, i8 -1)
ret <8 x double> %res1		ret <8 x double> %res1
}		}

define <8 x double> @combine_vpermi2var_8f64_as_permpd(<8 x double> %x0, <8 x double> %x1, i64 %a2) {		define <8 x double> @combine_vpermi2var_8f64_as_permpd(<8 x double> %x0, <8 x double> %x1, i64 %a2) {
; X86-LABEL: combine_vpermi2var_8f64_as_permpd:		; X86-LABEL: combine_vpermi2var_8f64_as_permpd:
; X86: # %bb.0:		; X86: # %bb.0:
; X86-NEXT: movl $2, %eax		; X86-NEXT: vmovsd {{.*#+}} xmm2 = mem[0],zero
; X86-NEXT: vmovd %eax, %xmm2		; X86-NEXT: vunpcklpd {{.*#+}} xmm2 = xmm2[0],mem[0]
; X86-NEXT: vmovq {{.*#+}} xmm3 = mem[0],zero		; X86-NEXT: vinsertf128 $1, {{\.LCPI.*}}, %ymm2, %ymm2
; X86-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]		; X86-NEXT: vinsertf64x4 $1, {{\.LCPI.*}}, %zmm2, %zmm2
; X86-NEXT: vinserti128 $1, {{\.LCPI.*}}, %ymm2, %ymm2
; X86-NEXT: vinserti64x4 $1, {{\.LCPI.*}}, %zmm2, %zmm2
; X86-NEXT: vpermi2pd %zmm1, %zmm0, %zmm2		; X86-NEXT: vpermi2pd %zmm1, %zmm0, %zmm2
; X86-NEXT: vpermpd {{.*#+}} zmm0 = zmm2[2,3,1,1,6,7,5,5]		; X86-NEXT: vpermpd {{.*#+}} zmm0 = zmm2[2,3,1,1,6,7,5,5]
; X86-NEXT: retl		; X86-NEXT: retl
;		;
; X64-LABEL: combine_vpermi2var_8f64_as_permpd:		; X64-LABEL: combine_vpermi2var_8f64_as_permpd:
; X64: # %bb.0:		; X64: # %bb.0:
; X64-NEXT: vpermpd {{.*#+}} zmm0 = zmm0[1,3,2,2,5,7,6,6]		; X64-NEXT: vpermpd {{.*#+}} zmm0 = zmm0[1,3,2,2,5,7,6,6]
; X64-NEXT: retq		; X64-NEXT: retq
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-combining-xop.ll

	Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[3],ymm1[3]			; CHECK-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[0],ymm1[0],ymm0[3],ymm1[3]
	; CHECK-NEXT: ret{{[l\|q]}}			; CHECK-NEXT: ret{{[l\|q]}}
	%res0 = call <4 x double> @llvm.x86.xop.vpermil2pd.256(<4 x double> %a0, <4 x double> %a1, <4 x i64> <i64 0, i64 4, i64 2, i64 7>, i8 0)			%res0 = call <4 x double> @llvm.x86.xop.vpermil2pd.256(<4 x double> %a0, <4 x double> %a1, <4 x i64> <i64 0, i64 4, i64 2, i64 7>, i8 0)
	ret <4 x double> %res0			ret <4 x double> %res0
	}			}

	define <4 x double> @demandedelts_vpermil2pd256_as_shufpd(<4 x double> %a0, <4 x double> %a1, i64 %a2) {			define <4 x double> @demandedelts_vpermil2pd256_as_shufpd(<4 x double> %a0, <4 x double> %a1, i64 %a2) {
	; X86-AVX-LABEL: demandedelts_vpermil2pd256_as_shufpd:			; X86-LABEL: demandedelts_vpermil2pd256_as_shufpd:
	; X86-AVX: # %bb.0:			; X86: # %bb.0:
	; X86-AVX-NEXT: movl $4, %eax			; X86-NEXT: vmovsd {{.*#+}} xmm2 = mem[0],zero
	; X86-AVX-NEXT: vmovd %eax, %xmm2			; X86-NEXT: vunpcklpd {{.*#+}} xmm2 = xmm2[0],mem[0]
	; X86-AVX-NEXT: vmovq {{.*#+}} xmm3 = mem[0],zero			; X86-NEXT: vinsertf128 $1, {{\.LCPI.*}}, %ymm2, %ymm2
	; X86-AVX-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]			; X86-NEXT: vpermil2pd $0, %ymm2, %ymm1, %ymm0, %ymm0
	; X86-AVX-NEXT: vinsertf128 $1, {{\.LCPI.*}}, %ymm2, %ymm2			; X86-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,1,2,3]
	; X86-AVX-NEXT: vpermil2pd $0, %ymm2, %ymm1, %ymm0, %ymm0			; X86-NEXT: retl
	; X86-AVX-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,1,2,3]
	; X86-AVX-NEXT: retl
	;
	; X86-AVX2-LABEL: demandedelts_vpermil2pd256_as_shufpd:
	; X86-AVX2: # %bb.0:
	; X86-AVX2-NEXT: movl $4, %eax
	; X86-AVX2-NEXT: vmovd %eax, %xmm2
	; X86-AVX2-NEXT: vmovq {{.*#+}} xmm3 = mem[0],zero
	; X86-AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm3[0],xmm2[0]
	; X86-AVX2-NEXT: vinserti128 $1, {{\.LCPI.*}}, %ymm2, %ymm2
	; X86-AVX2-NEXT: vpermil2pd $0, %ymm2, %ymm1, %ymm0, %ymm0
	; X86-AVX2-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,1,2,3]
	; X86-AVX2-NEXT: retl
	;			;
	; X64-LABEL: demandedelts_vpermil2pd256_as_shufpd:			; X64-LABEL: demandedelts_vpermil2pd256_as_shufpd:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: vpermil2pd {{.*#+}} ymm0 = ymm1[0,0],ymm0[3],ymm1[3]			; X64-NEXT: vpermil2pd {{.*#+}} ymm0 = ymm1[0,0],ymm0[3],ymm1[3]
	; X64-NEXT: retq			; X64-NEXT: retq
	%res0 = insertelement <4 x i64> <i64 0, i64 4, i64 2, i64 7>, i64 %a2, i32 0			%res0 = insertelement <4 x i64> <i64 0, i64 4, i64 2, i64 7>, i64 %a2, i32 0
	%res1 = call <4 x double> @llvm.x86.xop.vpermil2pd.256(<4 x double> %a0, <4 x double> %a1, <4 x i64> %res0, i8 0)			%res1 = call <4 x double> @llvm.x86.xop.vpermil2pd.256(<4 x double> %a0, <4 x double> %a1, <4 x i64> %res0, i8 0)
	%res2 = shufflevector <4 x double> %res1, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 2, i32 3>			%res2 = shufflevector <4 x double> %res1, <4 x double> undef, <4 x i32> <i32 1, i32 1, i32 2, i32 3>
	▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-v1.ll

	Show All 40 Lines
	}			}

	define <2 x i1> @shuf2i1_1_2(<2 x i1> %a) {			define <2 x i1> @shuf2i1_1_2(<2 x i1> %a) {
	; AVX512F-LABEL: shuf2i1_1_2:			; AVX512F-LABEL: shuf2i1_1_2:
	; AVX512F: # %bb.0:			; AVX512F: # %bb.0:
	; AVX512F-NEXT: vpsllq $63, %xmm0, %xmm0			; AVX512F-NEXT: vpsllq $63, %xmm0, %xmm0
	; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1			; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1
	; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; AVX512F-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1			; AVX512F-NEXT: vmovdqa {{.*#+}} xmm1 = [18446744073709551615,0]
	; AVX512F-NEXT: vpalignr {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],xmm1[0,1,2,3,4,5,6,7]			; AVX512F-NEXT: vpalignr {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],xmm1[0,1,2,3,4,5,6,7]
	; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1			; AVX512F-NEXT: vptestmq %zmm0, %zmm0, %k1
	; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}			; AVX512F-NEXT: vpternlogq $255, %zmm0, %zmm0, %zmm0 {%k1} {z}
	; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0			; AVX512F-NEXT: # kill: def $xmm0 killed $xmm0 killed $zmm0
	; AVX512F-NEXT: vzeroupper			; AVX512F-NEXT: vzeroupper
	; AVX512F-NEXT: retq			; AVX512F-NEXT: retq
	;			;
	; AVX512VL-LABEL: shuf2i1_1_2:			; AVX512VL-LABEL: shuf2i1_1_2:
	; AVX512VL: # %bb.0:			; AVX512VL: # %bb.0:
	; AVX512VL-NEXT: vpsllq $63, %xmm0, %xmm0			; AVX512VL-NEXT: vpsllq $63, %xmm0, %xmm0
	; AVX512VL-NEXT: vptestmq %xmm0, %xmm0, %k1			; AVX512VL-NEXT: vptestmq %xmm0, %xmm0, %k1
	; AVX512VL-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512VL-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512VL-NEXT: vmovdqa64 %xmm0, %xmm1 {%k1} {z}			; AVX512VL-NEXT: vmovdqa64 %xmm0, %xmm1 {%k1} {z}
	; AVX512VL-NEXT: movq $-1, %rax			; AVX512VL-NEXT: vmovdqa {{.*#+}} xmm2 = [18446744073709551615,0]
	; AVX512VL-NEXT: vmovq %rax, %xmm2
	; AVX512VL-NEXT: vpalignr {{.*#+}} xmm1 = xmm1[8,9,10,11,12,13,14,15],xmm2[0,1,2,3,4,5,6,7]			; AVX512VL-NEXT: vpalignr {{.*#+}} xmm1 = xmm1[8,9,10,11,12,13,14,15],xmm2[0,1,2,3,4,5,6,7]
	; AVX512VL-NEXT: vptestmq %xmm1, %xmm1, %k1			; AVX512VL-NEXT: vptestmq %xmm1, %xmm1, %k1
	; AVX512VL-NEXT: vmovdqa64 %xmm0, %xmm0 {%k1} {z}			; AVX512VL-NEXT: vmovdqa64 %xmm0, %xmm0 {%k1} {z}
	; AVX512VL-NEXT: retq			; AVX512VL-NEXT: retq
	;			;
	; VL_BW_DQ-LABEL: shuf2i1_1_2:			; VL_BW_DQ-LABEL: shuf2i1_1_2:
	; VL_BW_DQ: # %bb.0:			; VL_BW_DQ: # %bb.0:
	; VL_BW_DQ-NEXT: vpsllq $63, %xmm0, %xmm0			; VL_BW_DQ-NEXT: vpsllq $63, %xmm0, %xmm0
	; VL_BW_DQ-NEXT: vpmovq2m %xmm0, %k0			; VL_BW_DQ-NEXT: vpmovq2m %xmm0, %k0
	; VL_BW_DQ-NEXT: movq $-1, %rax			; VL_BW_DQ-NEXT: vpmovm2q %k0, %xmm0
	; VL_BW_DQ-NEXT: vmovq %rax, %xmm0			; VL_BW_DQ-NEXT: vmovdqa {{.*#+}} xmm1 = [18446744073709551615,0]
	; VL_BW_DQ-NEXT: vpmovm2q %k0, %xmm1			; VL_BW_DQ-NEXT: vpalignr {{.*#+}} xmm0 = xmm0[8,9,10,11,12,13,14,15],xmm1[0,1,2,3,4,5,6,7]
	; VL_BW_DQ-NEXT: vpalignr {{.*#+}} xmm0 = xmm1[8,9,10,11,12,13,14,15],xmm0[0,1,2,3,4,5,6,7]
	; VL_BW_DQ-NEXT: vpmovq2m %xmm0, %k0			; VL_BW_DQ-NEXT: vpmovq2m %xmm0, %k0
	; VL_BW_DQ-NEXT: vpmovm2q %k0, %xmm0			; VL_BW_DQ-NEXT: vpmovm2q %k0, %xmm0
	; VL_BW_DQ-NEXT: retq			; VL_BW_DQ-NEXT: retq
	%b = shufflevector <2 x i1> %a, <2 x i1> <i1 1, i1 0>, <2 x i32> <i32 1, i32 2>			%b = shufflevector <2 x i1> %a, <2 x i1> <i1 1, i1 0>, <2 x i32> <i32 1, i32 2>
	ret <2 x i1> %b			ret <2 x i1> %b
	}			}


	▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-tzcnt-128.ll

	Show First 20 Lines • Show All 1,570 Lines • ▼ Show 20 Lines
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%out = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> %in, i1 -1)			%out = call <16 x i8> @llvm.cttz.v16i8(<16 x i8> %in, i1 -1)
	ret <16 x i8> %out			ret <16 x i8> %out
	}			}

	define <2 x i64> @foldv2i64() nounwind {			define <2 x i64> @foldv2i64() nounwind {
	; SSE-LABEL: foldv2i64:			; SSE-LABEL: foldv2i64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0]			; SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0,0,0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: foldv2i64:			; AVX-LABEL: foldv2i64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX512VPOPCNTDQ-LABEL: foldv2i64:			; AVX512VPOPCNTDQ-LABEL: foldv2i64:
	; AVX512VPOPCNTDQ: # %bb.0:			; AVX512VPOPCNTDQ: # %bb.0:
	; AVX512VPOPCNTDQ-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; AVX512VPOPCNTDQ-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; AVX512VPOPCNTDQ-NEXT: retq			; AVX512VPOPCNTDQ-NEXT: retq
	;			;
	; AVX512VPOPCNTDQVL-LABEL: foldv2i64:			; AVX512VPOPCNTDQVL-LABEL: foldv2i64:
	; AVX512VPOPCNTDQVL: # %bb.0:			; AVX512VPOPCNTDQVL: # %bb.0:
	; AVX512VPOPCNTDQVL-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; AVX512VPOPCNTDQVL-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; AVX512VPOPCNTDQVL-NEXT: retq			; AVX512VPOPCNTDQVL-NEXT: retq
	;			;
	; BITALG_NOVLX-LABEL: foldv2i64:			; BITALG_NOVLX-LABEL: foldv2i64:
	; BITALG_NOVLX: # %bb.0:			; BITALG_NOVLX: # %bb.0:
	; BITALG_NOVLX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; BITALG_NOVLX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; BITALG_NOVLX-NEXT: retq			; BITALG_NOVLX-NEXT: retq
	;			;
	; BITALG-LABEL: foldv2i64:			; BITALG-LABEL: foldv2i64:
	; BITALG: # %bb.0:			; BITALG: # %bb.0:
	; BITALG-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; BITALG-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; BITALG-NEXT: retq			; BITALG-NEXT: retq
	;			;
	; X32-SSE-LABEL: foldv2i64:			; X32-SSE-LABEL: foldv2i64:
	; X32-SSE: # %bb.0:			; X32-SSE: # %bb.0:
	; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0,0,0]			; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0,0,0]
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%out = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 0)			%out = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 0)
	ret <2 x i64> %out			ret <2 x i64> %out
	}			}

	define <2 x i64> @foldv2i64u() nounwind {			define <2 x i64> @foldv2i64u() nounwind {
	; SSE-LABEL: foldv2i64u:			; SSE-LABEL: foldv2i64u:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0]			; SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0,0,0]
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: foldv2i64u:			; AVX-LABEL: foldv2i64u:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX512VPOPCNTDQ-LABEL: foldv2i64u:			; AVX512VPOPCNTDQ-LABEL: foldv2i64u:
	; AVX512VPOPCNTDQ: # %bb.0:			; AVX512VPOPCNTDQ: # %bb.0:
	; AVX512VPOPCNTDQ-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; AVX512VPOPCNTDQ-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; AVX512VPOPCNTDQ-NEXT: retq			; AVX512VPOPCNTDQ-NEXT: retq
	;			;
	; AVX512VPOPCNTDQVL-LABEL: foldv2i64u:			; AVX512VPOPCNTDQVL-LABEL: foldv2i64u:
	; AVX512VPOPCNTDQVL: # %bb.0:			; AVX512VPOPCNTDQVL: # %bb.0:
	; AVX512VPOPCNTDQVL-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; AVX512VPOPCNTDQVL-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; AVX512VPOPCNTDQVL-NEXT: retq			; AVX512VPOPCNTDQVL-NEXT: retq
	;			;
	; BITALG_NOVLX-LABEL: foldv2i64u:			; BITALG_NOVLX-LABEL: foldv2i64u:
	; BITALG_NOVLX: # %bb.0:			; BITALG_NOVLX: # %bb.0:
	; BITALG_NOVLX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; BITALG_NOVLX-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; BITALG_NOVLX-NEXT: retq			; BITALG_NOVLX-NEXT: retq
	;			;
	; BITALG-LABEL: foldv2i64u:			; BITALG-LABEL: foldv2i64u:
	; BITALG: # %bb.0:			; BITALG: # %bb.0:
	; BITALG-NEXT: vmovaps {{.*#+}} xmm0 = [8,0]			; BITALG-NEXT: vmovaps {{.*#+}} xmm0 = [8,0,0,0]
	; BITALG-NEXT: retq			; BITALG-NEXT: retq
	;			;
	; X32-SSE-LABEL: foldv2i64u:			; X32-SSE-LABEL: foldv2i64u:
	; X32-SSE: # %bb.0:			; X32-SSE: # %bb.0:
	; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0,0,0]			; X32-SSE-NEXT: movaps {{.*#+}} xmm0 = [8,0,0,0]
	; X32-SSE-NEXT: retl			; X32-SSE-NEXT: retl
	%out = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 -1)			%out = call <2 x i64> @llvm.cttz.v2i64(<2 x i64> <i64 256, i64 -1>, i1 -1)
	ret <2 x i64> %out			ret <2 x i64> %out
	▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[x86] favor vector constant load to avoid GPR to XMM transfer, part 2ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 266008

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx-load-store.ll

llvm/test/CodeGen/X86/avx2-arith.ll

llvm/test/CodeGen/X86/combine-udiv.ll

llvm/test/CodeGen/X86/fcmp-constant.ll

llvm/test/CodeGen/X86/insert-into-constant-vector.ll

llvm/test/CodeGen/X86/packss.ll

llvm/test/CodeGen/X86/pshufb-mask-comments.ll

llvm/test/CodeGen/X86/ret-mmx.ll

llvm/test/CodeGen/X86/sad.ll

llvm/test/CodeGen/X86/srem-seteq-vec-nonsplat.ll

llvm/test/CodeGen/X86/vec_set-A.ll

llvm/test/CodeGen/X86/vec_shift2.ll

llvm/test/CodeGen/X86/vector-lzcnt-128.ll

llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll

llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll

llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll

llvm/test/CodeGen/X86/vector-shuffle-512-v32.ll

llvm/test/CodeGen/X86/vector-shuffle-512-v64.ll

llvm/test/CodeGen/X86/vector-shuffle-512-v8.ll

llvm/test/CodeGen/X86/vector-shuffle-combining-avx512f.ll

llvm/test/CodeGen/X86/vector-shuffle-combining-xop.ll

llvm/test/CodeGen/X86/vector-shuffle-v1.ll

llvm/test/CodeGen/X86/vector-tzcnt-128.ll

[x86] favor vector constant load to avoid GPR to XMM transfer, part 2
ClosedPublic