This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
ClosedPublic

Authored by craig.topper on Aug 19 2019, 5:42 PM.

Download Raw Diff

Details

Reviewers

RKSimon
spatel

Commits

rGba375263e868: [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors…
rL369459: [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors…

Summary

I also had to tweak one existing X86 combine to avoid a regression there. I don't think we need the IdxVal == 0 check on the out insert_subvector. From the other index checks, we know the we had a subvector with some number of 0 elements above it. We can safely drop those 0 elements inserting just the smaller subvector anywhere into the larger zero vector.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 19 2019, 5:42 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2019, 5:42 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

craig.topper marked an inline comment as done.Aug 19 2019, 5:45 PM

craig.topper added inline comments.

llvm/test/CodeGen/X86/vec_umulo.ll
2516	By instruction count this is a regression, but I'm not sure exactly what the difference is.

RKSimon added inline comments.Aug 20 2019, 3:06 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
44307	Are we sure that this works in the general IdxVal case?
llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
7522–7523	This looks like we're missing a computeKnownBitsForTargetNode handling for a X86ISD opcode?

craig.topper marked 2 inline comments as done.Aug 20 2019, 8:26 AM

craig.topper added inline comments.

llvm/lib/Target/X86/X86ISelLowering.cpp
44307	I think so. We know we were inserting a subvector and some zeroes into a zero vector. This just shrinks the subvector of the insertion so we don’t transfer the zeroes. Since everything but the new subvector is going to be zero that should be fine. The only thing that should matter is that the extract and inner insert use the same index so we know the extract covers the whole subvector and possibly some zeroes. Does that sound right?
llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
7522–7523	There aren’t any target nodes here. The kshifts should be coming from isel for an insert_subvector. I think we’re probably missing a combine for insert_subvector into zero followed by an insert into undef. Maybe with an extract between them.

craig.topper marked an inline comment as done.Aug 20 2019, 10:06 AM

craig.topper added inline comments.

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
7522–7523	For these cases we end up with a DAG like this. t33: v16i1 = BUILD_VECTOR Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, .... t6: v2i1 = setcc t2, t4, seteq:ch t34: v16i1 = insert_subvector t33, t6, Constant:i64<0> t35: v8i1 = extract_subvector t34, Constant:i64<0> t18: i8 = bitcast t35 t36: i32 = zero_extend t18 We can't simplfy the t33 input to the insert_subvector, since t6 is only 2 bits. We can probably add a combine to turn the bitcast into a v16i1->i16 bitcast from the insert_subvector to get rid of the extract. Then the zero_extend will be from i16 to i32 which we should be able to optimize out through an isel pattern that we have for that to use KMOVW.

Replace the combineInsertSubVector change with a new combine in combineExtractSubvector. This just narrows an insert into zero if we are extracting a smaller portion of it and the extract is only the only user. This should be a more straightforward change. The combine I edited in combineInsertSubvector has no one use checks so doing an index other than zero may not be the best choice.

craig.topper added a child revision: D66489: [X86] Add a DAG combine to transform (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target.Aug 20 2019, 12:08 PM

Rebase after adding a bitcast optimization for v8i1.

LGTM - cheers

This revision is now accepted and ready to land.Aug 20 2019, 2:35 PM

Closed by commit rL369459: [DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors… (authored by ctopper). · Explain WhyAug 20 2019, 3:14 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

9 lines

Target/

X86/

X86ISelLowering.cpp

14 lines

test/

CodeGen/

X86/

avx512vl-vec-masked-cmp.ll

30 lines

128 lines

58 lines

2 lines

56 lines

10 lines

86 lines

10 lines

Diff 216229

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,681 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitCONCAT_VECTORS(SDNode *N) {

// Optimize concat_vectors where all but the first of the vectors are undef.		// Optimize concat_vectors where all but the first of the vectors are undef.
if (std::all_of(std::next(N->op_begin()), N->op_end(), [](const SDValue &Op) {		if (std::all_of(std::next(N->op_begin()), N->op_end(), [](const SDValue &Op) {
return Op.isUndef();		return Op.isUndef();
})) {		})) {
SDValue In = N->getOperand(0);		SDValue In = N->getOperand(0);
assert(In.getValueType().isVector() && "Must concat vectors");		assert(In.getValueType().isVector() && "Must concat vectors");

		// If the input is a concat_vectors, just make a larger concat by padding
		// with smaller undefs.
		if (In.getOpcode() == ISD::CONCAT_VECTORS && In.hasOneUse()) {
		unsigned NumOps = N->getNumOperands() * In.getNumOperands();
		SmallVector<SDValue, 4> Ops(In->op_begin(), In->op_end());
		Ops.resize(NumOps, DAG.getUNDEF(Ops[0].getValueType()));
		return DAG.getNode(ISD::CONCAT_VECTORS, SDLoc(N), VT, Ops);
		}

SDValue Scalar = peekThroughOneUseBitcasts(In);		SDValue Scalar = peekThroughOneUseBitcasts(In);

// concat_vectors(scalar_to_vector(scalar), undef) ->		// concat_vectors(scalar_to_vector(scalar), undef) ->
// scalar_to_vector(scalar)		// scalar_to_vector(scalar)
if (!LegalOperations && Scalar.getOpcode() == ISD::SCALAR_TO_VECTOR &&		if (!LegalOperations && Scalar.getOpcode() == ISD::SCALAR_TO_VECTOR &&
Scalar.hasOneUse()) {		Scalar.hasOneUse()) {
EVT SVT = Scalar.getValueType().getVectorElementType();		EVT SVT = Scalar.getValueType().getVectorElementType();
if (SVT == Scalar.getOperand(0).getValueType())		if (SVT == Scalar.getOperand(0).getValueType())
▲ Show 20 Lines • Show All 2,982 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 44,298 Lines • ▼ Show 20 Lines	if (SubVec.getOpcode() == ISD::INSERT_SUBVECTOR &&
SubVec.getOperand(1),		SubVec.getOperand(1),
DAG.getIntPtrConstant(IdxVal + Idx2Val, dl));		DAG.getIntPtrConstant(IdxVal + Idx2Val, dl));
}		}

// If we're inserting into a zero vector and our input was extracted from an		// If we're inserting into a zero vector and our input was extracted from an
// insert into a zero vector of the same type and the extraction was at		// insert into a zero vector of the same type and the extraction was at
// least as large as the original insertion. Just insert the original		// least as large as the original insertion. Just insert the original
// subvector into a zero vector.		// subvector into a zero vector.
if (SubVec.getOpcode() == ISD::EXTRACT_SUBVECTOR && IdxVal == 0 &&		if (SubVec.getOpcode() == ISD::EXTRACT_SUBVECTOR && IdxVal == 0 &&
		RKSimonUnsubmitted Not Done Reply Inline Actions Are we sure that this works in the general IdxVal case? RKSimon: Are we sure that this works in the general IdxVal case?
		craig.topperAuthorUnsubmitted Done Reply Inline Actions I think so. We know we were inserting a subvector and some zeroes into a zero vector. This just shrinks the subvector of the insertion so we don’t transfer the zeroes. Since everything but the new subvector is going to be zero that should be fine. The only thing that should matter is that the extract and inner insert use the same index so we know the extract covers the whole subvector and possibly some zeroes. Does that sound right? craig.topper: I think so. We know we were inserting a subvector and some zeroes into a zero vector. This just…
isNullConstant(SubVec.getOperand(1)) &&		isNullConstant(SubVec.getOperand(1)) &&
SubVec.getOperand(0).getOpcode() == ISD::INSERT_SUBVECTOR) {		SubVec.getOperand(0).getOpcode() == ISD::INSERT_SUBVECTOR) {
SDValue Ins = SubVec.getOperand(0);		SDValue Ins = SubVec.getOperand(0);
if (isNullConstant(Ins.getOperand(2)) &&		if (isNullConstant(Ins.getOperand(2)) &&
ISD::isBuildVectorAllZeros(Ins.getOperand(0).getNode()) &&		ISD::isBuildVectorAllZeros(Ins.getOperand(0).getNode()) &&
Ins.getOperand(1).getValueSizeInBits() <= SubVecVT.getSizeInBits())		Ins.getOperand(1).getValueSizeInBits() <= SubVecVT.getSizeInBits())
return DAG.getNode(ISD::INSERT_SUBVECTOR, dl, OpVT,		return DAG.getNode(ISD::INSERT_SUBVECTOR, dl, OpVT,
getZeroVector(OpVT, Subtarget, DAG, dl),		getZeroVector(OpVT, Subtarget, DAG, dl),
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	if ((DestNumElts % SrcNumElts) == 0) {
SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,		SDValue NewExtract = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, NewExtVT,
SrcOp, NewIndex);		SrcOp, NewIndex);
return DAG.getBitcast(VT, NewExtract);		return DAG.getBitcast(VT, NewExtract);
}		}
}		}
}		}
}		}

		// If we are extracting from an insert into a zero vector, replace with a
		// smaller insert into zero if we don't access less than the original
		// subvector. Don't do this for i1 vectors.
		if (VT.getVectorElementType() != MVT::i1 &&
		InVec.getOpcode() == ISD::INSERT_SUBVECTOR && IdxVal == 0 &&
		InVec.hasOneUse() && isNullConstant(InVec.getOperand(2)) &&
		ISD::isBuildVectorAllZeros(InVec.getOperand(0).getNode()) &&
		InVec.getOperand(1).getValueSizeInBits() <= VT.getSizeInBits()) {
		SDLoc DL(N);
		return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT,
		getZeroVector(VT, Subtarget, DAG, DL),
		InVec.getOperand(1), InVec.getOperand(2));
		}

// If we're extracting from a broadcast then we're better off just		// If we're extracting from a broadcast then we're better off just
// broadcasting to the smaller type directly, assuming this is the only use.		// broadcasting to the smaller type directly, assuming this is the only use.
// As its a broadcast we don't care about the extraction index.		// As its a broadcast we don't care about the extraction index.
if (InVec.getOpcode() == X86ISD::VBROADCAST && InVec.hasOneUse() &&		if (InVec.getOpcode() == X86ISD::VBROADCAST && InVec.hasOneUse() &&
InVec.getOperand(0).getValueSizeInBits() <= VT.getSizeInBits())		InVec.getOperand(0).getValueSizeInBits() <= VT.getSizeInBits())
return DAG.getNode(X86ISD::VBROADCAST, SDLoc(N), VT, InVec.getOperand(0));		return DAG.getNode(X86ISD::VBROADCAST, SDLoc(N), VT, InVec.getOperand(0));

// If we're extracting the lowest subvector and we're the only user,		// If we're extracting the lowest subvector and we're the only user,
▲ Show 20 Lines • Show All 1,508 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,678 Lines • ▼ Show 20 Lines
	; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	Show All 10 Lines
	; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 13 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 15 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 15 Lines
	; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	Show All 14 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 4,672 Lines • ▼ Show 20 Lines
	;			;
	; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
				RKSimonUnsubmitted Not Done Reply Inline Actions This looks like we're missing a computeKnownBitsForTargetNode handling for a X86ISD opcode? RKSimon: This looks like we're missing a computeKnownBitsForTargetNode handling for a X86ISD opcode?
				craig.topperAuthorUnsubmitted Done Reply Inline Actions There aren’t any target nodes here. The kshifts should be coming from isel for an insert_subvector. I think we’re probably missing a combine for insert_subvector into zero followed by an insert into undef. Maybe with an extract between them. craig.topper: There aren’t any target nodes here. The kshifts should be coming from isel for an…
				craig.topperAuthorUnsubmitted Done Reply Inline Actions For these cases we end up with a DAG like this. t33: v16i1 = BUILD_VECTOR Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, .... t6: v2i1 = setcc t2, t4, seteq:ch t34: v16i1 = insert_subvector t33, t6, Constant:i64<0> t35: v8i1 = extract_subvector t34, Constant:i64<0> t18: i8 = bitcast t35 t36: i32 = zero_extend t18 We can't simplfy the t33 input to the insert_subvector, since t6 is only 2 bits. We can probably add a combine to turn the bitcast into a v16i1->i16 bitcast from the insert_subvector to get rid of the extract. Then the zero_extend will be from i16 to i32 which we should be able to optimize out through an isel pattern that we have for that to use KMOVW. craig.topper: For these cases we end up with a DAG like this. t33: v16i1 = BUILD_VECTOR Constant:i8<0>…
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	ret i4 %4			ret i4 %4
	Show All 9 Lines
	; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 13 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 15 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 15 Lines
	; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	Show All 14 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 4,733 Lines • ▼ Show 20 Lines
	; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	Show All 10 Lines
	; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 13 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 15 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 15 Lines
	; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	Show All 14 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 4,753 Lines • ▼ Show 20 Lines
	; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	Show All 10 Lines
	; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 13 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 15 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 15 Lines
	; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	Show All 14 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1			; NoVLX-NEXT: vpbroadcastq (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, i64* %__b			%load = load i64, i64* %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 3,700 Lines • ▼ Show 20 Lines
	; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%1 = bitcast <2 x i64> %__b to <2 x double>			%1 = bitcast <2 x i64> %__b to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	Show All 10 Lines
	; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovapd (%rdi), %xmm1			; NoVLX-NEXT: vmovapd (%rdi), %xmm1
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x double>			%1 = bitcast <2 x i64> %load to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 11 Lines
	; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovddup {{.*#+}} xmm1 = mem[0,0]			; NoVLX-NEXT: vmovddup {{.*#+}} xmm1 = mem[0,0]
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load double, double* %__b			%load = load double, double* %__b
	%vec = insertelement <2 x double> undef, double %load, i32 0			%vec = insertelement <2 x double> undef, double %load, i32 0
	%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	Show All 14 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%1 = bitcast <2 x i64> %__b to <2 x double>			%1 = bitcast <2 x i64> %__b to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = bitcast i2 %__u to <2 x i1>			%3 = bitcast i2 %__u to <2 x i1>
	%4 = and <2 x i1> %2, %3			%4 = and <2 x i1> %2, %3
	Show All 14 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vmovapd (%rsi), %xmm1			; NoVLX-NEXT: vmovapd (%rsi), %xmm1
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load <2 x i64>, <2 x i64>* %__b			%load = load <2 x i64>, <2 x i64>* %__b
	%1 = bitcast <2 x i64> %load to <2 x double>			%1 = bitcast <2 x i64> %load to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = bitcast i2 %__u to <2 x i1>			%3 = bitcast i2 %__u to <2 x i1>
	Show All 15 Lines
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vmovddup {{.*#+}} xmm1 = mem[0,0]			; NoVLX-NEXT: vmovddup {{.*#+}} xmm1 = mem[0,0]
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0			; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0			; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
	; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load double, double* %__b			%load = load double, double* %__b
	%vec = insertelement <2 x double> undef, double %load, i32 0			%vec = insertelement <2 x double> undef, double %load, i32 0
	%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	▲ Show 20 Lines • Show All 2,190 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/oddshuffles.ll

	Show First 20 Lines • Show All 1,507 Lines • ▼ Show 20 Lines
	; SSE42-NEXT: movdqu %xmm0, 64(%rdi)			; SSE42-NEXT: movdqu %xmm0, 64(%rdi)
	; SSE42-NEXT: movdqu %xmm7, 80(%rdi)			; SSE42-NEXT: movdqu %xmm7, 80(%rdi)
	; SSE42-NEXT: movdqu %xmm1, (%rdi)			; SSE42-NEXT: movdqu %xmm1, (%rdi)
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	;			;
	; AVX1-LABEL: interleave_24i32_in:			; AVX1-LABEL: interleave_24i32_in:
	; AVX1: # %bb.0:			; AVX1: # %bb.0:
	; AVX1-NEXT: vmovupd (%rsi), %ymm0			; AVX1-NEXT: vmovupd (%rsi), %ymm0
	; AVX1-NEXT: vmovups 16(%rcx), %xmm1			; AVX1-NEXT: vmovups (%rdx), %xmm1
	; AVX1-NEXT: vmovups (%rdx), %xmm2			; AVX1-NEXT: vmovups 16(%rdx), %xmm2
	; AVX1-NEXT: vmovups 16(%rdx), %xmm3
	; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm3[3,0],xmm1[3,0]
	; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm1[2,1],xmm4[0,2]
	; AVX1-NEXT: vshufps {{.*#+}} xmm1 = xmm1[1,0],xmm3[1,0]
	; AVX1-NEXT: vshufps {{.*#+}} xmm1 = xmm1[2,0],xmm3[2,2]
	; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm1, %ymm1
	; AVX1-NEXT: vpermilpd {{.*#+}} ymm3 = ymm0[1,1,3,3]
	; AVX1-NEXT: vperm2f128 {{.*#+}} ymm3 = ymm3[2,3,2,3]
	; AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm1[0,1],ymm3[2],ymm1[3,4],ymm3[5],ymm1[6,7]
	; AVX1-NEXT: vmovups (%rsi), %xmm3			; AVX1-NEXT: vmovups (%rsi), %xmm3
	; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm3[2,0],xmm2[2,0]			; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm3[2,0],xmm1[2,0]
	; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm2[1,1],xmm4[0,2]			; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm1[1,1],xmm4[0,2]
	; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[0,0],xmm3[0,0]			; AVX1-NEXT: vshufps {{.*#+}} xmm1 = xmm1[0,0],xmm3[0,0]
	; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[2,0],xmm3[2,1]			; AVX1-NEXT: vshufps {{.*#+}} xmm1 = xmm1[2,0],xmm3[2,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm2, %ymm2			; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm1, %ymm1
	; AVX1-NEXT: vpermilps {{.*#+}} xmm3 = mem[0,1,0,1]			; AVX1-NEXT: vpermilps {{.*#+}} xmm3 = mem[0,1,0,1]
	; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm3, %ymm3			; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm3, %ymm3
				; AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm1[0,1],ymm3[2],ymm1[3,4],ymm3[5],ymm1[6,7]
				; AVX1-NEXT: vmovups 16(%rcx), %xmm3
				; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm2[3,0],xmm3[3,0]
				; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm3[2,1],xmm4[0,2]
				; AVX1-NEXT: vshufps {{.*#+}} xmm3 = xmm3[1,0],xmm2[1,0]
				; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm3[2,0],xmm2[2,2]
				; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm2, %ymm2
				; AVX1-NEXT: vpermilpd {{.*#+}} ymm3 = ymm0[1,1,3,3]
				; AVX1-NEXT: vperm2f128 {{.*#+}} ymm3 = ymm3[2,3,2,3]
	; AVX1-NEXT: vblendps {{.*#+}} ymm2 = ymm2[0,1],ymm3[2],ymm2[3,4],ymm3[5],ymm2[6,7]			; AVX1-NEXT: vblendps {{.*#+}} ymm2 = ymm2[0,1],ymm3[2],ymm2[3,4],ymm3[5],ymm2[6,7]
	; AVX1-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,2]
	; AVX1-NEXT: vpermilpd {{.*#+}} ymm3 = mem[1,1,2,2]			; AVX1-NEXT: vpermilpd {{.*#+}} ymm3 = mem[1,1,2,2]
				; AVX1-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,2]
	; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm3[0],ymm0[1],ymm3[2,3],ymm0[4],ymm3[5,6],ymm0[7]			; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm3[0],ymm0[1],ymm3[2,3],ymm0[4],ymm3[5,6],ymm0[7]
	; AVX1-NEXT: vpermilps {{.*#+}} ymm3 = mem[0,0,3,3,4,4,7,7]			; AVX1-NEXT: vpermilps {{.*#+}} ymm3 = mem[0,0,3,3,4,4,7,7]
	; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm3[2],ymm0[3,4],ymm3[5],ymm0[6,7]			; AVX1-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm3[2],ymm0[3,4],ymm3[5],ymm0[6,7]
	; AVX1-NEXT: vmovups %ymm0, 32(%rdi)			; AVX1-NEXT: vmovups %ymm0, 32(%rdi)
	; AVX1-NEXT: vmovups %ymm2, (%rdi)			; AVX1-NEXT: vmovups %ymm2, 64(%rdi)
	; AVX1-NEXT: vmovups %ymm1, 64(%rdi)			; AVX1-NEXT: vmovups %ymm1, (%rdi)
	; AVX1-NEXT: vzeroupper			; AVX1-NEXT: vzeroupper
	; AVX1-NEXT: retq			; AVX1-NEXT: retq
	;			;
	; AVX2-SLOW-LABEL: interleave_24i32_in:			; AVX2-SLOW-LABEL: interleave_24i32_in:
	; AVX2-SLOW: # %bb.0:			; AVX2-SLOW: # %bb.0:
	; AVX2-SLOW-NEXT: vmovups (%rsi), %ymm0			; AVX2-SLOW-NEXT: vmovups (%rsi), %ymm0
	; AVX2-SLOW-NEXT: vmovups (%rdx), %ymm1			; AVX2-SLOW-NEXT: vmovups (%rdx), %ymm1
	; AVX2-SLOW-NEXT: vmovups (%rcx), %ymm2			; AVX2-SLOW-NEXT: vmovups (%rcx), %ymm2
	; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm3 = ymm2[2,1,3,3]			; AVX2-SLOW-NEXT: vpermilps {{.*#+}} xmm3 = mem[1,0,2,2]
	; AVX2-SLOW-NEXT: vpermilps {{.*#+}} ymm4 = ymm1[1,2,3,3,5,6,7,7]			; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm3 = ymm3[0,1,0,1]
	; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm4 = ymm4[2,2,2,3]			; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm4 = ymm0[0,0,2,1]
	; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm3 = ymm4[0],ymm3[1],ymm4[2,3],ymm3[4],ymm4[5,6],ymm3[7]			; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm3 = ymm4[0],ymm3[1],ymm4[2,3],ymm3[4],ymm4[5,6],ymm3[7]
	; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm4 = ymm0[0,3,3,3]			; AVX2-SLOW-NEXT: vbroadcastsd (%rcx), %ymm4
	; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm3 = ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]			; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm3 = ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]
	; AVX2-SLOW-NEXT: vpermilps {{.*#+}} xmm4 = mem[1,0,2,2]			; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm4 = ymm2[2,1,3,3]
	; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm4 = ymm4[0,1,0,1]			; AVX2-SLOW-NEXT: vpermilps {{.*#+}} ymm5 = ymm1[1,2,3,3,5,6,7,7]
	; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm5 = ymm0[0,0,2,1]			; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm5 = ymm5[2,2,2,3]
	; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm4 = ymm5[0],ymm4[1],ymm5[2,3],ymm4[4],ymm5[5,6],ymm4[7]			; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm4 = ymm5[0],ymm4[1],ymm5[2,3],ymm4[4],ymm5[5,6],ymm4[7]
	; AVX2-SLOW-NEXT: vbroadcastsd (%rcx), %ymm5			; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm5 = ymm0[0,3,3,3]
	; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm4 = ymm4[0,1],ymm5[2],ymm4[3,4],ymm5[5],ymm4[6,7]			; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm4 = ymm4[0,1],ymm5[2],ymm4[3,4],ymm5[5],ymm4[6,7]
	; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[1,1,2,2]			; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[1,1,2,2]
	; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm2 = ymm2[1,1,2,2]			; AVX2-SLOW-NEXT: vpermpd {{.*#+}} ymm2 = ymm2[1,1,2,2]
	; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm0 = ymm2[0],ymm0[1],ymm2[2,3],ymm0[4],ymm2[5,6],ymm0[7]			; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm0 = ymm2[0],ymm0[1],ymm2[2,3],ymm0[4],ymm2[5,6],ymm0[7]
	; AVX2-SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm1[0,0,3,3,4,4,7,7]			; AVX2-SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm1[0,0,3,3,4,4,7,7]
	; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]			; AVX2-SLOW-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]
	; AVX2-SLOW-NEXT: vmovups %ymm0, 32(%rdi)			; AVX2-SLOW-NEXT: vmovups %ymm0, 32(%rdi)
	; AVX2-SLOW-NEXT: vmovups %ymm4, (%rdi)			; AVX2-SLOW-NEXT: vmovups %ymm4, 64(%rdi)
	; AVX2-SLOW-NEXT: vmovups %ymm3, 64(%rdi)			; AVX2-SLOW-NEXT: vmovups %ymm3, (%rdi)
	; AVX2-SLOW-NEXT: vzeroupper			; AVX2-SLOW-NEXT: vzeroupper
	; AVX2-SLOW-NEXT: retq			; AVX2-SLOW-NEXT: retq
	;			;
	; AVX2-FAST-LABEL: interleave_24i32_in:			; AVX2-FAST-LABEL: interleave_24i32_in:
	; AVX2-FAST: # %bb.0:			; AVX2-FAST: # %bb.0:
	; AVX2-FAST-NEXT: vmovups (%rsi), %ymm0			; AVX2-FAST-NEXT: vmovups (%rsi), %ymm0
	; AVX2-FAST-NEXT: vmovups (%rdx), %ymm1			; AVX2-FAST-NEXT: vmovups (%rdx), %ymm1
	; AVX2-FAST-NEXT: vmovups (%rcx), %ymm2			; AVX2-FAST-NEXT: vmovups (%rcx), %ymm2
	; AVX2-FAST-NEXT: vmovaps {{.*#+}} ymm3 = [5,6,5,6,5,6,7,7]			; AVX2-FAST-NEXT: vbroadcastf128 {{.*#+}} ymm3 = [1,0,2,2,1,0,2,2]
				; AVX2-FAST-NEXT: # ymm3 = mem[0,1,0,1]
	; AVX2-FAST-NEXT: vpermps %ymm1, %ymm3, %ymm3			; AVX2-FAST-NEXT: vpermps %ymm1, %ymm3, %ymm3
	; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm4 = ymm2[2,1,3,3]			; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm4 = ymm0[0,0,2,1]
	; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm3 = ymm3[0],ymm4[1],ymm3[2,3],ymm4[4],ymm3[5,6],ymm4[7]			; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm3 = ymm4[0],ymm3[1],ymm4[2,3],ymm3[4],ymm4[5,6],ymm3[7]
	; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm4 = ymm0[0,3,3,3]			; AVX2-FAST-NEXT: vbroadcastsd (%rcx), %ymm4
	; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm3 = ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]			; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm3 = ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]
	; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm4 = ymm0[1,1,2,2]			; AVX2-FAST-NEXT: vmovaps {{.*#+}} ymm4 = [5,6,5,6,5,6,7,7]
				; AVX2-FAST-NEXT: vpermps %ymm1, %ymm4, %ymm4
				; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm5 = ymm2[2,1,3,3]
				; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm4 = ymm4[0],ymm5[1],ymm4[2,3],ymm5[4],ymm4[5,6],ymm5[7]
				; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm5 = ymm0[0,3,3,3]
				; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm4 = ymm4[0,1],ymm5[2],ymm4[3,4],ymm5[5],ymm4[6,7]
				; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[1,1,2,2]
	; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm2 = ymm2[1,1,2,2]			; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm2 = ymm2[1,1,2,2]
	; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm2 = ymm2[0],ymm4[1],ymm2[2,3],ymm4[4],ymm2[5,6],ymm4[7]			; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm0 = ymm2[0],ymm0[1],ymm2[2,3],ymm0[4],ymm2[5,6],ymm0[7]
	; AVX2-FAST-NEXT: vpermilps {{.*#+}} ymm4 = ymm1[0,0,3,3,4,4,7,7]			; AVX2-FAST-NEXT: vpermilps {{.*#+}} ymm1 = ymm1[0,0,3,3,4,4,7,7]
	; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm2 = ymm2[0,1],ymm4[2],ymm2[3,4],ymm4[5],ymm2[6,7]
	; AVX2-FAST-NEXT: vbroadcastf128 {{.*#+}} ymm4 = [1,0,2,2,1,0,2,2]
	; AVX2-FAST-NEXT: # ymm4 = mem[0,1,0,1]
	; AVX2-FAST-NEXT: vpermps %ymm1, %ymm4, %ymm1
	; AVX2-FAST-NEXT: vpermpd {{.*#+}} ymm0 = ymm0[0,0,2,1]
	; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0],ymm1[1],ymm0[2,3],ymm1[4],ymm0[5,6],ymm1[7]
	; AVX2-FAST-NEXT: vbroadcastsd (%rcx), %ymm1
	; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]			; AVX2-FAST-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]
	; AVX2-FAST-NEXT: vmovups %ymm0, (%rdi)			; AVX2-FAST-NEXT: vmovups %ymm0, 32(%rdi)
	; AVX2-FAST-NEXT: vmovups %ymm2, 32(%rdi)			; AVX2-FAST-NEXT: vmovups %ymm4, 64(%rdi)
	; AVX2-FAST-NEXT: vmovups %ymm3, 64(%rdi)			; AVX2-FAST-NEXT: vmovups %ymm3, (%rdi)
	; AVX2-FAST-NEXT: vzeroupper			; AVX2-FAST-NEXT: vzeroupper
	; AVX2-FAST-NEXT: retq			; AVX2-FAST-NEXT: retq
	;			;
	; XOP-LABEL: interleave_24i32_in:			; XOP-LABEL: interleave_24i32_in:
	; XOP: # %bb.0:			; XOP: # %bb.0:
	; XOP-NEXT: vmovupd (%rsi), %ymm0			; XOP-NEXT: vmovupd (%rsi), %ymm0
	; XOP-NEXT: vmovups (%rcx), %ymm1			; XOP-NEXT: vmovups (%rcx), %ymm1
	; XOP-NEXT: vmovups 16(%rcx), %xmm2			; XOP-NEXT: vmovups (%rdx), %xmm2
	; XOP-NEXT: vmovups (%rdx), %xmm3			; XOP-NEXT: vmovups 16(%rdx), %xmm3
	; XOP-NEXT: vmovups 16(%rdx), %xmm4
	; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[3,0],xmm2[3,0]
	; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm2[2,1],xmm5[0,2]
	; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[1,0],xmm4[1,0]
	; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[2,0],xmm4[2,2]
	; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm2, %ymm2
	; XOP-NEXT: vpermilpd {{.*#+}} ymm4 = ymm0[1,1,3,3]
	; XOP-NEXT: vperm2f128 {{.*#+}} ymm4 = ymm4[2,3,2,3]
	; XOP-NEXT: vblendps {{.*#+}} ymm2 = ymm2[0,1],ymm4[2],ymm2[3,4],ymm4[5],ymm2[6,7]
	; XOP-NEXT: vmovups (%rsi), %xmm4			; XOP-NEXT: vmovups (%rsi), %xmm4
	; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,0],xmm3[2,0]			; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,0],xmm2[2,0]
	; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm3[1,1],xmm5[0,2]			; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm2[1,1],xmm5[0,2]
	; XOP-NEXT: vshufps {{.*#+}} xmm3 = xmm3[0,0],xmm4[0,0]			; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[0,0],xmm4[0,0]
	; XOP-NEXT: vshufps {{.*#+}} xmm3 = xmm3[2,0],xmm4[2,1]			; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[2,0],xmm4[2,1]
	; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm3, %ymm3			; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm2, %ymm2
	; XOP-NEXT: vpermilps {{.*#+}} xmm4 = mem[0,1,0,1]			; XOP-NEXT: vpermilps {{.*#+}} xmm4 = mem[0,1,0,1]
	; XOP-NEXT: vinsertf128 $1, %xmm4, %ymm4, %ymm4			; XOP-NEXT: vinsertf128 $1, %xmm4, %ymm4, %ymm4
				; XOP-NEXT: vblendps {{.*#+}} ymm2 = ymm2[0,1],ymm4[2],ymm2[3,4],ymm4[5],ymm2[6,7]
				; XOP-NEXT: vmovups 16(%rcx), %xmm4
				; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm3[3,0],xmm4[3,0]
				; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,1],xmm5[0,2]
				; XOP-NEXT: vshufps {{.*#+}} xmm4 = xmm4[1,0],xmm3[1,0]
				; XOP-NEXT: vshufps {{.*#+}} xmm3 = xmm4[2,0],xmm3[2,2]
				; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm3, %ymm3
				; XOP-NEXT: vpermilpd {{.*#+}} ymm4 = ymm0[1,1,3,3]
				; XOP-NEXT: vperm2f128 {{.*#+}} ymm4 = ymm4[2,3,2,3]
	; XOP-NEXT: vblendps {{.*#+}} ymm3 = ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]			; XOP-NEXT: vblendps {{.*#+}} ymm3 = ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]
	; XOP-NEXT: vpermil2ps {{.*#+}} ymm0 = ymm1[2],ymm0[3],ymm1[2,3],ymm0[4],ymm1[5,4],ymm0[5]			; XOP-NEXT: vpermil2ps {{.*#+}} ymm0 = ymm1[2],ymm0[3],ymm1[2,3],ymm0[4],ymm1[5,4],ymm0[5]
	; XOP-NEXT: vpermilps {{.*#+}} ymm1 = mem[0,0,3,3,4,4,7,7]			; XOP-NEXT: vpermilps {{.*#+}} ymm1 = mem[0,0,3,3,4,4,7,7]
	; XOP-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]			; XOP-NEXT: vblendps {{.*#+}} ymm0 = ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]
	; XOP-NEXT: vmovups %ymm0, 32(%rdi)			; XOP-NEXT: vmovups %ymm0, 32(%rdi)
	; XOP-NEXT: vmovups %ymm3, (%rdi)			; XOP-NEXT: vmovups %ymm3, 64(%rdi)
	; XOP-NEXT: vmovups %ymm2, 64(%rdi)			; XOP-NEXT: vmovups %ymm2, (%rdi)
	; XOP-NEXT: vzeroupper			; XOP-NEXT: vzeroupper
	; XOP-NEXT: retq			; XOP-NEXT: retq
	%s1 = load <8 x i32>, <8 x i32>* %q1, align 4			%s1 = load <8 x i32>, <8 x i32>* %q1, align 4
	%s2 = load <8 x i32>, <8 x i32>* %q2, align 4			%s2 = load <8 x i32>, <8 x i32>* %q2, align 4
	%s3 = load <8 x i32>, <8 x i32>* %q3, align 4			%s3 = load <8 x i32>, <8 x i32>* %q3, align 4
	%t1 = shufflevector <8 x i32> %s1, <8 x i32> %s2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%t1 = shufflevector <8 x i32> %s1, <8 x i32> %s2, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	%t2 = shufflevector <8 x i32> %s3, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%t2 = shufflevector <8 x i32> %s3, <8 x i32> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%interleaved = shufflevector <16 x i32> %t1, <16 x i32> %t2, <24 x i32> <i32 0, i32 8, i32 16, i32 1, i32 9, i32 17, i32 2, i32 10, i32 18, i32 3, i32 11, i32 19, i32 4, i32 12, i32 20, i32 5, i32 13, i32 21, i32 6, i32 14, i32 22, i32 7, i32 15, i32 23>			%interleaved = shufflevector <16 x i32> %t1, <16 x i32> %t2, <24 x i32> <i32 0, i32 8, i32 16, i32 1, i32 9, i32 17, i32 2, i32 10, i32 18, i32 3, i32 11, i32 19, i32 4, i32 12, i32 20, i32 5, i32 13, i32 21, i32 6, i32 14, i32 22, i32 7, i32 15, i32 23>
	▲ Show 20 Lines • Show All 449 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vec_saddo.ll

	Show First 20 Lines • Show All 1,785 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: movq %rbp, 24(%r11)			; AVX2-NEXT: movq %rbp, 24(%r11)
	; AVX2-NEXT: movq %rsi, 8(%r11)			; AVX2-NEXT: movq %rsi, 8(%r11)
	; AVX2-NEXT: popq %rbx			; AVX2-NEXT: popq %rbx
	; AVX2-NEXT: popq %rbp			; AVX2-NEXT: popq %rbp
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: saddo_v2i128:			; AVX512-LABEL: saddo_v2i128:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: pushq %r14			; AVX512-NEXT: pushq %rbp
	; AVX512-NEXT: pushq %rbx			; AVX512-NEXT: pushq %rbx
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r11			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r11
	; AVX512-NEXT: addq {{[0-9]+}}(%rsp), %rdx			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10
	; AVX512-NEXT: movq %rcx, %r14
	; AVX512-NEXT: adcq %r11, %r14
	; AVX512-NEXT: setns %bl
	; AVX512-NEXT: testq %rcx, %rcx
	; AVX512-NEXT: setns %cl
	; AVX512-NEXT: cmpb %bl, %cl
	; AVX512-NEXT: setne %bl
	; AVX512-NEXT: testq %r11, %r11
	; AVX512-NEXT: setns %al
	; AVX512-NEXT: cmpb %al, %cl
	; AVX512-NEXT: sete %al
	; AVX512-NEXT: andb %bl, %al
	; AVX512-NEXT: kmovd %eax, %k0
	; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: testq %r9, %r9			; AVX512-NEXT: testq %r9, %r9
	; AVX512-NEXT: setns %al			; AVX512-NEXT: setns %al
	; AVX512-NEXT: testq %rsi, %rsi			; AVX512-NEXT: testq %rsi, %rsi
	; AVX512-NEXT: setns %cl			; AVX512-NEXT: setns %bl
	; AVX512-NEXT: cmpb %al, %cl			; AVX512-NEXT: cmpb %al, %bl
	; AVX512-NEXT: sete %al			; AVX512-NEXT: sete %bpl
	; AVX512-NEXT: addq %r8, %rdi			; AVX512-NEXT: addq %r8, %rdi
	; AVX512-NEXT: adcq %r9, %rsi			; AVX512-NEXT: adcq %r9, %rsi
				; AVX512-NEXT: setns %al
				; AVX512-NEXT: cmpb %al, %bl
				; AVX512-NEXT: setne %al
				; AVX512-NEXT: andb %bpl, %al
				; AVX512-NEXT: addq {{[0-9]+}}(%rsp), %rdx
				; AVX512-NEXT: movq %rcx, %rbp
				; AVX512-NEXT: adcq %r10, %rbp
				; AVX512-NEXT: setns %bl
				; AVX512-NEXT: testq %rcx, %rcx
				; AVX512-NEXT: setns %cl
				; AVX512-NEXT: cmpb %bl, %cl
				; AVX512-NEXT: setne %r8b
				; AVX512-NEXT: testq %r10, %r10
	; AVX512-NEXT: setns %bl			; AVX512-NEXT: setns %bl
	; AVX512-NEXT: cmpb %bl, %cl			; AVX512-NEXT: cmpb %bl, %cl
	; AVX512-NEXT: setne %cl			; AVX512-NEXT: sete %cl
	; AVX512-NEXT: andb %al, %cl			; AVX512-NEXT: andb %r8b, %cl
	; AVX512-NEXT: andl $1, %ecx			; AVX512-NEXT: kmovd %ecx, %k0
	; AVX512-NEXT: kmovw %ecx, %k1			; AVX512-NEXT: kshiftlw $1, %k0, %k0
				; AVX512-NEXT: andl $1, %eax
				; AVX512-NEXT: kmovw %eax, %k1
	; AVX512-NEXT: korw %k0, %k1, %k1			; AVX512-NEXT: korw %k0, %k1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: movq %rdx, 16(%r10)			; AVX512-NEXT: movq %rdx, 16(%r11)
	; AVX512-NEXT: movq %rdi, (%r10)			; AVX512-NEXT: movq %rdi, (%r11)
	; AVX512-NEXT: movq %r14, 24(%r10)			; AVX512-NEXT: movq %rbp, 24(%r11)
	; AVX512-NEXT: movq %rsi, 8(%r10)			; AVX512-NEXT: movq %rsi, 8(%r11)
	; AVX512-NEXT: popq %rbx			; AVX512-NEXT: popq %rbx
	; AVX512-NEXT: popq %r14			; AVX512-NEXT: popq %rbp
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%t = call {<2 x i128>, <2 x i1>} @llvm.sadd.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)			%t = call {<2 x i128>, <2 x i1>} @llvm.sadd.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)
	%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0			%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0
	%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1			%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1
	%res = sext <2 x i1> %obit to <2 x i32>			%res = sext <2 x i1> %obit to <2 x i32>
	store <2 x i128> %val, <2 x i128>* %p2			store <2 x i128> %val, <2 x i128>* %p2
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

llvm/test/CodeGen/X86/vec_smulo.ll

	Show First 20 Lines • Show All 2,599 Lines • ▼ Show 20 Lines
	; AVX512-NEXT: movq %rbx, %rdi			; AVX512-NEXT: movq %rbx, %rdi
	; AVX512-NEXT: movq %r15, %rsi			; AVX512-NEXT: movq %r15, %rsi
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %rdx			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %rdx
	; AVX512-NEXT: movq %r13, %rcx			; AVX512-NEXT: movq %r13, %rcx
	; AVX512-NEXT: callq __muloti4			; AVX512-NEXT: callq __muloti4
	; AVX512-NEXT: cmpq $0, {{[0-9]+}}(%rsp)			; AVX512-NEXT: cmpq $0, {{[0-9]+}}(%rsp)
	; AVX512-NEXT: setne %cl			; AVX512-NEXT: setne %cl
	; AVX512-NEXT: kmovd %ecx, %k0			; AVX512-NEXT: kmovd %ecx, %k0
	; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: cmpq $0, {{[0-9]+}}(%rsp)			; AVX512-NEXT: cmpq $0, {{[0-9]+}}(%rsp)
	; AVX512-NEXT: setne %cl			; AVX512-NEXT: setne %cl
				; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: andl $1, %ecx			; AVX512-NEXT: andl $1, %ecx
	; AVX512-NEXT: kmovw %ecx, %k1			; AVX512-NEXT: kmovw %ecx, %k1
	; AVX512-NEXT: korw %k0, %k1, %k1			; AVX512-NEXT: korw %k0, %k1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: movq %rdx, 24(%r12)			; AVX512-NEXT: movq %rdx, 24(%r12)
	; AVX512-NEXT: movq %rax, 16(%r12)			; AVX512-NEXT: movq %rax, 16(%r12)
	; AVX512-NEXT: movq %rbp, 8(%r12)			; AVX512-NEXT: movq %rbp, 8(%r12)
	Show All 16 Lines

llvm/test/CodeGen/X86/vec_ssubo.ll

	Show First 20 Lines • Show All 1,824 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: movq %rbp, 24(%r11)			; AVX2-NEXT: movq %rbp, 24(%r11)
	; AVX2-NEXT: movq %rsi, 8(%r11)			; AVX2-NEXT: movq %rsi, 8(%r11)
	; AVX2-NEXT: popq %rbx			; AVX2-NEXT: popq %rbx
	; AVX2-NEXT: popq %rbp			; AVX2-NEXT: popq %rbp
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: ssubo_v2i128:			; AVX512-LABEL: ssubo_v2i128:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: pushq %r14			; AVX512-NEXT: pushq %rbp
	; AVX512-NEXT: pushq %rbx			; AVX512-NEXT: pushq %rbx
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r11			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r11
	; AVX512-NEXT: subq {{[0-9]+}}(%rsp), %rdx			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10
	; AVX512-NEXT: movq %rcx, %r14
	; AVX512-NEXT: sbbq %r11, %r14
	; AVX512-NEXT: setns %bl
	; AVX512-NEXT: testq %rcx, %rcx
	; AVX512-NEXT: setns %cl
	; AVX512-NEXT: cmpb %bl, %cl
	; AVX512-NEXT: setne %bl
	; AVX512-NEXT: testq %r11, %r11
	; AVX512-NEXT: setns %al
	; AVX512-NEXT: cmpb %al, %cl
	; AVX512-NEXT: setne %al
	; AVX512-NEXT: andb %bl, %al
	; AVX512-NEXT: kmovd %eax, %k0
	; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: testq %r9, %r9			; AVX512-NEXT: testq %r9, %r9
	; AVX512-NEXT: setns %al			; AVX512-NEXT: setns %al
	; AVX512-NEXT: testq %rsi, %rsi			; AVX512-NEXT: testq %rsi, %rsi
	; AVX512-NEXT: setns %cl			; AVX512-NEXT: setns %bl
	; AVX512-NEXT: cmpb %al, %cl			; AVX512-NEXT: cmpb %al, %bl
	; AVX512-NEXT: setne %al			; AVX512-NEXT: setne %bpl
	; AVX512-NEXT: subq %r8, %rdi			; AVX512-NEXT: subq %r8, %rdi
	; AVX512-NEXT: sbbq %r9, %rsi			; AVX512-NEXT: sbbq %r9, %rsi
				; AVX512-NEXT: setns %al
				; AVX512-NEXT: cmpb %al, %bl
				; AVX512-NEXT: setne %al
				; AVX512-NEXT: andb %bpl, %al
				; AVX512-NEXT: subq {{[0-9]+}}(%rsp), %rdx
				; AVX512-NEXT: movq %rcx, %rbp
				; AVX512-NEXT: sbbq %r10, %rbp
				; AVX512-NEXT: setns %bl
				; AVX512-NEXT: testq %rcx, %rcx
				; AVX512-NEXT: setns %cl
				; AVX512-NEXT: cmpb %bl, %cl
				; AVX512-NEXT: setne %r8b
				; AVX512-NEXT: testq %r10, %r10
	; AVX512-NEXT: setns %bl			; AVX512-NEXT: setns %bl
	; AVX512-NEXT: cmpb %bl, %cl			; AVX512-NEXT: cmpb %bl, %cl
	; AVX512-NEXT: setne %cl			; AVX512-NEXT: setne %cl
	; AVX512-NEXT: andb %al, %cl			; AVX512-NEXT: andb %r8b, %cl
	; AVX512-NEXT: andl $1, %ecx			; AVX512-NEXT: kmovd %ecx, %k0
	; AVX512-NEXT: kmovw %ecx, %k1			; AVX512-NEXT: kshiftlw $1, %k0, %k0
				; AVX512-NEXT: andl $1, %eax
				; AVX512-NEXT: kmovw %eax, %k1
	; AVX512-NEXT: korw %k0, %k1, %k1			; AVX512-NEXT: korw %k0, %k1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: movq %rdx, 16(%r10)			; AVX512-NEXT: movq %rdx, 16(%r11)
	; AVX512-NEXT: movq %rdi, (%r10)			; AVX512-NEXT: movq %rdi, (%r11)
	; AVX512-NEXT: movq %r14, 24(%r10)			; AVX512-NEXT: movq %rbp, 24(%r11)
	; AVX512-NEXT: movq %rsi, 8(%r10)			; AVX512-NEXT: movq %rsi, 8(%r11)
	; AVX512-NEXT: popq %rbx			; AVX512-NEXT: popq %rbx
	; AVX512-NEXT: popq %r14			; AVX512-NEXT: popq %rbp
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%t = call {<2 x i128>, <2 x i1>} @llvm.ssub.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)			%t = call {<2 x i128>, <2 x i1>} @llvm.ssub.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)
	%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0			%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0
	%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1			%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1
	%res = sext <2 x i1> %obit to <2 x i32>			%res = sext <2 x i1> %obit to <2 x i32>
	store <2 x i128> %val, <2 x i128>* %p2			store <2 x i128> %val, <2 x i128>* %p2
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

llvm/test/CodeGen/X86/vec_uaddo.ll

	Show First 20 Lines • Show All 1,276 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: movq %rdi, (%r10)			; AVX2-NEXT: movq %rdi, (%r10)
	; AVX2-NEXT: movq %rcx, 24(%r10)			; AVX2-NEXT: movq %rcx, 24(%r10)
	; AVX2-NEXT: movq %rsi, 8(%r10)			; AVX2-NEXT: movq %rsi, 8(%r10)
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: uaddo_v2i128:			; AVX512-LABEL: uaddo_v2i128:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10
				; AVX512-NEXT: addq %r8, %rdi
				; AVX512-NEXT: adcq %r9, %rsi
				; AVX512-NEXT: setb %r8b
	; AVX512-NEXT: addq {{[0-9]+}}(%rsp), %rdx			; AVX512-NEXT: addq {{[0-9]+}}(%rsp), %rdx
	; AVX512-NEXT: adcq {{[0-9]+}}(%rsp), %rcx			; AVX512-NEXT: adcq {{[0-9]+}}(%rsp), %rcx
	; AVX512-NEXT: setb %al			; AVX512-NEXT: setb %al
	; AVX512-NEXT: kmovd %eax, %k0			; AVX512-NEXT: kmovd %eax, %k0
	; AVX512-NEXT: kshiftlw $1, %k0, %k0			; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: addq %r8, %rdi			; AVX512-NEXT: andl $1, %r8d
	; AVX512-NEXT: adcq %r9, %rsi			; AVX512-NEXT: kmovw %r8d, %k1
	; AVX512-NEXT: setb %al
	; AVX512-NEXT: andl $1, %eax
	; AVX512-NEXT: kmovw %eax, %k1
	; AVX512-NEXT: korw %k0, %k1, %k1			; AVX512-NEXT: korw %k0, %k1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: movq %rdx, 16(%r10)			; AVX512-NEXT: movq %rdx, 16(%r10)
	; AVX512-NEXT: movq %rdi, (%r10)			; AVX512-NEXT: movq %rdi, (%r10)
	; AVX512-NEXT: movq %rcx, 24(%r10)			; AVX512-NEXT: movq %rcx, 24(%r10)
	; AVX512-NEXT: movq %rsi, 8(%r10)			; AVX512-NEXT: movq %rsi, 8(%r10)
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%t = call {<2 x i128>, <2 x i1>} @llvm.uadd.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)			%t = call {<2 x i128>, <2 x i1>} @llvm.uadd.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)
	%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0			%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0
	%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1			%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1
	%res = sext <2 x i1> %obit to <2 x i32>			%res = sext <2 x i1> %obit to <2 x i32>
	store <2 x i128> %val, <2 x i128>* %p2			store <2 x i128> %val, <2 x i128>* %p2
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

llvm/test/CodeGen/X86/vec_umulo.ll

	Show First 20 Lines • Show All 2,445 Lines • ▼ Show 20 Lines
	; AVX512-LABEL: umulo_v2i128:			; AVX512-LABEL: umulo_v2i128:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: pushq %rbp			; AVX512-NEXT: pushq %rbp
	; AVX512-NEXT: pushq %r15			; AVX512-NEXT: pushq %r15
	; AVX512-NEXT: pushq %r14			; AVX512-NEXT: pushq %r14
	; AVX512-NEXT: pushq %r13			; AVX512-NEXT: pushq %r13
	; AVX512-NEXT: pushq %r12			; AVX512-NEXT: pushq %r12
	; AVX512-NEXT: pushq %rbx			; AVX512-NEXT: pushq %rbx
	; AVX512-NEXT: movq %rcx, %rax			; AVX512-NEXT: movq %r9, %r10
	; AVX512-NEXT: movq %rdx, %r12			; AVX512-NEXT: movq %rcx, %r9
	; AVX512-NEXT: movq %rdi, %r11			; AVX512-NEXT: movq %rdx, %r11
				; AVX512-NEXT: movq %rsi, %rax
				; AVX512-NEXT: movq %rdi, %rsi
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r14			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r14
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r15			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r15
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r12
	; AVX512-NEXT: testq %r10, %r10			; AVX512-NEXT: testq %r10, %r10
	; AVX512-NEXT: setne %dl			; AVX512-NEXT: setne %dl
	; AVX512-NEXT: testq %rcx, %rcx			; AVX512-NEXT: testq %rax, %rax
	; AVX512-NEXT: setne %r13b			; AVX512-NEXT: setne %bl
	; AVX512-NEXT: andb %dl, %r13b			; AVX512-NEXT: andb %dl, %bl
	; AVX512-NEXT: mulq %r15			; AVX512-NEXT: mulq %r8
	; AVX512-NEXT: movq %rax, %rdi			; AVX512-NEXT: movq %rax, %r13
	; AVX512-NEXT: seto %bpl			; AVX512-NEXT: seto %bpl
	; AVX512-NEXT: movq %r10, %rax			; AVX512-NEXT: movq %r10, %rax
	; AVX512-NEXT: mulq %r12			; AVX512-NEXT: mulq %rdi
	; AVX512-NEXT: movq %rax, %rbx			; AVX512-NEXT: movq %rax, %rdi
	; AVX512-NEXT: seto %cl			; AVX512-NEXT: seto %cl
	; AVX512-NEXT: orb %bpl, %cl			; AVX512-NEXT: orb %bpl, %cl
	; AVX512-NEXT: addq %rdi, %rbx			; AVX512-NEXT: addq %r13, %rdi
	; AVX512-NEXT: movq %r12, %rax
	; AVX512-NEXT: mulq %r15
	; AVX512-NEXT: movq %rax, %r10
	; AVX512-NEXT: movq %rdx, %r15
	; AVX512-NEXT: addq %rbx, %r15
	; AVX512-NEXT: setb %al
	; AVX512-NEXT: orb %cl, %al
	; AVX512-NEXT: orb %r13b, %al
	; AVX512-NEXT: kmovd %eax, %k0
	; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: testq %r9, %r9
	; AVX512-NEXT: setne %al
	; AVX512-NEXT: testq %rsi, %rsi
	; AVX512-NEXT: setne %cl
	; AVX512-NEXT: andb %al, %cl
	; AVX512-NEXT: movq %rsi, %rax			; AVX512-NEXT: movq %rsi, %rax
	; AVX512-NEXT: mulq %r8			; AVX512-NEXT: mulq %r8
	; AVX512-NEXT: movq %rax, %rsi			; AVX512-NEXT: movq %rax, %r8
	; AVX512-NEXT: seto %bpl			; AVX512-NEXT: movq %rdx, %r10
				; AVX512-NEXT: addq %rdi, %r10
				; AVX512-NEXT: setb %sil
				; AVX512-NEXT: orb %cl, %sil
				; AVX512-NEXT: orb %bl, %sil
				; AVX512-NEXT: testq %r12, %r12
				; AVX512-NEXT: setne %al
				; AVX512-NEXT: testq %r9, %r9
				; AVX512-NEXT: setne %bpl
				; AVX512-NEXT: andb %al, %bpl
	; AVX512-NEXT: movq %r9, %rax			; AVX512-NEXT: movq %r9, %rax
	; AVX512-NEXT: mulq %r11			; AVX512-NEXT: mulq %r15
	; AVX512-NEXT: movq %rax, %rdi			; AVX512-NEXT: movq %rax, %rdi
	; AVX512-NEXT: seto %bl			; AVX512-NEXT: seto %r9b
	; AVX512-NEXT: orb %bpl, %bl			; AVX512-NEXT: movq %r12, %rax
	; AVX512-NEXT: addq %rsi, %rdi			; AVX512-NEXT: mulq %r11
				; AVX512-NEXT: movq %rax, %rbx
				; AVX512-NEXT: seto %cl
				; AVX512-NEXT: orb %r9b, %cl
				; AVX512-NEXT: addq %rdi, %rbx
	; AVX512-NEXT: movq %r11, %rax			; AVX512-NEXT: movq %r11, %rax
	; AVX512-NEXT: mulq %r8			; AVX512-NEXT: mulq %r15
	; AVX512-NEXT: addq %rdi, %rdx			; AVX512-NEXT: addq %rbx, %rdx
	; AVX512-NEXT: setb %sil			; AVX512-NEXT: setb %dil
	; AVX512-NEXT: orb %bl, %sil			; AVX512-NEXT: orb %cl, %dil
	; AVX512-NEXT: orb %cl, %sil			; AVX512-NEXT: orb %bpl, %dil
				; AVX512-NEXT: kmovd %edi, %k0
				; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: andl $1, %esi			; AVX512-NEXT: andl $1, %esi
	; AVX512-NEXT: kmovw %esi, %k1			; AVX512-NEXT: kmovw %esi, %k1
	; AVX512-NEXT: korw %k0, %k1, %k1			; AVX512-NEXT: korw %k0, %k1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: movq %r10, 16(%r14)			; AVX512-NEXT: movq %rax, 16(%r14)
	; AVX512-NEXT: movq %rax, (%r14)			; AVX512-NEXT: movq %r8, (%r14)
	; AVX512-NEXT: movq %r15, 24(%r14)			; AVX512-NEXT: movq %rdx, 24(%r14)
	; AVX512-NEXT: movq %rdx, 8(%r14)			; AVX512-NEXT: movq %r10, 8(%r14)
	; AVX512-NEXT: popq %rbx			; AVX512-NEXT: popq %rbx
				craig.topperAuthorUnsubmitted Done Reply Inline Actions By instruction count this is a regression, but I'm not sure exactly what the difference is. craig.topper: By instruction count this is a regression, but I'm not sure exactly what the difference is.
	; AVX512-NEXT: popq %r12			; AVX512-NEXT: popq %r12
	; AVX512-NEXT: popq %r13			; AVX512-NEXT: popq %r13
	; AVX512-NEXT: popq %r14			; AVX512-NEXT: popq %r14
	; AVX512-NEXT: popq %r15			; AVX512-NEXT: popq %r15
	; AVX512-NEXT: popq %rbp			; AVX512-NEXT: popq %rbp
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%t = call {<2 x i128>, <2 x i1>} @llvm.umul.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)			%t = call {<2 x i128>, <2 x i1>} @llvm.umul.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)
	%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0			%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0
	%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1			%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1
	%res = sext <2 x i1> %obit to <2 x i32>			%res = sext <2 x i1> %obit to <2 x i32>
	store <2 x i128> %val, <2 x i128>* %p2			store <2 x i128> %val, <2 x i128>* %p2
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

llvm/test/CodeGen/X86/vec_usubo.ll

	Show First 20 Lines • Show All 1,323 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: movq %rdi, (%r10)			; AVX2-NEXT: movq %rdi, (%r10)
	; AVX2-NEXT: movq %rcx, 24(%r10)			; AVX2-NEXT: movq %rcx, 24(%r10)
	; AVX2-NEXT: movq %rsi, 8(%r10)			; AVX2-NEXT: movq %rsi, 8(%r10)
	; AVX2-NEXT: retq			; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: usubo_v2i128:			; AVX512-LABEL: usubo_v2i128:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10			; AVX512-NEXT: movq {{[0-9]+}}(%rsp), %r10
				; AVX512-NEXT: subq %r8, %rdi
				; AVX512-NEXT: sbbq %r9, %rsi
				; AVX512-NEXT: setb %r8b
	; AVX512-NEXT: subq {{[0-9]+}}(%rsp), %rdx			; AVX512-NEXT: subq {{[0-9]+}}(%rsp), %rdx
	; AVX512-NEXT: sbbq {{[0-9]+}}(%rsp), %rcx			; AVX512-NEXT: sbbq {{[0-9]+}}(%rsp), %rcx
	; AVX512-NEXT: setb %al			; AVX512-NEXT: setb %al
	; AVX512-NEXT: kmovd %eax, %k0			; AVX512-NEXT: kmovd %eax, %k0
	; AVX512-NEXT: kshiftlw $1, %k0, %k0			; AVX512-NEXT: kshiftlw $1, %k0, %k0
	; AVX512-NEXT: subq %r8, %rdi			; AVX512-NEXT: andl $1, %r8d
	; AVX512-NEXT: sbbq %r9, %rsi			; AVX512-NEXT: kmovw %r8d, %k1
	; AVX512-NEXT: setb %al
	; AVX512-NEXT: andl $1, %eax
	; AVX512-NEXT: kmovw %eax, %k1
	; AVX512-NEXT: korw %k0, %k1, %k1			; AVX512-NEXT: korw %k0, %k1, %k1
	; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0			; AVX512-NEXT: vpcmpeqd %xmm0, %xmm0, %xmm0
	; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}			; AVX512-NEXT: vmovdqa32 %xmm0, %xmm0 {%k1} {z}
	; AVX512-NEXT: movq %rdx, 16(%r10)			; AVX512-NEXT: movq %rdx, 16(%r10)
	; AVX512-NEXT: movq %rdi, (%r10)			; AVX512-NEXT: movq %rdi, (%r10)
	; AVX512-NEXT: movq %rcx, 24(%r10)			; AVX512-NEXT: movq %rcx, 24(%r10)
	; AVX512-NEXT: movq %rsi, 8(%r10)			; AVX512-NEXT: movq %rsi, 8(%r10)
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	%t = call {<2 x i128>, <2 x i1>} @llvm.usub.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)			%t = call {<2 x i128>, <2 x i1>} @llvm.usub.with.overflow.v2i128(<2 x i128> %a0, <2 x i128> %a1)
	%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0			%val = extractvalue {<2 x i128>, <2 x i1>} %t, 0
	%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1			%obit = extractvalue {<2 x i128>, <2 x i1>} %t, 1
	%res = sext <2 x i1> %obit to <2 x i32>			%res = sext <2 x i1> %obit to <2 x i32>
	store <2 x i128> %val, <2 x i128>* %p2			store <2 x i128> %val, <2 x i128>* %p2
	ret <2 x i32> %res			ret <2 x i32> %res
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 216229

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll

llvm/test/CodeGen/X86/oddshuffles.ll

llvm/test/CodeGen/X86/vec_saddo.ll

llvm/test/CodeGen/X86/vec_smulo.ll

llvm/test/CodeGen/X86/vec_ssubo.ll

llvm/test/CodeGen/X86/vec_uaddo.ll

llvm/test/CodeGen/X86/vec_umulo.ll

llvm/test/CodeGen/X86/vec_usubo.ll

[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
ClosedPublic