This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
avx512-hadd-hsub.ll
-
scalarize-fp.ll
-
vector-partial-undef.ll

Differential D56875

[DAGCombiner] narrow vector binop with 2 insert subvector operands
ClosedPublic

Authored by spatel on Jan 17 2019, 12:19 PM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
efriedma

Commits

rGeffee52c59a0: [DAGCombiner] narrow vector binop with 2 insert subvector operands
rL351825: [DAGCombiner] narrow vector binop with 2 insert subvector operands

Summary

bo (ins undef, X, Z), (ins undef, Y, Z) --> ins undef, (bo X, Y), Z

This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests).

The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be undef because both inputs are undef in some vector lanes.

I think a similar match but checking for a constant operand might help some of the cases in D51553.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Jan 17 2019, 12:19 PM

Herald added a subscriber: mcrosier. · View Herald TranscriptJan 17 2019, 12:19 PM

LGTM with one minor

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18209 ↗	(On Diff #182372)	(very minor) Move this comment to the outer if() and use the same terms as we used in the shuffle fold above. Also, explain that this is likely to occur in reduction patterns.

This revision is now accepted and ready to land.Jan 21 2019, 9:40 AM

Actually, this patch isn't correct as-is. We can't insert into an undef base vector because at least 'xor undef, undef --> 0' (not undef).
I know I avoided or fixed the similar problem in IR somewhere along the way.
We need to actually compute the constant vector for the specified binop's opcode.

'xor undef, undef --> 0' (not undef).

Strictly speaking, xor undef, undef ---> undef is correct. It's just that as a practical matter, we try to fold obvious cases to zero so we don't have to argue with people who write silly constructs like __m128i a = _mm_xor_si128(a, a);. The fold here seems unlikely to cause problems in practice.

In D56875#1365621, @efriedma wrote:

'xor undef, undef --> 0' (not undef).

Strictly speaking, xor undef, undef ---> undef is correct. It's just that as a practical matter, we try to fold obvious cases to zero so we don't have to argue with people who write silly constructs like __m128i a = _mm_xor_si128(a, a);. The fold here seems unlikely to cause problems in practice.

Ah, interesting. Either way, I think we want a test to document the behavior, so I added 1 here:
rL351763

Now, the funny thing about this particular case is that we'll generate the optimal code with x86 AVX either way because 'vxorps' with 128-bit operands zeros the upper half...oops! So I added another test here:
rL351764

I think I should check in the more conservative version of the patch first since I already wrote it. Then, if we decide it's worth loosening to the form shown here currently, I can make that a follow-up.

After looking a bit more closely at other patterns that I was hoping to fix, I'm actually not sure if we will end up keeping this code. I think we might be better off trying more general vector-demanded-elements enhancements for binops which would make this explicit pattern-matching unnecessary.

Patch updated:

Compute the constant vector that we're inserting into rather than assuming it is undef.
Moved code comment as suggested.
Rebased with diffs for 'xor' tests.

This revision is now accepted and ready to land.Jan 21 2019, 2:40 PM

LGTM!

Closed by commit rL351825: [DAGCombiner] narrow vector binop with 2 insert subvector operands (authored by spatel). · Explain WhyJan 22 2019, 6:26 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D57377: [CGP] Add support for sinking operands to their users, if they are free..Jan 30 2019, 8:59 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

25 lines

test/

CodeGen/

X86/

avx512-hadd-hsub.ll

12 lines

scalarize-fp.ll

25 lines

vector-partial-undef.ll

10 lines

Diff 182906

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 18,168 Lines • ▼ Show 20 Lines
	/// Visit a binary vector operation, like ADD.			/// Visit a binary vector operation, like ADD.
	SDValue DAGCombiner::SimplifyVBinOp(SDNode *N) {			SDValue DAGCombiner::SimplifyVBinOp(SDNode *N) {
	assert(N->getValueType(0).isVector() &&			assert(N->getValueType(0).isVector() &&
	"SimplifyVBinOp only works on vectors!");			"SimplifyVBinOp only works on vectors!");

	SDValue LHS = N->getOperand(0);			SDValue LHS = N->getOperand(0);
	SDValue RHS = N->getOperand(1);			SDValue RHS = N->getOperand(1);
	SDValue Ops[] = {LHS, RHS};			SDValue Ops[] = {LHS, RHS};
				EVT VT = N->getValueType(0);

	// See if we can constant fold the vector operation.			// See if we can constant fold the vector operation.
	if (SDValue Fold = DAG.FoldConstantVectorArithmetic(			if (SDValue Fold = DAG.FoldConstantVectorArithmetic(
	N->getOpcode(), SDLoc(LHS), LHS.getValueType(), Ops, N->getFlags()))			N->getOpcode(), SDLoc(LHS), LHS.getValueType(), Ops, N->getFlags()))
	return Fold;			return Fold;

	// Type legalization might introduce new shuffles in the DAG.			// Type legalization might introduce new shuffles in the DAG.
	// Fold (VBinOp (shuffle (A, Undef, Mask)), (shuffle (B, Undef, Mask)))			// Fold (VBinOp (shuffle (A, Undef, Mask)), (shuffle (B, Undef, Mask)))
	// -> (shuffle (VBinOp (A, B)), Undef, Mask).			// -> (shuffle (VBinOp (A, B)), Undef, Mask).
	if (LegalTypes && isa<ShuffleVectorSDNode>(LHS) &&			if (LegalTypes && isa<ShuffleVectorSDNode>(LHS) &&
	isa<ShuffleVectorSDNode>(RHS) && LHS.hasOneUse() && RHS.hasOneUse() &&			isa<ShuffleVectorSDNode>(RHS) && LHS.hasOneUse() && RHS.hasOneUse() &&
	LHS.getOperand(1).isUndef() &&			LHS.getOperand(1).isUndef() &&
	RHS.getOperand(1).isUndef()) {			RHS.getOperand(1).isUndef()) {
	ShuffleVectorSDNode *SVN0 = cast<ShuffleVectorSDNode>(LHS);			ShuffleVectorSDNode *SVN0 = cast<ShuffleVectorSDNode>(LHS);
	ShuffleVectorSDNode *SVN1 = cast<ShuffleVectorSDNode>(RHS);			ShuffleVectorSDNode *SVN1 = cast<ShuffleVectorSDNode>(RHS);

	if (SVN0->getMask().equals(SVN1->getMask())) {			if (SVN0->getMask().equals(SVN1->getMask())) {
	EVT VT = N->getValueType(0);
	SDValue UndefVector = LHS.getOperand(1);			SDValue UndefVector = LHS.getOperand(1);
	SDValue NewBinOp = DAG.getNode(N->getOpcode(), SDLoc(N), VT,			SDValue NewBinOp = DAG.getNode(N->getOpcode(), SDLoc(N), VT,
	LHS.getOperand(0), RHS.getOperand(0),			LHS.getOperand(0), RHS.getOperand(0),
	N->getFlags());			N->getFlags());
	AddUsersToWorklist(N);			AddUsersToWorklist(N);
	return DAG.getVectorShuffle(VT, SDLoc(N), NewBinOp, UndefVector,			return DAG.getVectorShuffle(VT, SDLoc(N), NewBinOp, UndefVector,
	SVN0->getMask());			SVN0->getMask());
	}			}
	}			}

				// The following pattern is likely to emerge with vector reduction ops. Moving
				// the binary operation ahead of insertion may allow using a narrower vector
				// instruction that has better performance than the wide version of the op:
				// VBinOp (ins undef, X, Z), (ins undef, Y, Z) --> ins VecC, (VBinOp X, Y), Z
				if (LHS.getOpcode() == ISD::INSERT_SUBVECTOR && LHS.getOperand(0).isUndef() &&
				RHS.getOpcode() == ISD::INSERT_SUBVECTOR && RHS.getOperand(0).isUndef() &&
				LHS.getOperand(2) == RHS.getOperand(2) &&
				(LHS.hasOneUse() \|\| RHS.hasOneUse())) {
				SDValue X = LHS.getOperand(1);
				SDValue Y = RHS.getOperand(1);
				SDValue Z = LHS.getOperand(2);
				EVT NarrowVT = X.getValueType();
				if (NarrowVT == Y.getValueType() &&
				TLI.isOperationLegalOrCustomOrPromote(N->getOpcode(), NarrowVT)) {
				// (binop undef, undef) may not return undef, so compute that result.
				SDLoc DL(N);
				SDValue VecC = DAG.getNode(N->getOpcode(), DL, VT, DAG.getUNDEF(VT),
				DAG.getUNDEF(VT));
				SDValue NarrowBO = DAG.getNode(N->getOpcode(), DL, NarrowVT, X, Y);
				return DAG.getNode(ISD::INSERT_SUBVECTOR, DL, VT, VecC, NarrowBO, Z);
				}
				}

	return SDValue();			return SDValue();
	}			}

	SDValue DAGCombiner::SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1,			SDValue DAGCombiner::SimplifySelect(const SDLoc &DL, SDValue N0, SDValue N1,
	SDValue N2) {			SDValue N2) {
	assert(N0.getOpcode() ==ISD::SETCC && "First argument must be a SetCC node!");			assert(N0.getOpcode() ==ISD::SETCC && "First argument must be a SetCC node!");

	SDValue SCC = SimplifySelectCC(DL, N0.getOperand(0), N0.getOperand(1), N1, N2,			SDValue SCC = SimplifySelectCC(DL, N0.getOperand(0), N0.getOperand(1), N1, N2,
	▲ Show 20 Lines • Show All 1,188 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/avx512-hadd-hsub.ll

Show First 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	; SKX-NEXT: retq
ret float %x230		ret float %x230
}		}

define <16 x i32> @hadd_16_3(<16 x i32> %x225, <16 x i32> %x227) {		define <16 x i32> @hadd_16_3(<16 x i32> %x225, <16 x i32> %x227) {
; KNL-LABEL: hadd_16_3:		; KNL-LABEL: hadd_16_3:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]		; KNL-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]
; KNL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]		; KNL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
; KNL-NEXT: vpaddd %zmm0, %zmm2, %zmm0		; KNL-NEXT: vpaddd %ymm0, %ymm2, %ymm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: hadd_16_3:		; SKX-LABEL: hadd_16_3:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]		; SKX-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]
; SKX-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]		; SKX-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
; SKX-NEXT: vpaddd %zmm0, %zmm2, %zmm0		; SKX-NEXT: vpaddd %ymm0, %ymm2, %ymm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%x226 = shufflevector <16 x i32> %x225, <16 x i32> %x227, <16 x i32> <i32 0, i32 2, i32 16, i32 18		%x226 = shufflevector <16 x i32> %x225, <16 x i32> %x227, <16 x i32> <i32 0, i32 2, i32 16, i32 18
, i32 4, i32 6, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		, i32 4, i32 6, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%x228 = shufflevector <16 x i32> %x225, <16 x i32> %x227, <16 x i32> <i32 1, i32 3, i32 17, i32 19		%x228 = shufflevector <16 x i32> %x225, <16 x i32> %x227, <16 x i32> <i32 1, i32 3, i32 17, i32 19
, i32 5 , i32 7, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,		, i32 5 , i32 7, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
i32 undef, i32 undef>		i32 undef, i32 undef>
%x229 = add <16 x i32> %x226, %x228		%x229 = add <16 x i32> %x226, %x228
ret <16 x i32> %x229		ret <16 x i32> %x229
}		}

define <16 x float> @fhadd_16_3(<16 x float> %x225, <16 x float> %x227) {		define <16 x float> @fhadd_16_3(<16 x float> %x225, <16 x float> %x227) {
; KNL-LABEL: fhadd_16_3:		; KNL-LABEL: fhadd_16_3:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]		; KNL-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]
; KNL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]		; KNL-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
; KNL-NEXT: vaddps %zmm0, %zmm2, %zmm0		; KNL-NEXT: vaddps %ymm0, %ymm2, %ymm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: fhadd_16_3:		; SKX-LABEL: fhadd_16_3:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]		; SKX-NEXT: vshufps {{.*#+}} ymm2 = ymm0[0,2],ymm1[0,2],ymm0[4,6],ymm1[4,6]
; SKX-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]		; SKX-NEXT: vshufps {{.*#+}} ymm0 = ymm0[1,3],ymm1[1,3],ymm0[5,7],ymm1[5,7]
; SKX-NEXT: vaddps %zmm0, %zmm2, %zmm0		; SKX-NEXT: vaddps %ymm0, %ymm2, %ymm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%x226 = shufflevector <16 x float> %x225, <16 x float> %x227, <16 x i32> <i32 0, i32 2, i32 16, i32 18		%x226 = shufflevector <16 x float> %x225, <16 x float> %x227, <16 x i32> <i32 0, i32 2, i32 16, i32 18
, i32 4, i32 6, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		, i32 4, i32 6, i32 20, i32 22, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%x228 = shufflevector <16 x float> %x225, <16 x float> %x227, <16 x i32> <i32 1, i32 3, i32 17, i32 19		%x228 = shufflevector <16 x float> %x225, <16 x float> %x227, <16 x i32> <i32 1, i32 3, i32 17, i32 19
, i32 5 , i32 7, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		, i32 5 , i32 7, i32 21, i32 23, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
%x229 = fadd <16 x float> %x226, %x228		%x229 = fadd <16 x float> %x226, %x228
ret <16 x float> %x229		ret <16 x float> %x229
}		}

define <8 x double> @fhadd_16_4(<8 x double> %x225, <8 x double> %x227) {		define <8 x double> @fhadd_16_4(<8 x double> %x225, <8 x double> %x227) {
; KNL-LABEL: fhadd_16_4:		; KNL-LABEL: fhadd_16_4:
; KNL: # %bb.0:		; KNL: # %bb.0:
; KNL-NEXT: vunpcklpd {{.*#+}} ymm2 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]		; KNL-NEXT: vunpcklpd {{.*#+}} ymm2 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
; KNL-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3]		; KNL-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
; KNL-NEXT: vaddpd %zmm0, %zmm2, %zmm0		; KNL-NEXT: vaddpd %ymm0, %ymm2, %ymm0
; KNL-NEXT: retq		; KNL-NEXT: retq
;		;
; SKX-LABEL: fhadd_16_4:		; SKX-LABEL: fhadd_16_4:
; SKX: # %bb.0:		; SKX: # %bb.0:
; SKX-NEXT: vunpcklpd {{.*#+}} ymm2 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]		; SKX-NEXT: vunpcklpd {{.*#+}} ymm2 = ymm0[0],ymm1[0],ymm0[2],ymm1[2]
; SKX-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3]		; SKX-NEXT: vunpckhpd {{.*#+}} ymm0 = ymm0[1],ymm1[1],ymm0[3],ymm1[3]
; SKX-NEXT: vaddpd %zmm0, %zmm2, %zmm0		; SKX-NEXT: vaddpd %ymm0, %ymm2, %ymm0
; SKX-NEXT: retq		; SKX-NEXT: retq
%x226 = shufflevector <8 x double> %x225, <8 x double> %x227, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 undef, i32 undef, i32 undef, i32 undef>		%x226 = shufflevector <8 x double> %x225, <8 x double> %x227, <8 x i32> <i32 0, i32 8, i32 2, i32 10, i32 undef, i32 undef, i32 undef, i32 undef>
%x228 = shufflevector <8 x double> %x225, <8 x double> %x227, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 undef ,i32 undef, i32 undef, i32 undef>		%x228 = shufflevector <8 x double> %x225, <8 x double> %x227, <8 x i32> <i32 1, i32 9, i32 3, i32 11, i32 undef ,i32 undef, i32 undef, i32 undef>
%x229 = fadd <8 x double> %x226, %x228		%x229 = fadd <8 x double> %x226, %x228
ret <8 x double> %x229		ret <8 x double> %x229
}		}

define <4 x double> @fadd_noundef_low(<8 x double> %x225, <8 x double> %x227) {		define <4 x double> @fadd_noundef_low(<8 x double> %x225, <8 x double> %x227) {
▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/scalarize-fp.ll

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	; SSE-LABEL: fadd_op1_constant_v4f64:			; SSE-LABEL: fadd_op1_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: addpd %xmm1, %xmm0			; SSE-NEXT: addpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: fadd_op1_constant_v4f64:			; AVX-LABEL: fadd_op1_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vaddpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vaddpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fadd <4 x double> %v, <double 42.0, double undef, double undef, double undef>			%b = fadd <4 x double> %v, <double 42.0, double undef, double undef, double undef>
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @load_fadd_op1_constant_v4f64(double* %p) nounwind {			define <4 x double> @load_fadd_op1_constant_v4f64(double* %p) nounwind {
	; SSE-LABEL: load_fadd_op1_constant_v4f64:			; SSE-LABEL: load_fadd_op1_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: addpd %xmm1, %xmm0			; SSE-NEXT: addpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: load_fadd_op1_constant_v4f64:			; AVX-LABEL: load_fadd_op1_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vaddpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vaddpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load double, double* %p			%x = load double, double* %p
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fadd <4 x double> %v, <double 42.0, double undef, double undef, double undef>			%b = fadd <4 x double> %v, <double 42.0, double undef, double undef, double undef>
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @fsub_op0_constant_v4f64(double %x) nounwind {			define <4 x double> @fsub_op0_constant_v4f64(double %x) nounwind {
	; SSE-LABEL: fsub_op0_constant_v4f64:			; SSE-LABEL: fsub_op0_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: subpd %xmm0, %xmm1			; SSE-NEXT: subpd %xmm0, %xmm1
	; SSE-NEXT: movapd %xmm1, %xmm0			; SSE-NEXT: movapd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: fsub_op0_constant_v4f64:			; AVX-LABEL: fsub_op0_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vsubpd %ymm0, %ymm1, %ymm0			; AVX-NEXT: vsubpd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fsub <4 x double> <double 42.0, double undef, double undef, double undef>, %v			%b = fsub <4 x double> <double 42.0, double undef, double undef, double undef>, %v
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @load_fsub_op0_constant_v4f64(double* %p) nounwind {			define <4 x double> @load_fsub_op0_constant_v4f64(double* %p) nounwind {
	; SSE-LABEL: load_fsub_op0_constant_v4f64:			; SSE-LABEL: load_fsub_op0_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: subpd %xmm1, %xmm0			; SSE-NEXT: subpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: load_fsub_op0_constant_v4f64:			; AVX-LABEL: load_fsub_op0_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vsubpd %ymm0, %ymm1, %ymm0			; AVX-NEXT: vsubpd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load double, double* %p			%x = load double, double* %p
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fsub <4 x double> <double 42.0, double undef, double undef, double undef>, %v			%b = fsub <4 x double> <double 42.0, double undef, double undef, double undef>, %v
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @fmul_op1_constant_v4f64(double %x) nounwind {			define <4 x double> @fmul_op1_constant_v4f64(double %x) nounwind {
	; SSE-LABEL: fmul_op1_constant_v4f64:			; SSE-LABEL: fmul_op1_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: mulpd %xmm1, %xmm0			; SSE-NEXT: mulpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: fmul_op1_constant_v4f64:			; AVX-LABEL: fmul_op1_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vmulpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vmulpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fmul <4 x double> %v, <double 42.0, double undef, double undef, double undef>			%b = fmul <4 x double> %v, <double 42.0, double undef, double undef, double undef>
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @load_fmul_op1_constant_v4f64(double* %p) nounwind {			define <4 x double> @load_fmul_op1_constant_v4f64(double* %p) nounwind {
	; SSE-LABEL: load_fmul_op1_constant_v4f64:			; SSE-LABEL: load_fmul_op1_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: mulpd %xmm1, %xmm0			; SSE-NEXT: mulpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: load_fmul_op1_constant_v4f64:			; AVX-LABEL: load_fmul_op1_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vmulpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vmulpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load double, double* %p			%x = load double, double* %p
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fmul <4 x double> %v, <double 42.0, double undef, double undef, double undef>			%b = fmul <4 x double> %v, <double 42.0, double undef, double undef, double undef>
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @fdiv_op1_constant_v4f64(double %x) nounwind {			define <4 x double> @fdiv_op1_constant_v4f64(double %x) nounwind {
	; SSE-LABEL: fdiv_op1_constant_v4f64:			; SSE-LABEL: fdiv_op1_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: divpd %xmm1, %xmm0			; SSE-NEXT: divpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: fdiv_op1_constant_v4f64:			; AVX-LABEL: fdiv_op1_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vdivpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vdivpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fdiv <4 x double> %v, <double 42.0, double undef, double undef, double undef>			%b = fdiv <4 x double> %v, <double 42.0, double undef, double undef, double undef>
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @load_fdiv_op1_constant_v4f64(double* %p) nounwind {			define <4 x double> @load_fdiv_op1_constant_v4f64(double* %p) nounwind {
	; SSE-LABEL: load_fdiv_op1_constant_v4f64:			; SSE-LABEL: load_fdiv_op1_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: divpd %xmm1, %xmm0			; SSE-NEXT: divpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: load_fdiv_op1_constant_v4f64:			; AVX-LABEL: load_fdiv_op1_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vdivpd %ymm1, %ymm0, %ymm0			; AVX-NEXT: vdivpd %xmm1, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load double, double* %p			%x = load double, double* %p
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fdiv <4 x double> %v, <double 42.0, double undef, double undef, double undef>			%b = fdiv <4 x double> %v, <double 42.0, double undef, double undef, double undef>
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @fdiv_op0_constant_v4f64(double %x) nounwind {			define <4 x double> @fdiv_op0_constant_v4f64(double %x) nounwind {
	; SSE-LABEL: fdiv_op0_constant_v4f64:			; SSE-LABEL: fdiv_op0_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: divpd %xmm0, %xmm1			; SSE-NEXT: divpd %xmm0, %xmm1
	; SSE-NEXT: movapd %xmm1, %xmm0			; SSE-NEXT: movapd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: fdiv_op0_constant_v4f64:			; AVX-LABEL: fdiv_op0_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vdivpd %ymm0, %ymm1, %ymm0			; AVX-NEXT: vdivpd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fdiv <4 x double> <double 42.0, double undef, double undef, double undef>, %v			%b = fdiv <4 x double> <double 42.0, double undef, double undef, double undef>, %v
	ret <4 x double> %b			ret <4 x double> %b
	}			}

	define <4 x double> @load_fdiv_op0_constant_v4f64(double* %p) nounwind {			define <4 x double> @load_fdiv_op0_constant_v4f64(double* %p) nounwind {
	; SSE-LABEL: load_fdiv_op0_constant_v4f64:			; SSE-LABEL: load_fdiv_op0_constant_v4f64:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm1 = mem[0],zero
	; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero			; SSE-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
	; SSE-NEXT: divpd %xmm1, %xmm0			; SSE-NEXT: divpd %xmm1, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: load_fdiv_op0_constant_v4f64:			; AVX-LABEL: load_fdiv_op0_constant_v4f64:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
	; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero			; AVX-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero
	; AVX-NEXT: vdivpd %ymm0, %ymm1, %ymm0			; AVX-NEXT: vdivpd %xmm0, %xmm1, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%x = load double, double* %p			%x = load double, double* %p
	%v = insertelement <4 x double> undef, double %x, i32 0			%v = insertelement <4 x double> undef, double %x, i32 0
	%b = fdiv <4 x double> <double 42.0, double undef, double undef, double undef>, %v			%b = fdiv <4 x double> <double 42.0, double undef, double undef, double undef>, %v
	ret <4 x double> %b			ret <4 x double> %b
	}			}

llvm/trunk/test/CodeGen/X86/vector-partial-undef.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-- -mattr=+sse2 \| FileCheck %s --check-prefix=SSE			; RUN: llc < %s -mtriple=x86_64-- -mattr=+sse2 \| FileCheck %s --check-prefix=SSE
	; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx2 \| FileCheck %s --check-prefix=AVX			; RUN: llc < %s -mtriple=x86_64-- -mattr=+avx2 \| FileCheck %s --check-prefix=AVX

	; xor undef, undef --> 0 because it's not worth fighting to make that return undef?			; xor undef, undef --> 0 because it's not worth fighting to make that return undef?

	define <4 x i64> @xor_insert_insert(<2 x i64> %x, <2 x i64> %y) {			define <4 x i64> @xor_insert_insert(<2 x i64> %x, <2 x i64> %y) {
	; SSE-LABEL: xor_insert_insert:			; SSE-LABEL: xor_insert_insert:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: xorps %xmm1, %xmm0			; SSE-NEXT: xorps %xmm1, %xmm0
	; SSE-NEXT: xorps %xmm1, %xmm1			; SSE-NEXT: xorps %xmm1, %xmm1
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: xor_insert_insert:			; AVX-LABEL: xor_insert_insert:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: # kill: def $xmm1 killed $xmm1 def $ymm1			; AVX-NEXT: vxorps %xmm1, %xmm0, %xmm0
	; AVX-NEXT: # kill: def $xmm0 killed $xmm0 def $ymm0
	; AVX-NEXT: vxorps %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%xw = shufflevector <2 x i64> %x, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%xw = shufflevector <2 x i64> %x, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	%yw = shufflevector <2 x i64> %y, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			%yw = shufflevector <2 x i64> %y, <2 x i64> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	%r = xor <4 x i64> %xw, %yw			%r = xor <4 x i64> %xw, %yw
	ret <4 x i64> %r			ret <4 x i64> %r
	}			}

	define <4 x i64> @xor_insert_insert_high_half(<2 x i64> %x, <2 x i64> %y) {			define <4 x i64> @xor_insert_insert_high_half(<2 x i64> %x, <2 x i64> %y) {
	; SSE-LABEL: xor_insert_insert_high_half:			; SSE-LABEL: xor_insert_insert_high_half:
	; SSE: # %bb.0:			; SSE: # %bb.0:
	; SSE-NEXT: xorps %xmm0, %xmm1			; SSE-NEXT: xorps %xmm0, %xmm1
	; SSE-NEXT: xorps %xmm0, %xmm0			; SSE-NEXT: xorps %xmm0, %xmm0
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX-LABEL: xor_insert_insert_high_half:			; AVX-LABEL: xor_insert_insert_high_half:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0			; AVX-NEXT: vxorps %xmm1, %xmm0, %xmm0
	; AVX-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm1			; AVX-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; AVX-NEXT: vxorps %ymm1, %ymm0, %ymm0			; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%xw = shufflevector <2 x i64> %x, <2 x i64> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			%xw = shufflevector <2 x i64> %x, <2 x i64> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
	%yw = shufflevector <2 x i64> %y, <2 x i64> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			%yw = shufflevector <2 x i64> %y, <2 x i64> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
	%r = xor <4 x i64> %xw, %yw			%r = xor <4 x i64> %xw, %yw
	ret <4 x i64> %r			ret <4 x i64> %r
	}			}

	; All elements of the add are undefined:			; All elements of the add are undefined:
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines