This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
ClosedPublic

Authored by RKSimon on Apr 30 2019, 6:43 AM.

Download Raw Diff

Details

Reviewers

andreadb
craig.topper
spatel

Commits

rZORG39a6f2b37ad9: [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
rZORGc3bc2bdfd8b9: [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
rG39a6f2b37ad9: [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
rGc3bc2bdfd8b9: [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
rG93bfa5af48db: [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)
rL360360: [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920)

Summary

As reported on PR39920, "slow horizontal ops" targets tend to internally expand to 2*shuffle+add/sub - so if we can reduce 2*shuffle+add/sub to a hadd/sub then we should do it - similar port usage but reduced instruction count.

This works out in most cases, although the "PR22377" regression in vector-shuffle-combining.ll is annoying - going from 2*shuffle+add+shuffle to hadd+2*shuffle - I'm open to suggestions - I've been trying to think of ways to get foldShuffleOfHorizOp to work with general shuffles but haven't found anything yet.

Diff Detail

Repository: rL LLVM

Event Timeline

RKSimon created this revision.Apr 30 2019, 6:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2019, 6:43 AM

RKSimon retitled this revision from [X86][SSE] Fold add(shuffle(),shuffle()) on 'slow' targets (PR39920) to [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920).Apr 30 2019, 6:50 AM

rebase

ping?

Is the PR22377 test the only remaining problem? If so, do we have a new bug to track that (or reopen the old bug)?

In D61308#1494883, @spatel wrote:

Is the PR22377 test the only remaining problem? If so, do we have a new bug to track that (or reopen the old bug)?

Yes its just the PR22377 test case.

I've raised https://bugs.llvm.org/show_bug.cgi?id=41813 but I'm not totally happy with it being so vague on what the best thing to do is.

In D61308#1496359, @RKSimon wrote:

In D61308#1494883, @spatel wrote:

Is the PR22377 test the only remaining problem? If so, do we have a new bug to track that (or reopen the old bug)?

Yes its just the PR22377 test case.

I've raised https://bugs.llvm.org/show_bug.cgi?id=41813 but I'm not totally happy with it being so vague on what the best thing to do is.

Thanks. We're probably not going to regress the actual motivating example (AVX codegen) in PR22377 (although we could do better), so I think we're fine here. LGTM.

This revision is now accepted and ready to land.May 9 2019, 8:27 AM

Closed by commit rG93bfa5af48db: [X86][SSE] Fold add(shuffle(),shuffle()) to hadd on 'slow' targets (PR39920) (authored by RKSimon). · Explain WhyMay 9 2019, 10:45 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 9 2019, 10:45 AM

Revision Contents

Path

Size

lib/

Target/

X86/

	X86ISelLowering.cpp
	X86ISelLowering.cpp (revision 359555)

20 lines

test/

CodeGen/

X86/

	avx2-phaddsub.ll
	avx2-phaddsub.ll (revision 359555)

32 lines

	haddsub-shuf.ll
	haddsub-shuf.ll (revision 359555)

658 lines

	haddsub-undef.ll
	haddsub-undef.ll (revision 359555)

149 lines

	haddsub.ll
	haddsub.ll (revision 359555)

273 lines

	phaddsub.ll
	phaddsub.ll (revision 359555)

497 lines

	vector-shuffle-combining.ll
	vector-shuffle-combining.ll (revision 359555)

39 lines

Diff 197313

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 39,234 Lines • ▼ Show 20 Lines
/// A = < float a0, float a1, float a2, float a3 >		/// A = < float a0, float a1, float a2, float a3 >
/// and		/// and
/// B = < float b0, float b1, float b2, float b3 >		/// B = < float b0, float b1, float b2, float b3 >
/// then the result of doing a horizontal operation on A and B is		/// then the result of doing a horizontal operation on A and B is
/// A horizontal-op B = < a0 op a1, a2 op a3, b0 op b1, b2 op b3 >.		/// A horizontal-op B = < a0 op a1, a2 op a3, b0 op b1, b2 op b3 >.
/// In short, LHS and RHS are inspected to see if LHS op RHS is of the form		/// In short, LHS and RHS are inspected to see if LHS op RHS is of the form
/// A horizontal-op B, for some already available A and B, and if so then LHS is		/// A horizontal-op B, for some already available A and B, and if so then LHS is
/// set to A, RHS to B, and the routine returns 'true'.		/// set to A, RHS to B, and the routine returns 'true'.
static bool isHorizontalBinOp(SDValue &LHS, SDValue &RHS, bool IsCommutative) {		static bool isHorizontalBinOp(SDValue &LHS, SDValue &RHS, bool IsCommutative,
		unsigned &NumShuffles) {
// If either operand is undef, bail out. The binop should be simplified.		// If either operand is undef, bail out. The binop should be simplified.
if (LHS.isUndef() \|\| RHS.isUndef())		if (LHS.isUndef() \|\| RHS.isUndef())
return false;		return false;

// Look for the following pattern:		// Look for the following pattern:
// A = < float a0, float a1, float a2, float a3 >		// A = < float a0, float a1, float a2, float a3 >
// B = < float b0, float b1, float b2, float b3 >		// B = < float b0, float b1, float b2, float b3 >
// and		// and
Show All 35 Lines	static bool isHorizontalBinOp(SDValue &LHS, SDValue &RHS, bool IsCommutative,
// View LHS in the form		// View LHS in the form
// LHS = VECTOR_SHUFFLE A, B, LMask		// LHS = VECTOR_SHUFFLE A, B, LMask
// If LHS is not a shuffle, then pretend it is the identity shuffle:		// If LHS is not a shuffle, then pretend it is the identity shuffle:
// LHS = VECTOR_SHUFFLE LHS, undef, <0, 1, ..., N-1>		// LHS = VECTOR_SHUFFLE LHS, undef, <0, 1, ..., N-1>
// NOTE: A default initialized SDValue represents an UNDEF of type VT.		// NOTE: A default initialized SDValue represents an UNDEF of type VT.
SDValue A, B;		SDValue A, B;
SmallVector<int, 16> LMask;		SmallVector<int, 16> LMask;
GetShuffle(LHS, A, B, LMask);		GetShuffle(LHS, A, B, LMask);
		NumShuffles = (LMask.empty() ? 0 : 1);

// Likewise, view RHS in the form		// Likewise, view RHS in the form
// RHS = VECTOR_SHUFFLE C, D, RMask		// RHS = VECTOR_SHUFFLE C, D, RMask
SDValue C, D;		SDValue C, D;
SmallVector<int, 16> RMask;		SmallVector<int, 16> RMask;
GetShuffle(RHS, C, D, RMask);		GetShuffle(RHS, C, D, RMask);
		NumShuffles += (RMask.empty() ? 0 : 1);

// At least one of the operands should be a vector shuffle.		// At least one of the operands should be a vector shuffle.
if (LMask.empty() && RMask.empty())		if (LMask.empty() && RMask.empty())
return false;		return false;

if (LMask.empty()) {		if (LMask.empty()) {
A = LHS;		A = LHS;
for (unsigned i = 0; i != NumElts; ++i)		for (unsigned i = 0; i != NumElts; ++i)
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	static SDValue combineFaddFsub(SDNode *N, SelectionDAG &DAG,
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue LHS = N->getOperand(0);		SDValue LHS = N->getOperand(0);
SDValue RHS = N->getOperand(1);		SDValue RHS = N->getOperand(1);
bool IsFadd = N->getOpcode() == ISD::FADD;		bool IsFadd = N->getOpcode() == ISD::FADD;
auto HorizOpcode = IsFadd ? X86ISD::FHADD : X86ISD::FHSUB;		auto HorizOpcode = IsFadd ? X86ISD::FHADD : X86ISD::FHSUB;
assert((IsFadd \|\| N->getOpcode() == ISD::FSUB) && "Wrong opcode");		assert((IsFadd \|\| N->getOpcode() == ISD::FSUB) && "Wrong opcode");

// Try to synthesize horizontal add/sub from adds/subs of shuffles.		// Try to synthesize horizontal add/sub from adds/subs of shuffles.
		unsigned NumShuffles = 0;
if (((Subtarget.hasSSE3() && (VT == MVT::v4f32 \|\| VT == MVT::v2f64)) \|\|		if (((Subtarget.hasSSE3() && (VT == MVT::v4f32 \|\| VT == MVT::v2f64)) \|\|
(Subtarget.hasAVX() && (VT == MVT::v8f32 \|\| VT == MVT::v4f64))) &&		(Subtarget.hasAVX() && (VT == MVT::v8f32 \|\| VT == MVT::v4f64))) &&
isHorizontalBinOp(LHS, RHS, IsFadd) &&		isHorizontalBinOp(LHS, RHS, IsFadd, NumShuffles) &&
shouldUseHorizontalOp(LHS == RHS, DAG, Subtarget))		shouldUseHorizontalOp(LHS == RHS && NumShuffles < 2, DAG, Subtarget))
return DAG.getNode(HorizOpcode, SDLoc(N), VT, DAG.getBitcast(VT, LHS),		return DAG.getNode(HorizOpcode, SDLoc(N), VT, DAG.getBitcast(VT, LHS),
DAG.getBitcast(VT, RHS));		DAG.getBitcast(VT, RHS));

return SDValue();		return SDValue();
}		}

/// Attempt to pre-truncate inputs to arithmetic ops if it will simplify		/// Attempt to pre-truncate inputs to arithmetic ops if it will simplify
/// the codegen.		/// the codegen.
▲ Show 20 Lines • Show All 2,879 Lines • ▼ Show 20 Lines	static SDValue combineAdd(SDNode *N, SelectionDAG &DAG,
SDValue Op1 = N->getOperand(1);		SDValue Op1 = N->getOperand(1);

if (SDValue MAdd = matchPMADDWD(DAG, Op0, Op1, SDLoc(N), VT, Subtarget))		if (SDValue MAdd = matchPMADDWD(DAG, Op0, Op1, SDLoc(N), VT, Subtarget))
return MAdd;		return MAdd;
if (SDValue MAdd = matchPMADDWD_2(DAG, Op0, Op1, SDLoc(N), VT, Subtarget))		if (SDValue MAdd = matchPMADDWD_2(DAG, Op0, Op1, SDLoc(N), VT, Subtarget))
return MAdd;		return MAdd;

// Try to synthesize horizontal adds from adds of shuffles.		// Try to synthesize horizontal adds from adds of shuffles.
		unsigned NumShuffles = 0;
if ((VT == MVT::v8i16 \|\| VT == MVT::v4i32 \|\| VT == MVT::v16i16 \|\|		if ((VT == MVT::v8i16 \|\| VT == MVT::v4i32 \|\| VT == MVT::v16i16 \|\|
VT == MVT::v8i32) &&		VT == MVT::v8i32) &&
Subtarget.hasSSSE3() && isHorizontalBinOp(Op0, Op1, true) &&		Subtarget.hasSSSE3() && isHorizontalBinOp(Op0, Op1, true, NumShuffles) &&
shouldUseHorizontalOp(Op0 == Op1, DAG, Subtarget)) {		shouldUseHorizontalOp(Op0 == Op1 && NumShuffles < 2, DAG, Subtarget)) {
auto HADDBuilder = [](SelectionDAG &DAG, const SDLoc &DL,		auto HADDBuilder = [](SelectionDAG &DAG, const SDLoc &DL,
ArrayRef<SDValue> Ops) {		ArrayRef<SDValue> Ops) {
return DAG.getNode(X86ISD::HADD, DL, Ops[0].getValueType(), Ops);		return DAG.getNode(X86ISD::HADD, DL, Ops[0].getValueType(), Ops);
};		};
Op0 = DAG.getBitcast(VT, Op0);		Op0 = DAG.getBitcast(VT, Op0);
Op1 = DAG.getBitcast(VT, Op1);		Op1 = DAG.getBitcast(VT, Op1);
return SplitOpsAndApply(DAG, Subtarget, SDLoc(N), VT, {Op0, Op1},		return SplitOpsAndApply(DAG, Subtarget, SDLoc(N), VT, {Op0, Op1},
HADDBuilder);		HADDBuilder);
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	if (Op1->hasOneUse() && Op1.getOpcode() == ISD::XOR &&
Op1.getOperand(0),		Op1.getOperand(0),
DAG.getConstant(~XorC, SDLoc(Op1), VT));		DAG.getConstant(~XorC, SDLoc(Op1), VT));
return DAG.getNode(ISD::ADD, SDLoc(N), VT, NewXor,		return DAG.getNode(ISD::ADD, SDLoc(N), VT, NewXor,
DAG.getConstant(C->getAPIntValue() + 1, SDLoc(N), VT));		DAG.getConstant(C->getAPIntValue() + 1, SDLoc(N), VT));
}		}
}		}

// Try to synthesize horizontal subs from subs of shuffles.		// Try to synthesize horizontal subs from subs of shuffles.
		unsigned NumShuffles = 0;
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
if ((VT == MVT::v8i16 \|\| VT == MVT::v4i32 \|\| VT == MVT::v16i16 \|\|		if ((VT == MVT::v8i16 \|\| VT == MVT::v4i32 \|\| VT == MVT::v16i16 \|\|
VT == MVT::v8i32) &&		VT == MVT::v8i32) &&
Subtarget.hasSSSE3() && isHorizontalBinOp(Op0, Op1, false) &&		Subtarget.hasSSSE3() && isHorizontalBinOp(Op0, Op1, false, NumShuffles) &&
shouldUseHorizontalOp(Op0 == Op1, DAG, Subtarget)) {		shouldUseHorizontalOp(Op0 == Op1 && NumShuffles < 2, DAG, Subtarget)) {
auto HSUBBuilder = [](SelectionDAG &DAG, const SDLoc &DL,		auto HSUBBuilder = [](SelectionDAG &DAG, const SDLoc &DL,
ArrayRef<SDValue> Ops) {		ArrayRef<SDValue> Ops) {
return DAG.getNode(X86ISD::HSUB, DL, Ops[0].getValueType(), Ops);		return DAG.getNode(X86ISD::HSUB, DL, Ops[0].getValueType(), Ops);
};		};
Op0 = DAG.getBitcast(VT, Op0);		Op0 = DAG.getBitcast(VT, Op0);
Op1 = DAG.getBitcast(VT, Op1);		Op1 = DAG.getBitcast(VT, Op1);
return SplitOpsAndApply(DAG, Subtarget, SDLoc(N), VT, {Op0, Op1},		return SplitOpsAndApply(DAG, Subtarget, SDLoc(N), VT, {Op0, Op1},
HSUBBuilder);		HSUBBuilder);
▲ Show 20 Lines • Show All 1,808 Lines • Show Last 20 Lines

test/CodeGen/X86/avx2-phaddsub.ll

	Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
	; X64-NEXT: retq			; X64-NEXT: retq
	%a = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 1, i32 2, i32 9, i32 10, i32 5, i32 6, i32 13, i32 14>			%a = shufflevector <8 x i32> %x, <8 x i32> %y, <8 x i32> <i32 1, i32 2, i32 9, i32 10, i32 5, i32 6, i32 13, i32 14>
	%b = shufflevector <8 x i32> %y, <8 x i32> %x, <8 x i32> <i32 8, i32 11, i32 0, i32 3, i32 12, i32 15, i32 4, i32 7>			%b = shufflevector <8 x i32> %y, <8 x i32> %x, <8 x i32> <i32 8, i32 11, i32 0, i32 3, i32 12, i32 15, i32 4, i32 7>
	%r = add <8 x i32> %a, %b			%r = add <8 x i32> %a, %b
	ret <8 x i32> %r			ret <8 x i32> %r
	}			}

	define <8 x i32> @phaddd3(<8 x i32> %x) {			define <8 x i32> @phaddd3(<8 x i32> %x) {
	; X32-SLOW-LABEL: phaddd3:			; X32-LABEL: phaddd3:
	; X32-SLOW: # %bb.0:			; X32: # %bb.0:
	; X32-SLOW-NEXT: vpshufd {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]			; X32-NEXT: vphaddd %ymm0, %ymm0, %ymm0
	; X32-SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]			; X32-NEXT: retl
	; X32-SLOW-NEXT: vpaddd %ymm0, %ymm1, %ymm0			;
	; X32-SLOW-NEXT: retl			; X64-LABEL: phaddd3:
	;			; X64: # %bb.0:
	; X32-FAST-LABEL: phaddd3:			; X64-NEXT: vphaddd %ymm0, %ymm0, %ymm0
	; X32-FAST: # %bb.0:			; X64-NEXT: retq
	; X32-FAST-NEXT: vphaddd %ymm0, %ymm0, %ymm0
	; X32-FAST-NEXT: retl
	;
	; X64-SLOW-LABEL: phaddd3:
	; X64-SLOW: # %bb.0:
	; X64-SLOW-NEXT: vpshufd {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
	; X64-SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
	; X64-SLOW-NEXT: vpaddd %ymm0, %ymm1, %ymm0
	; X64-SLOW-NEXT: retq
	;
	; X64-FAST-LABEL: phaddd3:
	; X64-FAST: # %bb.0:
	; X64-FAST-NEXT: vphaddd %ymm0, %ymm0, %ymm0
	; X64-FAST-NEXT: retq
	%a = shufflevector <8 x i32> %x, <8 x i32> undef, <8 x i32> <i32 undef, i32 2, i32 8, i32 10, i32 4, i32 6, i32 undef, i32 14>			%a = shufflevector <8 x i32> %x, <8 x i32> undef, <8 x i32> <i32 undef, i32 2, i32 8, i32 10, i32 4, i32 6, i32 undef, i32 14>
	%b = shufflevector <8 x i32> %x, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 9, i32 undef, i32 5, i32 7, i32 13, i32 15>			%b = shufflevector <8 x i32> %x, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 9, i32 undef, i32 5, i32 7, i32 13, i32 15>
	%r = add <8 x i32> %a, %b			%r = add <8 x i32> %a, %b
	ret <8 x i32> %r			ret <8 x i32> %r
	}			}

	define <16 x i16> @phsubw1(<16 x i16> %x, <16 x i16> %y) {			define <16 x i16> @phsubw1(<16 x i16> %x, <16 x i16> %y) {
	; X32-LABEL: phsubw1:			; X32-LABEL: phsubw1:
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/CodeGen/X86/haddsub-shuf.ll

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3 \| FileCheck %s --check-prefixes=SSSE3,SSSE3_SLOW		; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3 \| FileCheck %s --check-prefixes=SSSE3,SSSE3_SLOW
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3,fast-hops \| FileCheck %s --check-prefixes=SSSE3,SSSE3_FAST		; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+ssse3,fast-hops \| FileCheck %s --check-prefixes=SSSE3,SSSE3_FAST
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx \| FileCheck %s --check-prefixes=AVX,AVX1,AVX1_SLOW		; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx \| FileCheck %s --check-prefixes=AVX,AVX1,AVX1_SLOW
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx,fast-hops \| FileCheck %s --check-prefixes=AVX,AVX1,AVX1_FAST		; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx,fast-hops \| FileCheck %s --check-prefixes=AVX,AVX1,AVX1_FAST
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2_SLOW		; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2_SLOW
; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2,fast-hops \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2_FAST		; RUN: llc < %s -mtriple=x86_64-unknown -mattr=+avx2,fast-hops \| FileCheck %s --check-prefixes=AVX,AVX2,AVX2_FAST

; The next 8 tests check for matching the horizontal op and eliminating the shuffle.		; The next 8 tests check for matching the horizontal op and eliminating the shuffle.
; PR34111 - https://bugs.llvm.org/show_bug.cgi?id=34111		; PR34111 - https://bugs.llvm.org/show_bug.cgi?id=34111

define <4 x float> @hadd_v4f32(<4 x float> %a) {		define <4 x float> @hadd_v4f32(<4 x float> %a) {
; SSSE3_SLOW-LABEL: hadd_v4f32:		; SSSE3-LABEL: hadd_v4f32:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movaps %xmm0, %xmm1		; SSSE3-NEXT: haddps %xmm0, %xmm0
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm0[2,3]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; SSSE3_SLOW-NEXT: addps %xmm1, %xmm0
; SSSE3_SLOW-NEXT: movddup {{.*#+}} xmm0 = xmm0[0,0]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hadd_v4f32:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: haddps %xmm0, %xmm0
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hadd_v4f32:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX1_SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hadd_v4f32:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hadd_v4f32:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX2_SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
; AVX2_SLOW-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hadd_v4f32:		; AVX-LABEL: hadd_v4f32:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0		; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a02 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 0, i32 2>		%a02 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 0, i32 2>
%a13 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 1, i32 3>		%a13 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 1, i32 3>
%hop = fadd <2 x float> %a02, %a13		%hop = fadd <2 x float> %a02, %a13
%shuf = shufflevector <2 x float> %hop, <2 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>		%shuf = shufflevector <2 x float> %hop, <2 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
ret <4 x float> %shuf		ret <4 x float> %shuf
}		}

define <8 x float> @hadd_v8f32a(<8 x float> %a) {		define <8 x float> @hadd_v8f32a(<8 x float> %a) {
Show All 22 Lines	; AVX2-NEXT: retq
%a0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		%a0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%a1 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>		%a1 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
%hop = fadd <4 x float> %a0, %a1		%hop = fadd <4 x float> %a0, %a1
%shuf = shufflevector <4 x float> %hop, <4 x float> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>		%shuf = shufflevector <4 x float> %hop, <4 x float> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>
ret <8 x float> %shuf		ret <8 x float> %shuf
}		}

define <8 x float> @hadd_v8f32b(<8 x float> %a) {		define <8 x float> @hadd_v8f32b(<8 x float> %a) {
; SSSE3_SLOW-LABEL: hadd_v8f32b:		; SSSE3-LABEL: hadd_v8f32b:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movaps %xmm0, %xmm2		; SSSE3-NEXT: haddps %xmm0, %xmm0
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm0[2,3]		; SSSE3-NEXT: haddps %xmm1, %xmm1
; SSSE3_SLOW-NEXT: movaps %xmm1, %xmm3		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,2],xmm1[2,3]
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; SSSE3_SLOW-NEXT: addps %xmm2, %xmm0
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3,2,3]
; SSSE3_SLOW-NEXT: addps %xmm3, %xmm1
; SSSE3_SLOW-NEXT: movddup {{.*#+}} xmm0 = xmm0[0,0]
; SSSE3_SLOW-NEXT: movddup {{.*#+}} xmm1 = xmm1[0,0]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hadd_v8f32b:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: haddps %xmm0, %xmm0
; SSSE3_FAST-NEXT: haddps %xmm1, %xmm1
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hadd_v8f32b:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
; AVX1_SLOW-NEXT: vaddps %ymm0, %ymm1, %ymm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hadd_v8f32b:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vhaddps %ymm0, %ymm0, %ymm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hadd_v8f32b:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
; AVX2_SLOW-NEXT: vaddps %ymm0, %ymm1, %ymm0
; AVX2_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hadd_v8f32b:		; AVX-LABEL: hadd_v8f32b:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vhaddps %ymm0, %ymm0, %ymm0		; AVX-NEXT: vhaddps %ymm0, %ymm0, %ymm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a0 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>		%a0 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>
%a1 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>		%a1 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>
%hop = fadd <8 x float> %a0, %a1		%hop = fadd <8 x float> %a0, %a1
%shuf = shufflevector <8 x float> %hop, <8 x float> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>		%shuf = shufflevector <8 x float> %hop, <8 x float> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>
ret <8 x float> %shuf		ret <8 x float> %shuf
}		}

define <4 x float> @hsub_v4f32(<4 x float> %a) {		define <4 x float> @hsub_v4f32(<4 x float> %a) {
; SSSE3_SLOW-LABEL: hsub_v4f32:		; SSSE3-LABEL: hsub_v4f32:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movaps %xmm0, %xmm1		; SSSE3-NEXT: hsubps %xmm0, %xmm0
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm0[2,3]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; SSSE3_SLOW-NEXT: subps %xmm0, %xmm1
; SSSE3_SLOW-NEXT: movddup {{.*#+}} xmm0 = xmm1[0,0]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hsub_v4f32:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: hsubps %xmm0, %xmm0
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hsub_v4f32:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX1_SLOW-NEXT: vsubps %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hsub_v4f32:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vhsubps %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hsub_v4f32:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX2_SLOW-NEXT: vsubps %xmm0, %xmm1, %xmm0
; AVX2_SLOW-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hsub_v4f32:		; AVX-LABEL: hsub_v4f32:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vhsubps %xmm0, %xmm0, %xmm0		; AVX-NEXT: vhsubps %xmm0, %xmm0, %xmm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a02 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 0, i32 2>		%a02 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 0, i32 2>
%a13 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 1, i32 3>		%a13 = shufflevector <4 x float> %a, <4 x float> undef, <2 x i32> <i32 1, i32 3>
%hop = fsub <2 x float> %a02, %a13		%hop = fsub <2 x float> %a02, %a13
%shuf = shufflevector <2 x float> %hop, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		%shuf = shufflevector <2 x float> %hop, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
ret <4 x float> %shuf		ret <4 x float> %shuf
}		}

define <8 x float> @hsub_v8f32a(<8 x float> %a) {		define <8 x float> @hsub_v8f32a(<8 x float> %a) {
Show All 22 Lines	; AVX2-NEXT: retq
%a0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		%a0 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%a1 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>		%a1 = shufflevector <8 x float> %a, <8 x float> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
%hop = fsub <4 x float> %a0, %a1		%hop = fsub <4 x float> %a0, %a1
%shuf = shufflevector <4 x float> %hop, <4 x float> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>		%shuf = shufflevector <4 x float> %hop, <4 x float> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>
ret <8 x float> %shuf		ret <8 x float> %shuf
}		}

define <8 x float> @hsub_v8f32b(<8 x float> %a) {		define <8 x float> @hsub_v8f32b(<8 x float> %a) {
; SSSE3_SLOW-LABEL: hsub_v8f32b:		; SSSE3-LABEL: hsub_v8f32b:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movaps %xmm0, %xmm2		; SSSE3-NEXT: hsubps %xmm0, %xmm0
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm0[2,3]		; SSSE3-NEXT: hsubps %xmm1, %xmm1
; SSSE3_SLOW-NEXT: movaps %xmm1, %xmm3		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,2],xmm1[2,3]
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
; SSSE3_SLOW-NEXT: subps %xmm0, %xmm2
; SSSE3_SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3,2,3]
; SSSE3_SLOW-NEXT: subps %xmm1, %xmm3
; SSSE3_SLOW-NEXT: movddup {{.*#+}} xmm0 = xmm2[0,0]
; SSSE3_SLOW-NEXT: movddup {{.*#+}} xmm1 = xmm3[0,0]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hsub_v8f32b:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: hsubps %xmm0, %xmm0
; SSSE3_FAST-NEXT: hsubps %xmm1, %xmm1
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hsub_v8f32b:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
; AVX1_SLOW-NEXT: vsubps %ymm0, %ymm1, %ymm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hsub_v8f32b:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vhsubps %ymm0, %ymm0, %ymm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hsub_v8f32b:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
; AVX2_SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
; AVX2_SLOW-NEXT: vsubps %ymm0, %ymm1, %ymm0
; AVX2_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hsub_v8f32b:		; AVX-LABEL: hsub_v8f32b:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vhsubps %ymm0, %ymm0, %ymm0		; AVX-NEXT: vhsubps %ymm0, %ymm0, %ymm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a0 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>		%a0 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>
%a1 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>		%a1 = shufflevector <8 x float> %a, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>
%hop = fsub <8 x float> %a0, %a1		%hop = fsub <8 x float> %a0, %a1
%shuf = shufflevector <8 x float> %hop, <8 x float> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>		%shuf = shufflevector <8 x float> %hop, <8 x float> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>
ret <8 x float> %shuf		ret <8 x float> %shuf
}		}

define <2 x double> @hadd_v2f64(<2 x double> %a) {		define <2 x double> @hadd_v2f64(<2 x double> %a) {
▲ Show 20 Lines • Show All 258 Lines • ▼ Show 20 Lines	; AVX2_FAST-NEXT: retq
%a0 = shufflevector <4 x double> %a, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 2, i32 undef>		%a0 = shufflevector <4 x double> %a, <4 x double> undef, <4 x i32> <i32 0, i32 undef, i32 2, i32 undef>
%a1 = shufflevector <4 x double> %a, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 3, i32 undef>		%a1 = shufflevector <4 x double> %a, <4 x double> undef, <4 x i32> <i32 1, i32 undef, i32 3, i32 undef>
%hop = fsub <4 x double> %a0, %a1		%hop = fsub <4 x double> %a0, %a1
%shuf = shufflevector <4 x double> %hop, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>		%shuf = shufflevector <4 x double> %hop, <4 x double> undef, <4 x i32> <i32 0, i32 0, i32 2, i32 2>
ret <4 x double> %shuf		ret <4 x double> %shuf
}		}

define <4 x i32> @hadd_v4i32(<4 x i32> %a) {		define <4 x i32> @hadd_v4i32(<4 x i32> %a) {
; SSSE3_SLOW-LABEL: hadd_v4i32:		; SSSE3-LABEL: hadd_v4i32:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]		; SSSE3-NEXT: phaddd %xmm0, %xmm0
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: paddd %xmm1, %xmm0
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hadd_v4i32:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phaddd %xmm0, %xmm0
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hadd_v4i32:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX1_SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hadd_v4i32:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hadd_v4i32:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX2_SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
; AVX2_SLOW-NEXT: vpbroadcastq %xmm0, %xmm0
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hadd_v4i32:		; AVX-LABEL: hadd_v4i32:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0		; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a02 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		%a02 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
%a13 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		%a13 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%hop = add <4 x i32> %a02, %a13		%hop = add <4 x i32> %a02, %a13
%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 1>		%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 1>
ret <4 x i32> %shuf		ret <4 x i32> %shuf
}		}

define <8 x i32> @hadd_v8i32a(<8 x i32> %a) {		define <8 x i32> @hadd_v8i32a(<8 x i32> %a) {
Show All 22 Lines	; AVX2-NEXT: retq
%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>		%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
%hop = add <4 x i32> %a0, %a1		%hop = add <4 x i32> %a0, %a1
%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>		%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>
ret <8 x i32> %shuf		ret <8 x i32> %shuf
}		}

define <8 x i32> @hadd_v8i32b(<8 x i32> %a) {		define <8 x i32> @hadd_v8i32b(<8 x i32> %a) {
; SSSE3_SLOW-LABEL: hadd_v8i32b:		; SSSE3-LABEL: hadd_v8i32b:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm2 = xmm0[0,2,2,3]		; SSSE3-NEXT: phaddd %xmm0, %xmm0
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm3 = xmm1[0,2,2,3]		; SSSE3-NEXT: phaddd %xmm1, %xmm1
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: paddd %xmm2, %xmm0
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,3,2,3]
; SSSE3_SLOW-NEXT: paddd %xmm3, %xmm1
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hadd_v8i32b:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phaddd %xmm0, %xmm0
; SSSE3_FAST-NEXT: phaddd %xmm1, %xmm1
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hadd_v8i32b:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
; AVX1_SLOW-NEXT: vextractf128 $1, %ymm0, %xmm2
; AVX1_SLOW-NEXT: vextractf128 $1, %ymm1, %xmm3
; AVX1_SLOW-NEXT: vpaddd %xmm2, %xmm3, %xmm2
; AVX1_SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hadd_v8i32b:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm1
; AVX1_FAST-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1_FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1_FAST-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_FAST-NEXT: retq
;		;
; AVX2_SLOW-LABEL: hadd_v8i32b:		; AVX1-LABEL: hadd_v8i32b:
; AVX2_SLOW: # %bb.0:		; AVX1: # %bb.0:
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]		; AVX1-NEXT: vphaddd %xmm0, %xmm0, %xmm1
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX2_SLOW-NEXT: vpaddd %ymm0, %ymm1, %ymm0		; AVX1-NEXT: vphaddd %xmm0, %xmm0, %xmm0
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[0,1,0,1,4,5,4,5]		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX2_SLOW-NEXT: retq		; AVX1-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
		; AVX1-NEXT: retq
;		;
; AVX2_FAST-LABEL: hadd_v8i32b:		; AVX2-LABEL: hadd_v8i32b:
; AVX2_FAST: # %bb.0:		; AVX2: # %bb.0:
; AVX2_FAST-NEXT: vphaddd %ymm0, %ymm0, %ymm0		; AVX2-NEXT: vphaddd %ymm0, %ymm0, %ymm0
; AVX2_FAST-NEXT: retq		; AVX2-NEXT: retq
%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>		%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>
%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>		%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>
%hop = add <8 x i32> %a0, %a1		%hop = add <8 x i32> %a0, %a1
%shuf = shufflevector <8 x i32> %hop, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>		%shuf = shufflevector <8 x i32> %hop, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>
ret <8 x i32> %shuf		ret <8 x i32> %shuf
}		}

define <4 x i32> @hsub_v4i32(<4 x i32> %a) {		define <4 x i32> @hsub_v4i32(<4 x i32> %a) {
; SSSE3_SLOW-LABEL: hsub_v4i32:		; SSSE3-LABEL: hsub_v4i32:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]		; SSSE3-NEXT: phsubd %xmm0, %xmm0
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: psubd %xmm0, %xmm1
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hsub_v4i32:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phsubd %xmm0, %xmm0
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hsub_v4i32:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX1_SLOW-NEXT: vpsubd %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hsub_v4i32:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphsubd %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hsub_v4i32:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
; AVX2_SLOW-NEXT: vpsubd %xmm0, %xmm1, %xmm0
; AVX2_SLOW-NEXT: vpbroadcastq %xmm0, %xmm0
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hsub_v4i32:		; AVX-LABEL: hsub_v4i32:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vphsubd %xmm0, %xmm0, %xmm0		; AVX-NEXT: vphsubd %xmm0, %xmm0, %xmm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a02 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>		%a02 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
%a13 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>		%a13 = shufflevector <4 x i32> %a, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
%hop = sub <4 x i32> %a02, %a13		%hop = sub <4 x i32> %a02, %a13
%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <4 x i32> <i32 undef, i32 1, i32 0, i32 undef>		%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <4 x i32> <i32 undef, i32 1, i32 0, i32 undef>
ret <4 x i32> %shuf		ret <4 x i32> %shuf
}		}

define <8 x i32> @hsub_v8i32a(<8 x i32> %a) {		define <8 x i32> @hsub_v8i32a(<8 x i32> %a) {
Show All 22 Lines	; AVX2-NEXT: retq
%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>		%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>		%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
%hop = sub <4 x i32> %a0, %a1		%hop = sub <4 x i32> %a0, %a1
%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>		%shuf = shufflevector <4 x i32> %hop, <4 x i32> undef, <8 x i32> <i32 undef, i32 undef, i32 0, i32 1, i32 undef, i32 undef, i32 2, i32 3>
ret <8 x i32> %shuf		ret <8 x i32> %shuf
}		}

define <8 x i32> @hsub_v8i32b(<8 x i32> %a) {		define <8 x i32> @hsub_v8i32b(<8 x i32> %a) {
; SSSE3_SLOW-LABEL: hsub_v8i32b:		; SSSE3-LABEL: hsub_v8i32b:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm2 = xmm0[0,2,2,3]		; SSSE3-NEXT: phsubd %xmm0, %xmm0
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm3 = xmm1[0,2,2,3]		; SSSE3-NEXT: phsubd %xmm1, %xmm1
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: psubd %xmm0, %xmm2
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,3,2,3]
; SSSE3_SLOW-NEXT: psubd %xmm0, %xmm3
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,1,0,1]
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm3[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hsub_v8i32b:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phsubd %xmm0, %xmm0
; SSSE3_FAST-NEXT: phsubd %xmm1, %xmm1
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hsub_v8i32b:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
; AVX1_SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
; AVX1_SLOW-NEXT: vextractf128 $1, %ymm0, %xmm2
; AVX1_SLOW-NEXT: vextractf128 $1, %ymm1, %xmm3
; AVX1_SLOW-NEXT: vpsubd %xmm2, %xmm3, %xmm2
; AVX1_SLOW-NEXT: vpsubd %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hsub_v8i32b:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphsubd %xmm0, %xmm0, %xmm1
; AVX1_FAST-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1_FAST-NEXT: vphsubd %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1_FAST-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_FAST-NEXT: retq
;		;
; AVX2_SLOW-LABEL: hsub_v8i32b:		; AVX1-LABEL: hsub_v8i32b:
; AVX2_SLOW: # %bb.0:		; AVX1: # %bb.0:
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]		; AVX1-NEXT: vphsubd %xmm0, %xmm0, %xmm1
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX2_SLOW-NEXT: vpsubd %ymm0, %ymm1, %ymm0		; AVX1-NEXT: vphsubd %xmm0, %xmm0, %xmm0
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[0,1,0,1,4,5,4,5]		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX2_SLOW-NEXT: retq		; AVX1-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
		; AVX1-NEXT: retq
;		;
; AVX2_FAST-LABEL: hsub_v8i32b:		; AVX2-LABEL: hsub_v8i32b:
; AVX2_FAST: # %bb.0:		; AVX2: # %bb.0:
; AVX2_FAST-NEXT: vphsubd %ymm0, %ymm0, %ymm0		; AVX2-NEXT: vphsubd %ymm0, %ymm0, %ymm0
; AVX2_FAST-NEXT: retq		; AVX2-NEXT: retq
%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>		%a0 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 0, i32 2, i32 undef, i32 undef, i32 4, i32 6, i32 undef, i32 undef>
%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>		%a1 = shufflevector <8 x i32> %a, <8 x i32> undef, <8 x i32> <i32 1, i32 3, i32 undef, i32 undef, i32 5, i32 7, i32 undef, i32 undef>
%hop = sub <8 x i32> %a0, %a1		%hop = sub <8 x i32> %a0, %a1
%shuf = shufflevector <8 x i32> %hop, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>		%shuf = shufflevector <8 x i32> %hop, <8 x i32> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 4, i32 5, i32 4, i32 5>
ret <8 x i32> %shuf		ret <8 x i32> %shuf
}		}

define <8 x i16> @hadd_v8i16(<8 x i16> %a) {		define <8 x i16> @hadd_v8i16(<8 x i16> %a) {
; SSSE3_SLOW-LABEL: hadd_v8i16:		; SSSE3-LABEL: hadd_v8i16:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: phaddw %xmm0, %xmm0
; SSSE3_SLOW-NEXT: pshufb {{.*#+}} xmm1 = xmm1[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: pshufb {{.*#+}} xmm0 = xmm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; SSSE3_SLOW-NEXT: paddw %xmm1, %xmm0
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hadd_v8i16:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phaddw %xmm0, %xmm0
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hadd_v8i16:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpshufb {{.*#+}} xmm1 = xmm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
; AVX1_SLOW-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; AVX1_SLOW-NEXT: vpaddw %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hadd_v8i16:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hadd_v8i16:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} xmm1 = xmm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; AVX2_SLOW-NEXT: vpaddw %xmm0, %xmm1, %xmm0
; AVX2_SLOW-NEXT: vpbroadcastq %xmm0, %xmm0
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hadd_v8i16:		; AVX-LABEL: hadd_v8i16:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0		; AVX-NEXT: vphaddw %xmm0, %xmm0, %xmm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a0246 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		%a0246 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
%a1357 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		%a1357 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
%hop = add <8 x i16> %a0246, %a1357		%hop = add <8 x i16> %a0246, %a1357
%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3>		%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3>
ret <8 x i16> %shuf		ret <8 x i16> %shuf
}		}

define <16 x i16> @hadd_v16i16a(<16 x i16> %a) {		define <16 x i16> @hadd_v16i16a(<16 x i16> %a) {
Show All 22 Lines	; AVX2-NEXT: retq
%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%hop = add <8 x i16> %a0, %a1		%hop = add <8 x i16> %a0, %a1
%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 5, i32 6, i32 7>		%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 5, i32 6, i32 7>
ret <16 x i16> %shuf		ret <16 x i16> %shuf
}		}

define <16 x i16> @hadd_v16i16b(<16 x i16> %a) {		define <16 x i16> @hadd_v16i16b(<16 x i16> %a) {
; SSSE3_SLOW-LABEL: hadd_v16i16b:		; SSSE3-LABEL: hadd_v16i16b:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movdqa {{.*#+}} xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]		; SSSE3-NEXT: phaddw %xmm0, %xmm0
; SSSE3_SLOW-NEXT: movdqa %xmm0, %xmm3		; SSSE3-NEXT: phaddw %xmm1, %xmm1
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm3		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: movdqa %xmm1, %xmm4
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm4
; SSSE3_SLOW-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm0
; SSSE3_SLOW-NEXT: paddw %xmm3, %xmm0
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm1
; SSSE3_SLOW-NEXT: paddw %xmm4, %xmm1
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hadd_v16i16b:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phaddw %xmm0, %xmm0
; SSSE3_FAST-NEXT: phaddw %xmm1, %xmm1
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hadd_v16i16b:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vmovdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
; AVX1_SLOW-NEXT: vpshufb %xmm1, %xmm0, %xmm2
; AVX1_SLOW-NEXT: vextractf128 $1, %ymm0, %xmm3
; AVX1_SLOW-NEXT: vpshufb %xmm1, %xmm3, %xmm1
; AVX1_SLOW-NEXT: vmovdqa {{.*#+}} xmm4 = [2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; AVX1_SLOW-NEXT: vpshufb %xmm4, %xmm0, %xmm0
; AVX1_SLOW-NEXT: vpaddw %xmm0, %xmm2, %xmm0
; AVX1_SLOW-NEXT: vpshufb %xmm4, %xmm3, %xmm2
; AVX1_SLOW-NEXT: vpaddw %xmm2, %xmm1, %xmm1
; AVX1_SLOW-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hadd_v16i16b:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm1
; AVX1_FAST-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1_FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1_FAST-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_FAST-NEXT: retq
;		;
; AVX2_SLOW-LABEL: hadd_v16i16b:		; AVX1-LABEL: hadd_v16i16b:
; AVX2_SLOW: # %bb.0:		; AVX1: # %bb.0:
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} ymm1 = ymm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]		; AVX1-NEXT: vphaddw %xmm0, %xmm0, %xmm1
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15,18,19,22,23,26,27,30,31,30,31,26,27,28,29,30,31]		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX2_SLOW-NEXT: vpaddw %ymm0, %ymm1, %ymm0		; AVX1-NEXT: vphaddw %xmm0, %xmm0, %xmm0
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[0,1,0,1,4,5,4,5]		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX2_SLOW-NEXT: retq		; AVX1-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
		; AVX1-NEXT: retq
;		;
; AVX2_FAST-LABEL: hadd_v16i16b:		; AVX2-LABEL: hadd_v16i16b:
; AVX2_FAST: # %bb.0:		; AVX2: # %bb.0:
; AVX2_FAST-NEXT: vphaddw %ymm0, %ymm0, %ymm0		; AVX2-NEXT: vphaddw %ymm0, %ymm0, %ymm0
; AVX2_FAST-NEXT: retq		; AVX2-NEXT: retq
%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef>		%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef>
%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef>		%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef>
%hop = add <16 x i16> %a0, %a1		%hop = add <16 x i16> %a0, %a1
%shuf = shufflevector <16 x i16> %hop, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 8, i32 9, i32 10, i32 11>		%shuf = shufflevector <16 x i16> %hop, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 8, i32 9, i32 10, i32 11>
ret <16 x i16> %shuf		ret <16 x i16> %shuf
}		}

define <8 x i16> @hsub_v8i16(<8 x i16> %a) {		define <8 x i16> @hsub_v8i16(<8 x i16> %a) {
; SSSE3_SLOW-LABEL: hsub_v8i16:		; SSSE3-LABEL: hsub_v8i16:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movdqa %xmm0, %xmm1		; SSSE3-NEXT: phsubw %xmm0, %xmm0
; SSSE3_SLOW-NEXT: pshufb {{.*#+}} xmm1 = xmm1[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: pshufb {{.*#+}} xmm0 = xmm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; SSSE3_SLOW-NEXT: psubw %xmm0, %xmm1
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm1[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hsub_v8i16:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phsubw %xmm0, %xmm0
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hsub_v8i16:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vpshufb {{.*#+}} xmm1 = xmm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
; AVX1_SLOW-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; AVX1_SLOW-NEXT: vpsubw %xmm0, %xmm1, %xmm0
; AVX1_SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hsub_v8i16:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphsubw %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: retq
;
; AVX2_SLOW-LABEL: hsub_v8i16:
; AVX2_SLOW: # %bb.0:
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} xmm1 = xmm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; AVX2_SLOW-NEXT: vpsubw %xmm0, %xmm1, %xmm0
; AVX2_SLOW-NEXT: vpbroadcastq %xmm0, %xmm0
; AVX2_SLOW-NEXT: retq
;		;
; AVX2_FAST-LABEL: hsub_v8i16:		; AVX-LABEL: hsub_v8i16:
; AVX2_FAST: # %bb.0:		; AVX: # %bb.0:
; AVX2_FAST-NEXT: vphsubw %xmm0, %xmm0, %xmm0		; AVX-NEXT: vphsubw %xmm0, %xmm0, %xmm0
; AVX2_FAST-NEXT: retq		; AVX-NEXT: retq
%a0246 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>		%a0246 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef>
%a1357 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>		%a1357 = shufflevector <8 x i16> %a, <8 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
%hop = sub <8 x i16> %a0246, %a1357		%hop = sub <8 x i16> %a0246, %a1357
%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <8 x i32> <i32 0, i32 undef, i32 2, i32 undef, i32 undef, i32 1, i32 undef, i32 3>		%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <8 x i32> <i32 0, i32 undef, i32 2, i32 undef, i32 undef, i32 1, i32 undef, i32 3>
ret <8 x i16> %shuf		ret <8 x i16> %shuf
}		}

define <16 x i16> @hsub_v16i16a(<16 x i16> %a) {		define <16 x i16> @hsub_v16i16a(<16 x i16> %a) {
Show All 22 Lines	; AVX2-NEXT: retq
%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>		%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>		%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 9, i32 11, i32 13, i32 15>
%hop = sub <8 x i16> %a0, %a1		%hop = sub <8 x i16> %a0, %a1
%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 5, i32 6, i32 7>		%shuf = shufflevector <8 x i16> %hop, <8 x i16> undef, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 5, i32 6, i32 7>
ret <16 x i16> %shuf		ret <16 x i16> %shuf
}		}

define <16 x i16> @hsub_v16i16b(<16 x i16> %a) {		define <16 x i16> @hsub_v16i16b(<16 x i16> %a) {
; SSSE3_SLOW-LABEL: hsub_v16i16b:		; SSSE3-LABEL: hsub_v16i16b:
; SSSE3_SLOW: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3_SLOW-NEXT: movdqa {{.*#+}} xmm2 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]		; SSSE3-NEXT: phsubw %xmm0, %xmm0
; SSSE3_SLOW-NEXT: movdqa %xmm0, %xmm3		; SSSE3-NEXT: phsubw %xmm1, %xmm1
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm3		; SSSE3-NEXT: retq
; SSSE3_SLOW-NEXT: movdqa %xmm1, %xmm4
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm4
; SSSE3_SLOW-NEXT: movdqa {{.*#+}} xmm2 = [2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm0
; SSSE3_SLOW-NEXT: psubw %xmm0, %xmm3
; SSSE3_SLOW-NEXT: pshufb %xmm2, %xmm1
; SSSE3_SLOW-NEXT: psubw %xmm1, %xmm4
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm3[0,1,0,1]
; SSSE3_SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm4[0,1,0,1]
; SSSE3_SLOW-NEXT: retq
;
; SSSE3_FAST-LABEL: hsub_v16i16b:
; SSSE3_FAST: # %bb.0:
; SSSE3_FAST-NEXT: phsubw %xmm0, %xmm0
; SSSE3_FAST-NEXT: phsubw %xmm1, %xmm1
; SSSE3_FAST-NEXT: retq
;
; AVX1_SLOW-LABEL: hsub_v16i16b:
; AVX1_SLOW: # %bb.0:
; AVX1_SLOW-NEXT: vmovdqa {{.*#+}} xmm1 = [0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15]
; AVX1_SLOW-NEXT: vpshufb %xmm1, %xmm0, %xmm2
; AVX1_SLOW-NEXT: vextractf128 $1, %ymm0, %xmm3
; AVX1_SLOW-NEXT: vpshufb %xmm1, %xmm3, %xmm1
; AVX1_SLOW-NEXT: vmovdqa {{.*#+}} xmm4 = [2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15]
; AVX1_SLOW-NEXT: vpshufb %xmm4, %xmm0, %xmm0
; AVX1_SLOW-NEXT: vpsubw %xmm0, %xmm2, %xmm0
; AVX1_SLOW-NEXT: vpshufb %xmm4, %xmm3, %xmm2
; AVX1_SLOW-NEXT: vpsubw %xmm2, %xmm1, %xmm1
; AVX1_SLOW-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
; AVX1_SLOW-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_SLOW-NEXT: retq
;
; AVX1_FAST-LABEL: hsub_v16i16b:
; AVX1_FAST: # %bb.0:
; AVX1_FAST-NEXT: vphsubw %xmm0, %xmm0, %xmm1
; AVX1_FAST-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1_FAST-NEXT: vphsubw %xmm0, %xmm0, %xmm0
; AVX1_FAST-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX1_FAST-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
; AVX1_FAST-NEXT: retq
;		;
; AVX2_SLOW-LABEL: hsub_v16i16b:		; AVX1-LABEL: hsub_v16i16b:
; AVX2_SLOW: # %bb.0:		; AVX1: # %bb.0:
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} ymm1 = ymm0[0,1,4,5,8,9,12,13,8,9,12,13,12,13,14,15,16,17,20,21,24,25,28,29,24,25,28,29,28,29,30,31]		; AVX1-NEXT: vphsubw %xmm0, %xmm0, %xmm1
; AVX2_SLOW-NEXT: vpshufb {{.*#+}} ymm0 = ymm0[2,3,6,7,10,11,14,15,14,15,10,11,12,13,14,15,18,19,22,23,26,27,30,31,30,31,26,27,28,29,30,31]		; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX2_SLOW-NEXT: vpsubw %ymm0, %ymm1, %ymm0		; AVX1-NEXT: vphsubw %xmm0, %xmm0, %xmm0
; AVX2_SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[0,1,0,1,4,5,4,5]		; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
; AVX2_SLOW-NEXT: retq		; AVX1-NEXT: vmovddup {{.*#+}} ymm0 = ymm0[0,0,2,2]
		; AVX1-NEXT: retq
;		;
; AVX2_FAST-LABEL: hsub_v16i16b:		; AVX2-LABEL: hsub_v16i16b:
; AVX2_FAST: # %bb.0:		; AVX2: # %bb.0:
; AVX2_FAST-NEXT: vphsubw %ymm0, %ymm0, %ymm0		; AVX2-NEXT: vphsubw %ymm0, %ymm0, %ymm0
; AVX2_FAST-NEXT: retq		; AVX2-NEXT: retq
%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef>		%a0 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 10, i32 12, i32 14, i32 undef, i32 undef, i32 undef, i32 undef>
%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef>		%a1 = shufflevector <16 x i16> %a, <16 x i16> undef, <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 9, i32 11, i32 13, i32 15, i32 undef, i32 undef, i32 undef, i32 undef>
%hop = sub <16 x i16> %a0, %a1		%hop = sub <16 x i16> %a0, %a1
%shuf = shufflevector <16 x i16> %hop, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 8, i32 9, i32 10, i32 11>		%shuf = shufflevector <16 x i16> %hop, <16 x i16> undef, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11, i32 8, i32 9, i32 10, i32 11>
ret <16 x i16> %shuf		ret <16 x i16> %shuf
}		}

test/CodeGen/X86/haddsub-undef.ll

	Show First 20 Lines • Show All 481 Lines • ▼ Show 20 Lines
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
	%l = shufflevector <2 x double> %x, <2 x double> undef, <2 x i32> <i32 undef, i32 0>			%l = shufflevector <2 x double> %x, <2 x double> undef, <2 x i32> <i32 undef, i32 0>
	%add = fadd <2 x double> %l, %x			%add = fadd <2 x double> %l, %x
	%shuffle2 = shufflevector <2 x double> %add, <2 x double> undef, <2 x i32> <i32 1, i32 undef>			%shuffle2 = shufflevector <2 x double> %add, <2 x double> undef, <2 x i32> <i32 1, i32 undef>
	ret <2 x double> %shuffle2			ret <2 x double> %shuffle2
	}			}

	define <4 x float> @add_ps_007(<4 x float> %x) {			define <4 x float> @add_ps_007(<4 x float> %x) {
	; SSE-SLOW-LABEL: add_ps_007:			; SSE-LABEL: add_ps_007:
	; SSE-SLOW: # %bb.0:			; SSE: # %bb.0:
	; SSE-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE-NEXT: haddps %xmm0, %xmm0
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,1],xmm0[0,2]			; SSE-NEXT: retq
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; SSE-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE-SLOW-NEXT: retq
	;
	; SSE-FAST-LABEL: add_ps_007:
	; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: add_ps_007:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,1,0,2]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: add_ps_007:			; AVX-LABEL: add_ps_007:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>			%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>
	%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>			%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>
	%add = fadd <4 x float> %l, %r			%add = fadd <4 x float> %l, %r
	ret <4 x float> %add			ret <4 x float> %add
	}			}

	define <4 x float> @add_ps_030(<4 x float> %x) {			define <4 x float> @add_ps_030(<4 x float> %x) {
	; SSE-SLOW-LABEL: add_ps_030:			; SSE-LABEL: add_ps_030:
	; SSE-SLOW: # %bb.0:			; SSE: # %bb.0:
	; SSE-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE-NEXT: haddps %xmm0, %xmm0
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,1],xmm0[0,2]			; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSE-NEXT: retq
	; SSE-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; SSE-SLOW-NEXT: retq
	;
	; SSE-FAST-LABEL: add_ps_030:
	; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE-FAST-NEXT: shufps {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; SSE-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: add_ps_030:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,1,0,2]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: add_ps_030:			; AVX-LABEL: add_ps_030:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,2,3]			; AVX-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>			%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>
	%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>			%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>
	%add = fadd <4 x float> %l, %r			%add = fadd <4 x float> %l, %r
	%shuffle2 = shufflevector <4 x float> %add, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 undef, i32 undef>			%shuffle2 = shufflevector <4 x float> %add, <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 undef, i32 undef>
	ret <4 x float> %shuffle2			ret <4 x float> %shuffle2
	}			}

	define <4 x float> @add_ps_007_2(<4 x float> %x) {			define <4 x float> @add_ps_007_2(<4 x float> %x) {
	; SSE-SLOW-LABEL: add_ps_007_2:			; SSE-LABEL: add_ps_007_2:
	; SSE-SLOW: # %bb.0:			; SSE: # %bb.0:
	; SSE-SLOW-NEXT: movddup {{.*#+}} xmm1 = xmm0[0,0]			; SSE-NEXT: haddps %xmm0, %xmm0
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSE-NEXT: retq
	; SSE-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE-SLOW-NEXT: retq
	;
	; SSE-FAST-LABEL: add_ps_007_2:
	; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE-FAST-NEXT: retq
	;
	; AVX1-SLOW-LABEL: add_ps_007_2:
	; AVX1-SLOW: # %bb.0:
	; AVX1-SLOW-NEXT: vmovddup {{.*#+}} xmm1 = xmm0[0,0]
	; AVX1-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; AVX1-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX1-SLOW-NEXT: retq
	;
	; AVX-FAST-LABEL: add_ps_007_2:
	; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq
	;			;
	; AVX512-SLOW-LABEL: add_ps_007_2:			; AVX-LABEL: add_ps_007_2:
	; AVX512-SLOW: # %bb.0:			; AVX: # %bb.0:
	; AVX512-SLOW-NEXT: vbroadcastss %xmm0, %xmm1			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX512-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; AVX-NEXT: retq
	; AVX512-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX512-SLOW-NEXT: retq
	%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>			%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>
	%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>			%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>
	%add = fadd <4 x float> %l, %r			%add = fadd <4 x float> %l, %r
	ret <4 x float> %add			ret <4 x float> %add
	}			}

	define <4 x float> @add_ps_008(<4 x float> %x) {			define <4 x float> @add_ps_008(<4 x float> %x) {
	; SSE-SLOW-LABEL: add_ps_008:			; SSE-SLOW-LABEL: add_ps_008:
	▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
	%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>			%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>
	%add = fadd <4 x float> %l, %x			%add = fadd <4 x float> %l, %x
	%shuffle2 = shufflevector <4 x float> %add, <4 x float> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef>			%shuffle2 = shufflevector <4 x float> %add, <4 x float> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef>
	ret <4 x float> %shuffle2			ret <4 x float> %shuffle2
	}			}

	define <4 x float> @add_ps_018(<4 x float> %x) {			define <4 x float> @add_ps_018(<4 x float> %x) {
	; SSE-SLOW-LABEL: add_ps_018:			; SSE-LABEL: add_ps_018:
	; SSE-SLOW: # %bb.0:			; SSE: # %bb.0:
	; SSE-SLOW-NEXT: movddup {{.*#+}} xmm1 = xmm0[0,0]			; SSE-NEXT: haddps %xmm0, %xmm0
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; SSE-SLOW-NEXT: addps %xmm1, %xmm0			; SSE-NEXT: retq
	; SSE-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; SSE-SLOW-NEXT: retq
	;
	; SSE-FAST-LABEL: add_ps_018:
	; SSE-FAST: # %bb.0:
	; SSE-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE-FAST-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; SSE-FAST-NEXT: retq
	;
	; AVX1-SLOW-LABEL: add_ps_018:
	; AVX1-SLOW: # %bb.0:
	; AVX1-SLOW-NEXT: vmovddup {{.*#+}} xmm1 = xmm0[0,0]
	; AVX1-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; AVX1-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX1-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; AVX1-SLOW-NEXT: retq
	;
	; AVX-FAST-LABEL: add_ps_018:
	; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; AVX-FAST-NEXT: retq
	;			;
	; AVX512-SLOW-LABEL: add_ps_018:			; AVX-LABEL: add_ps_018:
	; AVX512-SLOW: # %bb.0:			; AVX: # %bb.0:
	; AVX512-SLOW-NEXT: vbroadcastss %xmm0, %xmm1			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX512-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,1,3]			; AVX-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; AVX512-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0			; AVX-NEXT: retq
	; AVX512-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; AVX512-SLOW-NEXT: retq
	%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>			%l = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>
	%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>			%r = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>
	%add = fadd <4 x float> %l, %r			%add = fadd <4 x float> %l, %r
	%shuffle2 = shufflevector <4 x float> %add, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>			%shuffle2 = shufflevector <4 x float> %add, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>
	ret <4 x float> %shuffle2			ret <4 x float> %shuffle2
	}			}

	define <4 x float> @v8f32_inputs_v4f32_output_0101(<8 x float> %a, <8 x float> %b) {			define <4 x float> @v8f32_inputs_v4f32_output_0101(<8 x float> %a, <8 x float> %b) {
	▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

test/CodeGen/X86/haddsub.ll

	Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 1, i32 2, i32 5, i32 6>			%a = shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 1, i32 2, i32 5, i32 6>
	%b = shufflevector <4 x float> %y, <4 x float> %x, <4 x i32> <i32 4, i32 7, i32 0, i32 3>			%b = shufflevector <4 x float> %y, <4 x float> %x, <4 x i32> <i32 4, i32 7, i32 0, i32 3>
	%r = fadd <4 x float> %a, %b			%r = fadd <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @haddps3(<4 x float> %x) {			define <4 x float> @haddps3(<4 x float> %x) {
	; SSE3-SLOW-LABEL: haddps3:			; SSE3-LABEL: haddps3:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE3-NEXT: haddps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm0[2,3]			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE3-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: haddps3:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: haddps3:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
	; AVX-SLOW-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: haddps3:			; AVX-LABEL: haddps3:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>			%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>
	%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>			%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>
	%r = fadd <4 x float> %a, %b			%r = fadd <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @haddps4(<4 x float> %x) {			define <4 x float> @haddps4(<4 x float> %x) {
	; SSE3-SLOW-LABEL: haddps4:			; SSE3-LABEL: haddps4:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE3-NEXT: haddps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm0[2,3]			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; SSE3-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: haddps4:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: haddps4:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: haddps4:			; AVX-LABEL: haddps4:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>			%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
	%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>			%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
	%r = fadd <4 x float> %a, %b			%r = fadd <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @haddps5(<4 x float> %x) {			define <4 x float> @haddps5(<4 x float> %x) {
	; SSE3-SLOW-LABEL: haddps5:			; SSE3-LABEL: haddps5:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE3-NEXT: haddps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,3],xmm0[2,3]			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,2,2,3]
	; SSE3-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: haddps5:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: haddps5:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,3,2,3]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,2,2,3]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: haddps5:			; AVX-LABEL: haddps5:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 3, i32 undef, i32 undef>			%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 3, i32 undef, i32 undef>
	%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 2, i32 undef, i32 undef>			%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 2, i32 undef, i32 undef>
	%r = fadd <4 x float> %a, %b			%r = fadd <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @haddps6(<4 x float> %x) {			define <4 x float> @haddps6(<4 x float> %x) {
	; SSE3-SLOW-LABEL: haddps6:			; SSE3-SLOW-LABEL: haddps6:
	Show All 19 Lines
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>			%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
	%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	%r = fadd <4 x float> %a, %b			%r = fadd <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @haddps7(<4 x float> %x) {			define <4 x float> @haddps7(<4 x float> %x) {
	; SSE3-SLOW-LABEL: haddps7:			; SSE3-LABEL: haddps7:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE3-NEXT: haddps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: unpckhpd {{.*#+}} xmm1 = xmm1[1],xmm0[1]			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; SSE3-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: haddps7:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: haddps7:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilpd {{.*#+}} xmm1 = xmm0[1,0]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,2,3]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: haddps7:			; AVX-LABEL: haddps7:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 3, i32 undef, i32 undef>			%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 3, i32 undef, i32 undef>
	%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>			%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>
	%r = fadd <4 x float> %a, %b			%r = fadd <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <2 x double> @hsubpd1(<2 x double> %x, <2 x double> %y) {			define <2 x double> @hsubpd1(<2 x double> %x, <2 x double> %y) {
	; SSE3-LABEL: hsubpd1:			; SSE3-LABEL: hsubpd1:
	▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%a = shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%b = shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%b = shufflevector <4 x float> %x, <4 x float> %y, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	%r = fsub <4 x float> %a, %b			%r = fsub <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @hsubps2(<4 x float> %x) {			define <4 x float> @hsubps2(<4 x float> %x) {
	; SSE3-SLOW-LABEL: hsubps2:			; SSE3-LABEL: hsubps2:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE3-NEXT: hsubps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm0[2,3]			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: movhlps {{.*#+}} xmm0 = xmm0[1,1]
	; SSE3-SLOW-NEXT: subps %xmm0, %xmm1
	; SSE3-SLOW-NEXT: movaps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: hsubps2:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: hsubps %xmm0, %xmm0
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: hsubps2:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
	; AVX-SLOW-NEXT: vpermilpd {{.*#+}} xmm0 = xmm0[1,0]
	; AVX-SLOW-NEXT: vsubps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: hsubps2:			; AVX-LABEL: hsubps2:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhsubps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhsubps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>			%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>
	%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>			%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>
	%r = fsub <4 x float> %a, %b			%r = fsub <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @hsubps3(<4 x float> %x) {			define <4 x float> @hsubps3(<4 x float> %x) {
	; SSE3-SLOW-LABEL: hsubps3:			; SSE3-LABEL: hsubps3:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE3-NEXT: hsubps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm0[2,3]			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; SSE3-SLOW-NEXT: subps %xmm0, %xmm1
	; SSE3-SLOW-NEXT: movaps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: hsubps3:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: hsubps %xmm0, %xmm0
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: hsubps3:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; AVX-SLOW-NEXT: vsubps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: hsubps3:			; AVX-LABEL: hsubps3:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhsubps %xmm0, %xmm0, %xmm0			; AVX-NEXT: vhsubps %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>			%a = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
	%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>			%b = shufflevector <4 x float> %x, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
	%r = fsub <4 x float> %a, %b			%r = fsub <4 x float> %a, %b
	ret <4 x float> %r			ret <4 x float> %r
	}			}

	define <4 x float> @hsubps4(<4 x float> %x) {			define <4 x float> @hsubps4(<4 x float> %x) {
	; SSE3-SLOW-LABEL: hsubps4:			; SSE3-SLOW-LABEL: hsubps4:
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <8 x float> %x, <8 x float> %y, <8 x i32> <i32 1, i32 2, i32 9, i32 10, i32 5, i32 6, i32 13, i32 14>			%a = shufflevector <8 x float> %x, <8 x float> %y, <8 x i32> <i32 1, i32 2, i32 9, i32 10, i32 5, i32 6, i32 13, i32 14>
	%b = shufflevector <8 x float> %y, <8 x float> %x, <8 x i32> <i32 8, i32 11, i32 0, i32 3, i32 12, i32 15, i32 4, i32 7>			%b = shufflevector <8 x float> %y, <8 x float> %x, <8 x i32> <i32 8, i32 11, i32 0, i32 3, i32 12, i32 15, i32 4, i32 7>
	%r = fadd <8 x float> %a, %b			%r = fadd <8 x float> %a, %b
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <8 x float> @vhaddps3(<8 x float> %x) {			define <8 x float> @vhaddps3(<8 x float> %x) {
	; SSE3-SLOW-LABEL: vhaddps3:			; SSE3-LABEL: vhaddps3:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm1, %xmm2			; SSE3-NEXT: haddps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[2,3]			; SSE3-NEXT: haddps %xmm1, %xmm1
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm3			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,2],xmm0[2,3]
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3,2,3]
	; SSE3-SLOW-NEXT: addps %xmm2, %xmm1
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; SSE3-SLOW-NEXT: addps %xmm3, %xmm0
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: vhaddps3:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE3-FAST-NEXT: haddps %xmm1, %xmm1
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: vhaddps3:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
	; AVX-SLOW-NEXT: vaddps %ymm0, %ymm1, %ymm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: vhaddps3:			; AVX-LABEL: vhaddps3:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhaddps %ymm0, %ymm0, %ymm0			; AVX-NEXT: vhaddps %ymm0, %ymm0, %ymm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 undef, i32 2, i32 8, i32 10, i32 4, i32 6, i32 undef, i32 14>			%a = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 undef, i32 2, i32 8, i32 10, i32 4, i32 6, i32 undef, i32 14>
	%b = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 9, i32 undef, i32 5, i32 7, i32 13, i32 15>			%b = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 9, i32 undef, i32 5, i32 7, i32 13, i32 15>
	%r = fadd <8 x float> %a, %b			%r = fadd <8 x float> %a, %b
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <8 x float> @vhsubps1(<8 x float> %x, <8 x float> %y) {			define <8 x float> @vhsubps1(<8 x float> %x, <8 x float> %y) {
	; SSE3-LABEL: vhsubps1:			; SSE3-LABEL: vhsubps1:
	; SSE3: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-NEXT: hsubps %xmm2, %xmm0			; SSE3-NEXT: hsubps %xmm2, %xmm0
	; SSE3-NEXT: hsubps %xmm3, %xmm1			; SSE3-NEXT: hsubps %xmm3, %xmm1
	; SSE3-NEXT: retq			; SSE3-NEXT: retq
	;			;
	; AVX-LABEL: vhsubps1:			; AVX-LABEL: vhsubps1:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vhsubps %ymm1, %ymm0, %ymm0			; AVX-NEXT: vhsubps %ymm1, %ymm0, %ymm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <8 x float> %x, <8 x float> %y, <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>			%a = shufflevector <8 x float> %x, <8 x float> %y, <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
	%b = shufflevector <8 x float> %x, <8 x float> %y, <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>			%b = shufflevector <8 x float> %x, <8 x float> %y, <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
	%r = fsub <8 x float> %a, %b			%r = fsub <8 x float> %a, %b
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <8 x float> @vhsubps3(<8 x float> %x) {			define <8 x float> @vhsubps3(<8 x float> %x) {
	; SSE3-SLOW-LABEL: vhsubps3:			; SSE3-LABEL: vhsubps3:
	; SSE3-SLOW: # %bb.0:			; SSE3: # %bb.0:
	; SSE3-SLOW-NEXT: movaps %xmm1, %xmm2			; SSE3-NEXT: hsubps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,2],xmm1[2,3]			; SSE3-NEXT: hsubps %xmm1, %xmm1
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm3			; SSE3-NEXT: retq
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm3 = xmm3[0,2],xmm0[2,3]
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3,2,3]
	; SSE3-SLOW-NEXT: subps %xmm1, %xmm2
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; SSE3-SLOW-NEXT: subps %xmm0, %xmm3
	; SSE3-SLOW-NEXT: movaps %xmm3, %xmm0
	; SSE3-SLOW-NEXT: movaps %xmm2, %xmm1
	; SSE3-SLOW-NEXT: retq
	;
	; SSE3-FAST-LABEL: vhsubps3:
	; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: hsubps %xmm0, %xmm0
	; SSE3-FAST-NEXT: hsubps %xmm1, %xmm1
	; SSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: vhsubps3:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} ymm1 = ymm0[0,2,2,3,4,6,6,7]
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[1,3,2,3,5,7,6,7]
	; AVX-SLOW-NEXT: vsubps %ymm0, %ymm1, %ymm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: vhsubps3:			; AVX-LABEL: vhsubps3:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vhsubps %ymm0, %ymm0, %ymm0			; AVX-NEXT: vhsubps %ymm0, %ymm0, %ymm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 undef, i32 2, i32 8, i32 10, i32 4, i32 6, i32 undef, i32 14>			%a = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 undef, i32 2, i32 8, i32 10, i32 4, i32 6, i32 undef, i32 14>
	%b = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 9, i32 undef, i32 5, i32 7, i32 13, i32 15>			%b = shufflevector <8 x float> %x, <8 x float> undef, <8 x i32> <i32 1, i32 3, i32 9, i32 undef, i32 5, i32 7, i32 13, i32 15>
	%r = fsub <8 x float> %a, %b			%r = fsub <8 x float> %a, %b
	ret <8 x float> %r			ret <8 x float> %r
	}			}

	define <4 x double> @vhaddpd1(<4 x double> %x, <4 x double> %y) {			define <4 x double> @vhaddpd1(<4 x double> %x, <4 x double> %y) {
	; SSE3-LABEL: vhaddpd1:			; SSE3-LABEL: vhaddpd1:
	▲ Show 20 Lines • Show All 1,072 Lines • ▼ Show 20 Lines
	; SSSE3-FAST-NEXT: haddps %xmm1, %xmm0			; SSSE3-FAST-NEXT: haddps %xmm1, %xmm0
	; SSSE3-FAST-NEXT: haddps %xmm0, %xmm0			; SSSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSSE3-FAST-NEXT: haddps %xmm0, %xmm0			; SSSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq			; SSSE3-FAST-NEXT: retq
	;			;
	; SSE3-SLOW-LABEL: PR39936_v8f32:			; SSE3-SLOW-LABEL: PR39936_v8f32:
	; SSE3-SLOW: # %bb.0:			; SSE3-SLOW: # %bb.0:
	; SSE3-SLOW-NEXT: haddps %xmm1, %xmm0			; SSE3-SLOW-NEXT: haddps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: movaps %xmm0, %xmm1			; SSE3-SLOW-NEXT: haddps %xmm0, %xmm0
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,2],xmm0[2,3]
	; SSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; SSE3-SLOW-NEXT: addps %xmm1, %xmm0
	; SSE3-SLOW-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]			; SSE3-SLOW-NEXT: movshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
	; SSE3-SLOW-NEXT: addss %xmm1, %xmm0			; SSE3-SLOW-NEXT: addss %xmm1, %xmm0
	; SSE3-SLOW-NEXT: retq			; SSE3-SLOW-NEXT: retq
	;			;
	; SSE3-FAST-LABEL: PR39936_v8f32:			; SSE3-FAST-LABEL: PR39936_v8f32:
	; SSE3-FAST: # %bb.0:			; SSE3-FAST: # %bb.0:
	; SSE3-FAST-NEXT: haddps %xmm1, %xmm0			; SSE3-FAST-NEXT: haddps %xmm1, %xmm0
	; SSE3-FAST-NEXT: haddps %xmm0, %xmm0			; SSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE3-FAST-NEXT: haddps %xmm0, %xmm0			; SSE3-FAST-NEXT: haddps %xmm0, %xmm0
	; SSE3-FAST-NEXT: retq			; SSE3-FAST-NEXT: retq
	;			;
	; AVX-SLOW-LABEL: PR39936_v8f32:			; AVX-SLOW-LABEL: PR39936_v8f32:
	; AVX-SLOW: # %bb.0:			; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX-SLOW-NEXT: vextractf128 $1, %ymm0, %xmm1
	; AVX-SLOW-NEXT: vhaddps %xmm1, %xmm0, %xmm0			; AVX-SLOW-NEXT: vhaddps %xmm1, %xmm0, %xmm0
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[0,2,2,3]			; AVX-SLOW-NEXT: vhaddps %xmm0, %xmm0, %xmm0
	; AVX-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]			; AVX-SLOW-NEXT: vmovshdup {{.*#+}} xmm1 = xmm0[1,1,3,3]
	; AVX-SLOW-NEXT: vaddss %xmm1, %xmm0, %xmm0			; AVX-SLOW-NEXT: vaddss %xmm1, %xmm0, %xmm0
	; AVX-SLOW-NEXT: vzeroupper			; AVX-SLOW-NEXT: vzeroupper
	; AVX-SLOW-NEXT: retq			; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: PR39936_v8f32:			; AVX-FAST-LABEL: PR39936_v8f32:
	; AVX-FAST: # %bb.0:			; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX-FAST-NEXT: vextractf128 $1, %ymm0, %xmm1
	Show All 16 Lines

test/CodeGen/X86/phaddsub.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 1, i32 2, i32 5, i32 6>			%a = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 1, i32 2, i32 5, i32 6>
	%b = shufflevector <4 x i32> %y, <4 x i32> %x, <4 x i32> <i32 4, i32 7, i32 0, i32 3>			%b = shufflevector <4 x i32> %y, <4 x i32> %x, <4 x i32> <i32 4, i32 7, i32 0, i32 3>
	%r = add <4 x i32> %a, %b			%r = add <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phaddd3(<4 x i32> %x) {			define <4 x i32> @phaddd3(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd3:			; SSSE3-LABEL: phaddd3:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,2,3,3]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd3:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddd3:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,2,3,3]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; AVX-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddd3:			; AVX-LABEL: phaddd3:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>			%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>
	%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>			%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>
	%r = add <4 x i32> %a, %b			%r = add <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phaddd4(<4 x i32> %x) {			define <4 x i32> @phaddd4(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd4:			; SSSE3-LABEL: phaddd4:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd4:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddd4:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; AVX-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddd4:			; AVX-LABEL: phaddd4:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>			%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
	%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>			%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
	%r = add <4 x i32> %a, %b			%r = add <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phaddd5(<4 x i32> %x) {			define <4 x i32> @phaddd5(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd5:			; SSSE3-LABEL: phaddd5:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,3,2,3]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,2,2,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd5:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddd5:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,3,2,3]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,2,2,3]
	; AVX-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddd5:			; AVX-LABEL: phaddd5:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 3, i32 undef, i32 undef>			%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 3, i32 undef, i32 undef>
	%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 undef, i32 undef>			%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 2, i32 undef, i32 undef>
	%r = add <4 x i32> %a, %b			%r = add <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phaddd6(<4 x i32> %x) {			define <4 x i32> @phaddd6(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd6:			; SSSE3-SLOW-LABEL: phaddd6:
	Show All 19 Lines
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>			%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
	%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>			%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	%r = add <4 x i32> %a, %b			%r = add <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phaddd7(<4 x i32> %x) {			define <4 x i32> @phaddd7(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd7:			; SSSE3-LABEL: phaddd7:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd7:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddd7:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; AVX-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddd7:			; AVX-LABEL: phaddd7:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 undef, i32 undef>			%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 undef, i32 undef>
	%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>			%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>
	%r = add <4 x i32> %a, %b			%r = add <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <8 x i16> @phsubw1(<8 x i16> %x, <8 x i16> %y) {			define <8 x i16> @phsubw1(<8 x i16> %x, <8 x i16> %y) {
	; SSSE3-LABEL: phsubw1:			; SSSE3-LABEL: phsubw1:
	Show All 23 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%a = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%b = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%b = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	%r = sub <4 x i32> %a, %b			%r = sub <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phsubd2(<4 x i32> %x) {			define <4 x i32> @phsubd2(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phsubd2:			; SSSE3-LABEL: phsubd2:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,2,3,3]			; SSSE3-NEXT: phsubd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: psubd %xmm0, %xmm1
	; SSSE3-SLOW-NEXT: movdqa %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phsubd2:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phsubd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phsubd2:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[2,2,3,3]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
	; AVX-SLOW-NEXT: vpsubd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phsubd2:			; AVX-LABEL: phsubd2:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphsubd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphsubd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>			%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 4, i32 6>
	%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>			%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 3, i32 5, i32 7>
	%r = sub <4 x i32> %a, %b			%r = sub <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phsubd3(<4 x i32> %x) {			define <4 x i32> @phsubd3(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phsubd3:			; SSSE3-LABEL: phsubd3:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]			; SSSE3-NEXT: phsubd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: psubd %xmm0, %xmm1
	; SSSE3-SLOW-NEXT: movdqa %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phsubd3:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phsubd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phsubd3:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; AVX-SLOW-NEXT: vpsubd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phsubd3:			; AVX-LABEL: phsubd3:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphsubd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphsubd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>			%a = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 0, i32 2, i32 undef, i32 undef>
	%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>			%b = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 1, i32 3, i32 undef, i32 undef>
	%r = sub <4 x i32> %a, %b			%r = sub <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phsubd4(<4 x i32> %x) {			define <4 x i32> @phsubd4(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phsubd4:			; SSSE3-SLOW-LABEL: phsubd4:
	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%a = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	%b = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%b = shufflevector <4 x i32> %x, <4 x i32> %y, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	%r = sub <4 x i32> %a, %b			%r = sub <4 x i32> %a, %b
	ret <4 x i32> %r			ret <4 x i32> %r
	}			}

	define <4 x i32> @phaddd_single_source1(<4 x i32> %x) {			define <4 x i32> @phaddd_single_source1(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd_single_source1:			; SSSE3-LABEL: phaddd_single_source1:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,2]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd_single_source1:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddd_single_source1:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,1,0,2]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; AVX-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddd_single_source1:			; AVX-LABEL: phaddd_single_source1:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>			%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>
	%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>			%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>
	%add = add <4 x i32> %l, %r			%add = add <4 x i32> %l, %r
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

	define <4 x i32> @phaddd_single_source2(<4 x i32> %x) {			define <4 x i32> @phaddd_single_source2(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd_single_source2:			; SSSE3-LABEL: phaddd_single_source2:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,2]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd_single_source2:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: pshufd {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddd_single_source2:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,1,0,2]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]
	; AVX-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddd_single_source2:			; AVX-LABEL: phaddd_single_source2:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,2,2,3]			; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,2,2,3]
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>			%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 2>
	%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>			%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 3>
	%add = add <4 x i32> %l, %r			%add = add <4 x i32> %l, %r
	%shuffle2 = shufflevector <4 x i32> %add, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 undef, i32 undef>			%shuffle2 = shufflevector <4 x i32> %add, <4 x i32> undef, <4 x i32> <i32 3, i32 2, i32 undef, i32 undef>
	ret <4 x i32> %shuffle2			ret <4 x i32> %shuffle2
	}			}

	define <4 x i32> @phaddd_single_source3(<4 x i32> %x) {			define <4 x i32> @phaddd_single_source3(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd_single_source3:			; SSSE3-LABEL: phaddd_single_source3:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd_single_source3:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX1-SLOW-LABEL: phaddd_single_source3:
	; AVX1-SLOW: # %bb.0:
	; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]
	; AVX1-SLOW-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
	; AVX1-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX1-SLOW-NEXT: retq
	;
	; AVX-FAST-LABEL: phaddd_single_source3:
	; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq
	;			;
	; AVX2-SLOW-LABEL: phaddd_single_source3:			; AVX-LABEL: phaddd_single_source3:
	; AVX2-SLOW: # %bb.0:			; AVX: # %bb.0:
	; AVX2-SLOW-NEXT: vpbroadcastd %xmm0, %xmm1			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX2-SLOW-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; AVX-NEXT: retq
	; AVX2-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX2-SLOW-NEXT: retq
	%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>			%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>
	%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>			%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>
	%add = add <4 x i32> %l, %r			%add = add <4 x i32> %l, %r
	ret <4 x i32> %add			ret <4 x i32> %add
	}			}

	define <4 x i32> @phaddd_single_source4(<4 x i32> %x) {			define <4 x i32> @phaddd_single_source4(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd_single_source4:			; SSSE3-SLOW-LABEL: phaddd_single_source4:
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
	%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>			%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 undef, i32 2>
	%add = add <4 x i32> %l, %x			%add = add <4 x i32> %l, %x
	%shuffle2 = shufflevector <4 x i32> %add, <4 x i32> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef>			%shuffle2 = shufflevector <4 x i32> %add, <4 x i32> undef, <4 x i32> <i32 3, i32 undef, i32 undef, i32 undef>
	ret <4 x i32> %shuffle2			ret <4 x i32> %shuffle2
	}			}

	define <4 x i32> @phaddd_single_source6(<4 x i32> %x) {			define <4 x i32> @phaddd_single_source6(<4 x i32> %x) {
	; SSSE3-SLOW-LABEL: phaddd_single_source6:			; SSSE3-LABEL: phaddd_single_source6:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]			; SSSE3-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,1,3]			; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddd_single_source6:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; SSSE3-FAST-NEXT: retq
	;
	; AVX1-SLOW-LABEL: phaddd_single_source6:
	; AVX1-SLOW: # %bb.0:
	; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]
	; AVX1-SLOW-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
	; AVX1-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; AVX1-SLOW-NEXT: retq
	;
	; AVX-FAST-LABEL: phaddd_single_source6:
	; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; AVX-FAST-NEXT: retq
	;			;
	; AVX2-SLOW-LABEL: phaddd_single_source6:			; AVX-LABEL: phaddd_single_source6:
	; AVX2-SLOW: # %bb.0:			; AVX: # %bb.0:
	; AVX2-SLOW-NEXT: vpbroadcastd %xmm0, %xmm1			; AVX-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX2-SLOW-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero			; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; AVX2-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0			; AVX-NEXT: retq
	; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
	; AVX2-SLOW-NEXT: retq
	%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>			%l = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 0, i32 undef>
	%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>			%r = shufflevector <4 x i32> %x, <4 x i32> undef, <4 x i32> <i32 undef, i32 undef, i32 1, i32 undef>
	%add = add <4 x i32> %l, %r			%add = add <4 x i32> %l, %r
	%shuffle2 = shufflevector <4 x i32> %add, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>			%shuffle2 = shufflevector <4 x i32> %add, <4 x i32> undef, <4 x i32> <i32 undef, i32 2, i32 undef, i32 undef>
	ret <4 x i32> %shuffle2			ret <4 x i32> %shuffle2
	}			}

	define <8 x i16> @phaddw_single_source1(<8 x i16> %x) {			define <8 x i16> @phaddw_single_source1(<8 x i16> %x) {
	; SSSE3-SLOW-LABEL: phaddw_single_source1:			; SSSE3-LABEL: phaddw_single_source1:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: movdqa %xmm0, %xmm1			; SSSE3-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufb {{.*#+}} xmm1 = xmm1[0,1,4,5,4,5,6,7,0,1,4,5,8,9,12,13]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: pshufb {{.*#+}} xmm0 = xmm0[6,7,2,3,4,5,6,7,2,3,6,7,10,11,14,15]
	; SSSE3-SLOW-NEXT: paddw %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddw_single_source1:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddw_single_source1:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshufb {{.*#+}} xmm1 = xmm0[0,1,4,5,4,5,6,7,0,1,4,5,8,9,12,13]
	; AVX-SLOW-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[6,7,2,3,4,5,6,7,2,3,6,7,10,11,14,15]
	; AVX-SLOW-NEXT: vpaddw %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddw_single_source1:			; AVX-LABEL: phaddw_single_source1:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddw %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 2, i32 4, i32 6>			%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 2, i32 4, i32 6>
	%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 3, i32 5, i32 7>			%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 3, i32 5, i32 7>
	%add = add <8 x i16> %l, %r			%add = add <8 x i16> %l, %r
	ret <8 x i16> %add			ret <8 x i16> %add
	}			}

	define <8 x i16> @phaddw_single_source2(<8 x i16> %x) {			define <8 x i16> @phaddw_single_source2(<8 x i16> %x) {
	; SSSE3-SLOW-LABEL: phaddw_single_source2:			; SSSE3-LABEL: phaddw_single_source2:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshuflw {{.*#+}} xmm1 = xmm0[0,2,2,3,4,5,6,7]			; SSSE3-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,1,0,3]			; SSSE3-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,4,6,7]
	; SSSE3-SLOW-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[1,3,2,3,4,5,6,7]			; SSSE3-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddw %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,4,6,7]
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddw_single_source2:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-FAST-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,4,6,7]
	; SSSE3-FAST-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddw_single_source2:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshuflw {{.*#+}} xmm1 = xmm0[0,2,2,3,4,5,6,7]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,3]
	; AVX-SLOW-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[1,3,2,3,4,5,6,7]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]
	; AVX-SLOW-NEXT: vpaddw %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: vpshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,4,6,7]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddw_single_source2:			; AVX-LABEL: phaddw_single_source2:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddw %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: vpshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,4,6,7]			; AVX-NEXT: vpshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,4,6,7]
	; AVX-FAST-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]			; AVX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,1,2,3]
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 2, i32 4, i32 6>			%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 2, i32 4, i32 6>
	%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 3, i32 5, i32 7>			%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 3, i32 5, i32 7>
	%add = add <8 x i16> %l, %r			%add = add <8 x i16> %l, %r
	%shuffle2 = shufflevector <8 x i16> %add, <8 x i16> undef, <8 x i32> <i32 5, i32 4, i32 3, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>			%shuffle2 = shufflevector <8 x i16> %add, <8 x i16> undef, <8 x i32> <i32 5, i32 4, i32 3, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x i16> %shuffle2			ret <8 x i16> %shuffle2
	}			}

	define <8 x i16> @phaddw_single_source3(<8 x i16> %x) {			define <8 x i16> @phaddw_single_source3(<8 x i16> %x) {
	; SSSE3-SLOW-LABEL: phaddw_single_source3:			; SSSE3-LABEL: phaddw_single_source3:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshuflw {{.*#+}} xmm1 = xmm0[0,2,2,3,4,5,6,7]			; SSSE3-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,1,0,3]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[1,3,2,3,4,5,6,7]
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]
	; SSSE3-SLOW-NEXT: paddw %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddw_single_source3:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-FAST-NEXT: retq
	;
	; AVX-SLOW-LABEL: phaddw_single_source3:
	; AVX-SLOW: # %bb.0:
	; AVX-SLOW-NEXT: vpshuflw {{.*#+}} xmm1 = xmm0[0,2,2,3,4,5,6,7]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[0,1,0,3]
	; AVX-SLOW-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[1,3,2,3,4,5,6,7]
	; AVX-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]
	; AVX-SLOW-NEXT: vpaddw %xmm0, %xmm1, %xmm0
	; AVX-SLOW-NEXT: retq
	;			;
	; AVX-FAST-LABEL: phaddw_single_source3:			; AVX-LABEL: phaddw_single_source3:
	; AVX-FAST: # %bb.0:			; AVX: # %bb.0:
	; AVX-FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0			; AVX-NEXT: vphaddw %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-NEXT: retq
	%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 2, i32 undef, i32 undef>			%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 2, i32 undef, i32 undef>
	%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 3, i32 undef, i32 undef>			%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 3, i32 undef, i32 undef>
	%add = add <8 x i16> %l, %r			%add = add <8 x i16> %l, %r
	ret <8 x i16> %add			ret <8 x i16> %add
	}			}

	define <8 x i16> @phaddw_single_source4(<8 x i16> %x) {			define <8 x i16> @phaddw_single_source4(<8 x i16> %x) {
	; SSSE3-SLOW-LABEL: phaddw_single_source4:			; SSSE3-SLOW-LABEL: phaddw_single_source4:
	Show All 20 Lines
	; AVX-FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0			; AVX-FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: retq			; AVX-FAST-NEXT: retq
	%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 6>			%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 6>
	%add = add <8 x i16> %l, %x			%add = add <8 x i16> %l, %x
	ret <8 x i16> %add			ret <8 x i16> %add
	}			}

	define <8 x i16> @phaddw_single_source6(<8 x i16> %x) {			define <8 x i16> @phaddw_single_source6(<8 x i16> %x) {
	; SSSE3-SLOW-LABEL: phaddw_single_source6:			; SSSE3-LABEL: phaddw_single_source6:
	; SSSE3-SLOW: # %bb.0:			; SSSE3: # %bb.0:
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]			; SSSE3-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,3]			; SSSE3-NEXT: psrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; SSSE3-SLOW-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,5,5,6,7]			; SSSE3-NEXT: retq
	; SSSE3-SLOW-NEXT: paddw %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: psrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; SSSE3-SLOW-NEXT: retq
	;
	; SSSE3-FAST-LABEL: phaddw_single_source6:
	; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddw %xmm0, %xmm0
	; SSSE3-FAST-NEXT: psrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; SSSE3-FAST-NEXT: retq
	;
	; AVX1-SLOW-LABEL: phaddw_single_source6:
	; AVX1-SLOW: # %bb.0:
	; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,1,0,1]
	; AVX1-SLOW-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero
	; AVX1-SLOW-NEXT: vpaddw %xmm0, %xmm1, %xmm0
	; AVX1-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; AVX1-SLOW-NEXT: retq
	;
	; AVX-FAST-LABEL: phaddw_single_source6:
	; AVX-FAST: # %bb.0:
	; AVX-FAST-NEXT: vphaddw %xmm0, %xmm0, %xmm0
	; AVX-FAST-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; AVX-FAST-NEXT: retq
	;			;
	; AVX2-SLOW-LABEL: phaddw_single_source6:			; AVX-LABEL: phaddw_single_source6:
	; AVX2-SLOW: # %bb.0:			; AVX: # %bb.0:
	; AVX2-SLOW-NEXT: vpbroadcastw %xmm0, %xmm1			; AVX-NEXT: vphaddw %xmm0, %xmm0, %xmm0
	; AVX2-SLOW-NEXT: vpmovzxwq {{.*#+}} xmm0 = xmm0[0],zero,zero,zero,xmm0[1],zero,zero,zero			; AVX-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; AVX2-SLOW-NEXT: vpaddw %xmm0, %xmm1, %xmm0			; AVX-NEXT: retq
	; AVX2-SLOW-NEXT: vpsrldq {{.*#+}} xmm0 = xmm0[6,7,8,9,10,11,12,13,14,15],zero,zero,zero,zero,zero,zero
	; AVX2-SLOW-NEXT: retq
	%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 undef, i32 undef, i32 undef>			%l = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 undef, i32 undef, i32 undef>
	%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef>			%r = shufflevector <8 x i16> %x, <8 x i16> undef, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 1, i32 undef, i32 undef, i32 undef>
	%add = add <8 x i16> %l, %r			%add = add <8 x i16> %l, %r
	%shuffle2 = shufflevector <8 x i16> %add, <8 x i16> undef, <8 x i32> <i32 undef, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			%shuffle2 = shufflevector <8 x i16> %add, <8 x i16> undef, <8 x i32> <i32 undef, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	ret <8 x i16> %shuffle2			ret <8 x i16> %shuffle2
	}			}

	; PR39921 + PR39936			; PR39921 + PR39936
	define i32 @PR39936_v8i32(<8 x i32>) {			define i32 @PR39936_v8i32(<8 x i32>) {
	; SSSE3-SLOW-LABEL: PR39936_v8i32:			; SSSE3-SLOW-LABEL: PR39936_v8i32:
	; SSSE3-SLOW: # %bb.0:			; SSSE3-SLOW: # %bb.0:
	; SSSE3-SLOW-NEXT: phaddd %xmm1, %xmm0			; SSSE3-SLOW-NEXT: phaddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]			; SSSE3-SLOW-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm0
	; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]			; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
	; SSSE3-SLOW-NEXT: paddd %xmm0, %xmm1			; SSSE3-SLOW-NEXT: paddd %xmm0, %xmm1
	; SSSE3-SLOW-NEXT: movd %xmm1, %eax			; SSSE3-SLOW-NEXT: movd %xmm1, %eax
	; SSSE3-SLOW-NEXT: retq			; SSSE3-SLOW-NEXT: retq
	;			;
	; SSSE3-FAST-LABEL: PR39936_v8i32:			; SSSE3-FAST-LABEL: PR39936_v8i32:
	; SSSE3-FAST: # %bb.0:			; SSSE3-FAST: # %bb.0:
	; SSSE3-FAST-NEXT: phaddd %xmm1, %xmm0			; SSSE3-FAST-NEXT: phaddd %xmm1, %xmm0
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0			; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0			; SSSE3-FAST-NEXT: phaddd %xmm0, %xmm0
	; SSSE3-FAST-NEXT: movd %xmm0, %eax			; SSSE3-FAST-NEXT: movd %xmm0, %eax
	; SSSE3-FAST-NEXT: retq			; SSSE3-FAST-NEXT: retq
	;			;
	; AVX1-SLOW-LABEL: PR39936_v8i32:			; AVX1-SLOW-LABEL: PR39936_v8i32:
	; AVX1-SLOW: # %bb.0:			; AVX1-SLOW: # %bb.0:
	; AVX1-SLOW-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-SLOW-NEXT: vextractf128 $1, %ymm0, %xmm1
	; AVX1-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0			; AVX1-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0
	; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]			; AVX1-SLOW-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; AVX1-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]			; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
	; AVX1-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0			; AVX1-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX1-SLOW-NEXT: vmovd %xmm0, %eax			; AVX1-SLOW-NEXT: vmovd %xmm0, %eax
	; AVX1-SLOW-NEXT: vzeroupper			; AVX1-SLOW-NEXT: vzeroupper
	; AVX1-SLOW-NEXT: retq			; AVX1-SLOW-NEXT: retq
	;			;
	; AVX1-FAST-LABEL: PR39936_v8i32:			; AVX1-FAST-LABEL: PR39936_v8i32:
	; AVX1-FAST: # %bb.0:			; AVX1-FAST: # %bb.0:
	; AVX1-FAST-NEXT: vextractf128 $1, %ymm0, %xmm1			; AVX1-FAST-NEXT: vextractf128 $1, %ymm0, %xmm1
	; AVX1-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0			; AVX1-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0
	; AVX1-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX1-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX1-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0			; AVX1-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX1-FAST-NEXT: vmovd %xmm0, %eax			; AVX1-FAST-NEXT: vmovd %xmm0, %eax
	; AVX1-FAST-NEXT: vzeroupper			; AVX1-FAST-NEXT: vzeroupper
	; AVX1-FAST-NEXT: retq			; AVX1-FAST-NEXT: retq
	;			;
	; AVX2-SLOW-LABEL: PR39936_v8i32:			; AVX2-SLOW-LABEL: PR39936_v8i32:
	; AVX2-SLOW: # %bb.0:			; AVX2-SLOW: # %bb.0:
	; AVX2-SLOW-NEXT: vextracti128 $1, %ymm0, %xmm1			; AVX2-SLOW-NEXT: vextracti128 $1, %ymm0, %xmm1
	; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0			; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0
	; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[0,2,2,3]			; AVX2-SLOW-NEXT: vphaddd %xmm0, %xmm0, %xmm0
	; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[1,3,2,3]
	; AVX2-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]			; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
	; AVX2-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0			; AVX2-SLOW-NEXT: vpaddd %xmm0, %xmm1, %xmm0
	; AVX2-SLOW-NEXT: vmovd %xmm0, %eax			; AVX2-SLOW-NEXT: vmovd %xmm0, %eax
	; AVX2-SLOW-NEXT: vzeroupper			; AVX2-SLOW-NEXT: vzeroupper
	; AVX2-SLOW-NEXT: retq			; AVX2-SLOW-NEXT: retq
	;			;
	; AVX2-FAST-LABEL: PR39936_v8i32:			; AVX2-FAST-LABEL: PR39936_v8i32:
	; AVX2-FAST: # %bb.0:			; AVX2-FAST: # %bb.0:
	Show All 19 Lines

test/CodeGen/X86/vector-shuffle-combining.ll

	Show First 20 Lines • Show All 2,695 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vpinsrd $0, %edi, %xmm0, %xmm0			; AVX-NEXT: vpinsrd $0, %edi, %xmm0, %xmm0
	; AVX-NEXT: retq			; AVX-NEXT: retq
	%a0 = insertelement <4 x i32> undef, i32 %f, i32 0			%a0 = insertelement <4 x i32> undef, i32 %f, i32 0
	%ret = shufflevector <4 x i32> %a0, <4 x i32> <i32 undef, i32 4, i32 5, i32 30>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>			%ret = shufflevector <4 x i32> %a0, <4 x i32> <i32 undef, i32 4, i32 5, i32 30>, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	ret <4 x i32> %ret			ret <4 x i32> %ret
	}			}

	define <4 x float> @PR22377(<4 x float> %a, <4 x float> %b) {			define <4 x float> @PR22377(<4 x float> %a, <4 x float> %b) {
	; SSE-LABEL: PR22377:			; SSE2-LABEL: PR22377:
	; SSE: # %bb.0: # %entry			; SSE2: # %bb.0: # %entry
	; SSE-NEXT: movaps %xmm0, %xmm1			; SSE2-NEXT: movaps %xmm0, %xmm1
	; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3],xmm0[2,3]			; SSE2-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3],xmm0[2,3]
	; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,0,2]			; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,0,2]
	; SSE-NEXT: addps %xmm0, %xmm1			; SSE2-NEXT: addps %xmm0, %xmm1
	; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]			; SSE2-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; SSE-NEXT: retq			; SSE2-NEXT: retq
				;
				; SSSE3-LABEL: PR22377:
				; SSSE3: # %bb.0: # %entry
				; SSSE3-NEXT: movaps %xmm0, %xmm1
				; SSSE3-NEXT: haddps %xmm0, %xmm1
				; SSSE3-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,1]
				; SSSE3-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
				; SSSE3-NEXT: retq
				;
				; SSE41-LABEL: PR22377:
				; SSE41: # %bb.0: # %entry
				; SSE41-NEXT: movaps %xmm0, %xmm1
				; SSE41-NEXT: haddps %xmm0, %xmm1
				; SSE41-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,1]
				; SSE41-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
				; SSE41-NEXT: retq
	;			;
	; AVX-LABEL: PR22377:			; AVX-LABEL: PR22377:
	; AVX: # %bb.0: # %entry			; AVX: # %bb.0: # %entry
	; AVX-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[1,3,2,3]			; AVX-NEXT: vhaddps %xmm0, %xmm0, %xmm1
	; AVX-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,0,2]			; AVX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,1]
	; AVX-NEXT: vaddps %xmm0, %xmm1, %xmm1			; AVX-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3]
	; AVX-NEXT: vunpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
	; AVX-NEXT: retq			; AVX-NEXT: retq
	entry:			entry:
	%s1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 1, i32 3>			%s1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 1, i32 3, i32 1, i32 3>
	%s2 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 0, i32 2, i32 0, i32 2>			%s2 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 0, i32 2, i32 0, i32 2>
	%r2 = fadd <4 x float> %s1, %s2			%r2 = fadd <4 x float> %s1, %s2
	%s3 = shufflevector <4 x float> %s2, <4 x float> %r2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>			%s3 = shufflevector <4 x float> %s2, <4 x float> %r2, <4 x i32> <i32 0, i32 4, i32 1, i32 5>
	ret <4 x float> %s3			ret <4 x float> %s3
	}			}
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines