This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix incorrect cost tree calculation.
AbandonedPublic

Authored by dtemirbulatov on Feb 3 2019, 4:10 PM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
spatel
anton-afanasyev
hfinkel

Summary

I found that during tree cost calulation, the algorithm uses tree entries that were not supposed to be vectorized and were rejected on the early stage, but we still estimating those entries during the whole tree estimation. Following change fixes this issue.
Also here is spec 2k6 data before and after this change on :
...
vendor_id : GenuineIntel
cpu family : 6
model : 94
model name : Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
stepping : 3
microcode : 0xc6
cpu MHz : 2429.650
cache size : 6144 KB
....
Before:
400.perlbench NR
401.bzip2 9650 502 19.2 S
401.bzip2 9650 481 20.1 S
401.bzip2 9650 500 19.3 *
403.gcc 8050 248 32.5 S
403.gcc 8050 244 33.0 S
403.gcc 8050 245 32.9 *
429.mcf 9120 328 27.8 *
429.mcf 9120 322 28.3 S
429.mcf 9120 331 27.5 S
445.gobmk 10490 468 22.4 *
445.gobmk 10490 475 22.1 S
445.gobmk 10490 466 22.5 S
456.hmmer 9330 349 26.7 S
456.hmmer 9330 348 26.8 S
456.hmmer 9330 349 26.7 *
458.sjeng 12100 458 26.4 S
458.sjeng 12100 588 20.6 S
458.sjeng 12100 467 25.9 *
462.libquantum 20720 269 77.1 *
462.libquantum 20720 312 66.4 S
462.libquantum 20720 267 77.7 S
464.h264ref 22130 516 42.9 *
464.h264ref 22130 516 42.9 S
464.h264ref 22130 515 43.0 S
471.omnetpp 6250 327 19.1 S
471.omnetpp 6250 330 18.9 *
471.omnetpp 6250 333 18.8 S
473.astar 7020 -- CE

483.xalancbmk 6900 -- CE

400.perlbench NR
401.bzip2 9650 500 19.3 *
403.gcc 8050 245 32.9 *
429.mcf 9120 328 27.8 *
445.gobmk 10490 468 22.4 *
456.hmmer 9330 349 26.7 *
458.sjeng 12100 467 25.9 *
462.libquantum 20720 269 77.1 *
464.h264ref 22130 516 42.9 *
471.omnetpp 6250 330 18.9 *
473.astar NR
483.xalancbmk NR

After:
400.perlbench NR
401.bzip2 9650 493 19.6 S
401.bzip2 9650 491 19.6 S
401.bzip2 9650 492 19.6 *
403.gcc 8050 254 31.7 S
403.gcc 8050 253 31.8 S
403.gcc 8050 254 31.7 *
429.mcf 9120 329 27.7 S
429.mcf 9120 328 27.8 *
429.mcf 9120 327 27.9 S
445.gobmk 10490 469 22.4 S
445.gobmk 10490 468 22.4 S
445.gobmk 10490 468 22.4 *
456.hmmer 9330 347 26.9 S
456.hmmer 9330 427 21.8 S
456.hmmer 9330 348 26.8 *
458.sjeng 12100 460 26.3 S
458.sjeng 12100 662 18.3 S
458.sjeng 12100 460 26.3 *
462.libquantum 20720 268 77.3 *
462.libquantum 20720 268 77.4 S
462.libquantum 20720 341 60.8 S
464.h264ref 22130 504 43.9 S
464.h264ref 22130 500 44.2 S
464.h264ref 22130 503 44.0 *
471.omnetpp 6250 325 19.3 *
471.omnetpp 6250 324 19.3 S
471.omnetpp 6250 328 19.1 S
473.astar 7020 -- CE

483.xalancbmk 6900 -- CE

400.perlbench NR
401.bzip2 9650 492 19.6 *
403.gcc 8050 254 31.7 *
429.mcf 9120 328 27.8 *
445.gobmk 10490 468 22.4 *
456.hmmer 9330 348 26.8 *
458.sjeng 12100 460 26.3 *
462.libquantum 20720 268 77.3 *
464.h264ref 22130 503 44.0 *
471.omnetpp 6250 325 19.3 *
473.astar NR
483.xalancbmk NR

Diff Detail

Event Timeline

dtemirbulatov created this revision.Feb 3 2019, 4:10 PM

Herald added a subscriber: javed.absar. · View Herald TranscriptFeb 3 2019, 4:10 PM

Looks ok to me (after that comment fix), but @ABataev should probably have the final say.

lib/Transforms/Vectorize/SLPVectorizer.cpp
704	Please can make this comment more explanatory.

It does not look correct to me. Seems, you're throwing away the cost of the gather nodes.

dtemirbulatov abandoned this revision.Feb 4 2019, 6:11 PM

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

8 lines

test/

Transforms/

SLPVectorizer/

AArch64/

2 lines

8 lines

249 lines

74 lines

X86/

15 lines

93 lines

38 lines

46 lines

120 lines

9 lines

21 lines

26 lines

107 lines

12 lines

13 lines

13 lines

14 lines

18 lines

21 lines

32 lines

171 lines

136 lines

171 lines

59 lines

12 lines

24 lines

15 lines

Diff 184983

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 694 Lines • ▼ Show 20 Lines	struct TreeEntry {
/// to be a pointer and needs to be able to initialize the child iterator.		/// to be a pointer and needs to be able to initialize the child iterator.
/// Thus we need a reference back to the container to translate the indices		/// Thus we need a reference back to the container to translate the indices
/// to entries.		/// to entries.
std::vector<TreeEntry> &Container;		std::vector<TreeEntry> &Container;

/// The TreeEntry index containing the user of this entry. We can actually		/// The TreeEntry index containing the user of this entry. We can actually
/// have multiple users so the data structure is not truly a tree.		/// have multiple users so the data structure is not truly a tree.
SmallVector<int, 1> UserTreeIndices;		SmallVector<int, 1> UserTreeIndices;

		/// Is this entry finally should be vectorized.
		RKSimonUnsubmitted Not Done Reply Inline Actions Please can make this comment more explanatory. RKSimon: Please can make this comment more explanatory.
		bool Vectorized = false;
};		};

/// Create a new VectorizableTree entry.		/// Create a new VectorizableTree entry.
void newTreeEntry(ArrayRef<Value *> VL, bool Vectorized, int &UserTreeIdx,		void newTreeEntry(ArrayRef<Value *> VL, bool Vectorized, int &UserTreeIdx,
ArrayRef<unsigned> ReuseShuffleIndices = None,		ArrayRef<unsigned> ReuseShuffleIndices = None,
ArrayRef<unsigned> ReorderIndices = None) {		ArrayRef<unsigned> ReorderIndices = None) {
VectorizableTree.emplace_back(VectorizableTree);		VectorizableTree.emplace_back(VectorizableTree);
int idx = VectorizableTree.size() - 1;		int idx = VectorizableTree.size() - 1;
TreeEntry *Last = &VectorizableTree[idx];		TreeEntry *Last = &VectorizableTree[idx];
Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());		Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->NeedToGather = !Vectorized;		Last->NeedToGather = !Vectorized;
Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),		Last->ReuseShuffleIndices.append(ReuseShuffleIndices.begin(),
ReuseShuffleIndices.end());		ReuseShuffleIndices.end());
Last->ReorderIndices = ReorderIndices;		Last->ReorderIndices = ReorderIndices;
		Last->Vectorized = Vectorized;
if (Vectorized) {		if (Vectorized) {
for (int i = 0, e = VL.size(); i != e; ++i) {		for (int i = 0, e = VL.size(); i != e; ++i) {
assert(!getTreeEntry(VL[i]) && "Scalar already in tree!");		assert(!getTreeEntry(VL[i]) && "Scalar already in tree!");
ScalarToTreeEntry[VL[i]] = idx;		ScalarToTreeEntry[VL[i]] = idx;
}		}
} else {		} else {
MustGather.insert(VL.begin(), VL.end());		MustGather.insert(VL.begin(), VL.end());
}		}
▲ Show 20 Lines • Show All 1,821 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = VectorizableTree.size(); I < E; ++I) {
//		//
if (TE.NeedToGather &&		if (TE.NeedToGather &&
std::any_of(std::next(VectorizableTree.begin(), I + 1),		std::any_of(std::next(VectorizableTree.begin(), I + 1),
VectorizableTree.end(), [TE](TreeEntry &Entry) {		VectorizableTree.end(), [TE](TreeEntry &Entry) {
return Entry.NeedToGather && Entry.isSame(TE.Scalars);		return Entry.NeedToGather && Entry.isSame(TE.Scalars);
}))		}))
continue;		continue;

		// Avoid calculating cost for non vectorizable enties.
		if (!TE.Vectorized)
		continue;

int C = getEntryCost(&TE);		int C = getEntryCost(&TE);
LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C		LLVM_DEBUG(dbgs() << "SLP: Adding cost " << C
<< " for bundle that starts with " << *TE.Scalars[0]		<< " for bundle that starts with " << *TE.Scalars[0]
<< ".\n");		<< ".\n");
Cost += C;		Cost += C;
}		}

SmallPtrSet<Value *, 16> ExtractCostCalculated;		SmallPtrSet<Value *, 16> ExtractCostCalculated;
▲ Show 20 Lines • Show All 3,919 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/AArch64/gather-cost.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -S -slp-vectorizer -instcombine -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -S -slp-vectorizer -instcombine -pass-remarks-output=%t \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=REMARK %s			; RUN: cat %t \| FileCheck -check-prefix=REMARK %s
	; RUN: opt < %s -S -passes='slp-vectorizer,instcombine' -pass-remarks-output=%t \| FileCheck %s			; RUN: opt < %s -S -passes='slp-vectorizer,instcombine' -pass-remarks-output=%t \| FileCheck %s
	; RUN: cat %t \| FileCheck -check-prefix=REMARK %s			; RUN: cat %t \| FileCheck -check-prefix=REMARK %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64--linux-gnu"			target triple = "aarch64--linux-gnu"

	; REMARK-LABEL: Function: gather_multiple_use			; REMARK-LABEL: Function: gather_multiple_use
	; REMARK: Args:			; REMARK: Args:
	; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '			; REMARK-NEXT: - String: 'Vectorized horizontal reduction with cost '
	; REMARK-NEXT: - Cost: '-7'			; REMARK-NEXT: - Cost: '-16'
	;			;
	define internal i32 @gather_multiple_use(i32 %a, i32 %b, i32 %c, i32 %d) {			define internal i32 @gather_multiple_use(i32 %a, i32 %b, i32 %c, i32 %d) {
	; CHECK-LABEL: @gather_multiple_use(			; CHECK-LABEL: @gather_multiple_use(
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> undef, i32 [[C:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i32> undef, i32 [[C:%.]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A:%.]], i32 1
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 2			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 2
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[D:%.]], i32 3			; CHECK-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[D:%.]], i32 3
	; CHECK-NEXT: [[TMP5:%.*]] = lshr <4 x i32> [[TMP4]], <i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: [[TMP5:%.*]] = lshr <4 x i32> [[TMP4]], <i32 15, i32 15, i32 15, i32 15>
	Show All 32 Lines

test/Transforms/SLPVectorizer/AArch64/getelementptr.ll

	Show All 23 Lines
	;			;

	; YAML: --- !Passed			; YAML: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: getelementptr_4x32			; YAML-NEXT: Function: getelementptr_4x32
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '11'			; YAML-NEXT: - Cost: '5'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '5'			; YAML-NEXT: - TreeSize: '5'

	; YAML: --- !Passed			; YAML: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: getelementptr_4x32			; YAML-NEXT: Function: getelementptr_4x32
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '16'			; YAML-NEXT: - Cost: '6'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '3'			; YAML-NEXT: - TreeSize: '3'

	define i32 @getelementptr_4x32(i32* nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {			define i32 @getelementptr_4x32(i32* nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {
	; CHECK-LABEL: @getelementptr_4x32(			; CHECK-LABEL: @getelementptr_4x32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	}			}

	; YAML: --- !Passed			; YAML: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: getelementptr_2x32			; YAML-NEXT: Function: getelementptr_2x32
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '11'			; YAML-NEXT: - Cost: '5'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '5'			; YAML-NEXT: - TreeSize: '5'

	; YAML: --- !Passed			; YAML: --- !Passed
	; YAML-NEXT: Pass: slp-vectorizer			; YAML-NEXT: Pass: slp-vectorizer
	; YAML-NEXT: Name: VectorizedList			; YAML-NEXT: Name: VectorizedList
	; YAML-NEXT: Function: getelementptr_2x32			; YAML-NEXT: Function: getelementptr_2x32
	; YAML-NEXT: Args:			; YAML-NEXT: Args:
	; YAML-NEXT: - String: 'SLP vectorized with cost '			; YAML-NEXT: - String: 'SLP vectorized with cost '
	; YAML-NEXT: - Cost: '6'			; YAML-NEXT: - Cost: '2'
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '3'			; YAML-NEXT: - TreeSize: '3'

	define i32 @getelementptr_2x32(i32* nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {			define i32 @getelementptr_2x32(i32* nocapture readonly %g, i32 %n, i32 %x, i32 %y, i32 %z) {
	; CHECK-LABEL: @getelementptr_2x32(			; CHECK-LABEL: @getelementptr_2x32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0			; CHECK-NEXT: [[CMP31:%.]] = icmp sgt i32 [[N:%.]], 0
	; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]			; CHECK-NEXT: br i1 [[CMP31]], label [[FOR_BODY_PREHEADER:%.]], label [[FOR_COND_CLEANUP:%.]]
	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/AArch64/horizontal.ll

	Show All 20 Lines

	define i32 @test_select(i32* noalias nocapture readonly %blk1, i32* noalias nocapture readonly %blk2, i32 %lx, i32 %h) {			define i32 @test_select(i32* noalias nocapture readonly %blk1, i32* noalias nocapture readonly %blk2, i32 %lx, i32 %h) {
	; CHECK-LABEL: @test_select(			; CHECK-LABEL: @test_select(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP_22:%.]] = icmp sgt i32 [[H:%.]], 0			; CHECK-NEXT: [[CMP_22:%.]] = icmp sgt i32 [[H:%.]], 0
	; CHECK-NEXT: br i1 [[CMP_22]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP_22]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END:%.]]
	; CHECK: for.body.lr.ph:			; CHECK: for.body.lr.ph:
	; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[LX:%.]] to i64			; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[LX:%.]] to i64
				; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> undef, i32* [[BLK2:%.*]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32* [[BLK1:%.*]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[S_026:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[OP_EXTRA:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x i32> [ zeroinitializer, [[FOR_BODY_LR_PH]] ], [ [[TMP20:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[J_025:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP1]], [[FOR_BODY_LR_PH]] ], [ [[TMP23:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P2_024:%.]] = phi i32 [ [[BLK2:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR29:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[P1_023:%.]] = phi i32 [ [[BLK1:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 1			; CHECK-NEXT: [[TMP6:%.]] = extractelement <2 x i32> [[TMP3]], i32 0
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 1			; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 2			; CHECK-NEXT: [[SUB:%.*]] = sub nsw i32 [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 2			; CHECK-NEXT: [[CMP2:%.*]] = icmp slt i32 [[SUB]], 0
	; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[P1_023]], i64 3			; CHECK-NEXT: [[SUB3:%.*]] = sub nsw i32 0, [[SUB]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_023]] to <4 x i32>*			; CHECK-NEXT: [[SUB3_SUB:%.*]] = select i1 [[CMP2]], i32 [[SUB3]], i32 [[SUB]]
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[P2_024]], i64 3			; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[SUB3_SUB]], [[TMP8]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_024]] to <4 x i32>*			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 1
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX4]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <4 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 1
	; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], zeroinitializer			; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[ARRAYIDX5]], align 4
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <4 x i32> zeroinitializer, [[TMP4]]			; CHECK-NEXT: [[SUB6:%.*]] = sub nsw i32 [[TMP9]], [[TMP10]]
	; CHECK-NEXT: [[TMP7:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP6]], <4 x i32> [[TMP4]]			; CHECK-NEXT: [[CMP7:%.*]] = icmp slt i32 [[SUB6]], 0
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[S_026]]			; CHECK-NEXT: [[SUB9:%.*]] = sub nsw i32 0, [[SUB6]]
	; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[ADD]], undef			; CHECK-NEXT: [[V_1:%.*]] = select i1 [[CMP7]], i32 [[SUB9]], i32 [[SUB6]]
	; CHECK-NEXT: [[ADD19:%.*]] = add nsw i32 [[ADD11]], undef			; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[ADD]], [[V_1]]
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> [[TMP7]])			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 2
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP8]], [[S_026]]			; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 [[ARRAYIDX12]], align 4
	; CHECK-NEXT: [[ADD27:%.*]] = add nsw i32 [[ADD19]], undef			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 2
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[P1_023]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[TMP12:%.]] = load i32, i32 [[ARRAYIDX13]], align 4
	; CHECK-NEXT: [[ADD_PTR29]] = getelementptr inbounds i32, i32* [[P2_024]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[SUB14:%.*]] = sub nsw i32 [[TMP11]], [[TMP12]]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_025]], 1			; CHECK-NEXT: [[CMP15:%.*]] = icmp slt i32 [[SUB14]], 0
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[H]]			; CHECK-NEXT: [[SUB17:%.*]] = sub nsw i32 0, [[SUB14]]
				; CHECK-NEXT: [[SUB17_SUB14:%.*]] = select i1 [[CMP15]], i32 [[SUB17]], i32 [[SUB14]]
				; CHECK-NEXT: [[ADD19:%.*]] = add nsw i32 [[ADD11]], [[SUB17_SUB14]]
				; CHECK-NEXT: [[ARRAYIDX20:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 3
				; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 [[ARRAYIDX20]], align 4
				; CHECK-NEXT: [[ARRAYIDX21:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 3
				; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 [[ARRAYIDX21]], align 4
				; CHECK-NEXT: [[SUB22:%.*]] = sub nsw i32 [[TMP13]], [[TMP14]]
				; CHECK-NEXT: [[CMP23:%.*]] = icmp slt i32 [[SUB22]], 0
				; CHECK-NEXT: [[SUB25:%.*]] = sub nsw i32 0, [[SUB22]]
				; CHECK-NEXT: [[V_3:%.*]] = select i1 [[CMP23]], i32 [[SUB25]], i32 [[SUB22]]
				; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 [[IDX_EXT]]
				; CHECK-NEXT: [[ADD_PTR29:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 [[IDX_EXT]]
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[ADD19]], i32 0
				; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
				; CHECK-NEXT: [[TMP17:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[TMP16]], i32 1
				; CHECK-NEXT: [[TMP18:%.*]] = insertelement <2 x i32> undef, i32 [[V_3]], i32 0
				; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x i32> [[TMP18]], i32 1, i32 1
				; CHECK-NEXT: [[TMP20]] = add nsw <2 x i32> [[TMP17]], [[TMP19]]
				; CHECK-NEXT: [[TMP21:%.*]] = extractelement <2 x i32> [[TMP20]], i32 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[TMP21]], [[H]]
				; CHECK-NEXT: [[TMP22:%.]] = insertelement <2 x i32> undef, i32* [[ADD_PTR29]], i32 0
				; CHECK-NEXT: [[TMP23]] = insertelement <2 x i32> [[TMP22]], i32 [[ADD_PTR]], i32 1
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END_LOOPEXIT:%.*]], label [[FOR_BODY]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
				; CHECK-NEXT: [[TMP24:%.*]] = extractelement <2 x i32> [[TMP20]], i32 0
	; CHECK-NEXT: br label [[FOR_END]]			; CHECK-NEXT: br label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[S_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA]], [[FOR_END_LOOPEXIT]] ]			; CHECK-NEXT: [[S_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[TMP24]], [[FOR_END_LOOPEXIT]] ]
	; CHECK-NEXT: ret i32 [[S_0_LCSSA]]			; CHECK-NEXT: ret i32 [[S_0_LCSSA]]
	;			;
	entry:			entry:
	%cmp.22 = icmp sgt i32 %h, 0			%cmp.22 = icmp sgt i32 %h, 0
	br i1 %cmp.22, label %for.body.lr.ph, label %for.end			br i1 %cmp.22, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	%idx.ext = sext i32 %lx to i64			%idx.ext = sext i32 %lx to i64
	▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; YAML-NEXT: - String: ' and with tree size '			; YAML-NEXT: - String: ' and with tree size '
	; YAML-NEXT: - TreeSize: '3'			; YAML-NEXT: - TreeSize: '3'
	; CHECK-LABEL: @reduction_with_br(			; CHECK-LABEL: @reduction_with_br(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP_16:%.]] = icmp sgt i32 [[H:%.]], 0			; CHECK-NEXT: [[CMP_16:%.]] = icmp sgt i32 [[H:%.]], 0
	; CHECK-NEXT: br i1 [[CMP_16]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP_16]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END:%.]]
	; CHECK: for.body.lr.ph:			; CHECK: for.body.lr.ph:
	; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[LX:%.]] to i64			; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[LX:%.]] to i64
				; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> undef, i32* [[BLK2:%.*]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> [[TMP0]], i32* [[BLK1:%.*]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[S_020:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[OP_EXTRA:%.]], [[IF_END:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x i32> [ zeroinitializer, [[FOR_BODY_LR_PH]] ], [ [[TMP27:%.]], [[IF_END:%.*]] ]
	; CHECK-NEXT: [[J_019:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[IF_END]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP1]], [[FOR_BODY_LR_PH]] ], [ [[TMP29:%.*]], [[IF_END]] ]
	; CHECK-NEXT: [[P2_018:%.]] = phi i32 [ [[BLK2:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR16:%.]], [[IF_END]] ]			; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[P1_017:%.]] = phi i32 [ [[BLK1:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.]], [[IF_END]] ]			; CHECK-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP4]], align 4
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 1			; CHECK-NEXT: [[TMP6:%.]] = extractelement <2 x i32> [[TMP3]], i32 0
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 1			; CHECK-NEXT: [[TMP7:%.]] = load i32, i32 [[TMP6]], align 4
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 2			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 2			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 1
	; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[P1_017]], i64 3			; CHECK-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[P1_017]] to <4 x i32>*			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load i32, i32 [[ARRAYIDX3]], align 4
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i32, i32 [[P2_018]], i64 3			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[P2_018]] to <4 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = load i32, i32 [[ARRAYIDX6]], align 4
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i32>, <4 x i32> [[TMP2]], align 4			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 2
	; CHECK-NEXT: [[TMP4:%.*]] = mul nsw <4 x i32> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP12:%.]] = load i32, i32 [[ARRAYIDX7]], align 4
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[S_020]]			; CHECK-NEXT: [[ARRAYIDX10:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 3
				; CHECK-NEXT: [[TMP13:%.]] = load i32, i32 [[ARRAYIDX10]], align 4
				; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 3
				; CHECK-NEXT: [[TMP14:%.]] = load i32, i32 [[ARRAYIDX11]], align 4
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x i32> undef, i32 [[TMP7]], i32 0
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x i32> [[TMP15]], i32 [[TMP10]], i32 1
				; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x i32> [[TMP16]], i32 [[TMP12]], i32 2
				; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x i32> [[TMP17]], i32 [[TMP14]], i32 3
				; CHECK-NEXT: [[TMP19:%.*]] = insertelement <4 x i32> undef, i32 [[TMP5]], i32 0
				; CHECK-NEXT: [[TMP20:%.*]] = insertelement <4 x i32> [[TMP19]], i32 [[TMP9]], i32 1
				; CHECK-NEXT: [[TMP21:%.*]] = insertelement <4 x i32> [[TMP20]], i32 [[TMP11]], i32 2
				; CHECK-NEXT: [[TMP22:%.*]] = insertelement <4 x i32> [[TMP21]], i32 [[TMP13]], i32 3
				; CHECK-NEXT: [[TMP23:%.*]] = mul nsw <4 x i32> [[TMP18]], [[TMP22]]
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[TMP8]]
	; CHECK-NEXT: [[ADD5:%.*]] = add nsw i32 [[ADD]], undef			; CHECK-NEXT: [[ADD5:%.*]] = add nsw i32 [[ADD]], undef
	; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[ADD5]], undef			; CHECK-NEXT: [[ADD9:%.*]] = add nsw i32 [[ADD5]], undef
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP24:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v4i32(<4 x i32> [[TMP23]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP5]], [[S_020]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nsw i32 [[TMP24]], [[TMP8]]
	; CHECK-NEXT: [[ADD13:%.*]] = add nsw i32 [[ADD9]], undef			; CHECK-NEXT: [[ADD13:%.*]] = add nsw i32 [[ADD9]], undef
	; CHECK-NEXT: [[CMP14:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]			; CHECK-NEXT: [[CMP14:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]
	; CHECK-NEXT: br i1 [[CMP14]], label [[IF_END]], label [[FOR_END_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP14]], label [[IF_END]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i32, i32* [[P1_017]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i32, i32 [[TMP4]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[ADD_PTR16]] = getelementptr inbounds i32, i32* [[P2_018]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR16:%.]] = getelementptr inbounds i32, i32 [[TMP6]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_019]], 1			; CHECK-NEXT: [[TMP25:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
				; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[TMP25]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]
				; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x i32> undef, i32 [[OP_EXTRA]], i32 0
				; CHECK-NEXT: [[TMP27]] = insertelement <2 x i32> [[TMP26]], i32 [[INC]], i32 1
				; CHECK-NEXT: [[TMP28:%.]] = insertelement <2 x i32> undef, i32* [[ADD_PTR16]], i32 0
				; CHECK-NEXT: [[TMP29]] = insertelement <2 x i32> [[TMP28]], i32 [[ADD_PTR]], i32 1
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	; CHECK-NEXT: br label [[FOR_END]]			; CHECK-NEXT: br label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[S_1:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA]], [[FOR_END_LOOPEXIT]] ]			; CHECK-NEXT: [[S_1:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA]], [[FOR_END_LOOPEXIT]] ]
	; CHECK-NEXT: ret i32 [[S_1]]			; CHECK-NEXT: ret i32 [[S_1]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines

	define i32 @test_unrolled_select(i8* noalias nocapture readonly %blk1, i8* noalias nocapture readonly %blk2, i32 %lx, i32 %h, i32 %lim) #0 {			define i32 @test_unrolled_select(i8* noalias nocapture readonly %blk1, i8* noalias nocapture readonly %blk2, i32 %lx, i32 %h, i32 %lim) #0 {
	; CHECK-LABEL: @test_unrolled_select(			; CHECK-LABEL: @test_unrolled_select(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CMP_43:%.]] = icmp sgt i32 [[H:%.]], 0			; CHECK-NEXT: [[CMP_43:%.]] = icmp sgt i32 [[H:%.]], 0
	; CHECK-NEXT: br i1 [[CMP_43]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP_43]], label [[FOR_BODY_LR_PH:%.]], label [[FOR_END:%.]]
	; CHECK: for.body.lr.ph:			; CHECK: for.body.lr.ph:
	; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[LX:%.]] to i64			; CHECK-NEXT: [[IDX_EXT:%.]] = sext i32 [[LX:%.]] to i64
				; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i8> undef, i8* [[BLK2:%.*]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i8> [[TMP0]], i8* [[BLK1:%.*]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[S_047:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[OP_EXTRA:%.]], [[IF_END_86:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x i32> [ zeroinitializer, [[FOR_BODY_LR_PH]] ], [ [[TMP48:%.]], [[IF_END_86:%.*]] ]
	; CHECK-NEXT: [[J_046:%.]] = phi i32 [ 0, [[FOR_BODY_LR_PH]] ], [ [[INC:%.]], [[IF_END_86]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i8> [ [[TMP1]], [[FOR_BODY_LR_PH]] ], [ [[TMP50:%.*]], [[IF_END_86]] ]
	; CHECK-NEXT: [[P2_045:%.]] = phi i8 [ [[BLK2:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR88:%.]], [[IF_END_86]] ]			; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i8> [[TMP3]], i32 1
	; CHECK-NEXT: [[P1_044:%.]] = phi i8 [ [[BLK1:%.]], [[FOR_BODY_LR_PH]] ], [ [[ADD_PTR:%.]], [[IF_END_86]] ]			; CHECK-NEXT: [[TMP5:%.]] = load i8, i8 [[TMP4]], align 1
	; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 1			; CHECK-NEXT: [[TMP6:%.]] = extractelement <2 x i8> [[TMP3]], i32 0
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 1			; CHECK-NEXT: [[TMP7:%.]] = load i8, i8 [[TMP6]], align 1
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 2			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 2			; CHECK-NEXT: [[ARRAYIDX6:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 1
	; CHECK-NEXT: [[ARRAYIDX28:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 3			; CHECK-NEXT: [[TMP9:%.]] = load i8, i8 [[ARRAYIDX6]], align 1
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 3			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 1
	; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 4			; CHECK-NEXT: [[TMP10:%.]] = load i8, i8 [[ARRAYIDX8]], align 1
	; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 4			; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 2
	; CHECK-NEXT: [[ARRAYIDX50:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 5			; CHECK-NEXT: [[TMP11:%.]] = load i8, i8 [[ARRAYIDX17]], align 1
	; CHECK-NEXT: [[ARRAYIDX52:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 5			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 2
	; CHECK-NEXT: [[ARRAYIDX61:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 6			; CHECK-NEXT: [[TMP12:%.]] = load i8, i8 [[ARRAYIDX19]], align 1
	; CHECK-NEXT: [[ARRAYIDX63:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 6			; CHECK-NEXT: [[ARRAYIDX28:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 3
	; CHECK-NEXT: [[ARRAYIDX72:%.]] = getelementptr inbounds i8, i8 [[P1_044]], i64 7			; CHECK-NEXT: [[TMP13:%.]] = load i8, i8 [[ARRAYIDX28]], align 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i8 [[P1_044]] to <8 x i8>*			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 3
	; CHECK-NEXT: [[TMP1:%.]] = load <8 x i8>, <8 x i8> [[TMP0]], align 1			; CHECK-NEXT: [[TMP14:%.]] = load i8, i8 [[ARRAYIDX30]], align 1
	; CHECK-NEXT: [[TMP2:%.*]] = zext <8 x i8> [[TMP1]] to <8 x i32>			; CHECK-NEXT: [[ARRAYIDX39:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 4
	; CHECK-NEXT: [[ARRAYIDX74:%.]] = getelementptr inbounds i8, i8 [[P2_045]], i64 7			; CHECK-NEXT: [[TMP15:%.]] = load i8, i8 [[ARRAYIDX39]], align 1
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i8 [[P2_045]] to <8 x i8>*			; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 4
	; CHECK-NEXT: [[TMP4:%.]] = load <8 x i8>, <8 x i8> [[TMP3]], align 1			; CHECK-NEXT: [[TMP16:%.]] = load i8, i8 [[ARRAYIDX41]], align 1
	; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i8> [[TMP4]] to <8 x i32>			; CHECK-NEXT: [[ARRAYIDX50:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 5
	; CHECK-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; CHECK-NEXT: [[TMP17:%.]] = load i8, i8 [[ARRAYIDX50]], align 1
	; CHECK-NEXT: [[TMP7:%.*]] = icmp slt <8 x i32> [[TMP6]], zeroinitializer			; CHECK-NEXT: [[ARRAYIDX52:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 5
	; CHECK-NEXT: [[TMP8:%.*]] = sub nsw <8 x i32> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[TMP18:%.]] = load i8, i8 [[ARRAYIDX52]], align 1
	; CHECK-NEXT: [[TMP9:%.*]] = select <8 x i1> [[TMP7]], <8 x i32> [[TMP8]], <8 x i32> [[TMP6]]			; CHECK-NEXT: [[ARRAYIDX61:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 6
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[S_047]]			; CHECK-NEXT: [[TMP19:%.]] = load i8, i8 [[ARRAYIDX61]], align 1
				; CHECK-NEXT: [[ARRAYIDX63:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 6
				; CHECK-NEXT: [[TMP20:%.]] = load i8, i8 [[ARRAYIDX63]], align 1
				; CHECK-NEXT: [[ARRAYIDX72:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 7
				; CHECK-NEXT: [[TMP21:%.]] = load i8, i8 [[ARRAYIDX72]], align 1
				; CHECK-NEXT: [[TMP22:%.*]] = insertelement <8 x i8> undef, i8 [[TMP5]], i32 0
				; CHECK-NEXT: [[TMP23:%.*]] = insertelement <8 x i8> [[TMP22]], i8 [[TMP9]], i32 1
				; CHECK-NEXT: [[TMP24:%.*]] = insertelement <8 x i8> [[TMP23]], i8 [[TMP11]], i32 2
				; CHECK-NEXT: [[TMP25:%.*]] = insertelement <8 x i8> [[TMP24]], i8 [[TMP13]], i32 3
				; CHECK-NEXT: [[TMP26:%.*]] = insertelement <8 x i8> [[TMP25]], i8 [[TMP15]], i32 4
				; CHECK-NEXT: [[TMP27:%.*]] = insertelement <8 x i8> [[TMP26]], i8 [[TMP17]], i32 5
				; CHECK-NEXT: [[TMP28:%.*]] = insertelement <8 x i8> [[TMP27]], i8 [[TMP19]], i32 6
				; CHECK-NEXT: [[TMP29:%.*]] = insertelement <8 x i8> [[TMP28]], i8 [[TMP21]], i32 7
				; CHECK-NEXT: [[TMP30:%.*]] = zext <8 x i8> [[TMP29]] to <8 x i32>
				; CHECK-NEXT: [[ARRAYIDX74:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 7
				; CHECK-NEXT: [[TMP31:%.]] = load i8, i8 [[ARRAYIDX74]], align 1
				; CHECK-NEXT: [[TMP32:%.*]] = insertelement <8 x i8> undef, i8 [[TMP7]], i32 0
				; CHECK-NEXT: [[TMP33:%.*]] = insertelement <8 x i8> [[TMP32]], i8 [[TMP10]], i32 1
				; CHECK-NEXT: [[TMP34:%.*]] = insertelement <8 x i8> [[TMP33]], i8 [[TMP12]], i32 2
				; CHECK-NEXT: [[TMP35:%.*]] = insertelement <8 x i8> [[TMP34]], i8 [[TMP14]], i32 3
				; CHECK-NEXT: [[TMP36:%.*]] = insertelement <8 x i8> [[TMP35]], i8 [[TMP16]], i32 4
				; CHECK-NEXT: [[TMP37:%.*]] = insertelement <8 x i8> [[TMP36]], i8 [[TMP18]], i32 5
				; CHECK-NEXT: [[TMP38:%.*]] = insertelement <8 x i8> [[TMP37]], i8 [[TMP20]], i32 6
				; CHECK-NEXT: [[TMP39:%.*]] = insertelement <8 x i8> [[TMP38]], i8 [[TMP31]], i32 7
				; CHECK-NEXT: [[TMP40:%.*]] = zext <8 x i8> [[TMP39]] to <8 x i32>
				; CHECK-NEXT: [[TMP41:%.*]] = sub nsw <8 x i32> [[TMP30]], [[TMP40]]
				; CHECK-NEXT: [[TMP42:%.*]] = icmp slt <8 x i32> [[TMP41]], zeroinitializer
				; CHECK-NEXT: [[TMP43:%.*]] = sub nsw <8 x i32> zeroinitializer, [[TMP41]]
				; CHECK-NEXT: [[TMP44:%.*]] = select <8 x i1> [[TMP42]], <8 x i32> [[TMP43]], <8 x i32> [[TMP41]]
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 undef, [[TMP8]]
	; CHECK-NEXT: [[ADD16:%.*]] = add nsw i32 [[ADD]], undef			; CHECK-NEXT: [[ADD16:%.*]] = add nsw i32 [[ADD]], undef
	; CHECK-NEXT: [[ADD27:%.*]] = add nsw i32 [[ADD16]], undef			; CHECK-NEXT: [[ADD27:%.*]] = add nsw i32 [[ADD16]], undef
	; CHECK-NEXT: [[ADD38:%.*]] = add nsw i32 [[ADD27]], undef			; CHECK-NEXT: [[ADD38:%.*]] = add nsw i32 [[ADD27]], undef
	; CHECK-NEXT: [[ADD49:%.*]] = add nsw i32 [[ADD38]], undef			; CHECK-NEXT: [[ADD49:%.*]] = add nsw i32 [[ADD38]], undef
	; CHECK-NEXT: [[ADD60:%.*]] = add nsw i32 [[ADD49]], undef			; CHECK-NEXT: [[ADD60:%.*]] = add nsw i32 [[ADD49]], undef
	; CHECK-NEXT: [[ADD71:%.*]] = add nsw i32 [[ADD60]], undef			; CHECK-NEXT: [[ADD71:%.*]] = add nsw i32 [[ADD60]], undef
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP9]])			; CHECK-NEXT: [[TMP45:%.*]] = call i32 @llvm.experimental.vector.reduce.add.i32.v8i32(<8 x i32> [[TMP44]])
	; CHECK-NEXT: [[OP_EXTRA]] = add nsw i32 [[TMP10]], [[S_047]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add nsw i32 [[TMP45]], [[TMP8]]
	; CHECK-NEXT: [[ADD82:%.*]] = add nsw i32 [[ADD71]], undef			; CHECK-NEXT: [[ADD82:%.*]] = add nsw i32 [[ADD71]], undef
	; CHECK-NEXT: [[CMP83:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]			; CHECK-NEXT: [[CMP83:%.]] = icmp slt i32 [[OP_EXTRA]], [[LIM:%.]]
	; CHECK-NEXT: br i1 [[CMP83]], label [[IF_END_86]], label [[FOR_END_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 [[CMP83]], label [[IF_END_86]], label [[FOR_END_LOOPEXIT:%.*]]
	; CHECK: if.end.86:			; CHECK: if.end.86:
	; CHECK-NEXT: [[ADD_PTR]] = getelementptr inbounds i8, i8* [[P1_044]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i8, i8 [[TMP4]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[ADD_PTR88]] = getelementptr inbounds i8, i8* [[P2_045]], i64 [[IDX_EXT]]			; CHECK-NEXT: [[ADD_PTR88:%.]] = getelementptr inbounds i8, i8 [[TMP6]], i64 [[IDX_EXT]]
	; CHECK-NEXT: [[INC]] = add nuw nsw i32 [[J_046]], 1			; CHECK-NEXT: [[TMP46:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
				; CHECK-NEXT: [[INC:%.*]] = add nuw nsw i32 [[TMP46]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[H]]
				; CHECK-NEXT: [[TMP47:%.*]] = insertelement <2 x i32> undef, i32 [[OP_EXTRA]], i32 0
				; CHECK-NEXT: [[TMP48]] = insertelement <2 x i32> [[TMP47]], i32 [[INC]], i32 1
				; CHECK-NEXT: [[TMP49:%.]] = insertelement <2 x i8> undef, i8* [[ADD_PTR88]], i32 0
				; CHECK-NEXT: [[TMP50]] = insertelement <2 x i8> [[TMP49]], i8 [[ADD_PTR]], i32 1
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END_LOOPEXIT]]
	; CHECK: for.end.loopexit:			; CHECK: for.end.loopexit:
	; CHECK-NEXT: br label [[FOR_END]]			; CHECK-NEXT: br label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[S_1:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA]], [[FOR_END_LOOPEXIT]] ]			; CHECK-NEXT: [[S_1:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[OP_EXTRA]], [[FOR_END_LOOPEXIT]] ]
	; CHECK-NEXT: ret i32 [[S_1]]			; CHECK-NEXT: ret i32 [[S_1]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/AArch64/transpose.ll

Show First 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.0, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.1, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_1(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_1(		; CHECK-LABEL: @build_vec_v4i32_reuse_1(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> undef, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[V0]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[V1]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> undef, i32 [[TMP0_0]], i32 0		; CHECK-NEXT: [[TMP9:%.*]] = xor <2 x i32> [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> undef, i32 [[TMP0_1]], i32 0		; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> undef, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <4 x i32> <i32 0, i32 0, i32 3, i32 undef>
; CHECK-NEXT: [[TMP1_2:%.*]] = sub i32 [[TMP0_2]], [[TMP0_3]]		; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <4 x i32> [[TMP11]], <4 x i32> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 5>
; CHECK-NEXT: [[TMP1_3:%.*]] = sub i32 [[TMP0_3]], [[TMP0_2]]		; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <4 x i32> <i32 0, i32 0, i32 3, i32 undef>
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0		; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <4 x i32> [[TMP13]], <4 x i32> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 2, i32 5>
; CHECK-NEXT: [[TMP2_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP4]], i32 0		; CHECK-NEXT: [[TMP15:%.*]] = sub <4 x i32> [[TMP12]], [[TMP14]]
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0		; CHECK-NEXT: ret <4 x i32> [[TMP15]]
; CHECK-NEXT: [[TMP2_1:%.*]] = insertelement <4 x i32> [[TMP2_0]], i32 [[TMP5]], i32 1
; CHECK-NEXT: [[TMP2_2:%.*]] = insertelement <4 x i32> [[TMP2_1]], i32 [[TMP1_2]], i32 2
; CHECK-NEXT: [[TMP2_3:%.*]] = insertelement <4 x i32> [[TMP2_2]], i32 [[TMP1_3]], i32 3
; CHECK-NEXT: ret <4 x i32> [[TMP2_3]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
%tmp0.3 = xor i32 %v0.1, %v1.1		%tmp0.3 = xor i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %tmp0.0, %tmp0.1		%tmp1.0 = sub i32 %tmp0.0, %tmp0.1
%tmp1.1 = sub i32 %tmp0.0, %tmp0.1		%tmp1.1 = sub i32 %tmp0.0, %tmp0.1
%tmp1.2 = sub i32 %tmp0.2, %tmp0.3		%tmp1.2 = sub i32 %tmp0.2, %tmp0.3
%tmp1.3 = sub i32 %tmp0.3, %tmp0.2		%tmp1.3 = sub i32 %tmp0.3, %tmp0.2
%tmp2.0 = insertelement <4 x i32> undef, i32 %tmp1.0, i32 0		%tmp2.0 = insertelement <4 x i32> undef, i32 %tmp1.0, i32 0
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0
; CHECK-NEXT: [[TMP4:%.*]] = mul <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[V0]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x i32> [[V1]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP0_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[TMP6]], [[TMP7]]		; CHECK-NEXT: [[TMP0_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP9:%.*]] = mul <2 x i32> [[TMP6]], [[TMP7]]		; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP11:%.*]] = xor <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1_2:%.*]] = xor i32 [[V0_0]], [[V1_0]]
; CHECK-NEXT: [[TMP12:%.*]] = xor <2 x i32> [[TMP6]], [[TMP7]]		; CHECK-NEXT: [[TMP1_3:%.*]] = xor i32 [[V0_1]], [[V1_1]]
; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i32> [[TMP5]], [[TMP10]]		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> undef, i32 [[TMP0_0]], i32 0
; CHECK-NEXT: [[TMP14:%.*]] = add <2 x i32> [[TMP11]], [[TMP12]]		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[TMP1_0]], i32 1
; CHECK-NEXT: [[TMP3_3:%.*]] = shufflevector <2 x i32> [[TMP13]], <2 x i32> [[TMP14]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[TMP0_2]], i32 2
; CHECK-NEXT: ret <4 x i32> [[TMP3_3]]		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x i32> [[TMP3]], i32 [[TMP1_2]], i32 3
		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> undef, i32 [[TMP0_1]], i32 0
		; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[TMP1_1]], i32 1
		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[TMP0_3]], i32 2
		; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[TMP1_3]], i32 3
		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]
		; CHECK-NEXT: ret <4 x i32> [[TMP9]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/PR36280.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 \| FileCheck %s

	define float @jacobi(float* %p, float %x, float %y, float %z) {			define float @jacobi(float* %p, float %x, float %y, float %z) {
	; CHECK-LABEL: @jacobi(			; CHECK-LABEL: @jacobi(
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr float, float [[P:%.*]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr float, float [[P:%.*]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr float, float [[P]], i64 2			; CHECK-NEXT: [[GEP2:%.]] = getelementptr float, float [[P]], i64 2
	; CHECK-NEXT: [[P1:%.]] = load float, float [[GEP1]]			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*
	; CHECK-NEXT: [[P2:%.]] = load float, float [[GEP2]]			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[MUL1:%.]] = fmul float [[P1]], [[X:%.]]			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x float> undef, float [[X:%.]], i32 0
	; CHECK-NEXT: [[MUL2:%.]] = fmul float [[P2]], [[Y:%.]]			; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x float> [[TMP3]], float [[Y:%.]], i32 1
	; CHECK-NEXT: [[ADD1:%.]] = fadd float [[MUL1]], [[Z:%.]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[ADD2:%.*]] = fadd float [[MUL2]], [[ADD1]]			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
				; CHECK-NEXT: [[ADD1:%.]] = fadd float [[TMP6]], [[Z:%.]]
				; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
				; CHECK-NEXT: [[ADD2:%.*]] = fadd float [[TMP7]], [[ADD1]]
	; CHECK-NEXT: ret float [[ADD2]]			; CHECK-NEXT: ret float [[ADD2]]
	;			;
	%gep1 = getelementptr float, float* %p, i64 1			%gep1 = getelementptr float, float* %p, i64 1
	%gep2 = getelementptr float, float* %p, i64 2			%gep2 = getelementptr float, float* %p, i64 2
	%p1 = load float, float* %gep1			%p1 = load float, float* %gep1
	%p2 = load float, float* %gep2			%p2 = load float, float* %gep2
	%mul1 = fmul float %p1, %x			%mul1 = fmul float %p1, %x
	%mul2 = fmul float %p2, %y			%mul2 = fmul float %p2, %y
	%add1 = fadd float %mul1, %z			%add1 = fadd float %mul1, %z
	%add2 = fadd float %mul2, %add1			%add2 = fadd float %mul2, %add1
	ret float %add2			ret float %add2
	}			}

test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefixes=ALL,CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 \| FileCheck %s --check-prefixes=ALL,CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefixes=ALL,FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefixes=ALL,FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> undef, i32 [[TMP0:%.]], i32 0
				; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 12529, i32 1
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP15:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP19:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[VAL_0:%.*]] = add i32 [[TMP4]], 0
	; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>, [[SHUFFLE]]			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[VAL_1:%.*]] = and i32 [[TMP2]], undef			; CHECK-NEXT: [[VAL_1:%.*]] = and i32 [[TMP5]], [[VAL_0]]
	; CHECK-NEXT: [[VAL_2:%.]] = and i32 [[VAL_1]], [[TMP0:%.]]			; CHECK-NEXT: [[VAL_2:%.*]] = and i32 [[VAL_1]], [[TMP0]]
	; CHECK-NEXT: [[VAL_3:%.*]] = and i32 [[VAL_2]], [[TMP0]]			; CHECK-NEXT: [[VAL_3:%.*]] = and i32 [[VAL_2]], [[TMP0]]
	; CHECK-NEXT: [[VAL_4:%.*]] = and i32 [[VAL_3]], [[TMP0]]			; CHECK-NEXT: [[VAL_4:%.*]] = and i32 [[VAL_3]], [[TMP0]]
	; CHECK-NEXT: [[VAL_5:%.*]] = and i32 [[VAL_4]], [[TMP0]]			; CHECK-NEXT: [[VAL_5:%.*]] = and i32 [[VAL_4]], [[TMP0]]
	; CHECK-NEXT: [[VAL_7:%.*]] = and i32 [[VAL_5]], undef			; CHECK-NEXT: [[VAL_6:%.*]] = add i32 [[TMP5]], 55
				; CHECK-NEXT: [[VAL_7:%.*]] = and i32 [[VAL_5]], [[VAL_6]]
	; CHECK-NEXT: [[VAL_8:%.*]] = and i32 [[VAL_7]], [[TMP0]]			; CHECK-NEXT: [[VAL_8:%.*]] = and i32 [[VAL_7]], [[TMP0]]
	; CHECK-NEXT: [[VAL_9:%.*]] = and i32 [[VAL_8]], [[TMP0]]			; CHECK-NEXT: [[VAL_9:%.*]] = and i32 [[VAL_8]], [[TMP0]]
	; CHECK-NEXT: [[VAL_10:%.*]] = and i32 [[VAL_9]], [[TMP0]]			; CHECK-NEXT: [[VAL_10:%.*]] = and i32 [[VAL_9]], [[TMP0]]
	; CHECK-NEXT: [[VAL_12:%.*]] = and i32 [[VAL_10]], undef			; CHECK-NEXT: [[VAL_11:%.*]] = add i32 [[TMP5]], 285
				; CHECK-NEXT: [[VAL_12:%.*]] = and i32 [[VAL_10]], [[VAL_11]]
	; CHECK-NEXT: [[VAL_13:%.*]] = and i32 [[VAL_12]], [[TMP0]]			; CHECK-NEXT: [[VAL_13:%.*]] = and i32 [[VAL_12]], [[TMP0]]
	; CHECK-NEXT: [[VAL_14:%.*]] = and i32 [[VAL_13]], [[TMP0]]			; CHECK-NEXT: [[VAL_14:%.*]] = and i32 [[VAL_13]], [[TMP0]]
	; CHECK-NEXT: [[VAL_15:%.*]] = and i32 [[VAL_14]], [[TMP0]]			; CHECK-NEXT: [[VAL_15:%.*]] = and i32 [[VAL_14]], [[TMP0]]
	; CHECK-NEXT: [[VAL_16:%.*]] = and i32 [[VAL_15]], [[TMP0]]			; CHECK-NEXT: [[VAL_16:%.*]] = and i32 [[VAL_15]], [[TMP0]]
	; CHECK-NEXT: [[VAL_17:%.*]] = and i32 [[VAL_16]], [[TMP0]]			; CHECK-NEXT: [[VAL_17:%.*]] = and i32 [[VAL_16]], [[TMP0]]
	; CHECK-NEXT: [[VAL_19:%.*]] = and i32 [[VAL_17]], undef			; CHECK-NEXT: [[VAL_18:%.*]] = add i32 [[TMP5]], 1240
	; CHECK-NEXT: [[VAL_21:%.*]] = and i32 [[VAL_19]], undef			; CHECK-NEXT: [[VAL_19:%.*]] = and i32 [[VAL_17]], [[VAL_18]]
				; CHECK-NEXT: [[VAL_20:%.*]] = add i32 [[TMP5]], 1496
				; CHECK-NEXT: [[VAL_21:%.*]] = and i32 [[VAL_19]], [[VAL_20]]
	; CHECK-NEXT: [[VAL_22:%.*]] = and i32 [[VAL_21]], [[TMP0]]			; CHECK-NEXT: [[VAL_22:%.*]] = and i32 [[VAL_21]], [[TMP0]]
	; CHECK-NEXT: [[VAL_23:%.*]] = and i32 [[VAL_22]], [[TMP0]]			; CHECK-NEXT: [[VAL_23:%.*]] = and i32 [[VAL_22]], [[TMP0]]
	; CHECK-NEXT: [[VAL_24:%.*]] = and i32 [[VAL_23]], [[TMP0]]			; CHECK-NEXT: [[VAL_24:%.*]] = and i32 [[VAL_23]], [[TMP0]]
	; CHECK-NEXT: [[VAL_25:%.*]] = and i32 [[VAL_24]], [[TMP0]]			; CHECK-NEXT: [[VAL_25:%.*]] = and i32 [[VAL_24]], [[TMP0]]
	; CHECK-NEXT: [[VAL_26:%.*]] = and i32 [[VAL_25]], [[TMP0]]			; CHECK-NEXT: [[VAL_26:%.*]] = and i32 [[VAL_25]], [[TMP0]]
	; CHECK-NEXT: [[VAL_27:%.*]] = and i32 [[VAL_26]], [[TMP0]]			; CHECK-NEXT: [[VAL_27:%.*]] = and i32 [[VAL_26]], [[TMP0]]
	; CHECK-NEXT: [[VAL_28:%.*]] = and i32 [[VAL_27]], [[TMP0]]			; CHECK-NEXT: [[VAL_28:%.*]] = and i32 [[VAL_27]], [[TMP0]]
	; CHECK-NEXT: [[VAL_29:%.*]] = and i32 [[VAL_28]], [[TMP0]]			; CHECK-NEXT: [[VAL_29:%.*]] = and i32 [[VAL_28]], [[TMP0]]
	; CHECK-NEXT: [[VAL_30:%.*]] = and i32 [[VAL_29]], [[TMP0]]			; CHECK-NEXT: [[VAL_30:%.*]] = and i32 [[VAL_29]], [[TMP0]]
	; CHECK-NEXT: [[VAL_31:%.*]] = and i32 [[VAL_30]], [[TMP0]]			; CHECK-NEXT: [[VAL_31:%.*]] = and i32 [[VAL_30]], [[TMP0]]
	; CHECK-NEXT: [[VAL_32:%.*]] = and i32 [[VAL_31]], [[TMP0]]			; CHECK-NEXT: [[VAL_32:%.*]] = and i32 [[VAL_31]], [[TMP0]]
	; CHECK-NEXT: [[VAL_33:%.*]] = and i32 [[VAL_32]], [[TMP0]]			; CHECK-NEXT: [[VAL_33:%.*]] = and i32 [[VAL_32]], [[TMP0]]
	; CHECK-NEXT: [[VAL_35:%.*]] = and i32 [[VAL_33]], undef			; CHECK-NEXT: [[VAL_34:%.*]] = add i32 [[TMP5]], 8555
				; CHECK-NEXT: [[VAL_35:%.*]] = and i32 [[VAL_33]], [[VAL_34]]
	; CHECK-NEXT: [[VAL_36:%.*]] = and i32 [[VAL_35]], [[TMP0]]			; CHECK-NEXT: [[VAL_36:%.*]] = and i32 [[VAL_35]], [[TMP0]]
	; CHECK-NEXT: [[VAL_37:%.*]] = and i32 [[VAL_36]], [[TMP0]]			; CHECK-NEXT: [[VAL_37:%.*]] = and i32 [[VAL_36]], [[TMP0]]
	; CHECK-NEXT: [[VAL_38:%.*]] = and i32 [[VAL_37]], [[TMP0]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_37]], i32 0
	; CHECK-NEXT: [[VAL_40:%.*]] = and i32 [[VAL_38]], undef			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[TMP5]], i32 1
	; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = and <2 x i32> [[TMP7]], [[TMP2]]
	; CHECK-NEXT: [[BIN_RDX:%.*]] = and <8 x i32> [[TMP3]], [[RDX_SHUF]]			; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP7]], [[TMP2]]
	; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <8 x i32> [[BIN_RDX]], <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[BIN_RDX2:%.*]] = and <8 x i32> [[BIN_RDX]], [[RDX_SHUF1]]			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i32> [[TMP10]], i32 0
	; CHECK-NEXT: [[RDX_SHUF3:%.*]] = shufflevector <8 x i32> [[BIN_RDX2]], <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP10]], i32 1
	; CHECK-NEXT: [[BIN_RDX4:%.*]] = and <8 x i32> [[BIN_RDX2]], [[RDX_SHUF3]]			; CHECK-NEXT: [[VAL_40:%.*]] = and i32 [[TMP11]], [[TMP12]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[BIN_RDX4]], i32 0			; CHECK-NEXT: [[VAL_41:%.*]] = add i32 [[TMP5]], 13685
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP4]], [[TMP0]]			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_40]], i32 0
	; CHECK-NEXT: [[OP_EXTRA5:%.*]] = and i32 [[OP_EXTRA]], [[TMP0]]			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP5]], i32 1
	; CHECK-NEXT: [[OP_EXTRA6:%.*]] = and i32 [[OP_EXTRA5]], [[TMP0]]			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[VAL_41]], i32 0
	; CHECK-NEXT: [[OP_EXTRA7:%.*]] = and i32 [[OP_EXTRA6]], [[TMP0]]			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 14910, i32 1
	; CHECK-NEXT: [[OP_EXTRA8:%.*]] = and i32 [[OP_EXTRA7]], [[TMP0]]			; CHECK-NEXT: [[TMP17:%.*]] = and <2 x i32> [[TMP14]], [[TMP16]]
	; CHECK-NEXT: [[OP_EXTRA9:%.*]] = and i32 [[OP_EXTRA8]], [[TMP0]]			; CHECK-NEXT: [[TMP18:%.*]] = add <2 x i32> [[TMP14]], [[TMP16]]
	; CHECK-NEXT: [[OP_EXTRA10:%.*]] = and i32 [[OP_EXTRA9]], [[TMP0]]			; CHECK-NEXT: [[TMP19]] = shufflevector <2 x i32> [[TMP17]], <2 x i32> [[TMP18]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[OP_EXTRA11:%.*]] = and i32 [[OP_EXTRA10]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA12:%.*]] = and i32 [[OP_EXTRA11]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA13:%.*]] = and i32 [[OP_EXTRA12]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA14:%.*]] = and i32 [[OP_EXTRA13]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA15:%.*]] = and i32 [[OP_EXTRA14]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA16:%.*]] = and i32 [[OP_EXTRA15]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA17:%.*]] = and i32 [[OP_EXTRA16]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA18:%.*]] = and i32 [[OP_EXTRA17]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA19:%.*]] = and i32 [[OP_EXTRA18]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA20:%.*]] = and i32 [[OP_EXTRA19]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA21:%.*]] = and i32 [[OP_EXTRA20]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA22:%.*]] = and i32 [[OP_EXTRA21]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA23:%.*]] = and i32 [[OP_EXTRA22]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA24:%.*]] = and i32 [[OP_EXTRA23]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA25:%.*]] = and i32 [[OP_EXTRA24]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA26:%.*]] = and i32 [[OP_EXTRA25]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA27:%.*]] = and i32 [[OP_EXTRA26]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA28:%.*]] = and i32 [[OP_EXTRA27]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA29:%.*]] = and i32 [[OP_EXTRA28]], [[TMP0]]
	; CHECK-NEXT: [[OP_EXTRA30:%.*]] = and i32 [[OP_EXTRA29]], [[TMP0]]
	; CHECK-NEXT: [[VAL_42:%.*]] = and i32 [[VAL_40]], undef
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> undef, i32 [[OP_EXTRA30]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> [[TMP7]], i32 14910, i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = and <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i32> [[TMP11]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x i32> undef, i32 [[TMP12]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x i32> [[TMP11]], i32 1
	; CHECK-NEXT: [[TMP15]] = insertelement <2 x i32> [[TMP13]], i32 [[TMP14]], i32 1
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]			; FORCE_REDUCTION-NEXT: br label [[LOOP:%.*]]
	; FORCE_REDUCTION: loop:			; FORCE_REDUCTION: loop:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; FORCE_REDUCTION-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/addsub.ll

	Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines
	; fc[1] = fa[1]-fb[1];			; fc[1] = fa[1]-fb[1];
	; fc[2] = fa[2]+fb[2];			; fc[2] = fa[2]+fb[2];
	; fc[3] = fb[3]-fa[3];			; fc[3] = fb[3]-fa[3];
	; In the above code we can swap the 1st and 2nd operation as fadd is commutative			; In the above code we can swap the 1st and 2nd operation as fadd is commutative
	; but not 2nd or 4th as fsub is not commutative.			; but not 2nd or 4th as fsub is not commutative.

	define void @no_vec_shuff_reorder() #0 {			define void @no_vec_shuff_reorder() #0 {
	; CHECK-LABEL: @no_vec_shuff_reorder(			; CHECK-LABEL: @no_vec_shuff_reorder(
	; CHECK-NEXT: [[TMP1:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> bitcast ([4 x float]* @fa to <2 x float>*), align 4
	; CHECK-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> bitcast ([4 x float]* @fb to <2 x float>*), align 4
	; CHECK-NEXT: [[TMP3:%.*]] = fadd float [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 2), align 4
	; CHECK-NEXT: store float [[TMP3]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 0), align 4			; CHECK-NEXT: [[TMP4:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 2), align 4
	; CHECK-NEXT: [[TMP4:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4			; CHECK-NEXT: [[TMP5:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 3), align 4
	; CHECK-NEXT: [[TMP5:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4			; CHECK-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 3), align 4
	; CHECK-NEXT: [[TMP6:%.*]] = fsub float [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; CHECK-NEXT: store float [[TMP6]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 1), align 4			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> undef, float [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP7:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 2), align 4			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP8:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 2), align 4			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fadd float [[TMP7]], [[TMP8]]			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x float> [[TMP10]], float [[TMP4]], i32 2
	; CHECK-NEXT: store float [[TMP9]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 2), align 4			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x float> [[TMP11]], float [[TMP5]], i32 3
	; CHECK-NEXT: [[TMP10:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 3), align 4			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP11:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 3), align 4			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> undef, float [[TMP13]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = fsub float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; CHECK-NEXT: store float [[TMP12]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 3), align 4			; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP15]], i32 1
				; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP3]], i32 2
				; CHECK-NEXT: [[TMP18:%.*]] = insertelement <4 x float> [[TMP17]], float [[TMP6]], i32 3
				; CHECK-NEXT: [[TMP19:%.*]] = fadd <4 x float> [[TMP12]], [[TMP18]]
				; CHECK-NEXT: [[TMP20:%.*]] = fsub <4 x float> [[TMP12]], [[TMP18]]
				; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <4 x float> [[TMP19]], <4 x float> [[TMP20]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
				; CHECK-NEXT: store <4 x float> [[TMP21]], <4 x float>* bitcast ([4 x float]* @fc to <4 x float>*), align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4			%1 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4
	%2 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4			%2 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4
	%3 = fadd float %1, %2			%3 = fadd float %1, %2
	store float %3, float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 0), align 4			store float %3, float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 0), align 4
	%4 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4			%4 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4
	%5 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4			%5 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4
	Show All 16 Lines

test/Transforms/SLPVectorizer/X86/alternate-fp.ll

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	define <8 x float> @fmul_fdiv_v8f32(<8 x float> %a, <8 x float> %b) {			define <8 x float> @fmul_fdiv_v8f32(<8 x float> %a, <8 x float> %b) {
	; SSE-LABEL: @fmul_fdiv_v8f32(			; SSE-LABEL: @fmul_fdiv_v8f32(
	; SSE-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]			; SSE-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
	; SSE-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]			; SSE-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
	; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> [[TMP1]], <8 x i32> <i32 8, i32 1, i32 2, i32 11, i32 12, i32 5, i32 6, i32 15>			; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> [[TMP1]], <8 x i32> <i32 8, i32 1, i32 2, i32 11, i32 12, i32 5, i32 6, i32 15>
	; SSE-NEXT: ret <8 x float> [[R7]]			; SSE-NEXT: ret <8 x float> [[R7]]
	;			;
	; SLM-LABEL: @fmul_fdiv_v8f32(			; SLM-LABEL: @fmul_fdiv_v8f32(
	; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; SLM-NEXT: [[A0:%.]] = extractelement <8 x float> [[A:%.]], i32 0
	; SLM-NEXT: [[TMP2:%.]] = shufflevector <8 x float> [[B:%.]], <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			; SLM-NEXT: [[A1:%.*]] = extractelement <8 x float> [[A]], i32 1
	; SLM-NEXT: [[TMP3:%.*]] = fmul <4 x float> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[A2:%.*]] = extractelement <8 x float> [[A]], i32 2
	; SLM-NEXT: [[TMP4:%.*]] = fdiv <4 x float> [[TMP1]], [[TMP2]]			; SLM-NEXT: [[A3:%.*]] = extractelement <8 x float> [[A]], i32 3
	; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[A4:%.*]] = extractelement <8 x float> [[A]], i32 4
	; SLM-NEXT: [[TMP6:%.*]] = shufflevector <8 x float> [[B]], <8 x float> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>			; SLM-NEXT: [[A5:%.*]] = extractelement <8 x float> [[A]], i32 5
	; SLM-NEXT: [[TMP7:%.*]] = fmul <4 x float> [[TMP5]], [[TMP6]]			; SLM-NEXT: [[A6:%.*]] = extractelement <8 x float> [[A]], i32 6
	; SLM-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[A7:%.*]] = extractelement <8 x float> [[A]], i32 7
	; SLM-NEXT: [[TMP9:%.*]] = fdiv <4 x float> [[TMP5]], [[TMP6]]			; SLM-NEXT: [[B0:%.]] = extractelement <8 x float> [[B:%.]], i32 0
	; SLM-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> undef, <8 x i32> <i32 undef, i32 1, i32 2, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[B1:%.*]] = extractelement <8 x float> [[B]], i32 1
	; SLM-NEXT: [[R3:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> [[TMP3]], <8 x i32> <i32 4, i32 1, i32 2, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[B2:%.*]] = extractelement <8 x float> [[B]], i32 2
	; SLM-NEXT: [[R4:%.*]] = shufflevector <8 x float> [[R3]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 undef, i32 undef, i32 undef>			; SLM-NEXT: [[B3:%.*]] = extractelement <8 x float> [[B]], i32 3
	; SLM-NEXT: [[R6:%.*]] = shufflevector <8 x float> [[R4]], <8 x float> [[TMP10]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 9, i32 10, i32 undef>			; SLM-NEXT: [[B4:%.*]] = extractelement <8 x float> [[B]], i32 4
	; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[R6]], <8 x float> [[TMP8]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 11>			; SLM-NEXT: [[B5:%.*]] = extractelement <8 x float> [[B]], i32 5
				; SLM-NEXT: [[B6:%.*]] = extractelement <8 x float> [[B]], i32 6
				; SLM-NEXT: [[B7:%.*]] = extractelement <8 x float> [[B]], i32 7
				; SLM-NEXT: [[AB0:%.*]] = fmul float [[A0]], [[B0]]
				; SLM-NEXT: [[AB1:%.*]] = fdiv float [[A1]], [[B1]]
				; SLM-NEXT: [[AB2:%.*]] = fdiv float [[A2]], [[B2]]
				; SLM-NEXT: [[AB3:%.*]] = fmul float [[A3]], [[B3]]
				; SLM-NEXT: [[AB4:%.*]] = fmul float [[A4]], [[B4]]
				; SLM-NEXT: [[AB5:%.*]] = fdiv float [[A5]], [[B5]]
				; SLM-NEXT: [[AB6:%.*]] = fdiv float [[A6]], [[B6]]
				; SLM-NEXT: [[AB7:%.*]] = fmul float [[A7]], [[B7]]
				; SLM-NEXT: [[R0:%.*]] = insertelement <8 x float> undef, float [[AB0]], i32 0
				; SLM-NEXT: [[R1:%.*]] = insertelement <8 x float> [[R0]], float [[AB1]], i32 1
				; SLM-NEXT: [[R2:%.*]] = insertelement <8 x float> [[R1]], float [[AB2]], i32 2
				; SLM-NEXT: [[R3:%.*]] = insertelement <8 x float> [[R2]], float [[AB3]], i32 3
				; SLM-NEXT: [[R4:%.*]] = insertelement <8 x float> [[R3]], float [[AB4]], i32 4
				; SLM-NEXT: [[R5:%.*]] = insertelement <8 x float> [[R4]], float [[AB5]], i32 5
				; SLM-NEXT: [[R6:%.*]] = insertelement <8 x float> [[R5]], float [[AB6]], i32 6
				; SLM-NEXT: [[R7:%.*]] = insertelement <8 x float> [[R6]], float [[AB7]], i32 7
	; SLM-NEXT: ret <8 x float> [[R7]]			; SLM-NEXT: ret <8 x float> [[R7]]
	;			;
	; AVX-LABEL: @fmul_fdiv_v8f32(			; AVX-LABEL: @fmul_fdiv_v8f32(
	; AVX-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]			; AVX-NEXT: [[TMP1:%.]] = fmul <8 x float> [[A:%.]], [[B:%.*]]
	; AVX-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]			; AVX-NEXT: [[TMP2:%.*]] = fdiv <8 x float> [[A]], [[B]]
	; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> [[TMP1]], <8 x i32> <i32 8, i32 1, i32 2, i32 11, i32 12, i32 5, i32 6, i32 15>			; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x float> [[TMP2]], <8 x float> [[TMP1]], <8 x i32> <i32 8, i32 1, i32 2, i32 11, i32 12, i32 5, i32 6, i32 15>
	; AVX-NEXT: ret <8 x float> [[R7]]			; AVX-NEXT: ret <8 x float> [[R7]]
	;			;
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/alternate-int.ll

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	}			}

	define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {			define <8 x i32> @ashr_shl_v8i32(<8 x i32> %a, <8 x i32> %b) {
	; SSE-LABEL: @ashr_shl_v8i32(			; SSE-LABEL: @ashr_shl_v8i32(
	; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0			; SSE-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
	; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1			; SSE-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
	; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2			; SSE-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
	; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3			; SSE-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
				; SSE-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
				; SSE-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
				; SSE-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
				; SSE-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
	; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0			; SSE-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
	; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1			; SSE-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
	; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2			; SSE-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
	; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3			; SSE-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
				; SSE-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
				; SSE-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
				; SSE-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
				; SSE-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
	; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]			; SSE-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
	; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]			; SSE-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
	; SSE-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]			; SSE-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]
	; SSE-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]			; SSE-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]
	; SSE-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[A]], [[B]]			; SSE-NEXT: [[AB4:%.*]] = shl i32 [[A4]], [[B4]]
				; SSE-NEXT: [[AB5:%.*]] = shl i32 [[A5]], [[B5]]
				; SSE-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
				; SSE-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
	; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0			; SSE-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
	; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1			; SSE-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
	; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2			; SSE-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
	; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3			; SSE-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
	; SSE-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>			; SSE-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
				; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
				; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
				; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
	; SSE-NEXT: ret <8 x i32> [[R7]]			; SSE-NEXT: ret <8 x i32> [[R7]]
	;			;
	; SLM-LABEL: @ashr_shl_v8i32(			; SLM-LABEL: @ashr_shl_v8i32(
	; SLM-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]			; SLM-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
	; SLM-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]			; SLM-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
	; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>			; SLM-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
				; SLM-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
				; SLM-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
				; SLM-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
				; SLM-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
				; SLM-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
				; SLM-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
				; SLM-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
				; SLM-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]
				; SLM-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]
				; SLM-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[A]], [[B]]
				; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
				; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
				; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
				; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
				; SLM-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; SLM-NEXT: ret <8 x i32> [[R7]]			; SLM-NEXT: ret <8 x i32> [[R7]]
	;			;
	; AVX-LABEL: @ashr_shl_v8i32(			; AVX1-LABEL: @ashr_shl_v8i32(
	; AVX-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]			; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]			; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
	; AVX-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>			; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
	; AVX-NEXT: ret <8 x i32> [[R7]]			; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
				; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
				; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
				; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
				; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
				; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
				; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
				; AVX1-NEXT: [[AB2:%.*]] = ashr i32 [[A2]], [[B2]]
				; AVX1-NEXT: [[AB3:%.*]] = ashr i32 [[A3]], [[B3]]
				; AVX1-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[A]], [[B]]
				; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
				; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
				; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
				; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
				; AVX1-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[R3]], <8 x i32> [[TMP1]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
				; AVX1-NEXT: ret <8 x i32> [[R7]]
				;
				; AVX2-LABEL: @ashr_shl_v8i32(
				; AVX2-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
				; AVX2-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
				; AVX2-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
				; AVX2-NEXT: ret <8 x i32> [[R7]]
	;			;
	; AVX512-LABEL: @ashr_shl_v8i32(			; AVX512-LABEL: @ashr_shl_v8i32(
	; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]			; AVX512-NEXT: [[TMP1:%.]] = ashr <8 x i32> [[A:%.]], [[B:%.*]]
	; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]			; AVX512-NEXT: [[TMP2:%.*]] = shl <8 x i32> [[A]], [[B]]
	; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>			; AVX512-NEXT: [[R7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> [[TMP2]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 13, i32 14, i32 15>
	; AVX512-NEXT: ret <8 x i32> [[R7]]			; AVX512-NEXT: ret <8 x i32> [[R7]]
	;			;
	%a0 = extractelement <8 x i32> %a, i32 0			%a0 = extractelement <8 x i32> %a, i32 0
	▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
	; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5			; SSE-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
	; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6			; SSE-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
	; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7			; SSE-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
	; SSE-NEXT: ret <8 x i32> [[R7]]			; SSE-NEXT: ret <8 x i32> [[R7]]
	;			;
	; SLM-LABEL: @ashr_lshr_shl_v8i32(			; SLM-LABEL: @ashr_lshr_shl_v8i32(
	; SLM-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0			; SLM-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
	; SLM-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1			; SLM-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
				; SLM-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
				; SLM-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
				; SLM-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
				; SLM-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
	; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6			; SLM-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
	; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7			; SLM-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
	; SLM-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0			; SLM-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
	; SLM-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1			; SLM-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
				; SLM-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
				; SLM-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
				; SLM-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
				; SLM-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
	; SLM-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6			; SLM-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
	; SLM-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7			; SLM-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
	; SLM-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]			; SLM-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
	; SLM-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]			; SLM-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
	; SLM-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]			; SLM-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
				; SLM-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
				; SLM-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
				; SLM-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
	; SLM-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]			; SLM-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
	; SLM-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]			; SLM-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
	; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0			; SLM-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
	; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1			; SLM-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
	; SLM-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2			; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
	; SLM-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2			; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
	; SLM-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3			; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
	; SLM-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3			; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
	; SLM-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
	; SLM-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
	; SLM-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
	; SLM-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
	; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6			; SLM-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
	; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7			; SLM-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
	; SLM-NEXT: ret <8 x i32> [[R7]]			; SLM-NEXT: ret <8 x i32> [[R7]]
	;			;
	; AVX1-LABEL: @ashr_lshr_shl_v8i32(			; AVX1-LABEL: @ashr_lshr_shl_v8i32(
	; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0			; AVX1-NEXT: [[A0:%.]] = extractelement <8 x i32> [[A:%.]], i32 0
	; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1			; AVX1-NEXT: [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
				; AVX1-NEXT: [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
				; AVX1-NEXT: [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
				; AVX1-NEXT: [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
				; AVX1-NEXT: [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
	; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6			; AVX1-NEXT: [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
	; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7			; AVX1-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
	; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0			; AVX1-NEXT: [[B0:%.]] = extractelement <8 x i32> [[B:%.]], i32 0
	; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1			; AVX1-NEXT: [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
				; AVX1-NEXT: [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
				; AVX1-NEXT: [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
				; AVX1-NEXT: [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
				; AVX1-NEXT: [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
	; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6			; AVX1-NEXT: [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
	; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7			; AVX1-NEXT: [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
	; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]			; AVX1-NEXT: [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
	; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]			; AVX1-NEXT: [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
	; AVX1-NEXT: [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]			; AVX1-NEXT: [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
				; AVX1-NEXT: [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
				; AVX1-NEXT: [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
				; AVX1-NEXT: [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
	; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]			; AVX1-NEXT: [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
	; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]			; AVX1-NEXT: [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
	; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0			; AVX1-NEXT: [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
	; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1			; AVX1-NEXT: [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
	; AVX1-NEXT: [[TMP2:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2			; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
	; AVX1-NEXT: [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP2]], i32 2			; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
	; AVX1-NEXT: [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3			; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
	; AVX1-NEXT: [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP3]], i32 3			; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
	; AVX1-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 4
	; AVX1-NEXT: [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP4]], i32 4
	; AVX1-NEXT: [[TMP5:%.*]] = extractelement <8 x i32> [[TMP1]], i32 5
	; AVX1-NEXT: [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP5]], i32 5
	; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6			; AVX1-NEXT: [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
	; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7			; AVX1-NEXT: [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
	; AVX1-NEXT: ret <8 x i32> [[R7]]			; AVX1-NEXT: ret <8 x i32> [[R7]]
	;			;
	; AVX2-LABEL: @ashr_lshr_shl_v8i32(			; AVX2-LABEL: @ashr_lshr_shl_v8i32(
	; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6			; AVX2-NEXT: [[A6:%.]] = extractelement <8 x i32> [[A:%.]], i32 6
	; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7			; AVX2-NEXT: [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
	; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6			; AVX2-NEXT: [[B6:%.]] = extractelement <8 x i32> [[B:%.]], i32 6
	▲ Show 20 Lines • Show All 167 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/bad_types.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -S -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -S -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @test1(x86_mmx %a, x86_mmx %b, i64* %ptr) {			define void @test1(x86_mmx %a, x86_mmx %b, i64* %ptr) {
	; Ensure we can handle x86_mmx values which are primitive and can be bitcast			; Ensure we can handle x86_mmx values which are primitive and can be bitcast
	; with integer types but can't be put into a vector.			; with integer types but can't be put into a vector.
	;			;
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A_CAST:%.]] = bitcast x86_mmx [[A:%.]] to i64			; CHECK-NEXT: [[A_CAST:%.]] = bitcast x86_mmx [[A:%.]] to i64
	; CHECK-NEXT: [[B_CAST:%.]] = bitcast x86_mmx [[B:%.]] to i64			; CHECK-NEXT: [[B_CAST:%.]] = bitcast x86_mmx [[B:%.]] to i64
	; CHECK-NEXT: [[A_AND:%.*]] = and i64 [[A_CAST]], 42			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i64> undef, i64 [[A_CAST]], i32 0
	; CHECK-NEXT: [[B_AND:%.*]] = and i64 [[B_CAST]], 42			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> [[TMP0]], i64 [[B_CAST]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = and <2 x i64> <i64 42, i64 42>, [[TMP1]]
	; CHECK-NEXT: [[GEP:%.]] = getelementptr i64, i64 [[PTR:%.*]], i32 1			; CHECK-NEXT: [[GEP:%.]] = getelementptr i64, i64 [[PTR:%.*]], i32 1
	; CHECK-NEXT: store i64 [[A_AND]], i64* [[PTR]]			; CHECK-NEXT: [[TMP3:%.]] = bitcast i64 [[PTR]] to <2 x i64>*
	; CHECK-NEXT: store i64 [[B_AND]], i64* [[GEP]]			; CHECK-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* [[TMP3]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a.cast = bitcast x86_mmx %a to i64			%a.cast = bitcast x86_mmx %a to i64
	%b.cast = bitcast x86_mmx %b to i64			%b.cast = bitcast x86_mmx %b to i64
	%a.and = and i64 %a.cast, 42			%a.and = and i64 %a.cast, 42
	%b.and = and i64 %b.cast, 42			%b.and = and i64 %b.cast, 42
	%gep = getelementptr i64, i64* %ptr, i32 1			%gep = getelementptr i64, i64* %ptr, i32 1
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	;
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

define i8 @k_bb(<4 x i8> %x) {		define i8 @k_bb(<4 x i8> %x) {
; CHECK-LABEL: @k_bb(		; CHECK-LABEL: @k_bb(
; CHECK-NEXT: [[X0:%.]] = extractelement <4 x i8> [[X:%.]], i32 0
; CHECK-NEXT: br label [[BB1:%.*]]		; CHECK-NEXT: br label [[BB1:%.*]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[X3:%.*]] = extractelement <4 x i8> [[X]], i32 3		; CHECK-NEXT: [[TMP1:%.]] = mul <4 x i8> [[X:%.]], [[X]]
; CHECK-NEXT: [[X1:%.*]] = extractelement <4 x i8> [[X]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i8> [[TMP1]], <4 x i8> undef, <2 x i32> <i32 0, i32 1>
; CHECK-NEXT: [[X2:%.*]] = extractelement <4 x i8> [[X]], i32 2		; CHECK-NEXT: [[TMP3:%.*]] = mul <4 x i8> [[X]], [[X]]
; CHECK-NEXT: [[X0X0:%.*]] = mul i8 [[X0]], [[X0]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i8> [[TMP3]], <4 x i8> undef, <2 x i32> <i32 3, i32 2>
; CHECK-NEXT: [[X3X3:%.*]] = mul i8 [[X3]], [[X3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i8> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[X1X1:%.*]] = mul i8 [[X1]], [[X1]]		; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x i8> [[TMP5]], i32 0
; CHECK-NEXT: [[X2X2:%.*]] = mul i8 [[X2]], [[X2]]		; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i8> [[TMP5]], i32 1
; CHECK-NEXT: [[TMP1:%.*]] = add i8 [[X0X0]], [[X3X3]]		; CHECK-NEXT: [[TMP8:%.*]] = sdiv i8 [[TMP6]], [[TMP7]]
; CHECK-NEXT: [[TMP2:%.*]] = add i8 [[X1X1]], [[X2X2]]		; CHECK-NEXT: ret i8 [[TMP8]]
; CHECK-NEXT: [[TMP3:%.*]] = sdiv i8 [[TMP1]], [[TMP2]]
; CHECK-NEXT: ret i8 [[TMP3]]
;		;
%x0 = extractelement <4 x i8> %x, i32 0		%x0 = extractelement <4 x i8> %x, i32 0
br label %bb1		br label %bb1
bb1:		bb1:
%x3 = extractelement <4 x i8> %x, i32 3		%x3 = extractelement <4 x i8> %x, i32 3
%x1 = extractelement <4 x i8> %x, i32 1		%x1 = extractelement <4 x i8> %x, i32 1
%x2 = extractelement <4 x i8> %x, i32 2		%x2 = extractelement <4 x i8> %x, i32 2
%x0x0 = mul i8 %x0, %x0		%x0x0 = mul i8 %x0, %x0
%x3x3 = mul i8 %x3, %x3		%x3x3 = mul i8 %x3, %x3
%x1x1 = mul i8 %x1, %x1		%x1x1 = mul i8 %x1, %x1
%x2x2 = mul i8 %x2, %x2		%x2x2 = mul i8 %x2, %x2
%1 = add i8 %x0x0, %x3x3		%1 = add i8 %x0x0, %x3x3
%2 = add i8 %x1x1, %x2x2		%2 = add i8 %x1x1, %x2x2
%3 = sdiv i8 %1, %2		%3 = sdiv i8 %1, %2
ret i8 %3		ret i8 %3
}		}

test/Transforms/SLPVectorizer/X86/crash_binaryop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-darwin13.3.0"			target triple = "x86_64-apple-darwin13.3.0"

	@a = common global double 0.000000e+00, align 8			@a = common global double 0.000000e+00, align 8

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[INIT:%.]] = load double, double @a, align 8			; CHECK-NEXT: [[INIT:%.]] = load double, double @a, align 8
				; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> undef, double [[INIT]], i32 0
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[INIT]], i32 1
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[PHI:%.]] = phi double [ [[ADD2:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY:%.*]] ]			; CHECK-NEXT: [[PHI:%.]] = phi double [ [[ADD2:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[POSTADD1_PHI:%.]] = phi double [ [[POSTADD1:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP10:%.]], [[LOOP]] ], [ [[TMP1]], [[ENTRY]] ]
	; CHECK-NEXT: [[POSTADD2_PHI:%.]] = phi double [ [[POSTADD2:%.]], [[LOOP]] ], [ [[INIT]], [[ENTRY]] ]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ADD1:%.*]] = fadd double [[POSTADD1_PHI]], undef			; CHECK-NEXT: [[ADD1:%.*]] = fadd double [[TMP3]], undef
	; CHECK-NEXT: [[ADD2]] = fadd double [[POSTADD2_PHI]], [[PHI]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP2]], i32 1
				; CHECK-NEXT: [[ADD2]] = fadd double [[TMP4]], [[PHI]]
	; CHECK-NEXT: [[MUL2:%.*]] = fmul double [[ADD2]], 0.000000e+00			; CHECK-NEXT: [[MUL2:%.*]] = fmul double [[ADD2]], 0.000000e+00
	; CHECK-NEXT: [[BINARYOP_B:%.*]] = fadd double [[POSTADD1_PHI]], [[MUL2]]			; CHECK-NEXT: [[BINARYOP_B:%.*]] = fadd double [[TMP3]], [[MUL2]]
	; CHECK-NEXT: [[MUL1:%.*]] = fmul double [[ADD1]], 0.000000e+00			; CHECK-NEXT: [[MUL1:%.*]] = fmul double [[ADD1]], 0.000000e+00
	; CHECK-NEXT: [[TMP:%.*]] = fadd double [[POSTADD2_PHI]], 0.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> undef, double [[MUL1]], i32 0
	; CHECK-NEXT: [[BINARY_V:%.*]] = fadd double [[MUL1]], [[BINARYOP_B]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[TMP4]], i32 1
	; CHECK-NEXT: [[POSTADD1]] = fadd double [[BINARY_V]], 0.000000e+00			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> undef, double [[BINARYOP_B]], i32 0
	; CHECK-NEXT: [[POSTADD2]] = fadd double [[TMP]], 1.000000e+00			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double 0.000000e+00, i32 1
	; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[POSTADD1]], 0.000000e+00			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP6]], [[TMP8]]
				; CHECK-NEXT: [[TMP10]] = fadd <2 x double> <double 0.000000e+00, double 1.000000e+00>, [[TMP9]]
				; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP10]], i32 0
				; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une double [[TMP11]], 0.000000e+00
	; CHECK-NEXT: br i1 [[TOBOOL]], label [[EXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[TOBOOL]], label [[EXIT:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret i32 1			; CHECK-NEXT: ret i32 1
	;			;
	entry:			entry:
	%init = load double, double* @a, align 8			%init = load double, double* @a, align 8
	br label %loop			br label %loop

	Show All 29 Lines

test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -S \| FileCheck %s -check-prefixes=CHECK,SSE			; RUN: opt < %s -basicaa -slp-vectorizer -S \| FileCheck %s -check-prefixes=CHECK,SSE
	; RUN: opt < %s -basicaa -slp-vectorizer -S -mattr=+avx \| FileCheck %s -check-prefixes=CHECK,AVX			; RUN: opt < %s -basicaa -slp-vectorizer -S -mattr=+avx \| FileCheck %s -check-prefixes=CHECK,AVX

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.10.0"			target triple = "x86_64-apple-macosx10.10.0"

	define void @testfunc(float* nocapture %dest, float* nocapture readonly %src) {			define void @testfunc(float* nocapture %dest, float* nocapture readonly %src) {
	; SSE-LABEL: @testfunc(			; SSE-LABEL: @testfunc(
	; SSE-NEXT: entry:			; SSE-NEXT: entry:
	; SSE-NEXT: br label [[FOR_BODY:%.*]]			; SSE-NEXT: br label [[FOR_BODY:%.*]]
	; SSE: for.body:			; SSE: for.body:
	; SSE-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; SSE-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; SSE-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]			; SSE-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]
	; SSE-NEXT: [[S1_055:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[COND_I40:%.]], [[FOR_BODY]] ]			; SSE-NEXT: [[TMP0:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP23:%.]], [[FOR_BODY]] ]
	; SSE-NEXT: [[S0_054:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[COND_I44:%.]], [[FOR_BODY]] ]
	; SSE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]			; SSE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]
	; SSE-NEXT: [[TMP0:%.]] = load float, float [[ARRAYIDX]], align 4			; SSE-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
	; SSE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; SSE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; SSE-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]			; SSE-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]
	; SSE-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4			; SSE-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4
	; SSE-NEXT: [[ADD:%.*]] = fadd float [[S0_054]], [[TMP0]]			; SSE-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
	; SSE-NEXT: [[ADD3:%.*]] = fadd float [[S1_055]], [[TMP0]]			; SSE-NEXT: [[TMP3:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0
	; SSE-NEXT: [[MUL:%.*]] = fmul float [[S0_054]], 0.000000e+00			; SSE-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
	; SSE-NEXT: [[ADD4:%.*]] = fadd float [[MUL]], [[ADD3]]			; SSE-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP3]], float [[TMP4]], i32 1
	; SSE-NEXT: [[MUL5:%.*]] = fmul float [[S1_055]], 0.000000e+00			; SSE-NEXT: [[TMP6:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
	; SSE-NEXT: [[ADD6:%.*]] = fadd float [[MUL5]], [[ADD]]			; SSE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[TMP1]], i32 1
	; SSE-NEXT: [[CMP_I:%.*]] = fcmp olt float [[ADD6]], 1.000000e+00			; SSE-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP5]], [[TMP7]]
	; SSE-NEXT: [[COND_I:%.*]] = select i1 [[CMP_I]], float [[ADD6]], float 1.000000e+00			; SSE-NEXT: [[TMP9:%.*]] = fmul <2 x float> zeroinitializer, [[TMP0]]
	; SSE-NEXT: [[CMP_I51:%.*]] = fcmp olt float [[COND_I]], -1.000000e+00			; SSE-NEXT: [[TMP10:%.*]] = fadd <2 x float> [[TMP9]], [[TMP8]]
	; SSE-NEXT: [[CMP_I49:%.*]] = fcmp olt float [[ADD4]], 1.000000e+00			; SSE-NEXT: [[TMP11:%.*]] = fcmp olt <2 x float> [[TMP10]], <float 1.000000e+00, float 1.000000e+00>
	; SSE-NEXT: [[COND_I50:%.*]] = select i1 [[CMP_I49]], float [[ADD4]], float 1.000000e+00			; SSE-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP10]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; SSE-NEXT: [[CMP_I47:%.*]] = fcmp olt float [[COND_I50]], -1.000000e+00			; SSE-NEXT: [[TMP13:%.*]] = fcmp olt <2 x float> [[TMP12]], <float -1.000000e+00, float -1.000000e+00>
	; SSE-NEXT: [[COND_I_OP:%.*]] = fmul float [[COND_I]], 0.000000e+00			; SSE-NEXT: [[TMP14:%.*]] = fmul <2 x float> zeroinitializer, [[TMP12]]
	; SSE-NEXT: [[MUL10:%.*]] = select i1 [[CMP_I51]], float -0.000000e+00, float [[COND_I_OP]]			; SSE-NEXT: [[TMP15:%.*]] = select <2 x i1> [[TMP13]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP14]]
	; SSE-NEXT: [[COND_I50_OP:%.*]] = fmul float [[COND_I50]], 0.000000e+00			; SSE-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0
	; SSE-NEXT: [[MUL11:%.*]] = select i1 [[CMP_I47]], float -0.000000e+00, float [[COND_I50_OP]]			; SSE-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1
	; SSE-NEXT: [[ADD13]] = fadd float [[MUL10]], [[MUL11]]			; SSE-NEXT: [[ADD13]] = fadd float [[TMP16]], [[TMP17]]
	; SSE-NEXT: [[CMP_I45:%.*]] = fcmp olt float [[ADD13]], 1.000000e+00			; SSE-NEXT: [[TMP18:%.*]] = insertelement <2 x float> undef, float [[TMP17]], i32 0
	; SSE-NEXT: [[COND_I46:%.*]] = select i1 [[CMP_I45]], float [[ADD13]], float 1.000000e+00			; SSE-NEXT: [[TMP19:%.*]] = insertelement <2 x float> [[TMP18]], float [[ADD13]], i32 1
	; SSE-NEXT: [[CMP_I43:%.*]] = fcmp olt float [[COND_I46]], -1.000000e+00			; SSE-NEXT: [[TMP20:%.*]] = fcmp olt <2 x float> [[TMP19]], <float 1.000000e+00, float 1.000000e+00>
	; SSE-NEXT: [[COND_I44]] = select i1 [[CMP_I43]], float -1.000000e+00, float [[COND_I46]]			; SSE-NEXT: [[TMP21:%.*]] = select <2 x i1> [[TMP20]], <2 x float> [[TMP19]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; SSE-NEXT: [[CMP_I41:%.*]] = fcmp olt float [[MUL11]], 1.000000e+00			; SSE-NEXT: [[TMP22:%.*]] = fcmp olt <2 x float> [[TMP21]], <float -1.000000e+00, float -1.000000e+00>
	; SSE-NEXT: [[COND_I42:%.*]] = select i1 [[CMP_I41]], float [[MUL11]], float 1.000000e+00			; SSE-NEXT: [[TMP23]] = select <2 x i1> [[TMP22]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP21]]
	; SSE-NEXT: [[CMP_I39:%.*]] = fcmp olt float [[COND_I42]], -1.000000e+00
	; SSE-NEXT: [[COND_I40]] = select i1 [[CMP_I39]], float -1.000000e+00, float [[COND_I42]]
	; SSE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32			; SSE-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
	; SSE-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; SSE-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; SSE: for.end:			; SSE: for.end:
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @testfunc(			; AVX-LABEL: @testfunc(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: br label [[FOR_BODY:%.*]]			; AVX-NEXT: br label [[FOR_BODY:%.*]]
	; AVX: for.body:			; AVX: for.body:
	; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; AVX-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; AVX-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]			; AVX-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]
	; AVX-NEXT: [[TMP0:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP23:%.]], [[FOR_BODY]] ]			; AVX-NEXT: [[TMP0:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP27:%.]], [[FOR_BODY]] ]
	; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]
	; AVX-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4			; AVX-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
	; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; AVX-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]			; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]
	; AVX-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4			; AVX-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4
	; AVX-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 1			; AVX-NEXT: [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0			; AVX-NEXT: [[ADD3:%.*]] = fadd float [[TMP2]], [[TMP1]]
	; AVX-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP0]], i32 0			; AVX-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP3]], float [[TMP4]], i32 1			; AVX-NEXT: [[MUL:%.*]] = fmul float [[TMP3]], 0.000000e+00
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0			; AVX-NEXT: [[TMP4:%.*]] = insertelement <2 x float> <float 0.000000e+00, float undef>, float [[TMP1]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[TMP1]], i32 1			; AVX-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP0]], [[TMP4]]
	; AVX-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP5]], [[TMP7]]			; AVX-NEXT: [[TMP6:%.*]] = fadd <2 x float> [[TMP0]], [[TMP4]]
	; AVX-NEXT: [[TMP9:%.*]] = fmul <2 x float> zeroinitializer, [[TMP0]]			; AVX-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> [[TMP6]], <2 x i32> <i32 0, i32 3>
	; AVX-NEXT: [[TMP10:%.*]] = fadd <2 x float> [[TMP9]], [[TMP8]]			; AVX-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
	; AVX-NEXT: [[TMP11:%.*]] = fcmp olt <2 x float> [[TMP10]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
	; AVX-NEXT: [[TMP12:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP10]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP10:%.*]] = insertelement <2 x float> undef, float [[TMP8]], i32 0
	; AVX-NEXT: [[TMP13:%.*]] = fcmp olt <2 x float> [[TMP12]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP11:%.*]] = insertelement <2 x float> [[TMP10]], float [[MUL]], i32 1
	; AVX-NEXT: [[TMP14:%.*]] = fmul <2 x float> zeroinitializer, [[TMP12]]			; AVX-NEXT: [[TMP12:%.*]] = insertelement <2 x float> undef, float [[TMP9]], i32 0
	; AVX-NEXT: [[TMP15:%.*]] = select <2 x i1> [[TMP13]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP14]]			; AVX-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[ADD3]], i32 1
	; AVX-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0			; AVX-NEXT: [[TMP14:%.*]] = fadd <2 x float> [[TMP11]], [[TMP13]]
	; AVX-NEXT: [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1			; AVX-NEXT: [[TMP15:%.*]] = fcmp olt <2 x float> [[TMP14]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[ADD13]] = fadd float [[TMP16]], [[TMP17]]			; AVX-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP15]], <2 x float> [[TMP14]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP18:%.*]] = insertelement <2 x float> undef, float [[TMP17]], i32 0			; AVX-NEXT: [[TMP17:%.*]] = fcmp olt <2 x float> [[TMP16]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP19:%.*]] = insertelement <2 x float> [[TMP18]], float [[ADD13]], i32 1			; AVX-NEXT: [[TMP18:%.*]] = fmul <2 x float> zeroinitializer, [[TMP16]]
	; AVX-NEXT: [[TMP20:%.*]] = fcmp olt <2 x float> [[TMP19]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP19:%.*]] = select <2 x i1> [[TMP17]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP18]]
	; AVX-NEXT: [[TMP21:%.*]] = select <2 x i1> [[TMP20]], <2 x float> [[TMP19]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP20:%.*]] = extractelement <2 x float> [[TMP19]], i32 0
	; AVX-NEXT: [[TMP22:%.*]] = fcmp olt <2 x float> [[TMP21]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP21:%.*]] = extractelement <2 x float> [[TMP19]], i32 1
	; AVX-NEXT: [[TMP23]] = select <2 x i1> [[TMP22]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP21]]			; AVX-NEXT: [[ADD13]] = fadd float [[TMP20]], [[TMP21]]
				; AVX-NEXT: [[TMP22:%.*]] = insertelement <2 x float> undef, float [[TMP21]], i32 0
				; AVX-NEXT: [[TMP23:%.*]] = insertelement <2 x float> [[TMP22]], float [[ADD13]], i32 1
				; AVX-NEXT: [[TMP24:%.*]] = fcmp olt <2 x float> [[TMP23]], <float 1.000000e+00, float 1.000000e+00>
				; AVX-NEXT: [[TMP25:%.*]] = select <2 x i1> [[TMP24]], <2 x float> [[TMP23]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
				; AVX-NEXT: [[TMP26:%.*]] = fcmp olt <2 x float> [[TMP25]], <float -1.000000e+00, float -1.000000e+00>
				; AVX-NEXT: [[TMP27]] = select <2 x i1> [[TMP26]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP25]]
	; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32			; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
	; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; AVX: for.end:			; AVX: for.end:
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/crash_dequeue.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"
	%"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" = type { double, double, double, double* }			%"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" = type { double, double, double, double* }

	; Function Attrs: nounwind ssp uwtable			; Function Attrs: nounwind ssp uwtable
	define void @_ZSt6uniqueISt15_Deque_iteratorIdRdPdEET_S4_S4_(%"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731"* %__first, %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731"* nocapture %__last) {			define void @_ZSt6uniqueISt15_Deque_iteratorIdRdPdEET_S4_S4_(%"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731"* %__first, %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731"* nocapture %__last) {
	; CHECK-LABEL: @_ZSt6uniqueISt15_Deque_iteratorIdRdPdEET_S4_S4_(			; CHECK-LABEL: @_ZSt6uniqueISt15_Deque_iteratorIdRdPdEET_S4_S4_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[_M_CUR2_I_I:%.]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" [[__FIRST:%.*]], i64 0, i32 0			; CHECK-NEXT: [[_M_CUR2_I_I:%.]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" [[__FIRST:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP0:%.]] = load double, double** [[_M_CUR2_I_I]], align 8			; CHECK-NEXT: [[TMP0:%.]] = load double, double** [[_M_CUR2_I_I]], align 8
	; CHECK-NEXT: [[_M_FIRST3_I_I:%.]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" [[__FIRST]], i64 0, i32 1
	; CHECK-NEXT: [[_M_CUR2_I_I81:%.]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" [[__LAST:%.*]], i64 0, i32 0			; CHECK-NEXT: [[_M_CUR2_I_I81:%.]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" [[__LAST:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP1:%.]] = load double, double** [[_M_CUR2_I_I81]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load double, double** [[_M_CUR2_I_I81]], align 8
	; CHECK-NEXT: [[_M_FIRST3_I_I83:%.]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" [[__LAST]], i64 0, i32 1			; CHECK-NEXT: [[_M_FIRST3_I_I83:%.]] = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731" [[__LAST]], i64 0, i32 1
	; CHECK-NEXT: [[TMP2:%.]] = load double, double** [[_M_FIRST3_I_I83]], align 8			; CHECK-NEXT: [[TMP2:%.]] = load double, double** [[_M_FIRST3_I_I83]], align 8
				; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> undef, double* [[TMP0]], i32 0
				; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double* [[TMP2]], i32 1
	; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT:%.]], label [[WHILE_COND_I_PREHEADER:%.]]			; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT:%.]], label [[WHILE_COND_I_PREHEADER:%.]]
	; CHECK: while.cond.i.preheader:			; CHECK: while.cond.i.preheader:
				; CHECK-NEXT: [[TMP5:%.]] = insertelement <2 x double> undef, double* [[TMP1]], i32 0
				; CHECK-NEXT: [[TMP6:%.]] = insertelement <2 x double> [[TMP5]], double* [[TMP2]], i32 1
	; CHECK-NEXT: br label [[WHILE_COND_I:%.*]]			; CHECK-NEXT: br label [[WHILE_COND_I:%.*]]
	; CHECK: while.cond.i:			; CHECK: while.cond.i:
	; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT]], label [[WHILE_BODY_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT]], label [[WHILE_BODY_I:%.*]]
	; CHECK: while.body.i:			; CHECK: while.body.i:
	; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT]], label [[WHILE_COND_I]]			; CHECK-NEXT: br i1 undef, label [[_ZST13ADJACENT_FINDIST15_DEQUE_ITERATORIDRDPDEET_S4_S4__EXIT]], label [[WHILE_COND_I]]
	; CHECK: _ZSt13adjacent_findISt15_Deque_iteratorIdRdPdEET_S4_S4_.exit:			; CHECK: _ZSt13adjacent_findISt15_Deque_iteratorIdRdPdEET_S4_S4_.exit:
	; CHECK-NEXT: [[TMP3:%.]] = phi double [ [[TMP2]], [[ENTRY:%.*]] ], [ [[TMP2]], [[WHILE_COND_I]] ], [ undef, [[WHILE_BODY_I]] ]			; CHECK-NEXT: [[TMP7:%.]] = phi <2 x double> [ [[TMP4]], [[ENTRY:%.*]] ], [ [[TMP6]], [[WHILE_COND_I]] ], [ undef, [[WHILE_BODY_I]] ]
	; CHECK-NEXT: [[TMP4:%.]] = phi double [ [[TMP0]], [[ENTRY]] ], [ [[TMP1]], [[WHILE_COND_I]] ], [ undef, [[WHILE_BODY_I]] ]			; CHECK-NEXT: [[TMP8:%.]] = bitcast double* [[_M_CUR2_I_I]] to <2 x double>
	; CHECK-NEXT: store double* [[TMP4]], double** [[_M_CUR2_I_I]], align 8			; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; CHECK-NEXT: store double* [[TMP3]], double** [[_M_FIRST3_I_I]], align 8
	; CHECK-NEXT: br i1 undef, label [[IF_THEN_I55:%.]], label [[WHILE_COND:%.]]			; CHECK-NEXT: br i1 undef, label [[IF_THEN_I55:%.]], label [[WHILE_COND:%.]]
	; CHECK: if.then.i55:			; CHECK: if.then.i55:
	; CHECK-NEXT: br label [[WHILE_COND]]			; CHECK-NEXT: br label [[WHILE_COND]]
	; CHECK: while.cond:			; CHECK: while.cond:
	; CHECK-NEXT: br label [[WHILE_COND]]			; CHECK-NEXT: br label [[WHILE_COND]]
	;			;
	entry:			entry:
	%_M_cur2.i.i = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731"* %__first, i64 0, i32 0			%_M_cur2.i.i = getelementptr inbounds %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731", %"struct.std::_Deque_iterator.4.157.174.208.259.276.344.731"* %__first, i64 0, i32 0
	Show All 30 Lines

test/Transforms/SLPVectorizer/X86/crash_flop7.ll

	Show All 19 Lines
	; CHECK: for.end80:			; CHECK: for.end80:
	; CHECK-NEXT: br i1 undef, label [[FOR_END300:%.]], label [[FOR_BODY267_LR_PH:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_END300:%.]], label [[FOR_BODY267_LR_PH:%.]]
	; CHECK: for.body267.lr.ph:			; CHECK: for.body267.lr.ph:
	; CHECK-NEXT: br label [[FOR_BODY267:%.*]]			; CHECK-NEXT: br label [[FOR_BODY267:%.*]]
	; CHECK: for.body267:			; CHECK: for.body267:
	; CHECK-NEXT: [[S_71010:%.]] = phi double [ 0.000000e+00, [[FOR_BODY267_LR_PH]] ], [ [[ADD297:%.]], [[FOR_BODY267]] ]			; CHECK-NEXT: [[S_71010:%.]] = phi double [ 0.000000e+00, [[FOR_BODY267_LR_PH]] ], [ [[ADD297:%.]], [[FOR_BODY267]] ]
	; CHECK-NEXT: [[MUL269:%.*]] = fmul double undef, undef			; CHECK-NEXT: [[MUL269:%.*]] = fmul double undef, undef
	; CHECK-NEXT: [[MUL270:%.*]] = fmul double [[MUL269]], [[MUL269]]			; CHECK-NEXT: [[MUL270:%.*]] = fmul double [[MUL269]], [[MUL269]]
	; CHECK-NEXT: [[ADD282:%.*]] = fadd double undef, undef			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> undef, double [[MUL269]], i32 0
	; CHECK-NEXT: [[MUL283:%.*]] = fmul double [[MUL269]], [[ADD282]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[MUL270]], i32 1
	; CHECK-NEXT: [[ADD293:%.*]] = fadd double undef, undef			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], undef
	; CHECK-NEXT: [[MUL294:%.*]] = fmul double [[MUL270]], [[ADD293]]			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP2]], i32 1
	; CHECK-NEXT: [[ADD295:%.*]] = fadd double undef, [[MUL294]]			; CHECK-NEXT: [[ADD295:%.*]] = fadd double undef, [[TMP3]]
	; CHECK-NEXT: [[DIV296:%.*]] = fdiv double [[MUL283]], [[ADD295]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
				; CHECK-NEXT: [[DIV296:%.*]] = fdiv double [[TMP4]], [[ADD295]]
	; CHECK-NEXT: [[ADD297]] = fadd double [[S_71010]], [[DIV296]]			; CHECK-NEXT: [[ADD297]] = fadd double [[S_71010]], [[DIV296]]
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY267]], label [[FOR_END300]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY267]], label [[FOR_END300]]
	; CHECK: for.end300:			; CHECK: for.end300:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br i1 undef, label %while.body, label %while.end			br i1 undef, label %while.body, label %while.end

	Show All 36 Lines

test/Transforms/SLPVectorizer/X86/crash_gep.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -dce -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@a = common global i64* null, align 8			@a = common global i64* null, align 8

	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8			; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8
	; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds i64, i64 [[TMP0]], i64 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> undef, i64* [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = ptrtoint i64 [[ADD_PTR]] to i64			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i64> [[TMP1]], i64* [[TMP0]], i32 1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i64, i64 [[TMP0]], i64 2			; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, <2 x i64> [[TMP2]], <2 x i64> <i64 2, i64 1>
	; CHECK-NEXT: store i64 [[TMP1]], i64* [[ARRAYIDX]], align 8			; CHECK-NEXT: [[TMP4:%.]] = ptrtoint <2 x i64> [[TMP3]] to <2 x i64>
	; CHECK-NEXT: [[TMP2:%.]] = ptrtoint i64 [[ARRAYIDX]] to i64			; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i64> [[TMP3]], i32 0
	; CHECK-NEXT: store i64 [[TMP2]], i64* [[ADD_PTR]], align 8			; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*
				; CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i64, i64* @a, align 8			%0 = load i64, i64* @a, align 8
	%add.ptr = getelementptr inbounds i64, i64* %0, i64 1			%add.ptr = getelementptr inbounds i64, i64* %0, i64 1
	%1 = ptrtoint i64* %add.ptr to i64			%1 = ptrtoint i64* %add.ptr to i64
	%arrayidx = getelementptr inbounds i64, i64* %0, i64 2			%arrayidx = getelementptr inbounds i64, i64* %0, i64 2
	store i64 %1, i64* %arrayidx, align 8			store i64 %1, i64* %arrayidx, align 8
	%2 = ptrtoint i64* %arrayidx to i64			%2 = ptrtoint i64* %arrayidx to i64
	store i64 %2, i64* %add.ptr, align 8			store i64 %2, i64* %add.ptr, align 8
	ret i32 undef			ret i32 undef
	}			}

test/Transforms/SLPVectorizer/X86/crash_lencod.ll

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	entry:
%arrayidx156 = getelementptr inbounds [13 x i16], [13 x i16]* undef, i64 0, i64 10		%arrayidx156 = getelementptr inbounds [13 x i16], [13 x i16]* undef, i64 0, i64 10
store i16 %conv153, i16* %arrayidx156, align 4		store i16 %conv153, i16* %arrayidx156, align 4
ret void		ret void
}		}

define fastcc void @dct36(double* %inbuf) {		define fastcc void @dct36(double* %inbuf) {
; CHECK-LABEL: @dct36(		; CHECK-LABEL: @dct36(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ARRAYIDX41:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 2		; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1
; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF]], i64 1
; CHECK-NEXT: [[TMP0:%.]] = load double, double [[ARRAYIDX44]], align 8		; CHECK-NEXT: [[TMP0:%.]] = load double, double [[ARRAYIDX44]], align 8
; CHECK-NEXT: [[ADD46:%.*]] = fadd double [[TMP0]], undef
; CHECK-NEXT: store double [[ADD46]], double* [[ARRAYIDX41]], align 8
; CHECK-NEXT: [[TMP1:%.]] = load double, double [[INBUF]], align 8		; CHECK-NEXT: [[TMP1:%.]] = load double, double [[INBUF]], align 8
; CHECK-NEXT: [[ADD49:%.*]] = fadd double [[TMP1]], [[TMP0]]		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> undef, double [[TMP1]], i32 0
; CHECK-NEXT: store double [[ADD49]], double* [[ARRAYIDX44]], align 8		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double undef, i32 1
		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> undef, double [[TMP0]], i32 0
		; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[TMP0]], i32 1
		; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP3]], [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*
		; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2		%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2
%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1		%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1
%0 = load double, double* %arrayidx44, align 8		%0 = load double, double* %arrayidx44, align 8
%add46 = fadd double %0, undef		%add46 = fadd double %0, undef
store double %add46, double* %arrayidx41, align 8		store double %add46, double* %arrayidx41, align 8
%1 = load double, double* %inbuf, align 8		%1 = load double, double* %inbuf, align 8
%add49 = fadd double %1, %0		%add49 = fadd double %1, %0
store double %add49, double* %arrayidx44, align 8		store double %add49, double* %arrayidx44, align 8
ret void		ret void
}		}

test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -disable-verify -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basicaa -disable-verify -slp-vectorizer -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-darwin13.3.0"			target triple = "x86_64-apple-darwin13.3.0"

	define void @_foo(double %p1, double %p2, double %p3) #0 {			define void @_foo(double %p1, double %p2, double %p3) #0 {
	; CHECK-LABEL: @_foo(			; CHECK-LABEL: @_foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB1:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16			; CHECK-NEXT: [[TAB2:%.*]] = alloca [256 x i32], align 16
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[MUL19:%.]] = fmul double [[P1:%.]], 1.638400e+04
	; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04			; CHECK-NEXT: [[MUL20:%.]] = fmul double [[P3:%.]], 1.638400e+04
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03
	; CHECK-NEXT: [[MUL21:%.]] = fmul double [[P2:%.]], 1.638400e+04			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> undef, double [[P1:%.]], i32 0
				; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[P2:%.]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.638400e+04, double 1.638400e+04>
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double 0.000000e+00, double undef>, double [[ADD]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[T_0259:%.]] = phi double [ 0.000000e+00, [[BB1]] ], [ [[ADD27:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP3]], [[BB1]] ], [ [[TMP7:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[P3_ADDR_0258:%.]] = phi double [ [[ADD]], [[BB1]] ], [ [[ADD28:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> undef, double [[T_0259]], i32 0			; CHECK-NEXT: [[VECINIT_I_I237:%.*]] = insertelement <2 x double> undef, double [[TMP5]], i32 0
	; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])			; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I237]])
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa !0			; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa !0
	; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> undef, double [[P3_ADDR_0258]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
				; CHECK-NEXT: [[VECINIT_I_I:%.*]] = insertelement <2 x double> undef, double [[TMP6]], i32 0
	; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])			; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[VECINIT_I_I]])
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa !0			; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa !0
	; CHECK-NEXT: [[ADD27]] = fadd double [[MUL19]], [[T_0259]]			; CHECK-NEXT: [[TMP7]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[ADD28]] = fadd double [[MUL21]], [[P3_ADDR_0258]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tab1 = alloca [256 x i32], align 16			%tab1 = alloca [256 x i32], align 16
	Show All 39 Lines

test/Transforms/SLPVectorizer/X86/cse.ll

	Show All 13 Lines
	define i32 @test(double* nocapture %G) {			define i32 @test(double* nocapture %G) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[G:%.*]], i64 5
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds double, double [[G]], i64 6
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> <double 4.000000e+00, double 3.000000e+00>, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> <double 4.000000e+00, double 3.000000e+00>, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> <double 1.000000e+00, double 6.000000e+00>, [[TMP2]]
	; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1			; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[G]], i64 1
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[G]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ADD8:%.*]] = fadd double [[TMP5]], 7.000000e+00
	; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2			; CHECK-NEXT: [[ARRAYIDX9:%.]] = getelementptr inbounds double, double [[G]], i64 2
	; CHECK-NEXT: store double [[ADD8]], double* [[ARRAYIDX9]], align 8			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP1]], i32 1			; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP3]], 4.000000e+00
	; CHECK-NEXT: [[MUL11:%.*]] = fmul double [[TMP6]], 4.000000e+00			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP2]], i32 0
	; CHECK-NEXT: [[ADD12:%.*]] = fadd double [[MUL11]], 8.000000e+00			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> undef, double [[TMP4]], i32 0
				; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP2]], i32 1
				; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x double> [[TMP5]], double [[TMP6]], i32 1
				; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x double> [[TMP7]], double [[TMP4]], i32 2
				; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x double> [[TMP8]], double [[MUL11]], i32 3
				; CHECK-NEXT: [[TMP10:%.*]] = fadd <4 x double> <double 1.000000e+00, double 6.000000e+00, double 7.000000e+00, double 8.000000e+00>, [[TMP9]]
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds double, double [[G]], i64 3
	; CHECK-NEXT: store double [[ADD12]], double* [[ARRAYIDX13]], align 8			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[G]] to <4 x double>*
				; CHECK-NEXT: store <4 x double> [[TMP10]], <4 x double>* [[TMP11]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %G, i64 5			%arrayidx = getelementptr inbounds double, double* %G, i64 5
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%mul = fmul double %0, 4.000000e+00			%mul = fmul double %0, 4.000000e+00
	%add = fadd double %mul, 1.000000e+00			%add = fadd double %mul, 1.000000e+00
	store double %add, double* %G, align 8			store double %add, double* %G, align 8
	▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/external_user.ll

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; PR18129			; PR18129

	define i32 @needtogather(double noalias %a, i32 noalias %b, float * noalias %c,			define i32 @needtogather(double noalias %a, i32 noalias %b, float * noalias %c,
	; CHECK-LABEL: @needtogather(			; CHECK-LABEL: @needtogather(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[D:%.*]], align 4			; CHECK-NEXT: [[TMP0:%.]] = load i32, i32 [[D:%.*]], align 4
	; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float			; CHECK-NEXT: [[CONV:%.*]] = sitofp i32 [[TMP0]] to float
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[C:%.*]]			; CHECK-NEXT: [[TMP1:%.]] = load float, float [[C:%.*]]
	; CHECK-NEXT: [[SUB:%.*]] = fsub float 0.000000e+00, [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[SUB]], 0.000000e+00			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[CONV]], [[MUL]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x float> <float 1.000000e+00, float 0.000000e+00>, [[TMP3]]
	; CHECK-NEXT: [[CONV1:%.*]] = fpext float [[ADD]] to double			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> zeroinitializer, [[TMP4]]
	; CHECK-NEXT: [[SUB3:%.*]] = fsub float 1.000000e+00, [[TMP1]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> undef, float [[CONV]], i32 0
	; CHECK-NEXT: [[MUL4:%.*]] = fmul float [[SUB3]], 0.000000e+00			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[CONV]], i32 1
	; CHECK-NEXT: [[ADD5:%.*]] = fadd float [[CONV]], [[MUL4]]			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP7]], [[TMP5]]
	; CHECK-NEXT: [[CONV6:%.*]] = fpext float [[ADD5]] to double			; CHECK-NEXT: [[TMP9:%.*]] = fpext <2 x float> [[TMP8]] to <2 x double>
	; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une float [[ADD]], 0.000000e+00			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
				; CHECK-NEXT: [[TOBOOL:%.*]] = fcmp une float [[TMP10]], 0.000000e+00
				; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP9]], i32 1
				; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x double> undef, double [[TMP11]], i32 0
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x double> [[TMP9]], i32 0
				; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP12]], double [[TMP13]], i32 1
	; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.]], label [[IF_END:%.]]			; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_THEN:%.]], label [[IF_END:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[STOREMERGE:%.]] = phi double [ [[CONV6]], [[IF_THEN]] ], [ [[CONV1]], [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP15:%.]] = phi <2 x double> [ [[TMP9]], [[IF_THEN]] ], [ [[TMP14]], [[ENTRY:%.]] ]
	; CHECK-NEXT: [[E_0:%.*]] = phi double [ [[CONV1]], [[IF_THEN]] ], [ [[CONV6]], [[ENTRY]] ]			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x double> [[TMP15]], i32 0
	; CHECK-NEXT: store double [[STOREMERGE]], double* [[A:%.*]], align 8			; CHECK-NEXT: store double [[TMP16]], double* [[A:%.*]], align 8
	; CHECK-NEXT: [[CONV7:%.*]] = fptosi double [[E_0]] to i32			; CHECK-NEXT: [[TMP17:%.*]] = extractelement <2 x double> [[TMP15]], i32 1
				; CHECK-NEXT: [[CONV7:%.*]] = fptosi double [[TMP17]] to i32
	; CHECK-NEXT: store i32 [[CONV7]], i32* [[B:%.*]], align 4			; CHECK-NEXT: store i32 [[CONV7]], i32* [[B:%.*]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	i32 * noalias %d) {			i32 * noalias %d) {
	entry:			entry:
	%0 = load i32, i32* %d, align 4			%0 = load i32, i32* %d, align 4
	%conv = sitofp i32 %0 to float			%conv = sitofp i32 %0 to float
	%1 = load float, float* %c			%1 = load float, float* %c
	Show All 22 Lines

test/Transforms/SLPVectorizer/X86/hadd.ll

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; SSE-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fadd <4 x double> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: ret <4 x double> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
Show All 27 Lines	;
%r00 = insertelement <4 x double> undef, double %r0, i32 0		%r00 = insertelement <4 x double> undef, double %r0, i32 0
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; CHECK-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; CHECK-NEXT: ret <8 x float> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x float> [[R07]]
;
; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = fadd <4 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x float> [[R07]]
;
; AVX-LABEL: @test_v8f32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x float> [[TMP3]]
;
; AVX512-LABEL: @test_v8f32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX512-NEXT: [[TMP3:%.*]] = fadd <8 x float> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 22 Lines	;
%r05 = insertelement <8 x float> %r04, float %r5, i32 5		%r05 = insertelement <8 x float> %r04, float %r5, i32 5
%r06 = insertelement <8 x float> %r05, float %r6, i32 6		%r06 = insertelement <8 x float> %r05, float %r6, i32 6
%r07 = insertelement <8 x float> %r06, float %r7, i32 7		%r07 = insertelement <8 x float> %r06, float %r7, i32 7
ret <8 x float> %r07		ret <8 x float> %r07
}		}

define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {		define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {
; SSE-LABEL: @test_v4i64(		; SSE-LABEL: @test_v4i64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; SSE-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: ret <4 x i64> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: ret <4 x i64> [[R03]]
;		;
; SLM-LABEL: @test_v4i64(		; SLM-LABEL: @test_v4i64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[A0:%.]] = extractelement <4 x i64> [[A:%.]], i32 0
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x i64> [[A]], i32 1
; SLM-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x i64> [[A]], i32 2
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x i64> [[A]], i32 3
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[B0:%.]] = extractelement <4 x i64> [[B:%.]], i32 0
; SLM-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x i64> [[B]], i32 1
; SLM-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x i64> [[B]], i32 2
		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x i64> [[B]], i32 3
		; SLM-NEXT: [[R0:%.*]] = add i64 [[A0]], [[A1]]
		; SLM-NEXT: [[R1:%.*]] = add i64 [[B0]], [[B1]]
		; SLM-NEXT: [[R2:%.*]] = add i64 [[A2]], [[A3]]
		; SLM-NEXT: [[R3:%.*]] = add i64 [[B2]], [[B3]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x i64> undef, i64 [[R0]], i32 0
		; SLM-NEXT: [[R01:%.*]] = insertelement <4 x i64> [[R00]], i64 [[R1]], i32 1
		; SLM-NEXT: [[R02:%.*]] = insertelement <4 x i64> [[R01]], i64 [[R2]], i32 2
		; SLM-NEXT: [[R03:%.*]] = insertelement <4 x i64> [[R02]], i64 [[R3]], i32 3
; SLM-NEXT: ret <4 x i64> [[R03]]		; SLM-NEXT: ret <4 x i64> [[R03]]
;		;
; AVX-LABEL: @test_v4i64(		; AVX1-LABEL: @test_v4i64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
; AVX-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x i64> [[TMP3]]		; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; AVX1-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP4]], [[TMP5]]
		; AVX1-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; AVX1-NEXT: ret <4 x i64> [[R03]]
		;
		; AVX2-LABEL: @test_v4i64(
		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX2-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX2-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
		; AVX2-NEXT: ret <4 x i64> [[TMP3]]
;		;
; AVX512-LABEL: @test_v4i64(		; AVX512-LABEL: @test_v4i64(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX512-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = add <4 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <4 x i64> [[TMP3]]		; AVX512-NEXT: ret <4 x i64> [[TMP3]]
;		;
%a0 = extractelement <4 x i64> %a, i32 0		%a0 = extractelement <4 x i64> %a, i32 0
Show All 11 Lines	;
%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0		%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0
%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1		%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1
%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2		%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2
%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3		%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3
ret <4 x i64> %r03		ret <4 x i64> %r03
}		}

define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @test_v8i32(		; CHECK-LABEL: @test_v8i32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; CHECK-NEXT: ret <8 x i32> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R07]]
;
; SLM-LABEL: @test_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R07]]
;
; AVX-LABEL: @test_v8i32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x i32> [[TMP3]]
;
; AVX512-LABEL: @test_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX512-NEXT: [[TMP3:%.*]] = add <8 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4		%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; CHECK-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SSE-NEXT: [[TMP3:%.*]] = add <8 x i16> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; CHECK-NEXT: ret <16 x i16> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SSE-NEXT: [[TMP6:%.*]] = add <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: ret <16 x i16> [[RV15]]
;
; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]
;
; AVX-LABEL: @test_v16i16(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; AVX-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <16 x i16> [[TMP3]]
;
; AVX512-LABEL: @test_v16i16(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; AVX512-NEXT: [[TMP3:%.*]] = add <16 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <16 x i16> [[TMP3]]
;		;
%a0 = extractelement <16 x i16> %a, i32 0		%a0 = extractelement <16 x i16> %a, i32 0
%a1 = extractelement <16 x i16> %a, i32 1		%a1 = extractelement <16 x i16> %a, i32 1
%a2 = extractelement <16 x i16> %a, i32 2		%a2 = extractelement <16 x i16> %a, i32 2
%a3 = extractelement <16 x i16> %a, i32 3		%a3 = extractelement <16 x i16> %a, i32 3
%a4 = extractelement <16 x i16> %a, i32 4		%a4 = extractelement <16 x i16> %a, i32 4
%a5 = extractelement <16 x i16> %a, i32 5		%a5 = extractelement <16 x i16> %a, i32 5
%a6 = extractelement <16 x i16> %a, i32 6		%a6 = extractelement <16 x i16> %a, i32 6
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/horizontal.ll

	Show First 20 Lines • Show All 724 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP1495:%.]] = icmp eq i32 [[ARG_B:%.]], 0			; CHECK-NEXT: [[CMP1495:%.]] = icmp eq i32 [[ARG_B:%.]], 0
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.cond.cleanup:			; CHECK: for.cond.cleanup:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND_CLEANUP15:%.]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND_CLEANUP15:%.]] ]
	; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP0:%.*]] = shl i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[ARRAY:%.*]], i64 [[TMP0]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[ARRAY:%.*]], i64 [[TMP0]]
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = or i64 [[TMP0]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 1			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP1]]
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP2]]			; CHECK-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 2
	; CHECK-NEXT: [[TMP3:%.]] = load float, float [[ARRAYIDX4]], align 4			; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.*]] = or i64 [[TMP0]], 2			; CHECK-NEXT: [[TMP3:%.*]] = or i64 [[TMP0]], 3
	; CHECK-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP4]]			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX8]], align 4			; CHECK-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: [[TMP6:%.*]] = or i64 [[TMP0]], 3			; CHECK-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> [[TMP4]], align 4
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX12]], align 4
	; CHECK-NEXT: br i1 [[CMP1495]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16_LR_PH:%.*]]			; CHECK-NEXT: br i1 [[CMP1495]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16_LR_PH:%.*]]
	; CHECK: for.body16.lr.ph:			; CHECK: for.body16.lr.ph:
	; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[ARG_A:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[ARG_A:%.*]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP8:%.]] = load float, float [[ADD_PTR]], align 4			; CHECK-NEXT: [[TMP6:%.]] = load float, float [[ADD_PTR]], align 4
	; CHECK-NEXT: br label [[FOR_BODY16:%.*]]			; CHECK-NEXT: br label [[FOR_BODY16:%.*]]
	; CHECK: for.cond.cleanup15:			; CHECK: for.cond.cleanup15:
	; CHECK-NEXT: [[W2_0_LCSSA:%.]] = phi float [ [[TMP5]], [[FOR_BODY]] ], [ [[SUB28:%.]], [[FOR_BODY16]] ]			; CHECK-NEXT: [[TMP7:%.]] = phi <4 x float> [ [[TMP5]], [[FOR_BODY]] ], [ [[TMP19:%.]], [[FOR_BODY16]] ]
	; CHECK-NEXT: [[W3_0_LCSSA:%.]] = phi float [ [[TMP7]], [[FOR_BODY]] ], [ [[W2_096:%.]], [[FOR_BODY16]] ]			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; CHECK-NEXT: [[W1_0_LCSSA:%.]] = phi float [ [[TMP3]], [[FOR_BODY]] ], [ [[W0_0100:%.]], [[FOR_BODY16]] ]			; CHECK-NEXT: store <4 x float> [[TMP7]], <4 x float>* [[TMP8]], align 4
	; CHECK-NEXT: [[W0_0_LCSSA:%.]] = phi float [ [[TMP1]], [[FOR_BODY]] ], [ [[SUB19:%.]], [[FOR_BODY16]] ]
	; CHECK-NEXT: store float [[W0_0_LCSSA]], float* [[ARRAYIDX]], align 4
	; CHECK-NEXT: store float [[W1_0_LCSSA]], float* [[ARRAYIDX4]], align 4
	; CHECK-NEXT: store float [[W2_0_LCSSA]], float* [[ARRAYIDX8]], align 4
	; CHECK-NEXT: store float [[W3_0_LCSSA]], float* [[ARRAYIDX12]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[EXITCOND109:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 6			; CHECK-NEXT: [[EXITCOND109:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 6
	; CHECK-NEXT: br i1 [[EXITCOND109]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND109]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	; CHECK: for.body16:			; CHECK: for.body16:
	; CHECK-NEXT: [[W0_0100]] = phi float [ [[TMP1]], [[FOR_BODY16_LR_PH]] ], [ [[SUB19]], [[FOR_BODY16]] ]
	; CHECK-NEXT: [[W1_099:%.*]] = phi float [ [[TMP3]], [[FOR_BODY16_LR_PH]] ], [ [[W0_0100]], [[FOR_BODY16]] ]
	; CHECK-NEXT: [[J_098:%.]] = phi i32 [ 0, [[FOR_BODY16_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY16]] ]			; CHECK-NEXT: [[J_098:%.]] = phi i32 [ 0, [[FOR_BODY16_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY16]] ]
	; CHECK-NEXT: [[W3_097:%.*]] = phi float [ [[TMP7]], [[FOR_BODY16_LR_PH]] ], [ [[W2_096]], [[FOR_BODY16]] ]			; CHECK-NEXT: [[TMP9:%.*]] = phi <4 x float> [ [[TMP5]], [[FOR_BODY16_LR_PH]] ], [ [[TMP19]], [[FOR_BODY16]] ]
	; CHECK-NEXT: [[W2_096]] = phi float [ [[TMP5]], [[FOR_BODY16_LR_PH]] ], [ [[SUB28]], [[FOR_BODY16]] ]			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
	; CHECK-NEXT: [[MUL17:%.*]] = fmul fast float [[W0_0100]], 0x3FF19999A0000000			; CHECK-NEXT: [[MUL17:%.*]] = fmul fast float [[TMP10]], 0x3FF19999A0000000
	; CHECK-NEXT: [[MUL18_NEG:%.*]] = fmul fast float [[W1_099]], 0xBFF3333340000000			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1
				; CHECK-NEXT: [[MUL18_NEG:%.*]] = fmul fast float [[TMP11]], 0xBFF3333340000000
	; CHECK-NEXT: [[SUB92:%.*]] = fadd fast float [[MUL17]], [[MUL18_NEG]]			; CHECK-NEXT: [[SUB92:%.*]] = fadd fast float [[MUL17]], [[MUL18_NEG]]
	; CHECK-NEXT: [[SUB19]] = fadd fast float [[SUB92]], [[TMP8]]			; CHECK-NEXT: [[SUB19:%.*]] = fadd fast float [[SUB92]], [[TMP6]]
	; CHECK-NEXT: [[MUL20:%.*]] = fmul fast float [[SUB19]], 0x4000CCCCC0000000			; CHECK-NEXT: [[MUL20:%.*]] = fmul fast float [[SUB19]], 0x4000CCCCC0000000
	; CHECK-NEXT: [[MUL21_NEG:%.*]] = fmul fast float [[W0_0100]], 0xC0019999A0000000			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <4 x float> <float 0xC0019999A0000000, float 0x4002666660000000, float 0x4008CCCCC0000000, float 0xC0099999A0000000>, [[TMP9]]
	; CHECK-NEXT: [[MUL23:%.*]] = fmul fast float [[W1_099]], 0x4002666660000000			; CHECK-NEXT: [[ADD2293:%.*]] = fadd fast float undef, undef
	; CHECK-NEXT: [[MUL25:%.*]] = fmul fast float [[W2_096]], 0x4008CCCCC0000000			; CHECK-NEXT: [[ADD24:%.*]] = fadd fast float [[ADD2293]], undef
	; CHECK-NEXT: [[MUL27_NEG:%.*]] = fmul fast float [[W3_097]], 0xC0099999A0000000			; CHECK-NEXT: [[SUB2694:%.*]] = fadd fast float [[ADD24]], undef
	; CHECK-NEXT: [[ADD2293:%.*]] = fadd fast float [[MUL27_NEG]], [[MUL25]]			; CHECK-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP12]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; CHECK-NEXT: [[ADD24:%.*]] = fadd fast float [[ADD2293]], [[MUL23]]			; CHECK-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP12]], [[RDX_SHUF]]
	; CHECK-NEXT: [[SUB2694:%.*]] = fadd fast float [[ADD24]], [[MUL21_NEG]]			; CHECK-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[SUB28]] = fadd fast float [[SUB2694]], [[MUL20]]			; CHECK-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
				; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
				; CHECK-NEXT: [[TMP14:%.*]] = fadd fast float [[TMP13]], [[MUL20]]
				; CHECK-NEXT: [[SUB28:%.*]] = fadd fast float [[SUB2694]], [[MUL20]]
	; CHECK-NEXT: [[INC]] = add nuw i32 [[J_098]], 1			; CHECK-NEXT: [[INC]] = add nuw i32 [[J_098]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ARG_B]]			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ARG_B]]
				; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> undef, float [[SUB19]], i32 0
				; CHECK-NEXT: [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP10]], i32 1
				; CHECK-NEXT: [[TMP17:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP14]], i32 2
				; CHECK-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP9]], i32 2
				; CHECK-NEXT: [[TMP19]] = insertelement <4 x float> [[TMP17]], float [[TMP18]], i32 3
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16]]
	;			;
	; STORE-LABEL: @foo(			; STORE-LABEL: @foo(
	; STORE-NEXT: entry:			; STORE-NEXT: entry:
	; STORE-NEXT: [[CMP1495:%.]] = icmp eq i32 [[ARG_B:%.]], 0			; STORE-NEXT: [[CMP1495:%.]] = icmp eq i32 [[ARG_B:%.]], 0
	; STORE-NEXT: br label [[FOR_BODY:%.*]]			; STORE-NEXT: br label [[FOR_BODY:%.*]]
	; STORE: for.cond.cleanup:			; STORE: for.cond.cleanup:
	; STORE-NEXT: ret void			; STORE-NEXT: ret void
	; STORE: for.body:			; STORE: for.body:
	; STORE-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND_CLEANUP15:%.]] ]			; STORE-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_COND_CLEANUP15:%.]] ]
	; STORE-NEXT: [[TMP0:%.*]] = shl i64 [[INDVARS_IV]], 2			; STORE-NEXT: [[TMP0:%.*]] = shl i64 [[INDVARS_IV]], 2
	; STORE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[ARRAY:%.*]], i64 [[TMP0]]			; STORE-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[ARRAY:%.*]], i64 [[TMP0]]
	; STORE-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4			; STORE-NEXT: [[TMP1:%.*]] = or i64 [[TMP0]], 1
	; STORE-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 1			; STORE-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP1]]
	; STORE-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP2]]			; STORE-NEXT: [[TMP2:%.*]] = or i64 [[TMP0]], 2
	; STORE-NEXT: [[TMP3:%.]] = load float, float [[ARRAYIDX4]], align 4			; STORE-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP2]]
	; STORE-NEXT: [[TMP4:%.*]] = or i64 [[TMP0]], 2			; STORE-NEXT: [[TMP3:%.*]] = or i64 [[TMP0]], 3
	; STORE-NEXT: [[ARRAYIDX8:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP4]]			; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP3]]
	; STORE-NEXT: [[TMP5:%.]] = load float, float [[ARRAYIDX8]], align 4			; STORE-NEXT: [[TMP4:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; STORE-NEXT: [[TMP6:%.*]] = or i64 [[TMP0]], 3			; STORE-NEXT: [[TMP5:%.]] = load <4 x float>, <4 x float> [[TMP4]], align 4
	; STORE-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[ARRAY]], i64 [[TMP6]]
	; STORE-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX12]], align 4
	; STORE-NEXT: br i1 [[CMP1495]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16_LR_PH:%.*]]			; STORE-NEXT: br i1 [[CMP1495]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16_LR_PH:%.*]]
	; STORE: for.body16.lr.ph:			; STORE: for.body16.lr.ph:
	; STORE-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[ARG_A:%.*]], i64 [[INDVARS_IV]]			; STORE-NEXT: [[ADD_PTR:%.]] = getelementptr inbounds float, float [[ARG_A:%.*]], i64 [[INDVARS_IV]]
	; STORE-NEXT: [[TMP8:%.]] = load float, float [[ADD_PTR]], align 4			; STORE-NEXT: [[TMP6:%.]] = load float, float [[ADD_PTR]], align 4
	; STORE-NEXT: br label [[FOR_BODY16:%.*]]			; STORE-NEXT: br label [[FOR_BODY16:%.*]]
	; STORE: for.cond.cleanup15:			; STORE: for.cond.cleanup15:
	; STORE-NEXT: [[W2_0_LCSSA:%.]] = phi float [ [[TMP5]], [[FOR_BODY]] ], [ [[SUB28:%.]], [[FOR_BODY16]] ]			; STORE-NEXT: [[TMP7:%.]] = phi <4 x float> [ [[TMP5]], [[FOR_BODY]] ], [ [[TMP19:%.]], [[FOR_BODY16]] ]
	; STORE-NEXT: [[W3_0_LCSSA:%.]] = phi float [ [[TMP7]], [[FOR_BODY]] ], [ [[W2_096:%.]], [[FOR_BODY16]] ]			; STORE-NEXT: [[TMP8:%.]] = bitcast float [[ARRAYIDX]] to <4 x float>*
	; STORE-NEXT: [[W1_0_LCSSA:%.]] = phi float [ [[TMP3]], [[FOR_BODY]] ], [ [[W0_0100:%.]], [[FOR_BODY16]] ]			; STORE-NEXT: store <4 x float> [[TMP7]], <4 x float>* [[TMP8]], align 4
	; STORE-NEXT: [[W0_0_LCSSA:%.]] = phi float [ [[TMP1]], [[FOR_BODY]] ], [ [[SUB19:%.]], [[FOR_BODY16]] ]
	; STORE-NEXT: store float [[W0_0_LCSSA]], float* [[ARRAYIDX]], align 4
	; STORE-NEXT: store float [[W1_0_LCSSA]], float* [[ARRAYIDX4]], align 4
	; STORE-NEXT: store float [[W2_0_LCSSA]], float* [[ARRAYIDX8]], align 4
	; STORE-NEXT: store float [[W3_0_LCSSA]], float* [[ARRAYIDX12]], align 4
	; STORE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; STORE-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; STORE-NEXT: [[EXITCOND109:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 6			; STORE-NEXT: [[EXITCOND109:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 6
	; STORE-NEXT: br i1 [[EXITCOND109]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]			; STORE-NEXT: br i1 [[EXITCOND109]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]
	; STORE: for.body16:			; STORE: for.body16:
	; STORE-NEXT: [[W0_0100]] = phi float [ [[TMP1]], [[FOR_BODY16_LR_PH]] ], [ [[SUB19]], [[FOR_BODY16]] ]
	; STORE-NEXT: [[W1_099:%.*]] = phi float [ [[TMP3]], [[FOR_BODY16_LR_PH]] ], [ [[W0_0100]], [[FOR_BODY16]] ]
	; STORE-NEXT: [[J_098:%.]] = phi i32 [ 0, [[FOR_BODY16_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY16]] ]			; STORE-NEXT: [[J_098:%.]] = phi i32 [ 0, [[FOR_BODY16_LR_PH]] ], [ [[INC:%.]], [[FOR_BODY16]] ]
	; STORE-NEXT: [[W3_097:%.*]] = phi float [ [[TMP7]], [[FOR_BODY16_LR_PH]] ], [ [[W2_096]], [[FOR_BODY16]] ]			; STORE-NEXT: [[TMP9:%.*]] = phi <4 x float> [ [[TMP5]], [[FOR_BODY16_LR_PH]] ], [ [[TMP19]], [[FOR_BODY16]] ]
	; STORE-NEXT: [[W2_096]] = phi float [ [[TMP5]], [[FOR_BODY16_LR_PH]] ], [ [[SUB28]], [[FOR_BODY16]] ]			; STORE-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
	; STORE-NEXT: [[MUL17:%.*]] = fmul fast float [[W0_0100]], 0x3FF19999A0000000			; STORE-NEXT: [[MUL17:%.*]] = fmul fast float [[TMP10]], 0x3FF19999A0000000
	; STORE-NEXT: [[MUL18_NEG:%.*]] = fmul fast float [[W1_099]], 0xBFF3333340000000			; STORE-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1
				; STORE-NEXT: [[MUL18_NEG:%.*]] = fmul fast float [[TMP11]], 0xBFF3333340000000
	; STORE-NEXT: [[SUB92:%.*]] = fadd fast float [[MUL17]], [[MUL18_NEG]]			; STORE-NEXT: [[SUB92:%.*]] = fadd fast float [[MUL17]], [[MUL18_NEG]]
	; STORE-NEXT: [[SUB19]] = fadd fast float [[SUB92]], [[TMP8]]			; STORE-NEXT: [[SUB19:%.*]] = fadd fast float [[SUB92]], [[TMP6]]
	; STORE-NEXT: [[MUL20:%.*]] = fmul fast float [[SUB19]], 0x4000CCCCC0000000			; STORE-NEXT: [[MUL20:%.*]] = fmul fast float [[SUB19]], 0x4000CCCCC0000000
	; STORE-NEXT: [[MUL21_NEG:%.*]] = fmul fast float [[W0_0100]], 0xC0019999A0000000			; STORE-NEXT: [[TMP12:%.*]] = fmul fast <4 x float> <float 0xC0019999A0000000, float 0x4002666660000000, float 0x4008CCCCC0000000, float 0xC0099999A0000000>, [[TMP9]]
	; STORE-NEXT: [[MUL23:%.*]] = fmul fast float [[W1_099]], 0x4002666660000000			; STORE-NEXT: [[ADD2293:%.*]] = fadd fast float undef, undef
	; STORE-NEXT: [[MUL25:%.*]] = fmul fast float [[W2_096]], 0x4008CCCCC0000000			; STORE-NEXT: [[ADD24:%.*]] = fadd fast float [[ADD2293]], undef
	; STORE-NEXT: [[MUL27_NEG:%.*]] = fmul fast float [[W3_097]], 0xC0099999A0000000			; STORE-NEXT: [[SUB2694:%.*]] = fadd fast float [[ADD24]], undef
	; STORE-NEXT: [[ADD2293:%.*]] = fadd fast float [[MUL27_NEG]], [[MUL25]]			; STORE-NEXT: [[RDX_SHUF:%.*]] = shufflevector <4 x float> [[TMP12]], <4 x float> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
	; STORE-NEXT: [[ADD24:%.*]] = fadd fast float [[ADD2293]], [[MUL23]]			; STORE-NEXT: [[BIN_RDX:%.*]] = fadd fast <4 x float> [[TMP12]], [[RDX_SHUF]]
	; STORE-NEXT: [[SUB2694:%.*]] = fadd fast float [[ADD24]], [[MUL21_NEG]]			; STORE-NEXT: [[RDX_SHUF1:%.*]] = shufflevector <4 x float> [[BIN_RDX]], <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
	; STORE-NEXT: [[SUB28]] = fadd fast float [[SUB2694]], [[MUL20]]			; STORE-NEXT: [[BIN_RDX2:%.*]] = fadd fast <4 x float> [[BIN_RDX]], [[RDX_SHUF1]]
				; STORE-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[BIN_RDX2]], i32 0
				; STORE-NEXT: [[TMP14:%.*]] = fadd fast float [[TMP13]], [[MUL20]]
				; STORE-NEXT: [[SUB28:%.*]] = fadd fast float [[SUB2694]], [[MUL20]]
	; STORE-NEXT: [[INC]] = add nuw i32 [[J_098]], 1			; STORE-NEXT: [[INC]] = add nuw i32 [[J_098]], 1
	; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ARG_B]]			; STORE-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[ARG_B]]
				; STORE-NEXT: [[TMP15:%.*]] = insertelement <4 x float> undef, float [[SUB19]], i32 0
				; STORE-NEXT: [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP10]], i32 1
				; STORE-NEXT: [[TMP17:%.*]] = insertelement <4 x float> [[TMP16]], float [[TMP14]], i32 2
				; STORE-NEXT: [[TMP18:%.*]] = extractelement <4 x float> [[TMP9]], i32 2
				; STORE-NEXT: [[TMP19]] = insertelement <4 x float> [[TMP17]], float [[TMP18]], i32 3
	; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16]]			; STORE-NEXT: br i1 [[EXITCOND]], label [[FOR_COND_CLEANUP15]], label [[FOR_BODY16]]
	;			;
	entry:			entry:
	%cmp1495 = icmp eq i32 %arg_B, 0			%cmp1495 = icmp eq i32 %arg_B, 0
	br label %for.body			br label %for.body

	for.cond.cleanup: ; preds = %for.cond.cleanup15			for.cond.cleanup: ; preds = %for.cond.cleanup15
	ret void			ret void
	▲ Show 20 Lines • Show All 1,039 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/hsub.ll

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
}		}

;		;
; 256-bit vectors		; 256-bit vectors
;		;

define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {		define <4 x double> @test_v4f64(<4 x double> %a, <4 x double> %b) {
; SSE-LABEL: @test_v4f64(		; SSE-LABEL: @test_v4f64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; SSE-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = fsub <4 x double> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: ret <4 x double> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: ret <4 x double> [[R03]]
;		;
; SLM-LABEL: @test_v4f64(		; SLM-LABEL: @test_v4f64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x double> [[A:%.]], <4 x double> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 1, i32 5>
; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 2, i32 6>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x double> [[A]], <4 x double> [[B]], <2 x i32> <i32 3, i32 7>
; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[TMP6:%.*]] = fsub <2 x double> [[TMP4]], [[TMP5]]
Show All 27 Lines	;
%r00 = insertelement <4 x double> undef, double %r0, i32 0		%r00 = insertelement <4 x double> undef, double %r0, i32 0
%r01 = insertelement <4 x double> %r00, double %r1, i32 1		%r01 = insertelement <4 x double> %r00, double %r1, i32 1
%r02 = insertelement <4 x double> %r01, double %r2, i32 2		%r02 = insertelement <4 x double> %r01, double %r2, i32 2
%r03 = insertelement <4 x double> %r02, double %r3, i32 3		%r03 = insertelement <4 x double> %r02, double %r3, i32 3
ret <4 x double> %r03		ret <4 x double> %r03
}		}

define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {		define <8 x float> @test_v8f32(<8 x float> %a, <8 x float> %b) {
; SSE-LABEL: @test_v8f32(		; CHECK-LABEL: @test_v8f32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; CHECK-NEXT: ret <8 x float> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x float> [[R07]]
;
; SLM-LABEL: @test_v8f32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = fsub <4 x float> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = fsub <4 x float> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x float> [[R07]]
;
; AVX-LABEL: @test_v8f32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x float> [[TMP3]]
;
; AVX512-LABEL: @test_v8f32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x float> [[A:%.]], <8 x float> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x float> [[A]], <8 x float> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX512-NEXT: [[TMP3:%.*]] = fsub <8 x float> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <8 x float> [[TMP3]]
;		;
%a0 = extractelement <8 x float> %a, i32 0		%a0 = extractelement <8 x float> %a, i32 0
%a1 = extractelement <8 x float> %a, i32 1		%a1 = extractelement <8 x float> %a, i32 1
%a2 = extractelement <8 x float> %a, i32 2		%a2 = extractelement <8 x float> %a, i32 2
%a3 = extractelement <8 x float> %a, i32 3		%a3 = extractelement <8 x float> %a, i32 3
%a4 = extractelement <8 x float> %a, i32 4		%a4 = extractelement <8 x float> %a, i32 4
%a5 = extractelement <8 x float> %a, i32 5		%a5 = extractelement <8 x float> %a, i32 5
%a6 = extractelement <8 x float> %a, i32 6		%a6 = extractelement <8 x float> %a, i32 6
Show All 22 Lines	;
%r05 = insertelement <8 x float> %r04, float %r5, i32 5		%r05 = insertelement <8 x float> %r04, float %r5, i32 5
%r06 = insertelement <8 x float> %r05, float %r6, i32 6		%r06 = insertelement <8 x float> %r05, float %r6, i32 6
%r07 = insertelement <8 x float> %r06, float %r7, i32 7		%r07 = insertelement <8 x float> %r06, float %r7, i32 7
ret <8 x float> %r07		ret <8 x float> %r07
}		}

define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {		define <4 x i64> @test_v4i64(<4 x i64> %a, <4 x i64> %b) {
; SSE-LABEL: @test_v4i64(		; SSE-LABEL: @test_v4i64(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SSE-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>		; SSE-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; SSE-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; SSE-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SSE-NEXT: ret <4 x i64> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
; SSE-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; SSE-NEXT: ret <4 x i64> [[R03]]
;		;
; SLM-LABEL: @test_v4i64(		; SLM-LABEL: @test_v4i64(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>		; SLM-NEXT: [[A0:%.]] = extractelement <4 x i64> [[A:%.]], i32 0
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>		; SLM-NEXT: [[A1:%.*]] = extractelement <4 x i64> [[A]], i32 1
; SLM-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; SLM-NEXT: [[A2:%.*]] = extractelement <4 x i64> [[A]], i32 2
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>		; SLM-NEXT: [[A3:%.*]] = extractelement <4 x i64> [[A]], i32 3
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>		; SLM-NEXT: [[B0:%.]] = extractelement <4 x i64> [[B:%.]], i32 0
; SLM-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]		; SLM-NEXT: [[B1:%.*]] = extractelement <4 x i64> [[B]], i32 1
; SLM-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>		; SLM-NEXT: [[B2:%.*]] = extractelement <4 x i64> [[B]], i32 2
		; SLM-NEXT: [[B3:%.*]] = extractelement <4 x i64> [[B]], i32 3
		; SLM-NEXT: [[R0:%.*]] = sub i64 [[A0]], [[A1]]
		; SLM-NEXT: [[R1:%.*]] = sub i64 [[B0]], [[B1]]
		; SLM-NEXT: [[R2:%.*]] = sub i64 [[A2]], [[A3]]
		; SLM-NEXT: [[R3:%.*]] = sub i64 [[B2]], [[B3]]
		; SLM-NEXT: [[R00:%.*]] = insertelement <4 x i64> undef, i64 [[R0]], i32 0
		; SLM-NEXT: [[R01:%.*]] = insertelement <4 x i64> [[R00]], i64 [[R1]], i32 1
		; SLM-NEXT: [[R02:%.*]] = insertelement <4 x i64> [[R01]], i64 [[R2]], i32 2
		; SLM-NEXT: [[R03:%.*]] = insertelement <4 x i64> [[R02]], i64 [[R3]], i32 3
; SLM-NEXT: ret <4 x i64> [[R03]]		; SLM-NEXT: ret <4 x i64> [[R03]]
;		;
; AVX-LABEL: @test_v4i64(		; AVX1-LABEL: @test_v4i64(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX1-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <2 x i32> <i32 0, i32 4>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX1-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 1, i32 5>
; AVX-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]		; AVX1-NEXT: [[TMP3:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <4 x i64> [[TMP3]]		; AVX1-NEXT: [[TMP4:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 2, i32 6>
		; AVX1-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <2 x i32> <i32 3, i32 7>
		; AVX1-NEXT: [[TMP6:%.*]] = sub <2 x i64> [[TMP4]], [[TMP5]]
		; AVX1-NEXT: [[R03:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
		; AVX1-NEXT: ret <4 x i64> [[R03]]
		;
		; AVX2-LABEL: @test_v4i64(
		; AVX2-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
		; AVX2-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
		; AVX2-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
		; AVX2-NEXT: ret <4 x i64> [[TMP3]]
;		;
; AVX512-LABEL: @test_v4i64(		; AVX512-LABEL: @test_v4i64(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>		; AVX512-NEXT: [[TMP1:%.]] = shufflevector <4 x i64> [[A:%.]], <4 x i64> [[B:%.*]], <4 x i32> <i32 0, i32 4, i32 2, i32 6>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>		; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <4 x i64> [[A]], <4 x i64> [[B]], <4 x i32> <i32 1, i32 5, i32 3, i32 7>
; AVX512-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]		; AVX512-NEXT: [[TMP3:%.*]] = sub <4 x i64> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <4 x i64> [[TMP3]]		; AVX512-NEXT: ret <4 x i64> [[TMP3]]
;		;
%a0 = extractelement <4 x i64> %a, i32 0		%a0 = extractelement <4 x i64> %a, i32 0
Show All 11 Lines	;
%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0		%r00 = insertelement <4 x i64> undef, i64 %r0, i32 0
%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1		%r01 = insertelement <4 x i64> %r00, i64 %r1, i32 1
%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2		%r02 = insertelement <4 x i64> %r01, i64 %r2, i32 2
%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3		%r03 = insertelement <4 x i64> %r02, i64 %r3, i32 3
ret <4 x i64> %r03		ret <4 x i64> %r03
}		}

define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {		define <8 x i32> @test_v8i32(<8 x i32> %a, <8 x i32> %b) {
; SSE-LABEL: @test_v8i32(		; CHECK-LABEL: @test_v8i32(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>		; CHECK-NEXT: ret <8 x i32> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SSE-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SSE-NEXT: ret <8 x i32> [[R07]]
;
; SLM-LABEL: @test_v8i32(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <4 x i32> <i32 0, i32 2, i32 8, i32 10>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 1, i32 3, i32 9, i32 11>
; SLM-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; SLM-NEXT: [[TMP4:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 4, i32 6, i32 12, i32 14>
; SLM-NEXT: [[TMP5:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <4 x i32> <i32 5, i32 7, i32 13, i32 15>
; SLM-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; SLM-NEXT: [[R07:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
; SLM-NEXT: ret <8 x i32> [[R07]]
;
; AVX-LABEL: @test_v8i32(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <8 x i32> [[TMP3]]
;
; AVX512-LABEL: @test_v8i32(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <8 x i32> [[A:%.]], <8 x i32> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 8, i32 10, i32 4, i32 6, i32 12, i32 14>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> [[B]], <8 x i32> <i32 1, i32 3, i32 9, i32 11, i32 5, i32 7, i32 13, i32 15>
; AVX512-NEXT: [[TMP3:%.*]] = sub <8 x i32> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <8 x i32> [[TMP3]]
;		;
%a0 = extractelement <8 x i32> %a, i32 0		%a0 = extractelement <8 x i32> %a, i32 0
%a1 = extractelement <8 x i32> %a, i32 1		%a1 = extractelement <8 x i32> %a, i32 1
%a2 = extractelement <8 x i32> %a, i32 2		%a2 = extractelement <8 x i32> %a, i32 2
%a3 = extractelement <8 x i32> %a, i32 3		%a3 = extractelement <8 x i32> %a, i32 3
%a4 = extractelement <8 x i32> %a, i32 4		%a4 = extractelement <8 x i32> %a, i32 4
%a5 = extractelement <8 x i32> %a, i32 5		%a5 = extractelement <8 x i32> %a, i32 5
%a6 = extractelement <8 x i32> %a, i32 6		%a6 = extractelement <8 x i32> %a, i32 6
Show All 21 Lines	;
%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4		%r04 = insertelement <8 x i32> %r03, i32 %r4, i32 4
%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5		%r05 = insertelement <8 x i32> %r04, i32 %r5, i32 5
%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6		%r06 = insertelement <8 x i32> %r05, i32 %r6, i32 6
%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7		%r07 = insertelement <8 x i32> %r06, i32 %r7, i32 7
ret <8 x i32> %r07		ret <8 x i32> %r07
}		}

define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {		define <16 x i16> @test_v16i16(<16 x i16> %a, <16 x i16> %b) {
; SSE-LABEL: @test_v16i16(		; CHECK-LABEL: @test_v16i16(
; SSE-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <8 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22>		; CHECK-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SSE-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23>		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SSE-NEXT: [[TMP3:%.*]] = sub <8 x i16> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; SSE-NEXT: [[TMP4:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>		; CHECK-NEXT: ret <16 x i16> [[TMP3]]
; SSE-NEXT: [[TMP5:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <8 x i32> <i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SSE-NEXT: [[TMP6:%.*]] = sub <8 x i16> [[TMP4]], [[TMP5]]
; SSE-NEXT: [[RV15:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP6]], <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
; SSE-NEXT: ret <16 x i16> [[RV15]]
;
; SLM-LABEL: @test_v16i16(
; SLM-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; SLM-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; SLM-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; SLM-NEXT: ret <16 x i16> [[TMP3]]
;
; AVX-LABEL: @test_v16i16(
; AVX-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; AVX-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; AVX-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; AVX-NEXT: ret <16 x i16> [[TMP3]]
;
; AVX512-LABEL: @test_v16i16(
; AVX512-NEXT: [[TMP1:%.]] = shufflevector <16 x i16> [[A:%.]], <16 x i16> [[B:%.*]], <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 16, i32 18, i32 20, i32 22, i32 8, i32 10, i32 12, i32 14, i32 24, i32 26, i32 28, i32 30>
; AVX512-NEXT: [[TMP2:%.*]] = shufflevector <16 x i16> [[A]], <16 x i16> [[B]], <16 x i32> <i32 1, i32 3, i32 5, i32 7, i32 17, i32 19, i32 21, i32 23, i32 9, i32 11, i32 13, i32 15, i32 25, i32 27, i32 29, i32 31>
; AVX512-NEXT: [[TMP3:%.*]] = sub <16 x i16> [[TMP1]], [[TMP2]]
; AVX512-NEXT: ret <16 x i16> [[TMP3]]
;		;
%a0 = extractelement <16 x i16> %a, i32 0		%a0 = extractelement <16 x i16> %a, i32 0
%a1 = extractelement <16 x i16> %a, i32 1		%a1 = extractelement <16 x i16> %a, i32 1
%a2 = extractelement <16 x i16> %a, i32 2		%a2 = extractelement <16 x i16> %a, i32 2
%a3 = extractelement <16 x i16> %a, i32 3		%a3 = extractelement <16 x i16> %a, i32 3
%a4 = extractelement <16 x i16> %a, i32 4		%a4 = extractelement <16 x i16> %a, i32 4
%a5 = extractelement <16 x i16> %a, i32 5		%a5 = extractelement <16 x i16> %a, i32 5
%a6 = extractelement <16 x i16> %a, i32 6		%a6 = extractelement <16 x i16> %a, i32 6
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/reorder_phi.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s

	%struct.complex = type { float, float }			%struct.complex = type { float, float }

	define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {			define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP20:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP25:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP19:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP24:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP3:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP18:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[A]], i64 [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = bitcast float [[TMP3]] to <2 x float>*
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[A]], i64 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> [[TMP5]], align 4
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP8:%.]] = load float, float [[TMP7]], align 4
	; CHECK-NEXT: [[TMP9:%.]] = load float, float [[TMP8]], align 4			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[TMP9]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = load float, float [[TMP10]], align 4			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x float> undef, float [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = fmul float [[TMP5]], [[TMP9]]			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP8]], i32 1
	; CHECK-NEXT: [[TMP13:%.*]] = fmul float [[TMP7]], [[TMP11]]			; CHECK-NEXT: [[TMP13:%.*]] = fmul <2 x float> [[TMP6]], [[TMP12]]
	; CHECK-NEXT: [[TMP14:%.*]] = fsub float [[TMP12]], [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = fmul float [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> undef, float [[TMP14]], i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = fmul float [[TMP5]], [[TMP11]]			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]			; CHECK-NEXT: [[TMP17:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP16]], i32 1
	; CHECK-NEXT: [[TMP18]] = fadd float [[TMP3]], [[TMP14]]			; CHECK-NEXT: [[TMP18:%.*]] = insertelement <2 x float> undef, float [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP19]] = fadd float [[TMP2]], [[TMP17]]			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x float> [[TMP18]], float [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP20]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP20:%.*]] = fmul <2 x float> [[TMP17]], [[TMP19]]
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[TMP20]], [[TMP0]]			; CHECK-NEXT: [[TMP21:%.*]] = fsub <2 x float> [[TMP13]], [[TMP20]]
	; CHECK-NEXT: br i1 [[TMP21]], label [[EXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: [[TMP22:%.*]] = fadd <2 x float> [[TMP13]], [[TMP20]]
				; CHECK-NEXT: [[TMP23:%.*]] = shufflevector <2 x float> [[TMP21]], <2 x float> [[TMP22]], <2 x i32> <i32 0, i32 3>
				; CHECK-NEXT: [[TMP24]] = fadd <2 x float> [[TMP2]], [[TMP23]]
				; CHECK-NEXT: [[TMP25]] = add nuw nsw i64 [[TMP1]], 1
				; CHECK-NEXT: [[TMP26:%.*]] = icmp eq i64 [[TMP25]], [[TMP0]]
				; CHECK-NEXT: br i1 [[TMP26]], label [[EXIT:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0			; CHECK-NEXT: [[TMP27:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0
	; CHECK-NEXT: store float [[TMP18]], float* [[TMP22]], align 4			; CHECK-NEXT: [[TMP28:%.*]] = extractelement <2 x float> [[TMP24]], i32 0
	; CHECK-NEXT: [[TMP23:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT]], i32 0, i32 1			; CHECK-NEXT: store float [[TMP28]], float* [[TMP27]], align 4
	; CHECK-NEXT: store float [[TMP19]], float* [[TMP23]], align 4			; CHECK-NEXT: [[TMP29:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT]], i32 0, i32 1
				; CHECK-NEXT: [[TMP30:%.*]] = extractelement <2 x float> [[TMP24]], i32 1
				; CHECK-NEXT: store float [[TMP30]], float* [[TMP29]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = add i64 256, 0			%0 = add i64 256, 0
	br label %loop			br label %loop

	loop:			loop:
	%1 = phi i64 [ 0, %entry ], [ %20, %loop ]			%1 = phi i64 [ 0, %entry ], [ %20, %loop ]
	Show All 30 Lines

test/Transforms/SLPVectorizer/X86/simplebb.ll

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

; Don't vectorize volatile loads.		; Don't vectorize volatile loads.
define void @test_volatile_load(double* %a, double* %b, double* %c) {		define void @test_volatile_load(double* %a, double* %b, double* %c) {
; CHECK-LABEL: @test_volatile_load(		; CHECK-LABEL: @test_volatile_load(
; CHECK-NEXT: [[I0:%.]] = load volatile double, double [[A:%.*]], align 8		; CHECK-NEXT: [[I0:%.]] = load volatile double, double [[A:%.*]], align 8
; CHECK-NEXT: [[I1:%.]] = load volatile double, double [[B:%.*]], align 8		; CHECK-NEXT: [[I1:%.]] = load volatile double, double [[B:%.*]], align 8
; CHECK-NEXT: [[MUL:%.*]] = fmul double [[I0]], [[I1]]
; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A]], i64 1		; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A]], i64 1
; CHECK-NEXT: [[I3:%.]] = load double, double [[ARRAYIDX3]], align 8		; CHECK-NEXT: [[I3:%.]] = load double, double [[ARRAYIDX3]], align 8
; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B]], i64 1		; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds double, double [[B]], i64 1
; CHECK-NEXT: [[I4:%.]] = load double, double [[ARRAYIDX4]], align 8		; CHECK-NEXT: [[I4:%.]] = load double, double [[ARRAYIDX4]], align 8
; CHECK-NEXT: [[MUL5:%.*]] = fmul double [[I3]], [[I4]]		; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> undef, double [[I0]], i32 0
; CHECK-NEXT: store double [[MUL]], double* [[C:%.*]], align 8		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[I3]], i32 1
; CHECK-NEXT: [[ARRAYIDX5:%.]] = getelementptr inbounds double, double [[C]], i64 1		; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> undef, double [[I1]], i32 0
; CHECK-NEXT: store double [[MUL5]], double* [[ARRAYIDX5]], align 8		; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[I4]], i32 1
		; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]
		; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>
		; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%i0 = load volatile double, double* %a, align 8		%i0 = load volatile double, double* %a, align 8
%i1 = load volatile double, double* %b, align 8		%i1 = load volatile double, double* %b, align 8
%mul = fmul double %i0, %i1		%mul = fmul double %i0, %i1
%arrayidx3 = getelementptr inbounds double, double* %a, i64 1		%arrayidx3 = getelementptr inbounds double, double* %a, i64 1
%i3 = load double, double* %arrayidx3, align 8		%i3 = load double, double* %arrayidx3, align 8
%arrayidx4 = getelementptr inbounds double, double* %b, i64 1		%arrayidx4 = getelementptr inbounds double, double* %b, i64 1
Show All 38 Lines

test/Transforms/SLPVectorizer/X86/treecost.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=bdver2 < %s \| FileCheck %s

				%class.b = type { [0 x double] }

				@d = dso_local local_unnamed_addr global double 0.000000e+00, align 8
				@e = dso_local local_unnamed_addr global %class.b zeroinitializer, align 8

				; Function Attrs: norecurse nounwind uwtable
				define dso_local void @foo() local_unnamed_addr #0 {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.]] = load i64, i64 bitcast (double* @d to i64*), align 8
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i64> undef, i64 [[TMP0]], i32 0
				; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> [[TMP1]], i64 [[TMP0]], i32 1
				; CHECK-NEXT: store <2 x i64> [[TMP2]], <2 x i64>* bitcast (%class.b* @e to <2 x i64>*), align 8
				; CHECK-NEXT: ret void
				;
				entry:
				%0 = load i64, i64* bitcast (double* @d to i64*), align 8
				store i64 %0, i64* bitcast (double* getelementptr inbounds (%class.b, %class.b* @e, i64 0, i32 0, i64 1) to i64*), align 8
				store i64 %0, i64* bitcast (%class.b* @e to i64*), align 8
				ret void
				}

test/Transforms/SLPVectorizer/X86/unreachable.ll

	Show All 15 Lines
	; CHECK-NEXT: [[T4:%.]] = load i32, i32 [[T3]], align 4			; CHECK-NEXT: [[T4:%.]] = load i32, i32 [[T3]], align 4
	; CHECK-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[X]], i64 5			; CHECK-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[X]], i64 5
	; CHECK-NEXT: [[T6:%.]] = load i32, i32 [[T5]], align 4			; CHECK-NEXT: [[T6:%.]] = load i32, i32 [[T5]], align 4
	; CHECK-NEXT: [[BAD:%.*]] = fadd float [[BAD]], 0.000000e+00			; CHECK-NEXT: [[BAD:%.*]] = fadd float [[BAD]], 0.000000e+00
	; CHECK-NEXT: [[T7:%.]] = getelementptr inbounds i32, i32 [[X]], i64 6			; CHECK-NEXT: [[T7:%.]] = getelementptr inbounds i32, i32 [[X]], i64 6
	; CHECK-NEXT: [[T8:%.]] = load i32, i32 [[T7]], align 4			; CHECK-NEXT: [[T8:%.]] = load i32, i32 [[T7]], align 4
	; CHECK-NEXT: [[T9:%.]] = getelementptr inbounds i32, i32 [[X]], i64 7			; CHECK-NEXT: [[T9:%.]] = getelementptr inbounds i32, i32 [[X]], i64 7
	; CHECK-NEXT: [[T10:%.]] = load i32, i32 [[T9]], align 4			; CHECK-NEXT: [[T10:%.]] = load i32, i32 [[T9]], align 4
				; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> undef, i32 [[T4]], i32 0
				; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[T6]], i32 1
				; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x i32> [[TMP1]], i32 [[T8]], i32 2
				; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> [[TMP2]], i32 [[T10]], i32 3
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[T1_0:%.]] = phi i32 [ [[T4]], [[BB1:%.]] ], [ 2, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i32> [ [[TMP3]], [[BB1:%.]] ], [ <i32 2, i32 2, i32 2, i32 2>, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[T2_0:%.*]] = phi i32 [ [[T6]], [[BB1]] ], [ 2, [[ENTRY]] ]
	; CHECK-NEXT: [[T3_0:%.*]] = phi i32 [ [[T8]], [[BB1]] ], [ 2, [[ENTRY]] ]
	; CHECK-NEXT: [[T4_0:%.*]] = phi i32 [ [[T10]], [[BB1]] ], [ 2, [[ENTRY]] ]
	; CHECK-NEXT: store i32 [[T1_0]], i32* [[X]], align 4
	; CHECK-NEXT: [[T12:%.]] = getelementptr inbounds i32, i32 [[X]], i64 1			; CHECK-NEXT: [[T12:%.]] = getelementptr inbounds i32, i32 [[X]], i64 1
	; CHECK-NEXT: store i32 [[T2_0]], i32* [[T12]], align 4
	; CHECK-NEXT: [[T13:%.]] = getelementptr inbounds i32, i32 [[X]], i64 2			; CHECK-NEXT: [[T13:%.]] = getelementptr inbounds i32, i32 [[X]], i64 2
	; CHECK-NEXT: store i32 [[T3_0]], i32* [[T13]], align 4
	; CHECK-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[X]], i64 3			; CHECK-NEXT: [[T14:%.]] = getelementptr inbounds i32, i32 [[X]], i64 3
	; CHECK-NEXT: store i32 [[T4_0]], i32* [[T14]], align 4			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[X]] to <4 x i32>*
				; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %bb2			br label %bb2

	bb1: ; an unreachable block			bb1: ; an unreachable block
	%t3 = getelementptr inbounds i32, i32* %x, i64 4			%t3 = getelementptr inbounds i32, i32* %x, i64 4
	%t4 = load i32, i32* %t3, align 4			%t4 = load i32, i32* %t3, align 4
	Show All 25 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Fix incorrect cost tree calculation.AbandonedPublic

Details

483.xalancbmk 6900 -- CE

483.xalancbmk 6900 -- CE

Diff Detail

Event Timeline

Revision Contents

Diff 184983

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Transforms/SLPVectorizer/AArch64/gather-cost.ll

test/Transforms/SLPVectorizer/AArch64/getelementptr.ll

test/Transforms/SLPVectorizer/AArch64/horizontal.ll

test/Transforms/SLPVectorizer/AArch64/transpose.ll

test/Transforms/SLPVectorizer/X86/PR36280.ll

test/Transforms/SLPVectorizer/X86/PR39774.ll

test/Transforms/SLPVectorizer/X86/addsub.ll

test/Transforms/SLPVectorizer/X86/alternate-fp.ll

test/Transforms/SLPVectorizer/X86/alternate-int.ll

test/Transforms/SLPVectorizer/X86/bad_types.ll

test/Transforms/SLPVectorizer/X86/blending-shuffle.ll

test/Transforms/SLPVectorizer/X86/crash_binaryop.ll

test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

test/Transforms/SLPVectorizer/X86/crash_dequeue.ll

test/Transforms/SLPVectorizer/X86/crash_flop7.ll

test/Transforms/SLPVectorizer/X86/crash_gep.ll

test/Transforms/SLPVectorizer/X86/crash_lencod.ll

test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

test/Transforms/SLPVectorizer/X86/cse.ll

test/Transforms/SLPVectorizer/X86/external_user.ll

test/Transforms/SLPVectorizer/X86/hadd.ll

test/Transforms/SLPVectorizer/X86/horizontal.ll

test/Transforms/SLPVectorizer/X86/hsub.ll

test/Transforms/SLPVectorizer/X86/reorder_phi.ll

test/Transforms/SLPVectorizer/X86/simplebb.ll

test/Transforms/SLPVectorizer/X86/treecost.ll

test/Transforms/SLPVectorizer/X86/unreachable.ll

[SLP] Fix incorrect cost tree calculation.
AbandonedPublic