This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Redesign vectorization of the gather nodes.
ClosedPublic

Authored by ABataev on Oct 4 2022, 9:10 AM.

Download Raw Diff

Details

Reviewers

RKSimon
vdmitrie

Commits

rGb505fd559dcf: [SLP]Redesign vectorization of the gather nodes.
rG8ddd1ccdf893: [SLP]Redesign vectorization of the gather nodes.

Summary

Gather nodes are vectorized as simply vector of the scalars instead of
relying on the actual node. It leads to the fact that in some cases
we may miss incorrect transformation (non-matching set of scalars is
just ended as a gather node instead of possible vector/gather node).
Better to rely on the actual nodes, it allows to improve stability and
better detect missed cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Oct 4 2022, 9:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 9:10 AM

Herald added subscribers: kosarev, vporpo, kerbowa and 2 others. · View Herald Transcript

ABataev requested review of this revision.Oct 4 2022, 9:10 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2022, 9:10 AM

Herald added a subscriber: • pcwang-thead. · View Herald Transcript

Harbormaster completed remote builds in B190235: Diff 465041.Oct 4 2022, 10:26 AM

Rebase

Harbormaster completed remote builds in B194788: Diff 471335.Oct 27 2022, 6:02 PM

Rebase

Harbormaster completed remote builds in B195309: Diff 472073.Oct 31 2022, 12:49 PM

Rebase

Harbormaster completed remote builds in B195699: Diff 472607.Nov 2 2022, 8:23 AM

As usual, there's a lot going on in this patch! For starters - it looks like there a number of cleanup changes that can be pulled out?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3911	Independent change?
3938	Independent change?
3940	Independent change?
8100	Idx shadow variable?

In D135174#3903274, @RKSimon wrote:

As usual, there's a lot going on in this patch! For starters - it looks like there a number of cleanup changes that can be pulled out?

It was split already into several parts that were committed, just forgot to extract smaller things, will do.

Most of the things are cleanup, one is the bugfix, IIRC.

Rebase + address comments

a few minors

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921	Split + update comments
8072	We do this matching in a couple of places now - worth adding as a TreeEntry helper method?
8139	Add some comments describing whats happening in this method.

Harbormaster completed remote builds in B195921: Diff 472921.Nov 3 2022, 7:33 AM

ABataev added inline comments.Nov 3 2022, 9:06 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921	The problem is that this part does not affect current vectorization, it works only with the redesigned version. Originally we do not use gather nodes for the vectorization, just the list of scalars to produce the buildvector. That's the reason I think this must be the part of this change.
8072	Will try to do it in a separate patch.
8139	Will do

RKSimon added inline comments.Nov 3 2022, 9:41 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921	Sorry, I meant that the comment should updated/split as the reuses mask is always reordered now, not just for the early out.

ABataev added inline comments.Nov 3 2022, 9:52 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3920–3921	I see, thanks

Address comments

Harbormaster completed remote builds in B195970: Diff 472993.Nov 3 2022, 12:30 PM

RKSimon added inline comments.Nov 4 2022, 7:53 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8209	Please can you explain what this poisonvalue is for and why we don't need freeze for it.

ABataev added inline comments.Nov 4 2022, 8:00 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8209	It is to build the the shuffle with the poison only. On line 8271 the corresponding shuffle position is set to the non-poisoned element from the buildvector. The freeze is not required because we already checked that the scalar in Pos position is non-poisoned.

LGTM with one minor - cheers

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
8209	OK - please add a comment explaining that.

This revision is now accepted and ready to land.Nov 4 2022, 8:22 AM

Closed by commit rG8ddd1ccdf893: [SLP]Redesign vectorization of the gather nodes. (authored by ABataev). · Explain WhyNov 7 2022, 7:06 AM

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG8ddd1ccdf893: [SLP]Redesign vectorization of the gather nodes..

ABataev added a reverting change: rGecd0b5a5327a: Revert "[SLP]Redesign vectorization of the gather nodes.".Nov 7 2022, 8:38 AM

ABataev added a commit: rGb505fd559dcf: [SLP]Redesign vectorization of the gather nodes..Nov 10 2022, 11:00 AM

this change is causing a crash on the following IR

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.16.0"

%"class.(anonymous namespace)::ESMatrix" = type { [4 x [4 x float]] }

define void @"?Multiply@ESMatrix@?A0x78950DC3@@QEAAXPEAV1?A0x78950DC3@@0@Z"(ptr %a, ptr %b) {
entry:
  %result = alloca %"class.(anonymous namespace)::ESMatrix", i32 0, align 4
  %arrayidx11 = getelementptr [4 x [4 x float]], ptr %b, i64 0, i64 1
  %0 = load float, ptr %arrayidx11, align 4
  %1 = load float, ptr null, align 4
  %arrayidx120 = getelementptr [4 x float], ptr %b, i64 0, i64 3
  %2 = load float, ptr %arrayidx120, align 4
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %3 = load float, ptr %a, align 4
  %mul = fmul float %3, 0.000000e+00
  %arrayidx9 = getelementptr [4 x [4 x float]], ptr %a, i64 0, i64 0, i64 1
  %4 = load float, ptr %arrayidx9, align 4
  %mul13 = fmul float %4, %0
  %add = fadd float %mul, %mul13
  %add22 = fadd float %add, 0.000000e+00
  store float %add22, ptr %result, align 4
  %5 = load float, ptr null, align 4
  %mul43 = fmul float %3, %5
  %mul51 = fmul float %4, 0.000000e+00
  %add52 = fadd float %mul43, %mul51
  %add61 = fadd float %add52, 0.000000e+00
  %arrayidx74 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 1
  store float %add61, ptr %arrayidx74, align 4
  %mul82 = fmul float %3, 0.000000e+00
  %mul90 = fmul float %4, %1
  %add91 = fadd float %mul82, %mul90
  %add100 = fadd float %add91, 0.000000e+00
  %arrayidx113 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 2
  store float %add100, ptr %arrayidx113, align 4
  %mul121 = fmul float %3, %2
  %mul129 = fmul float %4, 0.000000e+00
  %add130 = fadd float %mul121, %mul129
  %add139 = fadd float %add130, 0.000000e+00
  %arrayidx152 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 3
  store float %add139, ptr %arrayidx152, align 4
  br label %for.body
}

$ opt -p slp-vectorizer /tmp/a.ll -disable-output
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8150: Value *llvm::slpvectorizer::BoUpSLP::vectorizeOperand(TreeEntry *, unsigned int): Assertion `(any_of(VE->UserTreeIndices, [E, Node
Idx](const EdgeInfo &EI) { return EI.EdgeIdx == NodeIdx && EI.UserTE == E; }) || any_of(VectorizableTree, [E, NodeIdx, VE](const std::unique_ptr<TreeEntry> &TE) { return TE->isOperandGatherNode({
E, NodeIdx}) && VE->isSame(TE->Scalars); })) && "Expected same vectorizable node."' failed.

In D135174#3928081, @aeubanks wrote:

this change is causing a crash on the following IR

target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.16.0"

%"class.(anonymous namespace)::ESMatrix" = type { [4 x [4 x float]] }

define void @"?Multiply@ESMatrix@?A0x78950DC3@@QEAAXPEAV1?A0x78950DC3@@0@Z"(ptr %a, ptr %b) {
entry:
  %result = alloca %"class.(anonymous namespace)::ESMatrix", i32 0, align 4
  %arrayidx11 = getelementptr [4 x [4 x float]], ptr %b, i64 0, i64 1
  %0 = load float, ptr %arrayidx11, align 4
  %1 = load float, ptr null, align 4
  %arrayidx120 = getelementptr [4 x float], ptr %b, i64 0, i64 3
  %2 = load float, ptr %arrayidx120, align 4
  br label %for.body

for.body:                                         ; preds = %for.body, %entry
  %3 = load float, ptr %a, align 4
  %mul = fmul float %3, 0.000000e+00
  %arrayidx9 = getelementptr [4 x [4 x float]], ptr %a, i64 0, i64 0, i64 1
  %4 = load float, ptr %arrayidx9, align 4
  %mul13 = fmul float %4, %0
  %add = fadd float %mul, %mul13
  %add22 = fadd float %add, 0.000000e+00
  store float %add22, ptr %result, align 4
  %5 = load float, ptr null, align 4
  %mul43 = fmul float %3, %5
  %mul51 = fmul float %4, 0.000000e+00
  %add52 = fadd float %mul43, %mul51
  %add61 = fadd float %add52, 0.000000e+00
  %arrayidx74 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 1
  store float %add61, ptr %arrayidx74, align 4
  %mul82 = fmul float %3, 0.000000e+00
  %mul90 = fmul float %4, %1
  %add91 = fadd float %mul82, %mul90
  %add100 = fadd float %add91, 0.000000e+00
  %arrayidx113 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 2
  store float %add100, ptr %arrayidx113, align 4
  %mul121 = fmul float %3, %2
  %mul129 = fmul float %4, 0.000000e+00
  %add130 = fadd float %mul121, %mul129
  %add139 = fadd float %add130, 0.000000e+00
  %arrayidx152 = getelementptr [4 x [4 x float]], ptr %result, i64 0, i64 0, i64 3
  store float %add139, ptr %arrayidx152, align 4
  br label %for.body
}

$ opt -p slp-vectorizer /tmp/a.ll -disable-output
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:8150: Value *llvm::slpvectorizer::BoUpSLP::vectorizeOperand(TreeEntry *, unsigned int): Assertion `(any_of(VE->UserTreeIndices, [E, Node
Idx](const EdgeInfo &EI) { return EI.EdgeIdx == NodeIdx && EI.UserTE == E; }) || any_of(VectorizableTree, [E, NodeIdx, VE](const std::unique_ptr<TreeEntry> &TE) { return TE->isOperandGatherNode({
E, NodeIdx}) && VE->isSame(TE->Scalars); })) && "Expected same vectorizable node."' failed.

Must be fixed in 0a33ceee0105c94060c8a6089a2e489a8a7a5cb7

This patch caused https://github.com/llvm/llvm-project/issues/59693.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

336 lines

test/

Transforms/

SLPVectorizer/

AArch64/

66 lines

140 lines

64 lines

29 lines

vectorizable-selects-uniform-cmps.ll

16 lines

vectorize-free-extracts-inserts.ll

46 lines

AMDGPU/

packed-math.ll

8 lines

X86/

PR35777.ll

22 lines

PR39774.ll

18 lines

alternate-cmp-swapped-pred.ll

4 lines

broadcast_long.ll

5 lines

buildvector-shuffle.ll

6 lines

20 lines

8 lines

6 lines

22 lines

40 lines

crash_exceed_scheduling.ll

34 lines

cse.ll

36 lines

extract-scalar-from-undef.ll

9 lines

extract_in_tree_user.ll

30 lines

extractelement-multiple-uses.ll

10 lines

20 lines

20 lines

24 lines

7 lines

jumbled-load-multiuse.ll

9 lines

lookahead.ll

84 lines

matched-shuffled-entries.ll

24 lines

22 lines

35 lines

12 lines

32 lines

remark_extract_broadcast.ll

2 lines

reorder_phi.ll

36 lines

reorder_with_external_users.ll

68 lines

reused-undefs.ll

4 lines

scatter-vectorize-reused-pointer.ll

16 lines

vectorize-widest-phis.ll

2 lines

Diff 473656

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,261 Lines • ▼ Show 20 Lines	private:
/// returns false, setting \p CurrentOrder to either an empty vector or a		/// returns false, setting \p CurrentOrder to either an empty vector or a
/// non-identity permutation that allows to reuse extract instructions.		/// non-identity permutation that allows to reuse extract instructions.
bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,		bool canReuseExtract(ArrayRef<Value > VL, Value OpValue,
SmallVectorImpl<unsigned> &CurrentOrder) const;		SmallVectorImpl<unsigned> &CurrentOrder) const;

/// Vectorize a single entry in the tree.		/// Vectorize a single entry in the tree.
Value vectorizeTree(TreeEntry E);		Value vectorizeTree(TreeEntry E);

/// Vectorize a single entry in the tree, starting in \p VL.		/// Vectorize a single entry in the tree, the \p Idx-th operand of the entry
Value vectorizeTree(ArrayRef<Value > VL);		/// \p E.
		Value vectorizeOperand(TreeEntry E, unsigned NodeIdx);

/// Create a new vector from a list of scalar values. Produces a sequence		/// Create a new vector from a list of scalar values. Produces a sequence
/// which exploits values reused across lanes, and arranges the inserts		/// which exploits values reused across lanes, and arranges the inserts
/// for ease of later optimization.		/// for ease of later optimization.
Value createBuildVector(ArrayRef<Value > VL);		Value createBuildVector(const TreeEntry E);

/// \returns the scalarization cost for this type. Scalarization in this		/// \returns the scalarization cost for this type. Scalarization in this
/// context means the creation of vectors from a group of scalars. If \p		/// context means the creation of vectors from a group of scalars. If \p
/// NeedToShuffle is true, need to add a cost of reshuffling some of the		/// NeedToShuffle is true, need to add a cost of reshuffling some of the
/// vector elements.		/// vector elements.
InstructionCost getGatherCost(FixedVectorType *Ty,		InstructionCost getGatherCost(FixedVectorType *Ty,
const APInt &ShuffledIndices,		const APInt &ShuffledIndices,
bool NeedToShuffle) const;		bool NeedToShuffle) const;
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	bool isSame(ArrayRef<Value *> VL) const {
::addMask(Mask, ReuseShuffleIndices);		::addMask(Mask, ReuseShuffleIndices);
return IsSame(Scalars, Mask);		return IsSame(Scalars, Mask);
}		}
return false;		return false;
}		}
return IsSame(Scalars, ReuseShuffleIndices);		return IsSame(Scalars, ReuseShuffleIndices);
}		}

		bool isOperandGatherNode(const EdgeInfo &UserEI) const {
		return State == TreeEntry::NeedToGather &&
		UserTreeIndices.front().EdgeIdx == UserEI.EdgeIdx &&
		UserTreeIndices.front().UserTE == UserEI.UserTE;
		}

/// \returns true if current entry has same operands as \p TE.		/// \returns true if current entry has same operands as \p TE.
bool hasEqualOperands(const TreeEntry &TE) const {		bool hasEqualOperands(const TreeEntry &TE) const {
if (TE.getNumOperands() != getNumOperands())		if (TE.getNumOperands() != getNumOperands())
return false;		return false;
SmallBitVector Used(getNumOperands());		SmallBitVector Used(getNumOperands());
for (unsigned I = 0, E = getNumOperands(); I < E; ++I) {		for (unsigned I = 0, E = getNumOperands(); I < E; ++I) {
unsigned PrevCount = Used.count();		unsigned PrevCount = Used.count();
for (unsigned K = 0; K < E; ++K) {		for (unsigned K = 0; K < E; ++K) {
▲ Show 20 Lines • Show All 1,509 Lines • ▼ Show 20 Lines

/// Checks if the given mask is a "clustered" mask with the same clusters of		/// Checks if the given mask is a "clustered" mask with the same clusters of
/// size \p Sz, which are not identity submasks.		/// size \p Sz, which are not identity submasks.
static bool isRepeatedNonIdentityClusteredMask(ArrayRef<int> Mask,		static bool isRepeatedNonIdentityClusteredMask(ArrayRef<int> Mask,
unsigned Sz) {		unsigned Sz) {
ArrayRef<int> FirstCluster = Mask.slice(0, Sz);		ArrayRef<int> FirstCluster = Mask.slice(0, Sz);
if (ShuffleVectorInst::isIdentityMask(FirstCluster))		if (ShuffleVectorInst::isIdentityMask(FirstCluster))
return false;		return false;
for (unsigned I = Sz, E = Mask.size(); I < E; I += Sz) {		for (unsigned I = Sz, E = Mask.size(); I < E; I += Sz) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Independent change? RKSimon: Independent change?
ArrayRef<int> Cluster = Mask.slice(I, Sz);		ArrayRef<int> Cluster = Mask.slice(I, Sz);
if (Cluster != FirstCluster)		if (Cluster != FirstCluster)
return false;		return false;
}		}
return true;		return true;
}		}

void BoUpSLP::reorderNodeWithReuses(TreeEntry &TE, ArrayRef<int> Mask) const {		void BoUpSLP::reorderNodeWithReuses(TreeEntry &TE, ArrayRef<int> Mask) const {
// For vectorized and non-clustered reused - just reorder reuses mask.		// Reorder reuses mask.
		reorderReuses(TE.ReuseShuffleIndices, Mask);
		RKSimonUnsubmitted Not Done Reply Inline Actions Split + update comments RKSimon: Split + update comments
		ABataevAuthorUnsubmitted Done Reply Inline Actions The problem is that this part does not affect current vectorization, it works only with the redesigned version. Originally we do not use gather nodes for the vectorization, just the list of scalars to produce the buildvector. That's the reason I think this must be the part of this change. ABataev: The problem is that this part does not affect current vectorization, it works only with the…
		RKSimonUnsubmitted Not Done Reply Inline Actions Sorry, I meant that the comment should updated/split as the reuses mask is always reordered now, not just for the early out. RKSimon: Sorry, I meant that the comment should updated/split as the reuses mask is always reordered now…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I see, thanks ABataev: I see, thanks
const unsigned Sz = TE.Scalars.size();		const unsigned Sz = TE.Scalars.size();
if (TE.State != TreeEntry::NeedToGather \|\| !TE.ReorderIndices.empty() \|\|		// For vectorized and non-clustered reused no need to do anything else.
		if (TE.State != TreeEntry::NeedToGather \|\|
!ShuffleVectorInst::isOneUseSingleSourceMask(TE.ReuseShuffleIndices,		!ShuffleVectorInst::isOneUseSingleSourceMask(TE.ReuseShuffleIndices,
Sz) \|\|		Sz) \|\|
!isRepeatedNonIdentityClusteredMask(TE.ReuseShuffleIndices, Sz)) {		!isRepeatedNonIdentityClusteredMask(TE.ReuseShuffleIndices, Sz))
reorderReuses(TE.ReuseShuffleIndices, Mask);
return;		return;
}		SmallVector<int> NewMask;
		inversePermutation(TE.ReorderIndices, NewMask);
		addMask(NewMask, TE.ReuseShuffleIndices);
		// Clear reorder since it is going to be applied to the new mask.
		TE.ReorderIndices.clear();
// Try to improve gathered nodes with clustered reuses, if possible.		// Try to improve gathered nodes with clustered reuses, if possible.
reorderScalars(TE.Scalars, makeArrayRef(TE.ReuseShuffleIndices).slice(0, Sz));		reorderScalars(TE.Scalars, makeArrayRef(NewMask).slice(0, Sz));
// Fill the reuses mask with the identity submasks.		// Fill the reuses mask with the identity submasks.
for (auto *It = TE.ReuseShuffleIndices.begin(),		for (auto *It = TE.ReuseShuffleIndices.begin(),
*End = TE.ReuseShuffleIndices.end();		*End = TE.ReuseShuffleIndices.end();
		RKSimonUnsubmitted Not Done Reply Inline Actions Independent change? RKSimon: Independent change?
It != End; std::advance(It, Sz))		It != End; std::advance(It, Sz))
std::iota(It, std::next(It, Sz), 0);		std::iota(It, std::next(It, Sz), 0);
		RKSimonUnsubmitted Not Done Reply Inline Actions Independent change? RKSimon: Independent change?
}		}

void BoUpSLP::reorderTopToBottom() {		void BoUpSLP::reorderTopToBottom() {
// Maps VF to the graph nodes.		// Maps VF to the graph nodes.
DenseMap<unsigned, SetVector<TreeEntry *>> VFToOrderedEntries;		DenseMap<unsigned, SetVector<TreeEntry *>> VFToOrderedEntries;
// ExtractElement gather nodes which can be vectorized and need to handle		// ExtractElement gather nodes which can be vectorized and need to handle
// their ordering.		// their ordering.
DenseMap<const TreeEntry *, OrdersType> GathersToOrders;		DenseMap<const TreeEntry *, OrdersType> GathersToOrders;
▲ Show 20 Lines • Show All 4,094 Lines • ▼ Show 20 Lines	public:

~ShuffleInstructionBuilder() {		~ShuffleInstructionBuilder() {
assert((IsFinalized \|\| Mask.empty()) &&		assert((IsFinalized \|\| Mask.empty()) &&
"Shuffle construction must be finalized.");		"Shuffle construction must be finalized.");
}		}
};		};
} // namespace		} // namespace

Value BoUpSLP::vectorizeTree(ArrayRef<Value > VL) {		Value BoUpSLP::vectorizeOperand(TreeEntry E, unsigned NodeIdx) {
		ArrayRef<Value *> VL = E->getOperand(NodeIdx);
const unsigned VF = VL.size();		const unsigned VF = VL.size();
InstructionsState S = getSameOpcode(VL, *TLI);		InstructionsState S = getSameOpcode(VL, *TLI);
// Special processing for GEPs bundle, which may include non-gep values.		// Special processing for GEPs bundle, which may include non-gep values.
if (!S.getOpcode() && VL.front()->getType()->isPointerTy()) {		if (!S.getOpcode() && VL.front()->getType()->isPointerTy()) {
const auto *It =		const auto *It =
find_if(VL, [](Value *V) { return isa<GetElementPtrInst>(V); });		find_if(VL, [](Value *V) { return isa<GetElementPtrInst>(V); });
if (It != VL.end())		if (It != VL.end())
S = getSameOpcode(It, TLI);		S = getSameOpcode(It, TLI);
}		}
if (S.getOpcode()) {		if (S.getOpcode()) {
if (TreeEntry *E = getTreeEntry(S.OpValue))		if (TreeEntry *VE = getTreeEntry(S.OpValue); VE && VE->isSame(VL)) {
if (E->isSame(VL)) {		assert((any_of(VE->UserTreeIndices,
Value *V = vectorizeTree(E);		[E, NodeIdx](const EdgeInfo &EI) {
		return EI.EdgeIdx == NodeIdx && EI.UserTE == E;
		}) \|\|
		any_of(VectorizableTree,
		[E, NodeIdx, VE](const std::unique_ptr<TreeEntry> &TE) {
		return TE->isOperandGatherNode({E, NodeIdx}) &&
		VE->isSame(TE->Scalars);
		})) &&
		RKSimonUnsubmitted Not Done Reply Inline Actions We do this matching in a couple of places now - worth adding as a TreeEntry helper method? RKSimon: We do this matching in a couple of places now - worth adding as a TreeEntry helper method?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will try to do it in a separate patch. ABataev: Will try to do it in a separate patch.
		"Expected same vectorizable node.");
		Value *V = vectorizeTree(VE);
if (VF != cast<FixedVectorType>(V->getType())->getNumElements()) {		if (VF != cast<FixedVectorType>(V->getType())->getNumElements()) {
if (!E->ReuseShuffleIndices.empty()) {		if (!VE->ReuseShuffleIndices.empty()) {
// Reshuffle to get only unique values.		// Reshuffle to get only unique values.
// If some of the scalars are duplicated in the vectorization tree		// If some of the scalars are duplicated in the vectorization
// entry, we do not vectorize them but instead generate a mask for		// tree entry, we do not vectorize them but instead generate a
// the reuses. But if there are several users of the same entry,		// mask for the reuses. But if there are several users of the
// they may have different vectorization factors. This is especially		// same entry, they may have different vectorization factors.
// important for PHI nodes. In this case, we need to adapt the		// This is especially important for PHI nodes. In this case, we
// resulting instruction for the user vectorization factor and have		// need to adapt the resulting instruction for the user
// to reshuffle it again to take only unique elements of the vector.		// vectorization factor and have to reshuffle it again to take
// Without this code the function incorrectly returns reduced vector		// only unique elements of the vector. Without this code the
// instruction with the same elements, not with the unique ones.		// function incorrectly returns reduced vector instruction with
		// the same elements, not with the unique ones.

// block:		// block:
// %phi = phi <2 x > { .., %entry} {%shuffle, %block}		// %phi = phi <2 x > { .., %entry} {%shuffle, %block}
// %2 = shuffle <2 x > %phi, poison, <4 x > <1, 1, 0, 0>		// %2 = shuffle <2 x > %phi, poison, <4 x > <1, 1, 0, 0>
// ... (use %2)		// ... (use %2)
// %shuffle = shuffle <2 x> %2, poison, <2 x> {2, 0}		// %shuffle = shuffle <2 x> %2, poison, <2 x> {2, 0}
// br %block		// br %block
SmallVector<int> UniqueIdxs(VF, UndefMaskElem);		SmallVector<int> UniqueIdxs(VF, UndefMaskElem);
SmallSet<int, 4> UsedIdxs;		SmallSet<int, 4> UsedIdxs;
int Pos = 0;		int Pos = 0;
int Sz = VL.size();		for (int Idx : VE->ReuseShuffleIndices) {
for (int Idx : E->ReuseShuffleIndices) {		if (Idx != static_cast<int>(VF) && Idx != UndefMaskElem &&
if (Idx != Sz && Idx != UndefMaskElem &&
UsedIdxs.insert(Idx).second)		UsedIdxs.insert(Idx).second)
		RKSimonUnsubmitted Not Done Reply Inline Actions Idx shadow variable? RKSimon: Idx shadow variable?
UniqueIdxs[Idx] = Pos;		UniqueIdxs[Idx] = Pos;
++Pos;		++Pos;
}		}
assert(VF >= UsedIdxs.size() && "Expected vectorization factor "		assert(VF >= UsedIdxs.size() && "Expected vectorization factor "
"less than original vector size.");		"less than original vector size.");
UniqueIdxs.append(VF - UsedIdxs.size(), UndefMaskElem);		UniqueIdxs.append(VF - UsedIdxs.size(), UndefMaskElem);
V = Builder.CreateShuffleVector(V, UniqueIdxs, "shrink.shuffle");		V = Builder.CreateShuffleVector(V, UniqueIdxs, "shrink.shuffle");
} else {		} else {
assert(VF < cast<FixedVectorType>(V->getType())->getNumElements() &&		assert(VF < cast<FixedVectorType>(V->getType())->getNumElements() &&
"Expected vectorization factor less "		"Expected vectorization factor less "
"than original vector size.");		"than original vector size.");
SmallVector<int> UniformMask(VF, 0);		SmallVector<int> UniformMask(VF, 0);
std::iota(UniformMask.begin(), UniformMask.end(), 0);		std::iota(UniformMask.begin(), UniformMask.end(), 0);
V = Builder.CreateShuffleVector(V, UniformMask, "shrink.shuffle");		V = Builder.CreateShuffleVector(V, UniformMask, "shrink.shuffle");
}		}
if (auto *I = dyn_cast<Instruction>(V)) {		if (auto *I = dyn_cast<Instruction>(V)) {
GatherShuffleExtractSeq.insert(I);		GatherShuffleExtractSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
}		}
return V;		return V;
}		}
}		}

// Can't vectorize this, so simply build a new vector with each lane		// Find the corresponding gather entry and vectorize it.
// corresponding to the requested value.		// Allows to be more accurate with tree/graph transformations, checks for the
return createBuildVector(VL);		// correctness of the transformations in many cases.
}		auto *I = find_if(VectorizableTree,
Value BoUpSLP::createBuildVector(ArrayRef<Value > VL) {		[E, NodeIdx](const std::unique_ptr<TreeEntry> &TE) {
assert(any_of(VectorizableTree,		return TE->isOperandGatherNode({E, NodeIdx});
[VL](const std::unique_ptr<TreeEntry> &TE) {		});
return TE->State == TreeEntry::NeedToGather && TE->isSame(VL);		assert(I != VectorizableTree.end() && "Gather node is not in the graph.");
}) &&		assert(I->get()->UserTreeIndices.size() == 1 &&
"Non-matching gather node.");		"Expected only single user for the gather node.");
unsigned VF = VL.size();		assert(I->get()->isSame(VL) && "Expected same list of scalars.");
// Exploit possible reuse of values across lanes.		return vectorizeTree(I->get());
SmallVector<int> ReuseShuffleIndicies;		}
SmallVector<Value *> UniqueValues;
if (VL.size() > 2) {		Value BoUpSLP::createBuildVector(const TreeEntry E) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Add some comments describing whats happening in this method. RKSimon: Add some comments describing whats happening in this method.
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will do ABataev: Will do
		assert(E->State == TreeEntry::NeedToGather && "Expected gather node.");
		unsigned VF = E->getVectorFactor();

		ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleExtractSeq,
		CSEBlocks);
		SmallVector<Value *> Gathered(
		VF, PoisonValue::get(E->Scalars.front()->getType()));
		bool NeedFreeze = false;
		SmallVector<Value *> VL(E->Scalars.begin(), E->Scalars.end());
		// Build a mask out of the redorder indices and reorder scalars per this mask.
		SmallVector<int> ReorderMask;
		inversePermutation(E->ReorderIndices, ReorderMask);
		if (!ReorderMask.empty())
		reorderScalars(VL, ReorderMask);
		if (!allConstant(VL)) {
		// For splats with can emit broadcasts instead of gathers, so try to find
		// such sequences.
		bool IsSplat = isSplat(VL) && (VL.size() > 2 \|\| VL.front() == VL.back());
		SmallVector<int> ReuseMask(VF, UndefMaskElem);
		SmallVector<int> UndefPos;
DenseMap<Value *, unsigned> UniquePositions;		DenseMap<Value *, unsigned> UniquePositions;
unsigned NumValues =		// Gather unique non-const values and all constant values.
std::distance(VL.begin(), find_if(reverse(VL), [](Value *V) {		// For repeated values, just shuffle them.
return !isa<UndefValue>(V);		for (auto [I, V] : enumerate(VL)) {
}).base());
VF = std::max<unsigned>(VF, PowerOf2Ceil(NumValues));
int UniqueVals = 0;
for (Value *V : VL.drop_back(VL.size() - VF)) {
if (isa<UndefValue>(V)) {		if (isa<UndefValue>(V)) {
ReuseShuffleIndicies.emplace_back(UndefMaskElem);		if (!isa<PoisonValue>(V)) {
		Gathered[I] = V;
		ReuseMask[I] = I;
		UndefPos.push_back(I);
		}
continue;		continue;
}		}
if (isConstant(V)) {		if (isConstant(V)) {
ReuseShuffleIndicies.emplace_back(UniqueValues.size());		Gathered[I] = V;
UniqueValues.emplace_back(V);		ReuseMask[I] = I;
continue;		continue;
}		}
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		if (IsSplat) {
ReuseShuffleIndicies.emplace_back(Res.first->second);		Gathered.front() = V;
if (Res.second) {		ReuseMask[I] = 0;
UniqueValues.emplace_back(V);		} else {
++UniqueVals;		const auto Res = UniquePositions.try_emplace(V, I);
}		Gathered[Res.first->second] = V;
}		ReuseMask[I] = Res.first->second;
if (UniqueVals == 1 && UniqueValues.size() == 1) {		}
// Emit pure splat vector.		}
ReuseShuffleIndicies.append(VF - ReuseShuffleIndicies.size(),		if (!UndefPos.empty() && IsSplat) {
UndefMaskElem);		// For undef values, try to replace them with the simple broadcast.
} else if (UniqueValues.size() >= VF - 1 \|\| UniqueValues.size() <= 1) {		// We can do it if the broadcasted value is guaranteed to be
if (UniqueValues.empty()) {		// non-poisonous, or by freezing the incoming scalar value first.
assert(all_of(VL, UndefValue::classof) && "Expected list of undefs.");		auto It = find_if(Gathered, [this, E](Value V) {
NumValues = VF;		return !isa<UndefValue>(V) &&
		(getTreeEntry(V) \|\| isGuaranteedNotToBePoison(V) \|\|
		any_of(V->uses(), [E](const Use &U) {
		// Check if the value already used in the same operation in
		// one of the nodes already.
		return E->UserTreeIndices.size() == 1 &&
		is_contained(
		E->UserTreeIndices.front().UserTE->Scalars,
		U.getUser()) &&
		E->UserTreeIndices.front().EdgeIdx != U.getOperandNo();
		}));
		});
		if (It != Gathered.end()) {
		// Replace undefs by the non-poisoned scalars and emit broadcast.
		int Pos = std::distance(Gathered.begin(), It);
		for_each(UndefPos, [&](int I) {
		// Set the undef position to the non-poisoned scalar.
		ReuseMask[I] = Pos;
		// Replace the undef by the poison, in the mask it is replaced by non-poisoned scalar already.
		RKSimonUnsubmitted Not Done Reply Inline Actions Please can you explain what this poisonvalue is for and why we don't need freeze for it. RKSimon: Please can you explain what this poisonvalue is for and why we don't need freeze for it.
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is to build the the shuffle with the poison only. On line 8271 the corresponding shuffle position is set to the non-poisoned element from the buildvector. The freeze is not required because we already checked that the scalar in Pos position is non-poisoned. ABataev: It is to build the the shuffle with the poison only. On line 8271 the corresponding shuffle…
		RKSimonUnsubmitted Not Done Reply Inline Actions OK - please add a comment explaining that. RKSimon: OK - please add a comment explaining that.
		if (I != Pos)
		Gathered[I] = PoisonValue::get(Gathered[I]->getType());
		});
		} else {
		// Replace undefs by the poisons, emit broadcast and then emit
		// freeze.
		for_each(UndefPos, [&](int I) {
		ReuseMask[I] = UndefMaskElem;
		if (isa<UndefValue>(Gathered[I]))
		Gathered[I] = PoisonValue::get(Gathered[I]->getType());
		});
		NeedFreeze = true;
}		}
ReuseShuffleIndicies.clear();
UniqueValues.clear();
UniqueValues.append(VL.begin(), std::next(VL.begin(), NumValues));
}		}
UniqueValues.append(VF - UniqueValues.size(),		ShuffleBuilder.addMask(ReuseMask);
PoisonValue::get(VL[0]->getType()));		} else {
VL = UniqueValues;		copy(VL, Gathered.begin());
}		}
		// Gather unique scalars and all constants.
ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleExtractSeq,		Value *Vec = gather(Gathered);
CSEBlocks);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
Value *Vec = gather(VL);
if (!ReuseShuffleIndicies.empty()) {
ShuffleBuilder.addMask(ReuseShuffleIndicies);
Vec = ShuffleBuilder.finalize(Vec);		Vec = ShuffleBuilder.finalize(Vec);
}		if (NeedFreeze)
		Vec = Builder.CreateFreeze(Vec);
return Vec;		return Vec;
}		}

Value BoUpSLP::vectorizeTree(TreeEntry E) {		Value BoUpSLP::vectorizeTree(TreeEntry E) {
IRBuilder<>::InsertPointGuard Guard(Builder);		IRBuilder<>::InsertPointGuard Guard(Builder);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *E->Scalars[0] << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();		bool NeedToShuffleReuses = !E->ReuseShuffleIndices.empty();
unsigned VF = E->getVectorFactor();		unsigned VF = E->getVectorFactor();
ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleExtractSeq,		ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleExtractSeq,
CSEBlocks);		CSEBlocks);
if (E->State == TreeEntry::NeedToGather) {		if (E->State == TreeEntry::NeedToGather) {
		if (E->Idx > 0) {
		// We are in the middle of a vectorizable chain. We need to gather the
		// scalars from the users.
		Value *Vec = createBuildVector(E);
		E->VectorizedValue = Vec;
		return Vec;
		}
if (E->getMainOp())		if (E->getMainOp())
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
Value *Vec;		Value *Vec;
SmallVector<int> Mask;		SmallVector<int> Mask;
SmallVector<const TreeEntry *> Entries;		SmallVector<const TreeEntry *> Entries;
Optional<TargetTransformInfo::ShuffleKind> Shuffle =		Optional<TargetTransformInfo::ShuffleKind> Shuffle =
isGatherShuffledEntry(E, Mask, Entries);		isGatherShuffledEntry(E, Mask, Entries);
if (Shuffle) {		if (Shuffle) {
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	case Instruction::PHI: {

if (!VisitedBBs.insert(IBB).second) {		if (!VisitedBBs.insert(IBB).second) {
NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);		NewPhi->addIncoming(NewPhi->getIncomingValueForBlock(IBB), IBB);
continue;		continue;
}		}

Builder.SetInsertPoint(IBB->getTerminator());		Builder.SetInsertPoint(IBB->getTerminator());
Builder.SetCurrentDebugLocation(PH->getDebugLoc());		Builder.SetCurrentDebugLocation(PH->getDebugLoc());
Value *Vec = vectorizeTree(E->getOperand(i));		Value *Vec = vectorizeOperand(E, i);
NewPhi->addIncoming(Vec, IBB);		NewPhi->addIncoming(Vec, IBB);
}		}

assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&		assert(NewPhi->getNumIncomingValues() == PH->getNumIncomingValues() &&
"Invalid number of incoming values");		"Invalid number of incoming values");
return V;		return V;
}		}

Show All 17 Lines	case Instruction::ExtractValue: {
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
NewV = ShuffleBuilder.finalize(NewV);		NewV = ShuffleBuilder.finalize(NewV);
E->VectorizedValue = NewV;		E->VectorizedValue = NewV;
return NewV;		return NewV;
}		}
case Instruction::InsertElement: {		case Instruction::InsertElement: {
assert(E->ReuseShuffleIndices.empty() && "All inserts should be unique");		assert(E->ReuseShuffleIndices.empty() && "All inserts should be unique");
Builder.SetInsertPoint(cast<Instruction>(E->Scalars.back()));		Builder.SetInsertPoint(cast<Instruction>(E->Scalars.back()));
Value *V = vectorizeTree(E->getOperand(1));		Value *V = vectorizeOperand(E, 1);

// Create InsertVector shuffle if necessary		// Create InsertVector shuffle if necessary
auto FirstInsert = cast<Instruction>(find_if(E->Scalars, [E](Value *V) {		auto FirstInsert = cast<Instruction>(find_if(E->Scalars, [E](Value *V) {
return !is_contained(E->Scalars, cast<Instruction>(V)->getOperand(0));		return !is_contained(E->Scalars, cast<Instruction>(V)->getOperand(0));
}));		}));
const unsigned NumElts =		const unsigned NumElts =
cast<FixedVectorType>(FirstInsert->getType())->getNumElements();		cast<FixedVectorType>(FirstInsert->getType())->getNumElements();
const unsigned NumScalars = E->Scalars.size();		const unsigned NumScalars = E->Scalars.size();
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	switch (ShuffleOrOp) {
case Instruction::IntToPtr:		case Instruction::IntToPtr:
case Instruction::SIToFP:		case Instruction::SIToFP:
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *InVec = vectorizeTree(E->getOperand(0));		Value *InVec = vectorizeOperand(E, 0);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

auto *CI = cast<CastInst>(VL0);		auto *CI = cast<CastInst>(VL0);
Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);		Value *V = Builder.CreateCast(CI->getOpcode(), InVec, VecTy);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FCmp:		case Instruction::FCmp:
case Instruction::ICmp: {		case Instruction::ICmp: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *L = vectorizeTree(E->getOperand(0));		Value *L = vectorizeOperand(E, 0);
Value *R = vectorizeTree(E->getOperand(1));		Value *R = vectorizeOperand(E, 1);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Value *V = Builder.CreateCmp(P0, L, R);		Value *V = Builder.CreateCmp(P0, L, R);
propagateIRFlags(V, E->Scalars, VL0);		propagateIRFlags(V, E->Scalars, VL0);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Select: {		case Instruction::Select: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Cond = vectorizeTree(E->getOperand(0));		Value *Cond = vectorizeOperand(E, 0);
Value *True = vectorizeTree(E->getOperand(1));		Value *True = vectorizeOperand(E, 1);
Value *False = vectorizeTree(E->getOperand(2));		Value *False = vectorizeOperand(E, 2);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateSelect(Cond, True, False);		Value *V = Builder.CreateSelect(Cond, True, False);
ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);

E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op = vectorizeTree(E->getOperand(0));		Value *Op = vectorizeOperand(E, 0);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateUnOp(		Value *V = Builder.CreateUnOp(
static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);		static_cast<Instruction::UnaryOps>(E->getOpcode()), Op);
Show All 25 Lines	switch (ShuffleOrOp) {
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *LHS = vectorizeTree(E->getOperand(0));		Value *LHS = vectorizeOperand(E, 0);
Value *RHS = vectorizeTree(E->getOperand(1));		Value *RHS = vectorizeOperand(E, 1);

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value *V = Builder.CreateBinOp(		Value *V = Builder.CreateBinOp(
static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,		static_cast<Instruction::BinaryOps>(E->getOpcode()), LHS,
Show All 30 Lines	case Instruction::Load: {
if (TreeEntry *Entry = getTreeEntry(PO)) {		if (TreeEntry *Entry = getTreeEntry(PO)) {
// Find which lane we need to extract.		// Find which lane we need to extract.
unsigned FoundLane = Entry->findLaneForValue(PO);		unsigned FoundLane = Entry->findLaneForValue(PO);
ExternalUses.emplace_back(		ExternalUses.emplace_back(
PO, PO != VecPtr ? cast<User>(VecPtr) : NewLI, FoundLane);		PO, PO != VecPtr ? cast<User>(VecPtr) : NewLI, FoundLane);
}		}
} else {		} else {
assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");		assert(E->State == TreeEntry::ScatterVectorize && "Unhandled state");
Value *VecPtr = vectorizeTree(E->getOperand(0));		Value *VecPtr = vectorizeOperand(E, 0);
// Use the minimum alignment of the gathered loads.		// Use the minimum alignment of the gathered loads.
Align CommonAlignment = LI->getAlign();		Align CommonAlignment = LI->getAlign();
for (Value *V : E->Scalars)		for (Value *V : E->Scalars)
CommonAlignment =		CommonAlignment =
std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());		std::min(CommonAlignment, cast<LoadInst>(V)->getAlign());
NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);		NewLI = Builder.CreateMaskedGather(VecTy, VecPtr, CommonAlignment);
}		}
Value *V = propagateMetadata(NewLI, E->Scalars);		Value *V = propagateMetadata(NewLI, E->Scalars);

ShuffleBuilder.addInversedMask(E->ReorderIndices);		ShuffleBuilder.addInversedMask(E->ReorderIndices);
ShuffleBuilder.addMask(E->ReuseShuffleIndices);		ShuffleBuilder.addMask(E->ReuseShuffleIndices);
V = ShuffleBuilder.finalize(V);		V = ShuffleBuilder.finalize(V);
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::Store: {		case Instruction::Store: {
auto *SI = cast<StoreInst>(VL0);		auto *SI = cast<StoreInst>(VL0);
unsigned AS = SI->getPointerAddressSpace();		unsigned AS = SI->getPointerAddressSpace();

setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *VecValue = vectorizeTree(E->getOperand(0));		Value *VecValue = vectorizeOperand(E, 0);
ShuffleBuilder.addMask(E->ReorderIndices);		ShuffleBuilder.addMask(E->ReorderIndices);
VecValue = ShuffleBuilder.finalize(VecValue);		VecValue = ShuffleBuilder.finalize(VecValue);

Value *ScalarPtr = SI->getPointerOperand();		Value *ScalarPtr = SI->getPointerOperand();
Value *VecPtr = Builder.CreateBitCast(		Value *VecPtr = Builder.CreateBitCast(
ScalarPtr, VecValue->getType()->getPointerTo(AS));		ScalarPtr, VecValue->getType()->getPointerTo(AS));
StoreInst *ST =		StoreInst *ST =
Builder.CreateAlignedStore(VecValue, VecPtr, SI->getAlign());		Builder.CreateAlignedStore(VecValue, VecPtr, SI->getAlign());
Show All 14 Lines	case Instruction::Store: {
E->VectorizedValue = V;		E->VectorizedValue = V;
++NumVectorInstructions;		++NumVectorInstructions;
return V;		return V;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
auto *GEP0 = cast<GetElementPtrInst>(VL0);		auto *GEP0 = cast<GetElementPtrInst>(VL0);
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);

Value *Op0 = vectorizeTree(E->getOperand(0));		Value *Op0 = vectorizeOperand(E, 0);

SmallVector<Value *> OpVecs;		SmallVector<Value *> OpVecs;
for (int J = 1, N = GEP0->getNumOperands(); J < N; ++J) {		for (int J = 1, N = GEP0->getNumOperands(); J < N; ++J) {
Value *OpVec = vectorizeTree(E->getOperand(J));		Value *OpVec = vectorizeOperand(E, J);
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
}		}

Value *V = Builder.CreateGEP(GEP0->getSourceElementType(), Op0, OpVecs);		Value *V = Builder.CreateGEP(GEP0->getSourceElementType(), Op0, OpVecs);
if (Instruction *I = dyn_cast<GetElementPtrInst>(V)) {		if (Instruction *I = dyn_cast<GetElementPtrInst>(V)) {
SmallVector<Value *> GEPs;		SmallVector<Value *> GEPs;
for (Value *V : E->Scalars) {		for (Value *V : E->Scalars) {
if (isa<GetElementPtrInst>(V))		if (isa<GetElementPtrInst>(V))
Show All 37 Lines	case Instruction::Call: {
CallInst *CEI = cast<CallInst>(VL0);		CallInst *CEI = cast<CallInst>(VL0);
ScalarArg = CEI->getArgOperand(j);		ScalarArg = CEI->getArgOperand(j);
OpVecs.push_back(CEI->getArgOperand(j));		OpVecs.push_back(CEI->getArgOperand(j));
if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))		if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))
TysForDecl.push_back(ScalarArg->getType());		TysForDecl.push_back(ScalarArg->getType());
continue;		continue;
}		}

Value *OpVec = vectorizeTree(E->getOperand(j));		Value *OpVec = vectorizeOperand(E, j);
LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");		LLVM_DEBUG(dbgs() << "SLP: OpVec[" << j << "]: " << *OpVec << "\n");
OpVecs.push_back(OpVec);		OpVecs.push_back(OpVec);
if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))		if (isVectorIntrinsicWithOverloadTypeAtArg(IID, j))
TysForDecl.push_back(OpVec->getType());		TysForDecl.push_back(OpVec->getType());
}		}

Function *CF;		Function *CF;
if (!UseIntrinsic) {		if (!UseIntrinsic) {
Show All 38 Lines	case Instruction::ShuffleVector: {
(Instruction::isCast(E->getOpcode()) &&		(Instruction::isCast(E->getOpcode()) &&
Instruction::isCast(E->getAltOpcode())) \|\|		Instruction::isCast(E->getAltOpcode())) \|\|
(isa<CmpInst>(VL0) && isa<CmpInst>(E->getAltOp()))) &&		(isa<CmpInst>(VL0) && isa<CmpInst>(E->getAltOp()))) &&
"Invalid Shuffle Vector Operand");		"Invalid Shuffle Vector Operand");

Value LHS = nullptr, RHS = nullptr;		Value LHS = nullptr, RHS = nullptr;
if (Instruction::isBinaryOp(E->getOpcode()) \|\| isa<CmpInst>(VL0)) {		if (Instruction::isBinaryOp(E->getOpcode()) \|\| isa<CmpInst>(VL0)) {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeOperand(E, 0);
RHS = vectorizeTree(E->getOperand(1));		RHS = vectorizeOperand(E, 1);
} else {		} else {
setInsertPointAfterBundle(E);		setInsertPointAfterBundle(E);
LHS = vectorizeTree(E->getOperand(0));		LHS = vectorizeOperand(E, 0);
}		}

if (E->VectorizedValue) {		if (E->VectorizedValue) {
LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Diamond merged for " << *VL0 << ".\n");
return E->VectorizedValue;		return E->VectorizedValue;
}		}

Value V0, V1;		Value V0, V1;
▲ Show 20 Lines • Show All 4,254 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

	Show All 19 Lines
	; CHECK-NEXT: [[ARRAYIDX30_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 2			; CHECK-NEXT: [[ARRAYIDX30_I:%.]] = getelementptr inbounds [4 x double], [4 x double] [[B]], i64 1, i64 2
	; CHECK-NEXT: [[ARRAYIDX47_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 0			; CHECK-NEXT: [[ARRAYIDX47_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 0
	; CHECK-NEXT: [[TEMP10:%.]] = load double, double [[ARRAYIDX47_I]], align 8			; CHECK-NEXT: [[TEMP10:%.]] = load double, double [[ARRAYIDX47_I]], align 8
	; CHECK-NEXT: [[ARRAYIDX52_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 1			; CHECK-NEXT: [[ARRAYIDX52_I:%.]] = getelementptr inbounds [2 x double], [2 x double] [[A]], i64 1, i64 1
	; CHECK-NEXT: [[TEMP11:%.]] = load double, double [[ARRAYIDX52_I]], align 8			; CHECK-NEXT: [[TEMP11:%.]] = load double, double [[ARRAYIDX52_I]], align 8
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[ARRAYIDX3_I]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[ARRAYIDX3_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[TEMP]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[TEMP]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[TEMP]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARRAYIDX7_I]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX7_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[TEMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[TEMP2]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[TEMP2]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP9]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[SHUFFLE1]], [[TMP6]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP5]], [[TMP10]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP4]], [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[OUT:%.]] to <2 x double>			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[OUT:%.]] to <2 x double>
	; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.]] = getelementptr inbounds double, double [[OUT]], i64 2			; CHECK-NEXT: [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4:%.]] = getelementptr inbounds double, double [[OUT]], i64 2
	; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[ARRAYIDX25_I]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[ARRAYIDX25_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP14:%.]] = load <2 x double>, <2 x double> [[TMP13]], align 8			; CHECK-NEXT: [[TMP12:%.]] = load <2 x double>, <2 x double> [[TMP11]], align 8
	; CHECK-NEXT: [[TMP15:%.*]] = fmul <2 x double> [[TMP4]], [[TMP14]]			; CHECK-NEXT: [[TMP13:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP12]]
	; CHECK-NEXT: [[TMP16:%.]] = bitcast double [[ARRAYIDX30_I]] to <2 x double>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[ARRAYIDX30_I]] to <2 x double>*
	; CHECK-NEXT: [[TMP17:%.]] = load <2 x double>, <2 x double> [[TMP16]], align 8			; CHECK-NEXT: [[TMP15:%.]] = load <2 x double>, <2 x double> [[TMP14]], align 8
	; CHECK-NEXT: [[TMP18:%.*]] = fmul <2 x double> [[TMP9]], [[TMP17]]			; CHECK-NEXT: [[TMP16:%.*]] = fmul <2 x double> [[SHUFFLE1]], [[TMP15]]
	; CHECK-NEXT: [[TMP19:%.*]] = fadd <2 x double> [[TMP15]], [[TMP18]]			; CHECK-NEXT: [[TMP17:%.*]] = fadd <2 x double> [[TMP13]], [[TMP16]]
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8			; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: [[TMP20:%.]] = bitcast double [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP18:%.]] = bitcast double [[RES_I_SROA_5_0_OUT2_I_SROA_IDX4]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP19]], <2 x double>* [[TMP20]], align 8			; CHECK-NEXT: store <2 x double> [[TMP17]], <2 x double>* [[TMP18]], align 8
	; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.]] = getelementptr inbounds double, double [[OUT]], i64 4			; CHECK-NEXT: [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8:%.]] = getelementptr inbounds double, double [[OUT]], i64 4
	; CHECK-NEXT: [[TMP21:%.*]] = insertelement <2 x double> poison, double [[TEMP10]], i32 0			; CHECK-NEXT: [[TMP19:%.*]] = insertelement <2 x double> poison, double [[TEMP10]], i32 0
	; CHECK-NEXT: [[TMP22:%.*]] = insertelement <2 x double> [[TMP21]], double [[TEMP10]], i32 1			; CHECK-NEXT: [[SHUFFLE4:%.*]] = shufflevector <2 x double> [[TMP19]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP23:%.*]] = fmul <2 x double> [[TMP2]], [[TMP22]]			; CHECK-NEXT: [[TMP20:%.*]] = fmul <2 x double> [[TMP2]], [[SHUFFLE4]]
	; CHECK-NEXT: [[TMP24:%.*]] = insertelement <2 x double> poison, double [[TEMP11]], i32 0			; CHECK-NEXT: [[TMP21:%.*]] = insertelement <2 x double> poison, double [[TEMP11]], i32 0
	; CHECK-NEXT: [[TMP25:%.*]] = insertelement <2 x double> [[TMP24]], double [[TEMP11]], i32 1			; CHECK-NEXT: [[SHUFFLE5:%.*]] = shufflevector <2 x double> [[TMP21]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP26:%.*]] = fmul <2 x double> [[TMP7]], [[TMP25]]			; CHECK-NEXT: [[TMP22:%.*]] = fmul <2 x double> [[TMP6]], [[SHUFFLE5]]
	; CHECK-NEXT: [[TMP27:%.*]] = fadd <2 x double> [[TMP23]], [[TMP26]]			; CHECK-NEXT: [[TMP23:%.*]] = fadd <2 x double> [[TMP20]], [[TMP22]]
	; CHECK-NEXT: [[TMP28:%.]] = bitcast double [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]] to <2 x double>*			; CHECK-NEXT: [[TMP24:%.]] = bitcast double [[RES_I_SROA_7_0_OUT2_I_SROA_IDX8]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP27]], <2 x double>* [[TMP28]], align 8			; CHECK-NEXT: store <2 x double> [[TMP23]], <2 x double>* [[TMP24]], align 8
	; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.]] = getelementptr inbounds double, double [[OUT]], i64 6			; CHECK-NEXT: [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12:%.]] = getelementptr inbounds double, double [[OUT]], i64 6
	; CHECK-NEXT: [[TMP29:%.*]] = fmul <2 x double> [[TMP14]], [[TMP22]]			; CHECK-NEXT: [[TMP25:%.*]] = fmul <2 x double> [[TMP12]], [[SHUFFLE4]]
	; CHECK-NEXT: [[TMP30:%.*]] = fmul <2 x double> [[TMP17]], [[TMP25]]			; CHECK-NEXT: [[TMP26:%.*]] = fmul <2 x double> [[TMP15]], [[SHUFFLE5]]
	; CHECK-NEXT: [[TMP31:%.*]] = fadd <2 x double> [[TMP29]], [[TMP30]]			; CHECK-NEXT: [[TMP27:%.*]] = fadd <2 x double> [[TMP25]], [[TMP26]]
	; CHECK-NEXT: [[TMP32:%.]] = bitcast double [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]] to <2 x double>*			; CHECK-NEXT: [[TMP28:%.]] = bitcast double [[RES_I_SROA_9_0_OUT2_I_SROA_IDX12]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP31]], <2 x double>* [[TMP32]], align 8			; CHECK-NEXT: store <2 x double> [[TMP27]], <2 x double>* [[TMP28]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%arrayidx1.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 0			%arrayidx1.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 0
	%temp = load double, double* %arrayidx1.i, align 8			%temp = load double, double* %arrayidx1.i, align 8
	%arrayidx3.i = getelementptr inbounds [4 x double], [4 x double]* %B, i64 0, i64 0			%arrayidx3.i = getelementptr inbounds [4 x double], [4 x double]* %B, i64 0, i64 0
	%temp1 = load double, double* %arrayidx3.i, align 8			%temp1 = load double, double* %arrayidx3.i, align 8
	%mul.i = fmul double %temp, %temp1			%mul.i = fmul double %temp, %temp1
	%arrayidx5.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 1			%arrayidx5.i = getelementptr inbounds [2 x double], [2 x double]* %A, i64 0, i64 1
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -mtriple=arm64-apple-ios -S %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -mtriple=arm64-apple-ios -S %s \| FileCheck %s

	; Test case where not vectorizing is more profitable because multiple			; Test case where not vectorizing is more profitable because multiple
	; fmul/{fadd,fsub} pairs can be lowered to fma instructions.			; fmul/{fadd,fsub} pairs can be lowered to fma instructions.
	define void @slp_not_profitable_with_fast_fmf(ptr %A, ptr %B) {			define void @slp_not_profitable_with_fast_fmf(ptr %A, ptr %B) {
	; CHECK-LABEL: @slp_not_profitable_with_fast_fmf(			; CHECK-LABEL: @slp_not_profitable_with_fast_fmf(
	; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1			; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
	; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4			; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4			; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[SHUFFLE1]], [[TMP1]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <2 x float> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP6:%.*]] = fsub fast <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4			; CHECK-NEXT: store <2 x float> [[TMP8]], ptr [[A]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4			; CHECK-NEXT: store float [[TMP9]], ptr [[B]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1			%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
	%A.0 = load float, ptr %A, align 4			%A.0 = load float, ptr %A, align 4
	%B.1 = load float, ptr %gep.B.1, align 4			%B.1 = load float, ptr %gep.B.1, align 4
	%mul.0 = fmul fast float %B.1, %A.0			%mul.0 = fmul fast float %B.1, %A.0
	%B.0 = load float, ptr %B, align 4			%B.0 = load float, ptr %B, align 4
	%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2			%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
	Show All 12 Lines

	define void @slp_not_profitable_with_reassoc_fmf(ptr %A, ptr %B) {			define void @slp_not_profitable_with_reassoc_fmf(ptr %A, ptr %B) {
	; CHECK-LABEL: @slp_not_profitable_with_reassoc_fmf(			; CHECK-LABEL: @slp_not_profitable_with_reassoc_fmf(
	; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1			; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
	; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4			; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4			; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[SHUFFLE1]], [[TMP1]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul reassoc <2 x float> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul reassoc <2 x float> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = fsub reassoc <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP6:%.*]] = fsub reassoc <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd reassoc <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd reassoc <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4			; CHECK-NEXT: store <2 x float> [[TMP8]], ptr [[A]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4			; CHECK-NEXT: store float [[TMP9]], ptr [[B]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1			%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
	%A.0 = load float, ptr %A, align 4			%A.0 = load float, ptr %A, align 4
	%B.1 = load float, ptr %gep.B.1, align 4			%B.1 = load float, ptr %gep.B.1, align 4
	%mul.0 = fmul reassoc float %B.1, %A.0			%mul.0 = fmul reassoc float %B.1, %A.0
	%B.0 = load float, ptr %B, align 4			%B.0 = load float, ptr %B, align 4
	%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2			%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
	Show All 13 Lines
	; FMA cannot be used due to missing fast-math flags, so SLP should kick in.			; FMA cannot be used due to missing fast-math flags, so SLP should kick in.
	define void @slp_profitable_missing_fmf_on_fadd_fsub(ptr %A, ptr %B) {			define void @slp_profitable_missing_fmf_on_fadd_fsub(ptr %A, ptr %B) {
	; CHECK-LABEL: @slp_profitable_missing_fmf_on_fadd_fsub(			; CHECK-LABEL: @slp_profitable_missing_fmf_on_fadd_fsub(
	; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1			; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
	; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4			; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4			; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[SHUFFLE1]], [[TMP1]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <2 x float> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4			; CHECK-NEXT: store <2 x float> [[TMP8]], ptr [[A]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4			; CHECK-NEXT: store float [[TMP9]], ptr [[B]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1			%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
	%A.0 = load float, ptr %A, align 4			%A.0 = load float, ptr %A, align 4
	%B.1 = load float, ptr %gep.B.1, align 4			%B.1 = load float, ptr %gep.B.1, align 4
	%mul.0 = fmul fast float %B.1, %A.0			%mul.0 = fmul fast float %B.1, %A.0
	%B.0 = load float, ptr %B, align 4			%B.0 = load float, ptr %B, align 4
	%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2			%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
	Show All 13 Lines
	; FMA cannot be used due to missing fast-math flags, so SLP should kick in.			; FMA cannot be used due to missing fast-math flags, so SLP should kick in.
	define void @slp_profitable_missing_fmf_on_fmul_fadd_fsub(ptr %A, ptr %B) {			define void @slp_profitable_missing_fmf_on_fmul_fadd_fsub(ptr %A, ptr %B) {
	; CHECK-LABEL: @slp_profitable_missing_fmf_on_fmul_fadd_fsub(			; CHECK-LABEL: @slp_profitable_missing_fmf_on_fmul_fadd_fsub(
	; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1			; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
	; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4			; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4			; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[SHUFFLE1]], [[TMP1]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x float> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = fsub <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP6:%.*]] = fsub <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4			; CHECK-NEXT: store <2 x float> [[TMP8]], ptr [[A]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4			; CHECK-NEXT: store float [[TMP9]], ptr [[B]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1			%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
	%A.0 = load float, ptr %A, align 4			%A.0 = load float, ptr %A, align 4
	%B.1 = load float, ptr %gep.B.1, align 4			%B.1 = load float, ptr %gep.B.1, align 4
	%mul.0 = fmul float %B.1, %A.0			%mul.0 = fmul float %B.1, %A.0
	%B.0 = load float, ptr %B, align 4			%B.0 = load float, ptr %B, align 4
	%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2			%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
	Show All 13 Lines
	; FMA cannot be used due to missing fast-math flags, so SLP should kick in.			; FMA cannot be used due to missing fast-math flags, so SLP should kick in.
	define void @slp_profitable_missing_fmf_nnans_only(ptr %A, ptr %B) {			define void @slp_profitable_missing_fmf_nnans_only(ptr %A, ptr %B) {
	; CHECK-LABEL: @slp_profitable_missing_fmf_nnans_only(			; CHECK-LABEL: @slp_profitable_missing_fmf_nnans_only(
	; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1			; CHECK-NEXT: [[GEP_B_1:%.]] = getelementptr inbounds float, ptr [[B:%.]], i64 1
	; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4			; CHECK-NEXT: [[A_0:%.]] = load float, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4			; CHECK-NEXT: [[B_0:%.*]] = load float, ptr [[B]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4			; CHECK-NEXT: [[TMP1:%.*]] = load <2 x float>, ptr [[GEP_B_1]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[B_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[B_0]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul nnan <2 x float> [[TMP3]], [[TMP1]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul nnan <2 x float> [[SHUFFLE1]], [[TMP1]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[A_0]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[A_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul nnan <2 x float> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul nnan <2 x float> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = fsub nnan <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP6:%.*]] = fsub nnan <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd nnan <2 x float> [[TMP7]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd nnan <2 x float> [[TMP5]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[A]], align 4			; CHECK-NEXT: store <2 x float> [[TMP8]], ptr [[A]], align 4
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP1]], i32 1
	; CHECK-NEXT: store float [[TMP11]], ptr [[B]], align 4			; CHECK-NEXT: store float [[TMP9]], ptr [[B]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1			%gep.B.1 = getelementptr inbounds float, ptr %B, i64 1
	%A.0 = load float, ptr %A, align 4			%A.0 = load float, ptr %A, align 4
	%B.1 = load float, ptr %gep.B.1, align 4			%B.1 = load float, ptr %gep.B.1, align 4
	%mul.0 = fmul nnan float %B.1, %A.0			%mul.0 = fmul nnan float %B.1, %A.0
	%B.0 = load float, ptr %B, align 4			%B.0 = load float, ptr %B, align 4
	%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2			%gep.B.2 = getelementptr inbounds float, ptr %B, i64 2
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
	}			}

	define void @slp_profitable(ptr %A, ptr %B, float %0) {			define void @slp_profitable(ptr %A, ptr %B, float %0) {
	; CHECK-LABEL: @slp_profitable(			; CHECK-LABEL: @slp_profitable(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[SUB_I1096:%.]] = fsub fast float 1.000000e+00, [[TMP0:%.]]			; CHECK-NEXT: [[SUB_I1096:%.]] = fsub fast float 1.000000e+00, [[TMP0:%.]]
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, ptr [[A:%.]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, ptr [[A:%.]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul fast <2 x float> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[SUB_I1096]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[SUB_I1096]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[SUB_I1096]], i32 1			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x float> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <2 x float> [[TMP1]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd fast <2 x float> [[SHUFFLE]], [[TMP7]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[SHUFFLE1]], [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x float> [[SHUFFLE]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = fsub fast <2 x float> [[SHUFFLE1]], [[TMP5]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> [[TMP7]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP10]], ptr [[B:%.*]], align 4			; CHECK-NEXT: store <2 x float> [[TMP8]], ptr [[B:%.*]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep.A.1 = getelementptr inbounds float, ptr %A, i64 1			%gep.A.1 = getelementptr inbounds float, ptr %A, i64 1
	%sub.i1096 = fsub fast float 1.000000e+00, %0			%sub.i1096 = fsub fast float 1.000000e+00, %0
	%1 = load float, ptr %A, align 4			%1 = load float, ptr %A, align 4
	%mul.i1100 = fmul fast float %1, %sub.i1096			%mul.i1100 = fmul fast float %1, %sub.i1096
	%2 = load float, ptr %gep.A.1, align 4			%2 = load float, ptr %gep.A.1, align 4
	Show All 10 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/splat-loads.ll

	Show All 10 Lines
	; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds double, double [[ARRAY1:%.*]], i64 0			; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds double, double [[ARRAY1:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds double, double [[ARRAY2:%.*]], i64 0			; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds double, double [[ARRAY2:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, double [[ARRAY2]], i64 1			; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, double [[ARRAY2]], i64 1
	; CHECK-NEXT: [[LD_2_0:%.]] = load double, double [[GEP_2_0]], align 8			; CHECK-NEXT: [[LD_2_0:%.]] = load double, double [[GEP_2_0]], align 8
	; CHECK-NEXT: [[LD_2_1:%.]] = load double, double [[GEP_2_1]], align 8			; CHECK-NEXT: [[LD_2_1:%.]] = load double, double [[GEP_2_1]], align 8
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[LD_2_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[LD_2_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[LD_2_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[LD_2_1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[LD_2_1]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[LD_2_1]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x double> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP9]], align 8			; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep_1_0 = getelementptr inbounds double, double* %array1, i64 0			%gep_1_0 = getelementptr inbounds double, double* %array1, i64 0
	%gep_1_1 = getelementptr inbounds double, double* %array1, i64 1			%gep_1_1 = getelementptr inbounds double, double* %array1, i64 1
	%ld_1_0 = load double, double* %gep_1_0, align 8			%ld_1_0 = load double, double* %gep_1_0, align 8
	%ld_1_1 = load double, double* %gep_1_1, align 8			%ld_1_1 = load double, double* %gep_1_1, align 8

	Show All 23 Lines
	; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds float, float [[ARRAY1:%.*]], i64 0			; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds float, float [[ARRAY1:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds float, float [[ARRAY2:%.*]], i64 0			; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds float, float [[ARRAY2:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds float, float [[ARRAY2]], i64 1			; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds float, float [[ARRAY2]], i64 1
	; CHECK-NEXT: [[LD_2_0:%.]] = load float, float [[GEP_2_0]], align 8			; CHECK-NEXT: [[LD_2_0:%.]] = load float, float [[GEP_2_0]], align 8
	; CHECK-NEXT: [[LD_2_1:%.]] = load float, float [[GEP_2_1]], align 8			; CHECK-NEXT: [[LD_2_1:%.]] = load float, float [[GEP_2_1]], align 8
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[GEP_1_0]] to <2 x float>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[GEP_1_0]] to <2 x float>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[LD_2_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[LD_2_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[LD_2_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[LD_2_1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[LD_2_1]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[LD_2_1]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fmul <2 x float> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP1]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x float> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast float [[GEP_1_0]] to <2 x float>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast float [[GEP_1_0]] to <2 x float>*
	; CHECK-NEXT: store <2 x float> [[TMP8]], <2 x float>* [[TMP9]], align 4			; CHECK-NEXT: store <2 x float> [[TMP6]], <2 x float>* [[TMP7]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep_1_0 = getelementptr inbounds float, float* %array1, i64 0			%gep_1_0 = getelementptr inbounds float, float* %array1, i64 0
	%gep_1_1 = getelementptr inbounds float, float* %array1, i64 1			%gep_1_1 = getelementptr inbounds float, float* %array1, i64 1
	%ld_1_0 = load float, float* %gep_1_0, align 8			%ld_1_0 = load float, float* %gep_1_0, align 8
	%ld_1_1 = load float, float* %gep_1_1, align 8			%ld_1_1 = load float, float* %gep_1_1, align 8

	Show All 23 Lines
	; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds i64, i64 [[ARRAY1:%.*]], i64 0			; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds i64, i64 [[ARRAY1:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds i64, i64 [[ARRAY2:%.*]], i64 0			; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds i64, i64 [[ARRAY2:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds i64, i64 [[ARRAY2]], i64 1			; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds i64, i64 [[ARRAY2]], i64 1
	; CHECK-NEXT: [[LD_2_0:%.]] = load i64, i64 [[GEP_2_0]], align 8			; CHECK-NEXT: [[LD_2_0:%.]] = load i64, i64 [[GEP_2_0]], align 8
	; CHECK-NEXT: [[LD_2_1:%.]] = load i64, i64 [[GEP_2_1]], align 8			; CHECK-NEXT: [[LD_2_1:%.]] = load i64, i64 [[GEP_2_1]], align 8
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[GEP_1_0]] to <2 x i64>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[GEP_1_0]] to <2 x i64>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> poison, i64 [[LD_2_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i64> poison, i64 [[LD_2_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i64> [[TMP2]], i64 [[LD_2_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = or <2 x i64> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = or <2 x i64> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i64> poison, i64 [[LD_2_1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> poison, i64 [[LD_2_1]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[LD_2_1]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = or <2 x i64> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = or <2 x i64> [[TMP1]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i64> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i64> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i64 [[GEP_1_0]] to <2 x i64>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i64 [[GEP_1_0]] to <2 x i64>*
	; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* [[TMP9]], align 4			; CHECK-NEXT: store <2 x i64> [[TMP6]], <2 x i64>* [[TMP7]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep_1_0 = getelementptr inbounds i64, i64* %array1, i64 0			%gep_1_0 = getelementptr inbounds i64, i64* %array1, i64 0
	%gep_1_1 = getelementptr inbounds i64, i64* %array1, i64 1			%gep_1_1 = getelementptr inbounds i64, i64* %array1, i64 1
	%ld_1_0 = load i64, i64* %gep_1_0, align 8			%ld_1_0 = load i64, i64* %gep_1_0, align 8
	%ld_1_1 = load i64, i64* %gep_1_1, align 8			%ld_1_1 = load i64, i64* %gep_1_1, align 8

	Show All 23 Lines
	; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds i32, i32 [[ARRAY1:%.*]], i64 0			; CHECK-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds i32, i32 [[ARRAY1:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds i32, i32 [[ARRAY2:%.*]], i64 0			; CHECK-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds i32, i32 [[ARRAY2:%.*]], i64 0
	; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds i32, i32 [[ARRAY2]], i64 1			; CHECK-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds i32, i32 [[ARRAY2]], i64 1
	; CHECK-NEXT: [[LD_2_0:%.]] = load i32, i32 [[GEP_2_0]], align 8			; CHECK-NEXT: [[LD_2_0:%.]] = load i32, i32 [[GEP_2_0]], align 8
	; CHECK-NEXT: [[LD_2_1:%.]] = load i32, i32 [[GEP_2_1]], align 8			; CHECK-NEXT: [[LD_2_1:%.]] = load i32, i32 [[GEP_2_1]], align 8
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[GEP_1_0]] to <2 x i32>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[GEP_1_0]] to <2 x i32>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> poison, i32 [[LD_2_0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> poison, i32 [[LD_2_0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> [[TMP2]], i32 [[LD_2_0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = or <2 x i32> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = or <2 x i32> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x i32> poison, i32 [[LD_2_1]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> poison, i32 [[LD_2_1]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> [[TMP5]], i32 [[LD_2_1]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = or <2 x i32> [[TMP1]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = or <2 x i32> [[TMP1]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i32> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[GEP_1_0]] to <2 x i32>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i32 [[GEP_1_0]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP8]], <2 x i32>* [[TMP9]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%gep_1_0 = getelementptr inbounds i32, i32* %array1, i64 0			%gep_1_0 = getelementptr inbounds i32, i32* %array1, i64 0
	%gep_1_1 = getelementptr inbounds i32, i32* %array1, i64 1			%gep_1_1 = getelementptr inbounds i32, i32* %array1, i64 1
	%ld_1_0 = load i32, i32* %gep_1_0, align 8			%ld_1_0 = load i32, i32* %gep_1_0, align 8
	%ld_1_1 = load i32, i32* %gep_1_1, align 8			%ld_1_1 = load i32, i32* %gep_1_1, align 8

	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

	Show All 13 Lines
	; contiguous. The score estimation needs to be corrected, so that these 4 loads			; contiguous. The score estimation needs to be corrected, so that these 4 loads
	; are not selected for vectorization. Instead we should vectorize with			; are not selected for vectorization. Instead we should vectorize with
	; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.			; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.

	define void @s116_modified(float* %a) {			define void @s116_modified(float* %a) {
	; CHECK-LABEL: @s116_modified(			; CHECK-LABEL: @s116_modified(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[A]], i64 2			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[A]], i64 3
	; CHECK-NEXT: [[GEP4:%.]] = getelementptr inbounds float, float [[A]], i64 4
	; CHECK-NEXT: [[LD1:%.]] = load float, float [[GEP1]], align 4
	; CHECK-NEXT: [[LD0:%.]] = load float, float [[GEP0]], align 4			; CHECK-NEXT: [[LD0:%.]] = load float, float [[GEP0]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP2]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[LD4:%.]] = load float, float [[GEP4]], align 4			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[GEP3]] to <2 x float>*
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float [[LD4]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 5, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> poison, float [[LD1]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float [[LD1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP4]], <4 x i32> <i32 0, i32 undef, i32 1, i32 2>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x float> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP10]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 2, i32 3>
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP11:%.*]] = fmul fast <4 x float> [[TMP9]], [[SHUFFLE]]
	; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP11]], align 4			; CHECK-NEXT: [[TMP12:%.]] = bitcast float [[GEP0]] to <4 x float>*
				; CHECK-NEXT: store <4 x float> [[TMP11]], <4 x float>* [[TMP12]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep0 = getelementptr inbounds float, float* %a, i64 0			%gep0 = getelementptr inbounds float, float* %a, i64 0
	%gep1 = getelementptr inbounds float, float* %a, i64 1			%gep1 = getelementptr inbounds float, float* %a, i64 1
	%gep2 = getelementptr inbounds float, float* %a, i64 2			%gep2 = getelementptr inbounds float, float* %a, i64 2
	%gep3 = getelementptr inbounds float, float* %a, i64 3			%gep3 = getelementptr inbounds float, float* %a, i64 3
	%gep4 = getelementptr inbounds float, float* %a, i64 4			%gep4 = getelementptr inbounds float, float* %a, i64 4
	%ld0 = load float, float* %gep0			%ld0 = load float, float* %gep0
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorizable-selects-uniform-cmps.ll

	Show First 20 Lines • Show All 512 Lines • ▼ Show 20 Lines

	define void @select_uniform_eq_2xi32(i32* %ptr, i32 %x) {			define void @select_uniform_eq_2xi32(i32* %ptr, i32 %x) {
	; CHECK-LABEL: @select_uniform_eq_2xi32(			; CHECK-LABEL: @select_uniform_eq_2xi32(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[PTR:%.]] to <2 x i32>			; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[PTR:%.]] to <2 x i32>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i32>, <2 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <2 x i32> [[TMP1]], <i32 16383, i32 16383>			; CHECK-NEXT: [[TMP2:%.*]] = icmp eq <2 x i32> [[TMP1]], <i32 16383, i32 16383>
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x i32> poison, i32 [[X:%.]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x i32> poison, i32 [[X:%.]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[X]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[TMP1]], <2 x i32> [[TMP4]]			; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP2]], <2 x i32> [[TMP1]], <2 x i32> [[SHUFFLE]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[PTR]] to <2 x i32>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[PTR]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 2			; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%l.0 = load i32, i32* %ptr			%l.0 = load i32, i32* %ptr
	%cmp.0 = icmp eq i32 %l.0, 16383			%cmp.0 = icmp eq i32 %l.0, 16383
	%s.0 = select i1 %cmp.0, i32 %l.0, i32 %x			%s.0 = select i1 %cmp.0, i32 %l.0, i32 %x
	store i32 %s.0, i32* %ptr, align 2			store i32 %s.0, i32* %ptr, align 2

	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

	define void @select_uniform_ne_2xi64(i64* %ptr, i64 %x) {			define void @select_uniform_ne_2xi64(i64* %ptr, i64 %x) {
	; CHECK-LABEL: @select_uniform_ne_2xi64(			; CHECK-LABEL: @select_uniform_ne_2xi64(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[PTR:%.]] to <2 x i64>			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[PTR:%.]] to <2 x i64>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i64> [[TMP1]], <i64 16383, i64 16383>			; CHECK-NEXT: [[TMP2:%.*]] = icmp ne <2 x i64> [[TMP1]], <i64 16383, i64 16383>
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x i64> poison, i64 [[X:%.]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x i64> poison, i64 [[X:%.]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i64> [[TMP3]], i64 [[X]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> [[TMP2]], <2 x i64> [[TMP1]], <2 x i64> [[TMP4]]			; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP2]], <2 x i64> [[TMP1]], <2 x i64> [[SHUFFLE]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[PTR]] to <2 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[PTR]] to <2 x i64>*
	; CHECK-NEXT: store <2 x i64> [[TMP5]], <2 x i64>* [[TMP6]], align 2			; CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP5]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%l.0 = load i64, i64* %ptr			%l.0 = load i64, i64* %ptr
	%cmp.0 = icmp ne i64 %l.0, 16383			%cmp.0 = icmp ne i64 %l.0, 16383
	%s.0 = select i1 %cmp.0, i64 %l.0, i64 %x			%s.0 = select i1 %cmp.0, i64 %l.0, i64 %x
	store i64 %s.0, i64* %ptr, align 2			store i64 %s.0, i64* %ptr, align 2

	%gep.1 = getelementptr inbounds i64, i64* %ptr, i64 1			%gep.1 = getelementptr inbounds i64, i64* %ptr, i64 1
	%l.1 = load i64, i64* %gep.1			%l.1 = load i64, i64* %gep.1
	%cmp.1 = icmp ne i64 %l.1, 16383			%cmp.1 = icmp ne i64 %l.1, 16383
	%s.1 = select i1 %cmp.1, i64 %l.1, i64 %x			%s.1 = select i1 %cmp.1, i64 %l.1, i64 %x
	store i64 %s.1, i64* %gep.1, align 2			store i64 %s.1, i64* %gep.1, align 2

	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <2 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8			; CHECK-NEXT: [[V_3:%.]] = load <2 x double>, <2 x double> [[PTR_3:%.*]], align 8
	; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1			; CHECK-NEXT: [[V3_LANE_1:%.*]] = extractelement <2 x double> [[V_3]], i32 1
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V3_LANE_1]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V3_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V3_LANE_1]])			; CHECK-NEXT: call void @use(double [[V3_LANE_1]])
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8			%v.3 = load <2 x double>, <2 x double>* %ptr.3, align 8
	%v3.lane.1 = extractelement <2 x double> %v.3, i32 1			%v3.lane.1 = extractelement <2 x double> %v.3, i32 1

	Show All 21 Lines
	; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <4 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_3]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_3]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[TMP4]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <4 x double> %v.1, i32 3			%v1.lane.3 = extractelement <4 x double> %v.1, i32 3

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 15 Lines
	; directly in a vector register on AArch64.			; directly in a vector register on AArch64.
	define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {			define void @extract_reverse_order(<2 x double>* %ptr.1, <4 x double>* %ptr.2) {
	; CHECK-LABEL: @extract_reverse_order(			; CHECK-LABEL: @extract_reverse_order(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <2 x double>, <2 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[V_1]], [[TMP1]]			; CHECK-NEXT: [[TMP1:%.*]] = fmul <2 x double> [[V_1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x double> [[V_1]], i32 0
				; CHECK-NEXT: call void @use(double [[TMP3]])
				; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[V_1]], i32 1
	; CHECK-NEXT: call void @use(double [[TMP4]])			; CHECK-NEXT: call void @use(double [[TMP4]])
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[V_1]], i32 1			; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: call void @use(double [[TMP5]])
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8			%v.1 = load <2 x double>, <2 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <2 x double> %v.1, i32 0			%v1.lane.0 = extractelement <2 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <2 x double> %v.1, i32 1			%v1.lane.1 = extractelement <2 x double> %v.1, i32 1

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	Show All 19 Lines
	; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8			; CHECK-NEXT: [[V_1:%.]] = load <4 x double>, <4 x double> [[PTR_1:%.*]], align 8
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <4 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <4 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_1]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V1_LANE_1]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_2]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[V1_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[V2_LANE_2]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: store <4 x double> [[TMP5]], <4 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <4 x double> [[TMP4]], <4 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8			%v.1 = load <4 x double>, <4 x double>* %ptr.1, align 8
	%v1.lane.1 = extractelement <4 x double> %v.1, i32 1			%v1.lane.1 = extractelement <4 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <4 x double> %v.1, i32 2			%v1.lane.2 = extractelement <4 x double> %v.1, i32 2

	%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16			%v.2 = load <4 x double>, <4 x double>* %ptr.2, align 16
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_2]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_1]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_1]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_3]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_3]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[V2_LANE_2]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[V2_LANE_0]], i32 3
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x double> [[TMP6]], double [[V2_LANE_0]], i32 3			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP6]], <4 x double> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 3>
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x double> [[TMP3]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x double> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x double> [[TMP8]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x double> [[TMP7]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[TMP9]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP8]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

define amdgpu_kernel void @mul_scalar_v2f16(half addrspace(3)* %a, half %scalar, half addrspace(3)* %c) {		define amdgpu_kernel void @mul_scalar_v2f16(half addrspace(3)* %a, half %scalar, half addrspace(3)* %c) {
; GCN-LABEL: @mul_scalar_v2f16(		; GCN-LABEL: @mul_scalar_v2f16(
; GCN-NEXT: [[TMP1:%.]] = bitcast half addrspace(3) [[A:%.]] to <2 x half> addrspace(3)		; GCN-NEXT: [[TMP1:%.]] = bitcast half addrspace(3) [[A:%.]] to <2 x half> addrspace(3)
; GCN-NEXT: [[TMP2:%.]] = load <2 x half>, <2 x half> addrspace(3) [[TMP1]], align 2		; GCN-NEXT: [[TMP2:%.]] = load <2 x half>, <2 x half> addrspace(3) [[TMP1]], align 2
; GCN-NEXT: [[TMP3:%.]] = insertelement <2 x half> poison, half [[SCALAR:%.]], i32 0		; GCN-NEXT: [[TMP3:%.]] = insertelement <2 x half> poison, half [[SCALAR:%.]], i32 0
; GCN-NEXT: [[TMP4:%.*]] = insertelement <2 x half> [[TMP3]], half [[SCALAR]], i32 1		; GCN-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x half> [[TMP3]], <2 x half> poison, <2 x i32> zeroinitializer
; GCN-NEXT: [[TMP5:%.*]] = fmul <2 x half> [[TMP2]], [[TMP4]]		; GCN-NEXT: [[TMP4:%.*]] = fmul <2 x half> [[TMP2]], [[SHUFFLE]]
; GCN-NEXT: [[TMP6:%.]] = bitcast half addrspace(3) [[C:%.]] to <2 x half> addrspace(3)		; GCN-NEXT: [[TMP5:%.]] = bitcast half addrspace(3) [[C:%.]] to <2 x half> addrspace(3)
; GCN-NEXT: store <2 x half> [[TMP5]], <2 x half> addrspace(3)* [[TMP6]], align 2		; GCN-NEXT: store <2 x half> [[TMP4]], <2 x half> addrspace(3)* [[TMP5]], align 2
; GCN-NEXT: ret void		; GCN-NEXT: ret void
;		;
%i0 = load half, half addrspace(3)* %a, align 2		%i0 = load half, half addrspace(3)* %a, align 2
%mul = fmul half %i0, %scalar		%mul = fmul half %i0, %scalar
%arrayidx3 = getelementptr inbounds half, half addrspace(3)* %a, i64 1		%arrayidx3 = getelementptr inbounds half, half addrspace(3)* %a, i64 1
%i3 = load half, half addrspace(3)* %arrayidx3, align 2		%i3 = load half, half addrspace(3)* %arrayidx3, align 2
%mul5 = fmul half %i3, %scalar		%mul5 = fmul half %i3, %scalar
store half %mul, half addrspace(3)* %c, align 2		store half %mul, half addrspace(3)* %c, align 2
▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35777.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -verify -slp-vectorizer -o - -S -mtriple=x86_64-apple-macosx10.13.0 \| FileCheck %s			; RUN: opt < %s -verify -slp-vectorizer -o - -S -mtriple=x86_64-apple-macosx10.13.0 \| FileCheck %s

	@global = local_unnamed_addr global [6 x double] zeroinitializer, align 16			@global = local_unnamed_addr global [6 x double] zeroinitializer, align 16

	define { i64, i64 } @patatino(double %arg) {			define { i64, i64 } @patatino(double %arg) {
	; CHECK-LABEL: @patatino(			; CHECK-LABEL: @patatino(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = load <2 x double>, <2 x double> bitcast ([6 x double]* @global to <2 x double>*), align 16			; CHECK-NEXT: [[TMP0:%.]] = load <2 x double>, <2 x double> bitcast ([6 x double]* @global to <2 x double>*), align 16
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 2) to <2 x double>*), align 16			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 2) to <2 x double>*), align 16
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[ARG:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[ARG:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[ARG]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP0]], [[TMP4]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP0]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 4) to <2 x double>*), align 16			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> bitcast (double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 4) to <2 x double>*), align 16
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP5]], [[TMP4]]
	; CHECK-NEXT: [[TMP8:%.*]] = fptosi <2 x double> [[TMP7]] to <2 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = fptosi <2 x double> [[TMP6]] to <2 x i32>
	; CHECK-NEXT: [[TMP9:%.*]] = sext <2 x i32> [[TMP8]] to <2 x i64>			; CHECK-NEXT: [[TMP8:%.*]] = sext <2 x i32> [[TMP7]] to <2 x i64>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i64> [[TMP8]], i32 0
	; CHECK-NEXT: [[T16:%.*]] = insertvalue { i64, i64 } undef, i64 [[TMP10]], 0			; CHECK-NEXT: [[T16:%.*]] = insertvalue { i64, i64 } undef, i64 [[TMP9]], 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i64> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i64> [[TMP8]], i32 1
	; CHECK-NEXT: [[T17:%.*]] = insertvalue { i64, i64 } [[T16]], i64 [[TMP11]], 1			; CHECK-NEXT: [[T17:%.*]] = insertvalue { i64, i64 } [[T16]], i64 [[TMP10]], 1
	; CHECK-NEXT: ret { i64, i64 } [[T17]]			; CHECK-NEXT: ret { i64, i64 } [[T17]]
	;			;
	bb:			bb:
	%t = load double, double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 0), align 16			%t = load double, double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 0), align 16
	%t1 = load double, double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 2), align 16			%t1 = load double, double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 2), align 16
	%t2 = fmul double %t1, %arg			%t2 = fmul double %t1, %arg
	%t3 = fadd double %t, %t2			%t3 = fadd double %t, %t2
	%t4 = load double, double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 4), align 16			%t4 = load double, double* getelementptr inbounds ([6 x double], [6 x double]* @global, i64 0, i64 4), align 16
	Show All 15 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-4 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -passes=slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 -slp-min-tree-size=5 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; CHECK-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE8:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; CHECK-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE7:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP14:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ [[TMP13:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <8 x i32> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>			; CHECK-NEXT: [[TMP5:%.*]] = add <8 x i32> [[SHUFFLE]], <i32 0, i32 55, i32 285, i32 1240, i32 1496, i32 8555, i32 12529, i32 13685>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE6]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[SHUFFLE7]])
	; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE7]])			; CHECK-NEXT: [[TMP7:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[SHUFFLE8]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[OP_RDX:%.*]] = and i32 [[TMP6]], [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP8:%.*]] = call i32 @llvm.vector.reduce.and.v8i32(<8 x i32> [[TMP5]])
	; CHECK-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]			; CHECK-NEXT: [[OP_RDX1:%.*]] = and i32 [[OP_RDX]], [[TMP8]]
	; CHECK-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP0]]			; CHECK-NEXT: [[OP_RDX2:%.*]] = and i32 [[OP_RDX1]], [[TMP0]]
	; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP0]]			; CHECK-NEXT: [[OP_RDX3:%.*]] = and i32 [[TMP0]], [[TMP0]]
	; CHECK-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]			; CHECK-NEXT: [[OP_RDX4:%.*]] = and i32 [[OP_RDX2]], [[OP_RDX3]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX4]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> <i32 poison, i32 14910>, i32 [[OP_RDX4]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x i32> [[TMP10]], i32 [[TMP4]], i32 1			; CHECK-NEXT: [[SHUFFLE6:%.*]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = and <2 x i32> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP11:%.*]] = and <2 x i32> [[TMP9]], [[SHUFFLE6]]
	; CHECK-NEXT: [[TMP13:%.*]] = add <2 x i32> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = add <2 x i32> [[TMP9]], [[SHUFFLE6]]
	; CHECK-NEXT: [[TMP14]] = shufflevector <2 x i32> [[TMP12]], <2 x i32> [[TMP13]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP13]] = shufflevector <2 x i32> [[TMP11]], <2 x i32> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	; FORCE_REDUCTION-LABEL: @Test(			; FORCE_REDUCTION-LABEL: @Test(
	; FORCE_REDUCTION-NEXT: entry:			; FORCE_REDUCTION-NEXT: entry:
	; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP1:%.]] = insertelement <8 x i32> poison, i32 [[TMP0:%.]], i32 0
	; FORCE_REDUCTION-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer			; FORCE_REDUCTION-NEXT: [[SHUFFLE7:%.*]] = shufflevector <8 x i32> [[TMP1]], <8 x i32> poison, <8 x i32> zeroinitializer
	; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0			; FORCE_REDUCTION-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> poison, i32 [[TMP0]], i32 0
	; FORCE_REDUCTION-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer			; FORCE_REDUCTION-NEXT: [[SHUFFLE6:%.*]] = shufflevector <16 x i32> [[TMP2]], <16 x i32> poison, <16 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/alternate-cmp-swapped-pred.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -passes=slp-vectorizer -S \| FileCheck %s

	define i16 @test(i16 %call37) {			define i16 @test(i16 %call37) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2			; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 poison, i16 poison>, i16 [[CALL37:%.]], i32 3			; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 poison, i16 poison, i16 0, i16 poison, i16 0>, i16 [[CALL37:%.]], i32 3
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 3, i32 4, i32 3, i32 5>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 3, i32 5, i32 3, i32 7>
	; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <8 x i16> [[SHUFFLE]], zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = icmp slt <8 x i16> [[SHUFFLE]], zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <8 x i16> [[SHUFFLE]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = icmp sgt <8 x i16> [[SHUFFLE]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 5, i32 6, i32 7>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <8 x i1> [[TMP2]], <8 x i1> [[TMP3]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 12, i32 5, i32 6, i32 7>
	; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i1> [[TMP4]] to <8 x i16>			; CHECK-NEXT: [[TMP5:%.*]] = zext <8 x i1> [[TMP4]] to <8 x i16>
	; CHECK-NEXT: [[TMP6:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP5]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP6]], 0			; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP6]], 0
	; CHECK-NEXT: ret i16 [[OP_RDX]]			; CHECK-NEXT: ret i16 [[OP_RDX]]
	;			;
	Show All 28 Lines

llvm/test/Transforms/SLPVectorizer/X86/broadcast_long.ll

	Show All 13 Lines

	define void @bcast_long(i32 %A, i32 %S) {			define void @bcast_long(i32 %A, i32 %S) {
	; CHECK-LABEL: @bcast_long(			; CHECK-LABEL: @bcast_long(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A0:%.]] = load i32, i32 [[A:%.*]], align 8			; CHECK-NEXT: [[A0:%.]] = load i32, i32 [[A:%.*]], align 8
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[A0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[A0]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> <i32 0, i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> <i32 0, i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IDXS0]] to <8 x i32>*			; CHECK-NEXT: [[TMP1:%.*]] = freeze <8 x i32> [[SHUFFLE]]
	; CHECK-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* [[TMP1]], align 8			; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[IDXS0]] to <8 x i32>*
				; CHECK-NEXT: store <8 x i32> [[TMP1]], <8 x i32>* [[TMP2]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%A0 = load i32, i32 *%A, align 8			%A0 = load i32, i32 *%A, align 8

	%idxS0 = getelementptr inbounds i32, i32* %S, i64 0			%idxS0 = getelementptr inbounds i32, i32* %S, i64 0
	%idxS1 = getelementptr inbounds i32, i32* %S, i64 1			%idxS1 = getelementptr inbounds i32, i32* %S, i64 1
	%idxS2 = getelementptr inbounds i32, i32* %S, i64 2			%idxS2 = getelementptr inbounds i32, i32* %S, i64 2
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/buildvector-shuffle.ll

	Show All 40 Lines
	}			}

	declare float @llvm.fmuladd.f32(float, float, float)			declare float @llvm.fmuladd.f32(float, float, float)

	define void @test(float %a) {			define void @test(float %a) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x float> poison, float [[A:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> [[TMP0]], float [[A]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP0]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x float> zeroinitializer, [[TMP1]]			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x float> zeroinitializer, [[SHUFFLE]]
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	;			;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%add.i157 = fadd float 0.000000e+00, %a			%add.i157 = fadd float 0.000000e+00, %a
	%add23.i = fadd float 0.000000e+00, %a			%add23.i = fadd float 0.000000e+00, %a
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[FNEG87:%.*]] = fneg double [[TMP12]]			; CHECK-NEXT: [[FNEG87:%.*]] = fneg double [[TMP12]]
	; CHECK-NEXT: [[MUL88:%.*]] = fmul double [[TMP4]], 2.000000e+00			; CHECK-NEXT: [[MUL88:%.*]] = fmul double [[TMP4]], 2.000000e+00
	; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0			; CHECK-NEXT: [[TMP26:%.*]] = insertelement <2 x double> poison, double [[FNEG87]], i32 0
	; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1			; CHECK-NEXT: [[TMP27:%.*]] = insertelement <2 x double> [[TMP26]], double [[CALL]], i32 1
	; CHECK-NEXT: [[TMP28:%.*]] = insertelement <2 x double> poison, double [[CALL]], i32 0			; CHECK-NEXT: [[TMP28:%.*]] = insertelement <2 x double> poison, double [[CALL]], i32 0
	; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1			; CHECK-NEXT: [[TMP29:%.*]] = insertelement <2 x double> [[TMP28]], double [[TMP12]], i32 1
	; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]			; CHECK-NEXT: [[TMP30:%.*]] = fsub <2 x double> [[TMP27]], [[TMP29]]
	; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0			; CHECK-NEXT: [[TMP31:%.*]] = insertelement <2 x double> poison, double [[MUL88]], i32 0
	; CHECK-NEXT: [[TMP32:%.*]] = insertelement <2 x double> [[TMP31]], double [[MUL88]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP31]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP33:%.*]] = fdiv <2 x double> [[TMP30]], [[TMP32]]			; CHECK-NEXT: [[TMP32:%.*]] = fdiv <2 x double> [[TMP30]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x double> [[TMP33]], i32 1			; CHECK-NEXT: [[TMP33:%.*]] = extractelement <2 x double> [[TMP32]], i32 1
	; CHECK-NEXT: [[CMP93:%.*]] = fcmp olt double [[TMP34]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP93:%.*]] = fcmp olt double [[TMP33]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[TMP35:%.*]] = extractelement <2 x double> [[TMP33]], i32 0			; CHECK-NEXT: [[TMP34:%.*]] = extractelement <2 x double> [[TMP32]], i32 0
	; CHECK-NEXT: [[CMP94:%.*]] = fcmp olt double [[TMP35]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP94:%.*]] = fcmp olt double [[TMP34]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP93]], i1 [[CMP94]], i1 false			; CHECK-NEXT: [[OR_COND:%.*]] = select i1 [[CMP93]], i1 [[CMP94]], i1 false
	; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP]], label [[LOR_LHS_FALSE:%.*]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP]], label [[LOR_LHS_FALSE:%.*]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[TMP36:%.*]] = fcmp ule <2 x double> [[TMP33]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP35:%.*]] = fcmp ule <2 x double> [[TMP32]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP37:%.*]] = extractelement <2 x i1> [[TMP36]], i32 0			; CHECK-NEXT: [[TMP36:%.*]] = extractelement <2 x i1> [[TMP35]], i32 0
	; CHECK-NEXT: [[TMP38:%.*]] = extractelement <2 x i1> [[TMP36]], i32 1			; CHECK-NEXT: [[TMP37:%.*]] = extractelement <2 x i1> [[TMP35]], i32 1
	; CHECK-NEXT: [[OR_COND106:%.*]] = select i1 [[TMP38]], i1 true, i1 [[TMP37]]			; CHECK-NEXT: [[OR_COND106:%.*]] = select i1 [[TMP37]], i1 true, i1 [[TMP36]]
	; CHECK-NEXT: [[SPEC_SELECT:%.*]] = zext i1 [[OR_COND106]] to i32			; CHECK-NEXT: [[SPEC_SELECT:%.*]] = zext i1 [[OR_COND106]] to i32
	; CHECK-NEXT: br label [[CLEANUP]]			; CHECK-NEXT: br label [[CLEANUP]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: [[RETVAL_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 0, [[IF_END]] ], [ [[SPEC_SELECT]], [[LOR_LHS_FALSE]] ]			; CHECK-NEXT: [[RETVAL_0:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 0, [[IF_END]] ], [ [[SPEC_SELECT]], [[LOR_LHS_FALSE]] ]
	; CHECK-NEXT: ret i32 [[RETVAL_0]]			; CHECK-NEXT: ret i32 [[RETVAL_0]]
	;			;
	entry:			entry:
	%dir = getelementptr inbounds %struct.ray, ptr %ray, i64 0, i32 1			%dir = getelementptr inbounds %struct.ray, ptr %ray, i64 0, i32 1
	▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp_sel.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	; int foo(double * restrict A, double * restrict B, double G) {			; int foo(double * restrict A, double * restrict B, double G) {
	; A[0] = (B[10] ? G : 1);			; A[0] = (B[10] ? G : 1);
	; A[1] = (B[11] ? G : 1);			; A[1] = (B[11] ? G : 1);
	; }			; }

	define i32 @foo(double* noalias nocapture %A, double* noalias nocapture %B, double %G) {			define i32 @foo(double* noalias nocapture %A, double* noalias nocapture %B, double %G) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 10			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 10
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = fcmp une <2 x double> [[TMP1]], zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = fcmp une <2 x double> [[TMP1]], zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[G:%.]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[G:%.]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[G]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> [[TMP2]], <2 x double> [[TMP4]], <2 x double> <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[TMP2]], <2 x double> [[SHUFFLE]], <2 x double> <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[A:%.]] to <2 x double>			; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %B, i64 10			%arrayidx = getelementptr inbounds double, double* %B, i64 10
	%0 = load double, double* %arrayidx, align 8			%0 = load double, double* %arrayidx, align 8
	%tobool = fcmp une double %0, 0.000000e+00			%tobool = fcmp une double %0, 0.000000e+00
	%cond = select i1 %tobool, double %G, double 1.000000e+00			%cond = select i1 %tobool, double %G, double 1.000000e+00
	store double %cond, double* %A, align 8			store double %cond, double* %A, align 8
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; AVX-LABEL: @same_opcode_on_one_side(			; AVX-LABEL: @same_opcode_on_one_side(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
	; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0			; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
	; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer			; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
	; AVX-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; AVX-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 1			; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[C]], i32 2			; AVX-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[C]], i32 2
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 3			; AVX-NEXT: [[SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 0>
	; AVX-NEXT: [[TMP7:%.*]] = xor <4 x i32> [[TMP3]], [[TMP6]]			; AVX-NEXT: [[TMP6:%.*]] = xor <4 x i32> [[TMP3]], [[SHUFFLE2]]
	; AVX-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16			; AVX-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%add1 = add i32 %c, %a			%add1 = add i32 %c, %a
	%add2 = add i32 %c, %a			%add2 = add i32 %c, %a
	%add3 = add i32 %a, %c			%add3 = add i32 %a, %c
	%add4 = add i32 %c, %a			%add4 = add i32 %c, %a
	%1 = xor i32 %add1, %a			%1 = xor i32 %add1, %a
	store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16			store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16
	%2 = xor i32 %b, %add2			%2 = xor i32 %b, %add2
	store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)			store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)
	%3 = xor i32 %c, %add3			%3 = xor i32 %c, %add3
	store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)			store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)
	%4 = xor i32 %a, %add4			%4 = xor i32 %a, %add4
	store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)			store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/compare-reduce.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.7.0"			target triple = "x86_64-apple-macosx10.7.0"

	@.str = private unnamed_addr constant [6 x i8] c"bingo\00", align 1			@.str = private unnamed_addr constant [6 x i8] c"bingo\00", align 1

	define void @reduce_compare(double* nocapture %A, i32 %n) {			define void @reduce_compare(double* nocapture %A, i32 %n) {
	; CHECK-LABEL: @reduce_compare(			; CHECK-LABEL: @reduce_compare(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double			; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[CONV]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[CONV]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[CONV]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = shl nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP1:%.*]] = shl nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 [[TMP2]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[TMP4]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], <double 7.000000e+00, double 4.000000e+00>			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], <double 7.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 5.000000e+00, double 9.000000e+00>			; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP5]], <double 5.000000e+00, double 9.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 1
	; CHECK-NEXT: [[CMP11:%.*]] = fcmp ogt double [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[CMP11:%.*]] = fcmp ogt double [[TMP7]], [[TMP8]]
	; CHECK-NEXT: br i1 [[CMP11]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; CHECK-NEXT: br i1 [[CMP11]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[CALL:%.]] = tail call i32 (i8, ...) @printf(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))			; CHECK-NEXT: [[CALL:%.]] = tail call i32 (i8, ...) @printf(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], 100			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], 100
	▲ Show 20 Lines • Show All 161 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S \| FileCheck %s
	; RUN: opt < %s -passes=slp-vectorizer -S -mattr=+avx \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mattr=+avx \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.10.0"			target triple = "x86_64-apple-macosx10.10.0"

	define void @testfunc(float* nocapture %dest, float* nocapture readonly %src) {			define void @testfunc(float* nocapture %dest, float* nocapture readonly %src) {
	; CHECK-LABEL: @testfunc(			; CHECK-LABEL: @testfunc(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[ACC1_056:%.]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP18:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds float, float [[SRC:%.*]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float [[ARRAYIDX]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[DEST:%.*]], i64 [[INDVARS_IV]]
	; CHECK-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4			; CHECK-NEXT: store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x float> [[TMP0]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x float> [[TMP0]], [[SHUFFLE]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x float> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x float> [[TMP5]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x float> [[TMP4]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fcmp olt <2 x float> [[TMP6]], <float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP6:%.*]] = fcmp olt <2 x float> [[TMP5]], <float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP6]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP7:%.*]] = select <2 x i1> [[TMP6]], <2 x float> [[TMP5]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp olt <2 x float> [[TMP8]], <float -1.000000e+00, float -1.000000e+00>			; CHECK-NEXT: [[TMP8:%.*]] = fcmp olt <2 x float> [[TMP7]], <float -1.000000e+00, float -1.000000e+00>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP8]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x float> [[TMP7]], zeroinitializer
	; CHECK-NEXT: [[TMP11:%.*]] = select <2 x i1> [[TMP9]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP10]]			; CHECK-NEXT: [[TMP10:%.*]] = select <2 x i1> [[TMP8]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP9]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x float> [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP10]], i32 1
	; CHECK-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]			; CHECK-NEXT: [[ADD13]] = fadd float [[TMP11]], [[TMP12]]
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x float> [[TMP11]], <2 x float> poison, <2 x i32> <i32 1, i32 undef>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> <i32 1, i32 undef>
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[ADD13]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP15:%.*]] = fcmp olt <2 x float> [[TMP14]], <float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP15]], <2 x float> [[TMP14]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; CHECK-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>			; CHECK-NEXT: [[TMP17:%.*]] = fcmp olt <2 x float> [[TMP16]], <float -1.000000e+00, float -1.000000e+00>
	; CHECK-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]			; CHECK-NEXT: [[TMP18]] = select <2 x i1> [[TMP17]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP16]]
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -slp-min-tree-size=2 -slp-threshold=-1000 -slp-max-look-ahead-depth=1 -slp-schedule-budget=27 -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -slp-min-tree-size=2 -slp-threshold=-1000 -slp-max-look-ahead-depth=1 -slp-schedule-budget=27 -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	define void @exceed(double %0, double %1) {			define void @exceed(double %0, double %1) {
	; CHECK-LABEL: @exceed(			; CHECK-LABEL: @exceed(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[TMP0:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[TMP0:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> poison, double [[TMP1:%.]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[TMP1:%.]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[TMP1]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = fdiv fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP4:%.*]] = fdiv fast <2 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
	; CHECK-NEXT: [[IX:%.*]] = fmul double [[TMP7]], undef			; CHECK-NEXT: [[IX:%.*]] = fmul double [[TMP5]], undef
	; CHECK-NEXT: [[IXX0:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX0:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX1:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX1:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX2:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX2:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX3:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX3:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX4:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX4:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX5:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX5:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IX1:%.*]] = fmul double [[TMP7]], undef			; CHECK-NEXT: [[IX1:%.*]] = fmul double [[TMP5]], undef
	; CHECK-NEXT: [[IXX10:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX10:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX11:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX11:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX12:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX12:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP6]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP10]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP8]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP11]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <2 x double> [[TMP9]], [[TMP7]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <2 x double> [[TMP13]], <2 x double> [[TMP6]], <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP4]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef			; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <2 x double> [[TMP12]], undef
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	; CHECK: label:			; CHECK: label:
	; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x double> [ [[TMP12]], [[BB1]] ], [ [[TMP15]], [[BB2]] ]			; CHECK-NEXT: [[TMP14:%.*]] = phi <2 x double> [ [[TMP10]], [[BB1]] ], [ [[TMP13]], [[BB2]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%i10 = fdiv fast double %0, %1			%i10 = fdiv fast double %0, %1
	%ix = fmul double %i10, undef			%ix = fmul double %i10, undef
	%ixx0 = fsub double undef, undef			%ixx0 = fsub double undef, undef
	%ixx1 = fsub double undef, undef			%ixx1 = fsub double undef, undef
	%ixx2 = fsub double undef, undef			%ixx2 = fsub double undef, undef
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines

define i32 @partial_mrg(double* nocapture %A, i32 %n) {		define i32 @partial_mrg(double* nocapture %A, i32 %n) {
; CHECK-LABEL: @partial_mrg(		; CHECK-LABEL: @partial_mrg(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double		; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double
; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>
; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[CONV]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[CONV]], i32 0
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[CONV]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], [[TMP1]]		; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP1]]
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[A]] to <2 x double>*		; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[A]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8		; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[N]], 4		; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[N]], 4
; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[IF_END:%.]]		; CHECK-NEXT: br i1 [[CMP]], label [[RETURN:%.]], label [[IF_END:%.]]
; CHECK: if.end:		; CHECK: if.end:
; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds double, double [[A]], i64 2		; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds double, double [[A]], i64 2
; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[N]], 4		; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[N]], 4
; CHECK-NEXT: [[CONV12:%.*]] = sitofp i32 [[ADD]] to double		; CHECK-NEXT: [[CONV12:%.*]] = sitofp i32 [[ADD]] to double
; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*		; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*
; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8		; CHECK-NEXT: [[TMP6:%.]] = load <2 x double>, <2 x double> [[TMP5]], align 8
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP2]], double [[CONV12]], i32 1		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP2]], double [[CONV12]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP8]], [[TMP7]]		; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP6]]
; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*		; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[ARRAYIDX7]] to <2 x double>*
; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8		; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP9]], align 8
; CHECK-NEXT: br label [[RETURN]]		; CHECK-NEXT: br label [[RETURN]]
; CHECK: return:		; CHECK: return:
; CHECK-NEXT: ret i32 0		; CHECK-NEXT: ret i32 0
;		;
entry:		entry:
%0 = load double, double* %A, align 8		%0 = load double, double* %A, align 8
%conv = sitofp i32 %n to double		%conv = sitofp i32 %n to double
%mul = fmul double %conv, %0		%mul = fmul double %conv, %0
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	if.end13: ; preds = %if.then12, %sw.epilog7, %entry
%b.0 = phi double [ %3, %if.then12 ], [ %add10, %sw.epilog7 ], [ undef, %entry], [ undef, %entry ]		%b.0 = phi double [ %3, %if.then12 ], [ %add10, %sw.epilog7 ], [ undef, %entry], [ undef, %entry ]
unreachable		unreachable
}		}

define void @cse_for_hoisted_instructions_in_preheader(i32* %dst, i32 %a, i1 %c) {		define void @cse_for_hoisted_instructions_in_preheader(i32* %dst, i32 %a, i1 %c) {
; CHECK-LABEL: @cse_for_hoisted_instructions_in_preheader(		; CHECK-LABEL: @cse_for_hoisted_instructions_in_preheader(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[A:%.]], i32 0		; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> poison, i32 [[A:%.]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[A]], i32 1		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP0]], <2 x i32> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i32> <i32 22, i32 22>, [[TMP1]]		; CHECK-NEXT: [[TMP1:%.*]] = or <2 x i32> <i32 22, i32 22>, [[SHUFFLE]]
; CHECK-NEXT: [[GEP_0:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 0		; CHECK-NEXT: [[GEP_0:%.]] = getelementptr inbounds i32, i32 [[DST:%.*]], i64 0
; CHECK-NEXT: [[TMP3:%.*]] = or <2 x i32> [[TMP2]], <i32 3, i32 3>		; CHECK-NEXT: [[TMP2:%.*]] = or <2 x i32> [[TMP1]], <i32 3, i32 3>
; CHECK-NEXT: [[TMP4:%.]] = bitcast i32 [[GEP_0]] to <2 x i32>*		; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[GEP_0]] to <2 x i32>*
; CHECK-NEXT: store <2 x i32> [[TMP3]], <2 x i32>* [[TMP4]], align 4		; CHECK-NEXT: store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = or <2 x i32> [[TMP1]], <i32 3, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = or <2 x i32> [[SHUFFLE]], <i32 3, i32 3>
; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 10		; CHECK-NEXT: [[GEP_2:%.]] = getelementptr inbounds i32, i32 [[DST]], i64 10
; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP_2]] to <2 x i32>*		; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[GEP_2]] to <2 x i32>*
; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 4		; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4
; CHECK-NEXT: br i1 [[C:%.]], label [[LOOP]], label [[EXIT:%.]]		; CHECK-NEXT: br i1 [[C:%.]], label [[LOOP]], label [[EXIT:%.]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %loop		br label %loop

loop:		loop:
Show All 19 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s

	define i64 @foo(i32 %tmp7) {			define i64 @foo(i32 %tmp7) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2
	; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 undef, i32 4			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 undef, i32 6
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 4, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[SHUFFLE]]			; CHECK-NEXT: [[TMP5:%.*]] = add nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[SHUFFLE]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
	; CHECK-NEXT: [[TMP7:%.*]] = add <8 x i32> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = add <8 x i32> zeroinitializer, [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = xor <8 x i32> [[TMP7]], zeroinitializer			; CHECK-NEXT: [[TMP8:%.*]] = xor <8 x i32> [[TMP7]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP8]])
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP9]], [[TMP10]]			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP9]], [[TMP10]]
	; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64			; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64
	; CHECK-NEXT: ret i64 [[TMP64]]			; CHECK-NEXT: ret i64 [[TMP64]]
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=i386-apple-macosx10.9.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=i386-apple-macosx10.9.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	@a = common global i64* null, align 8			@a = common global i64* null, align 8

	; Function Attrs: nounwind ssp uwtable			; Function Attrs: nounwind ssp uwtable
	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8			; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i64> [[TMP1]], i64* [[TMP0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[TMP1]], <2 x i64*> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, <2 x i64> [[TMP2]], <2 x i64> <i64 11, i64 56>			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i64, <2 x i64> [[SHUFFLE]], <2 x i64> <i64 11, i64 56>
	; CHECK-NEXT: [[TMP4:%.]] = ptrtoint <2 x i64> [[TMP3]] to <2 x i64>			; CHECK-NEXT: [[TMP3:%.]] = ptrtoint <2 x i64> [[TMP2]] to <2 x i64>
	; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i64> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i64> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[TMP4]] to <2 x i64>*
	; CHECK-NEXT: store <2 x i64> [[TMP4]], <2 x i64>* [[TMP6]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP3]], <2 x i64>* [[TMP5]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i64, i64* @a, align 8			%0 = load i64, i64* @a, align 8
	%add.ptr = getelementptr inbounds i64, i64* %0, i64 11			%add.ptr = getelementptr inbounds i64, i64* %0, i64 11
	%1 = ptrtoint i64* %add.ptr to i64			%1 = ptrtoint i64* %add.ptr to i64
	store i64 %1, i64* %add.ptr, align 8			store i64 %1, i64* %add.ptr, align 8
	%add.ptr1 = getelementptr inbounds i64, i64* %0, i64 56			%add.ptr1 = getelementptr inbounds i64, i64* %0, i64 56
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines

	}			}

	define void @externally_used_ptrs() {			define void @externally_used_ptrs() {
	; CHECK-LABEL: @externally_used_ptrs(			; CHECK-LABEL: @externally_used_ptrs(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8			; CHECK-NEXT: [[TMP0:%.]] = load i64, i64** @a, align 8
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i64> poison, i64* [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x i64> [[TMP1]], i64* [[TMP0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[TMP1]], <2 x i64*> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr i64, <2 x i64> [[TMP2]], <2 x i64> <i64 56, i64 11>			; CHECK-NEXT: [[TMP2:%.]] = getelementptr i64, <2 x i64> [[SHUFFLE]], <2 x i64> <i64 56, i64 11>
	; CHECK-NEXT: [[TMP4:%.]] = ptrtoint <2 x i64> [[TMP3]] to <2 x i64>			; CHECK-NEXT: [[TMP3:%.]] = ptrtoint <2 x i64> [[TMP2]] to <2 x i64>
	; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i64> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP4:%.]] = extractelement <2 x i64> [[TMP2]], i32 1
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[TMP4]] to <2 x i64>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> [[TMP6]], align 8			; CHECK-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> [[TMP5]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = add <2 x i64> [[TMP4]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP3]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i64 [[TMP5]] to <2 x i64>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i64 [[TMP4]] to <2 x i64>*
	; CHECK-NEXT: store <2 x i64> [[TMP8]], <2 x i64>* [[TMP9]], align 8			; CHECK-NEXT: store <2 x i64> [[TMP7]], <2 x i64>* [[TMP8]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i64, i64* @a, align 8			%0 = load i64, i64* @a, align 8
	%add.ptr = getelementptr inbounds i64, i64* %0, i64 11			%add.ptr = getelementptr inbounds i64, i64* %0, i64 11
	%1 = ptrtoint i64* %add.ptr to i64			%1 = ptrtoint i64* %add.ptr to i64
	%add.ptr1 = getelementptr inbounds i64, i64* %0, i64 56			%add.ptr1 = getelementptr inbounds i64, i64* %0, i64 56
	%2 = ptrtoint i64* %add.ptr1 to i64			%2 = ptrtoint i64* %add.ptr1 to i64
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll

	Show All 10 Lines
	; YAML: - Cost: '-1'			; YAML: - Cost: '-1'
	; YAML: - String: ' and with tree size '			; YAML: - String: ' and with tree size '
	; YAML: - TreeSize: '3'			; YAML: - TreeSize: '3'

	define float @multi_uses(<2 x float> %x, <2 x float> %y) {			define float @multi_uses(<2 x float> %x, <2 x float> %y) {
	; CHECK-LABEL: @multi_uses(			; CHECK-LABEL: @multi_uses(
	; CHECK-NEXT: [[Y1:%.]] = extractelement <2 x float> [[Y:%.]], i32 1			; CHECK-NEXT: [[Y1:%.]] = extractelement <2 x float> [[Y:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[Y1]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x float> poison, float [[Y1]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x float> [[TMP1]], float [[Y1]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP1]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.]] = fmul <2 x float> [[X:%.]], [[TMP2]]			; CHECK-NEXT: [[TMP2:%.]] = fmul <2 x float> [[X:%.]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[TMP3]], [[TMP4]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%y1 = extractelement <2 x float> %y, i32 1			%y1 = extractelement <2 x float> %y, i32 1
	%x0x0 = fmul float %x0, %y1			%x0x0 = fmul float %x0, %y1
	%x1x1 = fmul float %x1, %y1			%x1x1 = fmul float %x1, %y1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]			; CHECK-NEXT: [[X0X0:%.*]] = fmul float [[X0]], [[X1]]
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]			; THRESH1-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[SHUFFLE]], [[X]]
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <2 x i32> zeroinitializer
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]			; THRESH2-NEXT: [[TMP3:%.*]] = fmul <2 x float> [[SHUFFLE]], [[X]]
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP3]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP4]], [[TMP5]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

	Show First 20 Lines • Show All 763 Lines • ▼ Show 20 Lines
	; THRESHOLD-NEXT: entry:			; THRESHOLD-NEXT: entry:
	; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]			; THRESHOLD-NEXT: [[MUL:%.]] = mul nsw i32 [[B:%.]], [[A:%.*]]
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X:%.]] to <8 x float>			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X:%.]] to <8 x float>
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])			; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
	; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> <float poison, float 3.000000e+00>, float [[TMP2]], i32 0			; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> <float poison, float 3.000000e+00>, float [[TMP2]], i32 0
	; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[CONV]], i32 0			; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[CONV]], i32 0
	; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[CONV]], i32 1			; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <2 x i32> zeroinitializer
	; THRESHOLD-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP3]], [[TMP5]]			; THRESHOLD-NEXT: [[TMP5:%.*]] = fadd fast <2 x float> [[TMP3]], [[SHUFFLE]]
	; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0			; THRESHOLD-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP5]], i32 0
	; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1			; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP5]], i32 1
	; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP7]], [[TMP8]]			; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP6]], [[TMP7]]
	; THRESHOLD-NEXT: ret float [[OP_RDX2]]			; THRESHOLD-NEXT: ret float [[OP_RDX2]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%add = fadd fast float %conv, 3.000000e+00			%add = fadd fast float %conv, 3.000000e+00
	%add1 = fadd fast float %0, %add			%add1 = fadd fast float %0, %add
	▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float			; THRESHOLD-NEXT: [[CONV:%.*]] = sitofp i32 [[MUL]] to float
	; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float			; THRESHOLD-NEXT: [[CONVC:%.]] = sitofp i32 [[C:%.]] to float
	; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X:%.]] to <8 x float>			; THRESHOLD-NEXT: [[TMP0:%.]] = bitcast float [[X:%.]] to <8 x float>
	; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4			; THRESHOLD-NEXT: [[TMP1:%.]] = load <8 x float>, <8 x float> [[TMP0]], align 4
	; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])			; THRESHOLD-NEXT: [[TMP2:%.*]] = call fast float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> [[TMP1]])
	; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[TMP2]], i32 0			; THRESHOLD-NEXT: [[TMP3:%.*]] = insertelement <2 x float> poison, float [[TMP2]], i32 0
	; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[CONVC]], i32 1			; THRESHOLD-NEXT: [[TMP4:%.*]] = insertelement <2 x float> [[TMP3]], float [[CONVC]], i32 1
	; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[CONV]], i32 0			; THRESHOLD-NEXT: [[TMP5:%.*]] = insertelement <2 x float> poison, float [[CONV]], i32 0
	; THRESHOLD-NEXT: [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float [[CONV]], i32 1			; THRESHOLD-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <2 x i32> zeroinitializer
	; THRESHOLD-NEXT: [[TMP7:%.*]] = fadd fast <2 x float> [[TMP4]], [[TMP6]]			; THRESHOLD-NEXT: [[TMP6:%.*]] = fadd fast <2 x float> [[TMP4]], [[SHUFFLE]]
	; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; THRESHOLD-NEXT: [[TMP7:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
	; THRESHOLD-NEXT: [[TMP9:%.*]] = extractelement <2 x float> [[TMP7]], i32 1			; THRESHOLD-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
	; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP8]], [[TMP9]]			; THRESHOLD-NEXT: [[OP_RDX2:%.*]] = fadd fast float [[TMP7]], [[TMP8]]
	; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], 3.000000e+00			; THRESHOLD-NEXT: [[OP_RDX3:%.*]] = fadd fast float [[OP_RDX2]], 3.000000e+00
	; THRESHOLD-NEXT: ret float [[OP_RDX3]]			; THRESHOLD-NEXT: ret float [[OP_RDX3]]
	;			;
	entry:			entry:
	%mul = mul nsw i32 %b, %a			%mul = mul nsw i32 %b, %a
	%conv = sitofp i32 %mul to float			%conv = sitofp i32 %mul to float
	%0 = load float, float* %x, align 4			%0 = load float, float* %x, align 4
	%convc = sitofp i32 %c to float			%convc = sitofp i32 %c to float
	▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/in-tree-user.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.7.0"			target triple = "x86_64-apple-macosx10.7.0"

	@.str = private unnamed_addr constant [6 x i8] c"bingo\00", align 1			@.str = private unnamed_addr constant [6 x i8] c"bingo\00", align 1

	; Uses inside the tree must be scheduled after the corresponding tree bundle.			; Uses inside the tree must be scheduled after the corresponding tree bundle.
	define void @in_tree_user(double* nocapture %A, i32 %n) {			define void @in_tree_user(double* nocapture %A, i32 %n) {
	; CHECK-LABEL: @in_tree_user(			; CHECK-LABEL: @in_tree_user(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double			; CHECK-NEXT: [[CONV:%.]] = sitofp i32 [[N:%.]] to double
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[CONV]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[CONV]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[CONV]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_INC:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = shl nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP1:%.*]] = shl nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 [[TMP2]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[TMP4]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[SHUFFLE]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], <double 7.000000e+00, double 4.000000e+00>			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], <double 7.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP6]], <double 5.000000e+00, double 9.000000e+00>			; CHECK-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP5]], <double 5.000000e+00, double 9.000000e+00>
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[INTREEUSER:%.*]] = fadd double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[INTREEUSER:%.*]] = fadd double [[TMP7]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 1
	; CHECK-NEXT: [[CMP11:%.*]] = fcmp ogt double [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[CMP11:%.*]] = fcmp ogt double [[TMP7]], [[TMP8]]
	; CHECK-NEXT: br i1 [[CMP11]], label [[IF_THEN:%.*]], label [[FOR_INC]]			; CHECK-NEXT: br i1 [[CMP11]], label [[IF_THEN:%.*]], label [[FOR_INC]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[CALL:%.]] = tail call i32 (i8, ...) @printf(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))			; CHECK-NEXT: [[CALL:%.]] = tail call i32 (i8, ...) @printf(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0))
	; CHECK-NEXT: br label [[FOR_INC]]			; CHECK-NEXT: br label [[FOR_INC]]
	; CHECK: for.inc:			; CHECK: for.inc:
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[LFTR_WIDEIV:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], 100			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[LFTR_WIDEIV]], 100
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	%struct.sw = type { float, float, float, float }			%struct.sw = type { float, float, float, float }

	define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {			define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0			; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16			; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> <float undef, float poison, float poison, float undef>, float [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef			; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1			; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 poison>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[SHUFFLE]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
				; CHECK-NEXT: store <4 x i32> [[SHUFFLE1]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4			%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	%cond = select i1 %cmp, i32 8, i32 0			%cond = select i1 %cmp, i32 8, i32 0
	store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4			store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4
	%1 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 1), align 4			%1 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 1), align 4
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

	Show First 20 Lines • Show All 438 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0			; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
	; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1			; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
	; CHECK-NEXT: [[LOADA0:%.]] = load double, double [[IDX0]], align 4			; CHECK-NEXT: [[LOADA0:%.]] = load double, double [[IDX0]], align 4
	; CHECK-NEXT: [[LOADA1:%.]] = load double, double [[IDX1]], align 4			; CHECK-NEXT: [[LOADA1:%.]] = load double, double [[IDX1]], align 4
	; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4			; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4			; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
	; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0			; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[LOADA0]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[LOADVEC]], [[TMP2]]			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[LOADVEC]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[LOADA1]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[LOADVEC2]], [[TMP5]]			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[LOADVEC2]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP3]], [[TMP6]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP8:%.]] = bitcast double [[SIDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[SIDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%loadA0 = load double, double* %idx0, align 4			%loadA0 = load double, double* %idx0, align 4
	%loadA1 = load double, double* %idx1, align 4			%loadA1 = load double, double* %idx1, align 4

	%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4			%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4
	▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4			; AVX-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
	; AVX-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4			; AVX-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
	; AVX-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0			; AVX-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
	; AVX-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1			; AVX-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
	; AVX-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0			; AVX-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
	; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0			; AVX-NEXT: [[TMP1:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[EXTRA1]], i32 1			; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[EXTRA1]], i32 1
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0			; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[LOADA0]], i32 0
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[LOADA0]], i32 1			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP2]], [[TMP4]]			; AVX-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP2]], [[SHUFFLE]]
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0			; AVX-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1			; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[EXTRB1]], i32 1
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0			; AVX-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[LOADA1]], i32 0
	; AVX-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[LOADA1]], i32 1			; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> poison, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP10:%.*]] = fmul <2 x double> [[TMP7]], [[TMP9]]			; AVX-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP6]], [[SHUFFLE1]]
	; AVX-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP5]], [[TMP10]]			; AVX-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP4]], [[TMP8]]
	; AVX-NEXT: [[TMP12:%.]] = bitcast double [[SIDX0]] to <2 x double>*			; AVX-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*
	; AVX-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8			; AVX-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%loadA0 = load double, double* %idx0, align 4			%loadA0 = load double, double* %idx0, align 4
	%loadA1 = load double, double* %idx1, align 4			%loadA1 = load double, double* %idx1, align 4

	%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4			%loadVec = load <2 x double>, <2 x double>* %vecPtr1, align 4
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds double, double [[ARRAY1:%.*]], i64 0			; AVX-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds double, double [[ARRAY1:%.*]], i64 0
	; AVX-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds double, double [[ARRAY2:%.*]], i64 0			; AVX-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds double, double [[ARRAY2:%.*]], i64 0
	; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, double [[ARRAY2]], i64 1			; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, double [[ARRAY2]], i64 1
	; AVX-NEXT: [[LD_2_0:%.]] = load double, double [[GEP_2_0]], align 8			; AVX-NEXT: [[LD_2_0:%.]] = load double, double [[GEP_2_0]], align 8
	; AVX-NEXT: [[LD_2_1:%.]] = load double, double [[GEP_2_1]], align 8			; AVX-NEXT: [[LD_2_1:%.]] = load double, double [[GEP_2_1]], align 8
	; AVX-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*			; AVX-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*
	; AVX-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; AVX-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[LD_2_0]], i32 0			; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[LD_2_0]], i32 0
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[LD_2_0]], i32 1			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; AVX-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[LD_2_1]], i32 0			; AVX-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[LD_2_1]], i32 0
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[LD_2_1]], i32 1			; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP7:%.*]] = fmul <2 x double> [[TMP1]], [[TMP6]]			; AVX-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE1]]
	; AVX-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP4]], [[TMP7]]			; AVX-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP3]], [[TMP5]]
	; AVX-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP8]], i32 0			; AVX-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP8]], i32 1			; AVX-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 1
	; AVX-NEXT: [[ADD3:%.*]] = fadd double [[TMP9]], [[TMP10]]			; AVX-NEXT: [[ADD3:%.*]] = fadd double [[TMP7]], [[TMP8]]
	; AVX-NEXT: ret double [[ADD3]]			; AVX-NEXT: ret double [[ADD3]]
	;			;
	entry:			entry:
	%gep_1_0 = getelementptr inbounds double, double* %array1, i64 0			%gep_1_0 = getelementptr inbounds double, double* %array1, i64 0
	%gep_1_1 = getelementptr inbounds double, double* %array1, i64 1			%gep_1_1 = getelementptr inbounds double, double* %array1, i64 1
	%ld_1_0 = load double, double* %gep_1_0, align 8			%ld_1_0 = load double, double* %gep_1_0, align 8
	%ld_1_1 = load double, double* %gep_1_1, align 8			%ld_1_1 = load double, double* %gep_1_1, align 8

	Show All 25 Lines
	; SSE-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*			; SSE-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*
	; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; SSE-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; SSE-NEXT: [[TMP2:%.]] = bitcast double [[GEP_2_0]] to <2 x double>*			; SSE-NEXT: [[TMP2:%.]] = bitcast double [[GEP_2_0]] to <2 x double>*
	; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; SSE-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; SSE-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]			; SSE-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; SSE-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]
	; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]			; SSE-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP4]], [[TMP5]]
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer			; SSE-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> zeroinitializer
	; SSE-NEXT: [[TMP8:%.*]] = fsub <2 x double> [[TMP6]], [[TMP7]]			; SSE-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP6]], [[SHUFFLE1]]
	; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP8]], i32 0			; SSE-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	; SSE-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP8]], i32 1			; SSE-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; SSE-NEXT: [[RES:%.*]] = fadd double [[TMP9]], [[TMP10]]			; SSE-NEXT: [[RES:%.*]] = fadd double [[TMP8]], [[TMP9]]
	; SSE-NEXT: ret double [[RES]]			; SSE-NEXT: ret double [[RES]]
	;			;
	; AVX-LABEL: @splat_loads_with_internal_uses(			; AVX-LABEL: @splat_loads_with_internal_uses(
	; AVX-NEXT: entry:			; AVX-NEXT: entry:
	; AVX-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds double, double [[ARRAY1:%.*]], i64 0			; AVX-NEXT: [[GEP_1_0:%.]] = getelementptr inbounds double, double [[ARRAY1:%.*]], i64 0
	; AVX-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds double, double [[ARRAY2:%.*]], i64 0			; AVX-NEXT: [[GEP_2_0:%.]] = getelementptr inbounds double, double [[ARRAY2:%.*]], i64 0
	; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, double [[ARRAY2]], i64 1			; AVX-NEXT: [[GEP_2_1:%.]] = getelementptr inbounds double, double [[ARRAY2]], i64 1
	; AVX-NEXT: [[LD_2_0:%.]] = load double, double [[GEP_2_0]], align 8			; AVX-NEXT: [[LD_2_0:%.]] = load double, double [[GEP_2_0]], align 8
	; AVX-NEXT: [[LD_2_1:%.]] = load double, double [[GEP_2_1]], align 8			; AVX-NEXT: [[LD_2_1:%.]] = load double, double [[GEP_2_1]], align 8
	; AVX-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*			; AVX-NEXT: [[TMP0:%.]] = bitcast double [[GEP_1_0]] to <2 x double>*
	; AVX-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; AVX-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[LD_2_0]], i32 0			; AVX-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[LD_2_0]], i32 0
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[LD_2_0]], i32 1			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP1]], [[TMP3]]			; AVX-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE]]
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[LD_2_1]], i32 0			; AVX-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[LD_2_1]], i32 0
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[LD_2_1]], i32 1			; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> zeroinitializer
	; AVX-NEXT: [[TMP7:%.*]] = fmul <2 x double> [[TMP1]], [[TMP6]]			; AVX-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP1]], [[SHUFFLE1]]
	; AVX-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP4]], [[TMP7]]			; AVX-NEXT: [[TMP6:%.*]] = fadd <2 x double> [[TMP3]], [[TMP5]]
	; AVX-NEXT: [[TMP9:%.*]] = fsub <2 x double> [[TMP8]], [[TMP3]]			; AVX-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP6]], [[SHUFFLE]]
	; AVX-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP9]], i32 0			; AVX-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	; AVX-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP9]], i32 1			; AVX-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; AVX-NEXT: [[RES:%.*]] = fadd double [[TMP10]], [[TMP11]]			; AVX-NEXT: [[RES:%.*]] = fadd double [[TMP8]], [[TMP9]]
	; AVX-NEXT: ret double [[RES]]			; AVX-NEXT: ret double [[RES]]
	;			;
	entry:			entry:
	%gep_1_0 = getelementptr inbounds double, double* %array1, i64 0			%gep_1_0 = getelementptr inbounds double, double* %array1, i64 0
	%gep_1_1 = getelementptr inbounds double, double* %array1, i64 1			%gep_1_1 = getelementptr inbounds double, double* %array1, i64 1
	%ld_1_0 = load double, double* %gep_1_0, align 8			%ld_1_0 = load double, double* %gep_1_0, align 8
	%ld_1_1 = load double, double* %gep_1_1, align 8			%ld_1_1 = load double, double* %gep_1_1, align 8

	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu -mattr=+avx2 -slp-threshold=50 -slp-recursion-max-depth=6 < %s \| FileCheck %s

	define i32 @bar() local_unnamed_addr {			define i32 @bar() local_unnamed_addr {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB86_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD94_1:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_1]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 poison, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>, i32 [[SUB102_1]], i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 6
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 7
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 4			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 9
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 4, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 9, i32 11, i32 12, i32 13, i32 14, i32 15>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> poison, i32 [[SUB86_1]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 poison, i32 poison, i32 poison, i32 undef, i32 undef, i32 undef, i32 undef, i32 poison, i32 undef, i32 undef, i32 poison>, i32 [[SUB86_1]], i32 4
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[ADD78_1]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[ADD78_1]], i32 5
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD94_1]], i32 2			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD94_1]], i32 6
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_1]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_1]], i32 7
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_3]], i32 4			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_3]], i32 12
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP9]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 undef, i32 undef, i32 4>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP9]], <16 x i32> poison, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 12>
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP10:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
	; CHECK-NEXT: [[TMP13:%.*]] = lshr <16 x i32> [[TMP12]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: [[TMP13:%.*]] = lshr <16 x i32> [[TMP12]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i32> [[TMP13]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>			; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i32> [[TMP13]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[TMP15:%.*]] = mul nuw <16 x i32> [[TMP14]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>			; CHECK-NEXT: [[TMP15:%.*]] = mul nuw <16 x i32> [[TMP14]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
	; CHECK-NEXT: [[TMP16:%.*]] = add <16 x i32> [[TMP15]], [[TMP12]]			; CHECK-NEXT: [[TMP16:%.*]] = add <16 x i32> [[TMP15]], [[TMP12]]
	; CHECK-NEXT: [[TMP17:%.*]] = xor <16 x i32> [[TMP16]], [[TMP15]]			; CHECK-NEXT: [[TMP17:%.*]] = xor <16 x i32> [[TMP16]], [[TMP15]]
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

	Show All 11 Lines
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @a to <2 x i64>*), align 8			; CHECK-NEXT: [[TMP0:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @a to <2 x i64>*), align 8
	; CHECK-NEXT: br i1 [[X:%.]], label [[WHILE_BODY_LR_PH:%.]], label [[WHILE_END:%.*]]			; CHECK-NEXT: br i1 [[X:%.]], label [[WHILE_BODY_LR_PH:%.]], label [[WHILE_END:%.*]]
	; CHECK: while.body.lr.ph:			; CHECK: while.body.lr.ph:
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i64> [[TMP0]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i64> [[TMP0]], i32 1
	; CHECK-NEXT: [[ICMP_A1:%.*]] = icmp eq i64 [[TMP1]], 0			; CHECK-NEXT: [[ICMP_A1:%.*]] = icmp eq i64 [[TMP1]], 0
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @b to <2 x i64>*), align 8			; CHECK-NEXT: [[TMP2:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @b to <2 x i64>*), align 8
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i1> poison, i1 [[ICMP_A1]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i1> poison, i1 [[ICMP_A1]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i1> [[TMP3]], i1 [[ICMP_A1]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i1> [[TMP3]], <2 x i1> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> [[TMP4]], <2 x i64> [[TMP2]], <2 x i64> [[TMP0]]			; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> [[SHUFFLE]], <2 x i64> [[TMP2]], <2 x i64> [[TMP0]]
	; CHECK-NEXT: br label [[WHILE_END]]			; CHECK-NEXT: br label [[WHILE_END]]
	; CHECK: while.end:			; CHECK: while.end:
	; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i64> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP5]], [[WHILE_BODY_LR_PH]] ]			; CHECK-NEXT: [[TMP5:%.]] = phi <2 x i64> [ [[TMP0]], [[ENTRY:%.]] ], [ [[TMP4]], [[WHILE_BODY_LR_PH]] ]
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @c to <2 x i64>*), align 8			; CHECK-NEXT: [[TMP6:%.]] = load <2 x i64>, <2 x i64> bitcast (%struct.a* @c to <2 x i64>*), align 8
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
	; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP8]], 0			; CHECK-NEXT: [[ICMP_D0:%.*]] = icmp eq i64 [[TMP7]], 0
	; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 [[ICMP_D0]], label [[IF_END:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP8]], 8			; CHECK-NEXT: [[AND0_TMP:%.*]] = and i64 [[TMP7]], 8
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> poison, i64 [[AND0_TMP]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x i64> [[TMP9]], <2 x i64> [[TMP6]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i64> [[TMP8]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = and <2 x i64> [[TMP10]], [[TMP7]]			; CHECK-NEXT: [[TMP10:%.*]] = and <2 x i64> [[TMP9]], [[TMP6]]
	; CHECK-NEXT: store <2 x i64> [[TMP11]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8			; CHECK-NEXT: store <2 x i64> [[TMP10]], <2 x i64>* bitcast (%struct.a* @a to <2 x i64>*), align 8
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8			%a0 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 0), align 8
	%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8			%a1 = load i64, i64* getelementptr inbounds (%struct.a, %struct.a* @a, i32 0, i32 0, i32 1), align 8
	br i1 %x, label %while.body.lr.ph, label %while.end			br i1 %x, label %while.body.lr.ph, label %while.end
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

	Show All 11 Lines
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef			; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef
	; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2			; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP0]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = freeze <4 x i32> [[TMP0]]
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[TMP3]], <4 x i32> undef			; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], undef
	; CHECK-NEXT: [[TMP6:%.*]] = sext <4 x i32> [[TMP5]] to <4 x i64>			; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP4]], <4 x i32> undef
	; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i64> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i32> [[TMP6]] to <4 x i64>
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = trunc <4 x i64> [[TMP7]] to <4 x i32>
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 0
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sext i32 [[TMP9]] to i64
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1			; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i32> [[TMP8]], i32 1
	; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = sext i32 [[TMP11]] to i64
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP7]], i32 2			; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = sext i32 [[TMP12]] to i64			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i32> [[TMP8]], i32 2
	; CHECK-NEXT: [[ARRAYIDX31_2:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP7]], i32 3			; CHECK-NEXT: [[ARRAYIDX31_2:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP14]]
	; CHECK-NEXT: [[TMP15:%.*]] = sext i32 [[TMP14]] to i64			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3
	; CHECK-NEXT: [[ARRAYIDX31_3:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64
				; CHECK-NEXT: [[ARRAYIDX31_3:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP16]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br label %land.lhs.true			br label %land.lhs.true

	land.lhs.true: ; preds = %entry			land.lhs.true: ; preds = %entry
	br i1 undef, label %if.then, label %if.end			br i1 undef, label %if.then, label %if.end

	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -slp-threshold=-1000 -mtriple=x86_64 -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -slp-threshold=-1000 -mtriple=x86_64 -S \| FileCheck %s

	; The inputs to vector phi should remain undef.			; The inputs to vector phi should remain undef.

	define i32 @phi3UndefInput(i1 %cond, i8 %arg0, i8 %arg1, i8 %arg2, i8 %arg3) {			define i32 @phi3UndefInput(i1 %cond, i8 %arg0, i8 %arg1, i8 %arg2, i8 %arg3) {
	; CHECK-LABEL: @phi3UndefInput(			; CHECK-LABEL: @phi3UndefInput(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 undef, i8 undef, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 undef, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 0, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 0, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 21 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 21 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG0:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG0:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG3:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG3:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG0:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG0:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG2:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG2:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[MUL]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP4]], [[TMP6]]			; CHECK-NEXT: [[TMP6:%.*]] = fdiv <2 x double> [[TMP4]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 1
	; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP:%.*]] = fcmp olt double [[TMP7]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt double [[TMP9]], 0x3EB0C6F7A0B5ED8D			; CHECK-NEXT: [[CMP4:%.*]] = fcmp olt double [[TMP8]], 0x3EB0C6F7A0B5ED8D
	; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP]], [[CMP4]]			; CHECK-NEXT: [[OR_COND:%.*]] = and i1 [[CMP]], [[CMP4]]
	; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]			; CHECK-NEXT: br i1 [[OR_COND]], label [[CLEANUP:%.]], label [[LOR_LHS_FALSE:%.]]
	; CHECK: lor.lhs.false:			; CHECK: lor.lhs.false:
	; CHECK-NEXT: [[TMP10:%.*]] = fcmp ule <2 x double> [[TMP7]], <double 1.000000e+00, double 1.000000e+00>			; CHECK-NEXT: [[TMP9:%.*]] = fcmp ule <2 x double> [[TMP6]], <double 1.000000e+00, double 1.000000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP10]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x i1> [[TMP10]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1
	; CHECK-NEXT: [[NOT_OR_COND9:%.*]] = or i1 [[TMP11]], [[TMP12]]			; CHECK-NEXT: [[NOT_OR_COND9:%.*]] = or i1 [[TMP10]], [[TMP11]]
	; CHECK-NEXT: ret i1 [[NOT_OR_COND9]]			; CHECK-NEXT: ret i1 [[NOT_OR_COND9]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: ret i1 false			; CHECK-NEXT: ret i1 false
	;			;
	entry:			entry:
	%fneg = fneg double %b			%fneg = fneg double %b
	%add = fsub double %c, %b			%add = fsub double %c, %b
	%mul = fmul double %a, 2.000000e+00			%mul = fmul double %a, 2.000000e+00
	Show All 20 Lines
	; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]			; CHECK-NEXT: [[FNEG:%.]] = fneg double [[B:%.]]
	; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00			; CHECK-NEXT: [[MUL:%.]] = fmul double [[A:%.]], 2.000000e+00
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[C:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[FNEG]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[C]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[B]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = fsub <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[MUL]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[MUL]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP8:%.*]] = fdiv <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = fdiv <2 x double> [[TMP5]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP9:%.*]] = fcmp uge <2 x double> [[TMP8]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>			; CHECK-NEXT: [[TMP8:%.*]] = fcmp uge <2 x double> [[TMP7]], <double 0x3EB0C6F7A0B5ED8D, double 0x3EB0C6F7A0B5ED8D>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x i1> [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x i1> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x i1> [[TMP8]], i32 1
	; CHECK-NEXT: [[NOT_OR_COND:%.*]] = or i1 [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[NOT_OR_COND:%.*]] = or i1 [[TMP9]], [[TMP10]]
	; CHECK-NEXT: ret i1 [[NOT_OR_COND]]			; CHECK-NEXT: ret i1 [[NOT_OR_COND]]
	;			;
	%fneg = fneg double %b			%fneg = fneg double %b
	%add = fsub double %c, %b			%add = fsub double %c, %b
	%mul = fmul double %a, 2.000000e+00			%mul = fmul double %a, 2.000000e+00
	%div = fdiv double %add, %mul			%div = fdiv double %add, %mul
	%sub = fsub double %fneg, %c			%sub = fsub double %fneg, %c
	%div3 = fdiv double %sub, %mul			%div3 = fdiv double %sub, %mul
	%cmp = fcmp uge double %div, 0x3EB0C6F7A0B5ED8D			%cmp = fcmp uge double %div, 0x3EB0C6F7A0B5ED8D
	%cmp4 = fcmp uge double %div3, 0x3EB0C6F7A0B5ED8D			%cmp4 = fcmp uge double %div3, 0x3EB0C6F7A0B5ED8D
	%not.or.cond = or i1 %cmp4, %cmp			%not.or.cond = or i1 %cmp4, %cmp
	ret i1 %not.or.cond			ret i1 %not.or.cond
	}			}

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -passes=slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s			; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -passes=slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	define void @fextr(i16* %ptr) {			define void @fextr(i16* %ptr) {
	; CHECK-LABEL: @fextr(			; CHECK-LABEL: @fextr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load <8 x i16>, <8 x i16> undef, align 16			; CHECK-NEXT: [[LD:%.]] = load <8 x i16>, <8 x i16> undef, align 16
	; CHECK-NEXT: br label [[T:%.*]]			; CHECK-NEXT: br label [[T:%.*]]
	; CHECK: t:			; CHECK: t:
	; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP0:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP0:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[TMP0]], <8 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store <8 x i16> [[TMP0]], <8 x i16>* [[TMP1]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_phi.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown -mcpu=corei7-avx \| FileCheck %s

	%struct.complex = type { float, float }			%struct.complex = type { float, float }

	define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {			define void @foo (%struct.complex* %A, %struct.complex* %B, %struct.complex* %Result) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0			; CHECK-NEXT: [[TMP0:%.*]] = add i64 256, 0
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP20:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[TMP18:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP19:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP17:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP3:%.]] = getelementptr inbounds [[STRUCT_COMPLEX:%.]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B:%.*]], i64 [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4			; CHECK-NEXT: [[TMP5:%.]] = load float, float [[TMP4]], align 4
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[B]], i64 [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load float, float [[TMP6]], align 4
	; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP3]] to <2 x float>*			; CHECK-NEXT: [[TMP8:%.]] = bitcast float [[TMP3]] to <2 x float>*
	; CHECK-NEXT: [[TMP9:%.]] = load <2 x float>, <2 x float> [[TMP8]], align 4			; CHECK-NEXT: [[TMP9:%.]] = load <2 x float>, <2 x float> [[TMP8]], align 4
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x float> poison, float [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x float> [[TMP10]], float [[TMP5]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x float> [[TMP9]], [[TMP11]]			; CHECK-NEXT: [[TMP11:%.*]] = fmul <2 x float> [[TMP9]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> [[TMP13]], float [[TMP7]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP15:%.*]] = fmul <2 x float> [[TMP9]], [[TMP14]]			; CHECK-NEXT: [[TMP13:%.*]] = fmul <2 x float> [[TMP9]], [[SHUFFLE1]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP15]], <2 x float> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x float> [[TMP13]], <2 x float> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP16:%.*]] = fsub <2 x float> [[TMP12]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP14:%.*]] = fsub <2 x float> [[TMP11]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP17:%.*]] = fadd <2 x float> [[TMP12]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP15:%.*]] = fadd <2 x float> [[TMP11]], [[SHUFFLE2]]
	; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> [[TMP17]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x float> [[TMP14]], <2 x float> [[TMP15]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP19]] = fadd <2 x float> [[TMP2]], [[TMP18]]			; CHECK-NEXT: [[TMP17]] = fadd <2 x float> [[TMP2]], [[TMP16]]
	; CHECK-NEXT: [[TMP20]] = add nuw nsw i64 [[TMP1]], 1			; CHECK-NEXT: [[TMP18]] = add nuw nsw i64 [[TMP1]], 1
	; CHECK-NEXT: [[TMP21:%.*]] = icmp eq i64 [[TMP20]], [[TMP0]]			; CHECK-NEXT: [[TMP19:%.*]] = icmp eq i64 [[TMP18]], [[TMP0]]
	; CHECK-NEXT: br i1 [[TMP21]], label [[EXIT:%.*]], label [[LOOP]]			; CHECK-NEXT: br i1 [[TMP19]], label [[EXIT:%.*]], label [[LOOP]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: [[TMP22:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0			; CHECK-NEXT: [[TMP20:%.]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex [[RESULT:%.*]], i32 0, i32 0
	; CHECK-NEXT: [[TMP23:%.]] = bitcast float [[TMP22]] to <2 x float>*			; CHECK-NEXT: [[TMP21:%.]] = bitcast float [[TMP20]] to <2 x float>*
	; CHECK-NEXT: store <2 x float> [[TMP19]], <2 x float>* [[TMP23]], align 4			; CHECK-NEXT: store <2 x float> [[TMP17]], <2 x float>* [[TMP21]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = add i64 256, 0			%0 = add i64 256, 0
	br label %loop			br label %loop

	loop:			loop:
	%1 = phi i64 [ 0, %entry ], [ %20, %loop ]			%1 = phi i64 [ 0, %entry ], [ %20, %loop ]
	Show All 30 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_with_external_users.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx \| FileCheck %s

	; Make sure that we rotate the graph to help avoid the shuffle to			; Make sure that we rotate the graph to help avoid the shuffle to
	; the external vectorizable stores.			; the external vectorizable stores.
	;			;
	; SLP starts vectorizing from the operands of the `fcmp` in bb2, then crosses			; SLP starts vectorizing from the operands of the `fcmp` in bb2, then crosses
	; into bb1, vectorizing all the way to the broadcast load at the top.			; into bb1, vectorizing all the way to the broadcast load at the top.
	; The stores in bb1 are external to this tree, but they are vectorizable and are			; The stores in bb1 are external to this tree, but they are vectorizable and are
	; in reverse order.			; in reverse order.
	define void @rotate_with_external_users(double %A, double %ptr) {			define void @rotate_with_external_users(double %A, double %ptr) {
	; CHECK-LABEL: @rotate_with_external_users(			; CHECK-LABEL: @rotate_with_external_users(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[LD]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 2.200000e+00, double 1.100000e+00>			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> [[SHUFFLE]], <double 2.200000e+00, double 1.100000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 2.200000e+00, double 1.100000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 2.200000e+00, double 1.100000e+00>
	; CHECK-NEXT: [[PTRA1:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0			; CHECK-NEXT: [[PTRA1:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[PTRA1]] to <2 x double>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[PTRA1]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8			; CHECK-NEXT: store <2 x double> [[TMP2]], <2 x double>* [[TMP3]], align 8
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP3]], <double 4.400000e+00, double 3.300000e+00>			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP2]], <double 4.400000e+00, double 3.300000e+00>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
	; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP7]], [[TMP6]]			; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP6]], [[TMP5]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb1:			bb1:
	%ld = load double, double* undef			%ld = load double, double* undef

	%add1 = fadd double %ld, 1.1			%add1 = fadd double %ld, 1.1
	%add2 = fadd double %ld, 2.2			%add2 = fadd double %ld, 2.2

	▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines

	; We have to be careful when the tree contains add/sub patterns that could be			; We have to be careful when the tree contains add/sub patterns that could be
	; combined into a single addsub instruction. Reordering can block the pattern.			; combined into a single addsub instruction. Reordering can block the pattern.
	define void @addsub_and_external_users(double %A, double %ptr) {			define void @addsub_and_external_users(double %A, double %ptr) {
	; CHECK-LABEL: @addsub_and_external_users(			; CHECK-LABEL: @addsub_and_external_users(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[LD]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>			; CHECK-NEXT: [[TMP1:%.*]] = fsub <2 x double> [[SHUFFLE]], <double 1.100000e+00, double 1.200000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], <double 1.100000e+00, double 1.200000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[SHUFFLE]], <double 1.100000e+00, double 1.200000e+00>
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP3]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP4]], <double 2.100000e+00, double 2.200000e+00>			; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], <double 2.100000e+00, double 2.200000e+00>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], <double 3.100000e+00, double 3.200000e+00>			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], <double 3.100000e+00, double 3.200000e+00>
	; CHECK-NEXT: [[PTRA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0			; CHECK-NEXT: [[PTRA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[PTRA0]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[PTRA0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[SHUFFLE]], <2 x double>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x double> [[SHUFFLE1]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP6]], <double 4.100000e+00, double 4.200000e+00>			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP5]], <double 4.100000e+00, double 4.200000e+00>
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP8]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP8]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP9]], [[TMP10]]			; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP8]], [[TMP9]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb1:			bb1:
	%ld = load double, double* undef			%ld = load double, double* undef

	%sub1 = fsub double %ld, 1.1			%sub1 = fsub double %ld, 1.1
	%add2 = fadd double %ld, 1.2			%add2 = fadd double %ld, 1.2

	Show All 18 Lines
	}			}

	; This contains a sub/add bundle, reordering it will make it better.			; This contains a sub/add bundle, reordering it will make it better.
	define void @subadd_and_external_users(double %A, double %ptr) {			define void @subadd_and_external_users(double %A, double %ptr) {
	; CHECK-LABEL: @subadd_and_external_users(			; CHECK-LABEL: @subadd_and_external_users(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[LD]], i32 1			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 1.200000e+00, double 1.100000e+00>			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> [[SHUFFLE]], <double 1.200000e+00, double 1.100000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fsub <2 x double> [[TMP1]], <double 1.200000e+00, double 1.100000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fsub <2 x double> [[SHUFFLE]], <double 1.200000e+00, double 1.100000e+00>
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> [[TMP3]], <2 x i32> <i32 2, i32 1>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> [[TMP2]], <2 x i32> <i32 2, i32 1>
	; CHECK-NEXT: [[TMP5:%.*]] = fdiv <2 x double> [[TMP4]], <double 2.200000e+00, double 2.100000e+00>			; CHECK-NEXT: [[TMP4:%.*]] = fdiv <2 x double> [[TMP3]], <double 2.200000e+00, double 2.100000e+00>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <2 x double> [[TMP5]], <double 3.200000e+00, double 3.100000e+00>			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], <double 3.200000e+00, double 3.100000e+00>
	; CHECK-NEXT: [[PTRA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0			; CHECK-NEXT: [[PTRA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[PTRA0]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[PTRA0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP6]], <double 4.200000e+00, double 4.100000e+00>			; CHECK-NEXT: [[TMP7:%.*]] = fadd <2 x double> [[TMP5]], <double 4.200000e+00, double 4.100000e+00>
	; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP8]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP8]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
	; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP10]], [[TMP9]]			; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP9]], [[TMP8]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb1:			bb1:
	%ld = load double, double* undef			%ld = load double, double* undef

	%add1 = fadd double %ld, 1.1			%add1 = fadd double %ld, 1.1
	%sub2 = fsub double %ld, 1.2			%sub2 = fsub double %ld, 1.2

	▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reused-undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -slp-threshold=-1000 < %s \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -slp-threshold=-1000 < %s \| FileCheck %s

	define i32 @main(i32 %0) {			define i32 @main(i32 %0) {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: for.cond.preheader:			; CHECK-NEXT: for.cond.preheader:
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.]], label [[FOR_INC_PREHEADER:%.]]			; CHECK-NEXT: br i1 false, label [[FOR_END:%.]], label [[FOR_INC_PREHEADER:%.]]
	; CHECK: for.inc.preheader:			; CHECK: for.inc.preheader:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 poison>, i32 [[TMP0:%.]], i32 6			; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 undef>, i32 [[TMP0:%.]], i32 6
	; CHECK-NEXT: br i1 false, label [[FOR_END]], label [[L1_PREHEADER:%.*]]			; CHECK-NEXT: br i1 false, label [[FOR_END]], label [[L1_PREHEADER:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[DOTPR:%.]] = phi i32 [ 0, [[FOR_INC_PREHEADER]] ], [ 0, [[FOR_COND_PREHEADER:%.]] ]			; CHECK-NEXT: [[DOTPR:%.]] = phi i32 [ 0, [[FOR_INC_PREHEADER]] ], [ 0, [[FOR_COND_PREHEADER:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> poison, i32 [[DOTPR]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> poison, i32 [[DOTPR]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: br label [[L1_PREHEADER]]			; CHECK-NEXT: br label [[L1_PREHEADER]]
	; CHECK: L1.preheader:			; CHECK: L1.preheader:
	; CHECK-NEXT: [[TMP3:%.*]] = phi <8 x i32> [ [[SHUFFLE]], [[FOR_END]] ], [ [[TMP1]], [[FOR_INC_PREHEADER]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi <8 x i32> [ [[SHUFFLE]], [[FOR_END]] ], [ [[TMP1]], [[FOR_INC_PREHEADER]] ]
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	for.cond.preheader:			for.cond.preheader:
	br i1 false, label %for.end, label %for.inc.preheader			br i1 false, label %for.end, label %for.inc.preheader

	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reused-pointer.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-12 \| FileCheck %s			; RUN: opt -S -passes=slp-vectorizer < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-12 \| FileCheck %s

	define void @test(i1 %c, ptr %arg) {			define void @test(i1 %c, ptr %arg) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: br i1 [[C:%.]], label [[IF:%.]], label [[ELSE:%.*]]			; CHECK-NEXT: br i1 [[C:%.]], label [[IF:%.]], label [[ELSE:%.*]]
	; CHECK: if:			; CHECK: if:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x ptr> poison, ptr [[ARG:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x ptr> poison, ptr [[ARG:%.]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x ptr> [[TMP1]], <4 x ptr> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x ptr> [[TMP1]], <4 x ptr> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i8, <4 x ptr> [[SHUFFLE]], <4 x i64> <i64 32, i64 24, i64 8, i64 0>			; CHECK-NEXT: [[TMP2:%.*]] = getelementptr i8, <4 x ptr> [[SHUFFLE]], <4 x i64> <i64 32, i64 24, i64 8, i64 0>
	; CHECK-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.masked.gather.v4i64.v4p0(<4 x ptr> [[TMP2]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i64> poison)			; CHECK-NEXT: [[TMP3:%.*]] = call <4 x i64> @llvm.masked.gather.v4i64.v4p0(<4 x ptr> [[TMP2]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i64> poison)
	; CHECK-NEXT: br label [[JOIN:%.*]]			; CHECK-NEXT: br label [[JOIN:%.*]]
	; CHECK: else:			; CHECK: else:
	; CHECK-NEXT: [[ARG_1:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 8			; CHECK-NEXT: [[ARG_1:%.*]] = getelementptr inbounds i8, ptr [[ARG]], i64 8
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x ptr> poison, ptr [[ARG]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x ptr> poison, ptr [[ARG]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x ptr> [[TMP4]], ptr [[ARG]], i32 1			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x ptr> [[TMP4]], <2 x ptr> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i8, <2 x ptr> [[TMP5]], <2 x i64> <i64 32, i64 24>			; CHECK-NEXT: [[TMP5:%.*]] = getelementptr i8, <2 x ptr> [[SHUFFLE1]], <2 x i64> <i64 32, i64 24>
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x ptr> poison, ptr [[ARG]], i32 3			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x ptr> poison, ptr [[ARG]], i32 3
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x ptr> [[TMP6]], <2 x ptr> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x ptr> [[TMP5]], <2 x ptr> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x ptr> [[TMP7]], <4 x ptr> [[TMP8]], <4 x i32> <i32 4, i32 5, i32 undef, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x ptr> [[TMP6]], <4 x ptr> [[TMP7]], <4 x i32> <i32 4, i32 5, i32 undef, i32 3>
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x ptr> [[TMP9]], ptr [[ARG_1]], i32 2			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x ptr> [[TMP8]], ptr [[ARG_1]], i32 2
	; CHECK-NEXT: [[TMP11:%.*]] = call <4 x i64> @llvm.masked.gather.v4i64.v4p0(<4 x ptr> [[TMP10]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i64> poison)			; CHECK-NEXT: [[TMP10:%.*]] = call <4 x i64> @llvm.masked.gather.v4i64.v4p0(<4 x ptr> [[TMP9]], i32 8, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i64> poison)
	; CHECK-NEXT: br label [[JOIN]]			; CHECK-NEXT: br label [[JOIN]]
	; CHECK: join:			; CHECK: join:
	; CHECK-NEXT: [[TMP12:%.*]] = phi <4 x i64> [ [[TMP3]], [[IF]] ], [ [[TMP11]], [[ELSE]] ]			; CHECK-NEXT: [[TMP11:%.*]] = phi <4 x i64> [ [[TMP3]], [[IF]] ], [ [[TMP10]], [[ELSE]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	br i1 %c, label %if, label %else			br i1 %c, label %if, label %else

	if:			if:
	%i2.0 = load i64, ptr %arg, align 8			%i2.0 = load i64, ptr %arg, align 8
	%arg2.1 = getelementptr inbounds i8, ptr %arg, i64 8			%arg2.1 = getelementptr inbounds i8, ptr %arg, i64 8
	%i2.1 = load i64, ptr %arg2.1, align 8			%i2.1 = load i64, ptr %arg2.1, align 8
	Show All 24 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -passes=slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -passes=slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define void @foo() {			define void @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float			; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float
	; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef			; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> <float poison, float poison, float undef, float undef>, float [[SUB]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP14:%.]], [[BB3:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP14:%.]], [[BB3:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8
	; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Redesign vectorization of the gather nodes.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 473656

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/matmul.ll

llvm/test/Transforms/SLPVectorizer/AArch64/slp-fma-loss.ll

llvm/test/Transforms/SLPVectorizer/AArch64/splat-loads.ll

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorizable-selects-uniform-cmps.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/AMDGPU/packed-math.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35777.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cmp-swapped-pred.ll

llvm/test/Transforms/SLPVectorizer/X86/broadcast_long.ll

llvm/test/Transforms/SLPVectorizer/X86/buildvector-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/c-ray.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp_sel.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/compare-reduce.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/cse.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

llvm/test/Transforms/SLPVectorizer/X86/extract_in_tree_user.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement-multiple-uses.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/horizontal-list.ll

llvm/test/Transforms/SLPVectorizer/X86/in-tree-user.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll

llvm/test/Transforms/SLPVectorizer/X86/reduction2.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_phi.ll

llvm/test/Transforms/SLPVectorizer/X86/reorder_with_external_users.ll

llvm/test/Transforms/SLPVectorizer/X86/reused-undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/scatter-vectorize-reused-pointer.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

[SLP]Redesign vectorization of the gather nodes.
ClosedPublic