This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
9/22
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
gather-reduce.ll
-
gather-root.ll
-
spillcost-di.ll
-
trunc-insertion.ll
-
X86/
-
PR35628_2.ll
-
PR40310.ll
-
barriercall.ll
-
consecutive-access.ll
-
crash_cmpop.ll
-
crash_exceed_scheduling.ll
-
cross_block_slp.ll
-
cycle_dup.ll
-
external_user.ll
-
geps-non-pow-2.ll
-
multi_block.ll
-
opaque-ptr.ll
-
phi.ll
-
pr47629-inseltpoison.ll
-
pr47629.ll
-
pr47642.ll
-
rgb_phi.ll
-
shrink_after_reorder2.ll
-
sitofp-inseltpoison.ll
-
sitofp.ll
-
stores-non-ordered.ll
-
vectorize-widest-phis.ll
-
slp-max-phi-size.ll

Differential D121121

[SLP]Do not schedule instructions with constants/argument/phi operands and external users.
ClosedPublic

Authored by ABataev on Mar 7 2022, 7:42 AM.

Download Raw Diff

Details

Reviewers

RKSimon
dtemirbulatov
anton-afanasyev
vporpo

Commits

rGd65cc8597792: [SLP]Do not schedule instructions with constants/argument/phi operands and…
rG1eeb2bfe7273: [SLP]Do not schedule instructions with constants/argument/phi operands and…

Summary

No need to schedule entry nodes where all instructions are not memory
read/write instructions and their operands are either constants, or
arguments, or phis, or instructions from others blocks, or their users
are phis or from the other blocks.
The resulting vector instructions can be placed at
the beginning of the basic block without scheduling (if operands does
not need to be scheduled) or at the end of the block (if users are
outside of the block).
It may save some compile time and scheduling resources.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Mar 7 2022, 7:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 7:42 AM

Herald added subscribers: hiraditya, qcolombet. · View Herald Transcript

ABataev requested review of this revision.Mar 7 2022, 7:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 7:42 AM

Harbormaster completed remote builds in B152934: Diff 413482.Mar 7 2022, 8:39 AM

Ping!

I think this needs some text in the code to describe the high-level design changes in the scheduler introduced by this patch.
There are a couple of things to explain like:

When an instruction is not scheduled (which is also explained in the comments of isUsedOutsideBlock() and areAllOperandsNonInsts()).
Changes in the scheduler's design and data structures: If I understand correctly the instructions that are skipped don't get a ScheduleData object assigned to them.

Regarding the last point, couldn't we assign a ScheduleData object even to the instructions that are not scheduled and still save compilation time? I would assume that most of the compile-time overhead comes from calculateDependencies(), so as long as we don't calculate dependencies beyond the instructions that are marked to be skipped, then we should still save compilation time. I think this would make this change a bit less intrusive as it would not change the design of the scheduler much. What do you think?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
803	We may need to limit the number of users to a small integer because in some pathological cases we may have thousands of them.
2366	Why are we removing `Lane`?
2412–2413	This is a bit hard to read. I think it could be simplified by iterating across lanes in the for loop `for (unsigned Lane = 0, Lanes = VL.size(); Lane != Lanes; ++Lane)` and then setting `BundleMember` inside the for loop.
2415	I think we need a function like `mustBeScheduled(Value *V)` that calls `areAllOperandsNonInsts()` and `isUsedOutsideBlock()`. Also these checks show up in more than one place. Do you think we could check once and and cache the outcome of the check in a map? Perhaps add a `NeedsScheduling` field in `ScheduleData` ?
2603	Do we need the checks `NextInBundle != nullptr \|\| FirstInBundle != this` ? Shouldn't the check `TE != nullptr` be sufficient?
3933	perhaps rename it to `NeedsScheduling`?
4137	I don't understand why `needToScheduleSingleInstruction(VL)` needs to be here. If I understand correctly this assertion checks that if we have failed to schedule VL, then cacnelScheduling has worked correctly and `VL0` (and probably the rest of the values?) are not marked as part of a bundle. So `needToScheduleSingleInstruction(VL)` executes if `VL0` is part of a bundle.
6490–6491	Perhaps move these checks in a method`TreeEntry::doesNotNeedScheduling()` ?

In D121121#3374247, @vporpo wrote:

I think this needs some text in the code to describe the high-level design changes in the scheduler introduced by this patch.
There are a couple of things to explain like:

When an instruction is not scheduled (which is also explained in the comments of isUsedOutsideBlock() and areAllOperandsNonInsts()).

Changes in the scheduler's design and data structures: If I understand correctly the instructions that are skipped don't get a ScheduleData object assigned to them.

Where do you want to see this info?

Regarding the last point, couldn't we assign a ScheduleData object even to the instructions that are not scheduled and still save compilation time? I would assume that most of the compile-time overhead comes from calculateDependencies(), so as long as we don't calculate dependencies beyond the instructions that are marked to be skipped, then we should still save compilation time. I think this would make this change a bit less intrusive as it would not change the design of the scheduler much. What do you think?

I would like to save some memory too, not only compile time. Why do we need to allocate ScheduleData for the instructions that should not be scheduled?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
803	Good point, will do this.
2366	It is no needed anymore (actually used only in one place). Plus, it gets invalidated after graph reordering and it should be recalculated. Instead, better to find the corresponding instruction directly rather than keep this Lane and then recalculate it after graph reordering. Plus, saves some memory.
2415	Rather doubt we need a cache. The checks are pretty simple and should not take much time to perform.
2603	Yes, still need these checks since this member function is used before we actually assigning TE.
4137	isPartOfBundle is not powerfull enough to check if we have just a single instruction that requires scheduling. We don't have assigned TE yet and need an extra check here for a single schedulable instruction.
6490–6491	doesNotNeedScheduling works with a list of values, not with TreeEntry. And in some cases we need to call it for just a list. This check is needed only here, other cases do not require anything like this.

Address comments

Harbormaster completed remote builds in B153773: Diff 414661.Mar 11 2022, 8:33 AM

Where do you want to see this info?

Perhaps right above struct BlockScheduling ?

I would like to save some memory too, not only compile time. Why do we need to allocate ScheduleData for the instructions that should not be scheduled?

Makes sense. Please mention this in the design description text too.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
800	Nit: Please add a one line comment saying that this limit is to save compilation time when instrs have too many uses.
4137	I am probably missing something about the design here. If the check is false for `!BS.getScheduleData(VL0) \|\| !BS.getScheduleData(VL0)->isPartOfBundle()` then it means that `VL0` is part of a real bundle with more than one instruction in the bundle. In other words, how can we end up in such a situation where `VL0` is in a multi-instruction bundle and still have a single instruction in `VL` that needs scheduling? Let me explain. If we had only a single instruction that needs scheduling, then this could be either: (i) `VL0`, in which case it should be a single-instruction bundle so `isPartOfBundle()` would be false, or (ii) some other instruction in `VL`, in which case `VL0` should not require scheduling so `getScheduleData(VL0)` should be false because we no longer assign `ScheduleData` objects to instructions that don't need scheduling.

ABataev added inline comments.Mar 11 2022, 12:53 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4137	We allocate ScheduleData not only for schedulable instruction, but also for all instruction in between, but since they don't have next elements, they are not considered to be part of bundle. Function `isPartOfBundle()` is not enough here, because if we have only single schedulable instruction, `isPartOfBundle()` will still return false here, because this instruction doers not have next element in bundle and TE is not set yet. Here we have a corner case, where only single instruction from the VL must be scheduled and `isPartOfBundle()` still returns false.

Address comments

Harbormaster completed remote builds in B153829: Diff 414734.Mar 11 2022, 2:17 PM

vporpo added inline comments.Mar 11 2022, 3:23 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2725–2731	Some of this text repeats multiple times in the patch, it is probably better to avoid too much repetition. What would also be nice to have here is some brief explanation about the design, like when exactly a ScheduleData entry is assigned to an Instruction.
4137	We allocate ScheduleData not only for schedulable instruction, but also for all instruction in between, I see, `initScheduleData()` won't allocate ScheduleData unless it is schedulable, but `extendSchedulingRegion()` will do so without checking. Why is this done this way, and not for example always skip ScheduleData? Here we have a corner case, where only single instruction from the VL must be scheduled and isPartOfBundle() still returns false. I don't follow. If `isPartOfBundle()` returns false then `!BS.getScheduleData(VL0) \|\| !BS.getScheduleData(VL0)->isPartOfBundle()` is true, and we don't need `needToScheduleSingelInstruction(VL)`. The assertion would pass anyway.
7991–7994	move this under `if (doesNotNeedToSchedule(I))` ?

ABataev marked an inline comment as done.Mar 14 2022, 7:00 AM

ABataev added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4137	We allocate ScheduleData not only for schedulable instruction, but also for all instruction in between, I see, `initScheduleData()` won't allocate ScheduleData unless it is schedulable, but `extendSchedulingRegion()` will do so without checking. Why is this done this way, and not for example always skip ScheduleData? Actually, the check in `extendSchedulingRegion()` is not required, there is an assert instead. Plus, `extendSchedulingRegion()` calls `initScheduleData()`, which then checks if actually need to schedule the given instruction. Here we have a corner case, where only single instruction from the VL must be scheduled and isPartOfBundle() still returns false. I don't follow. If `isPartOfBundle()` returns false then `!BS.getScheduleData(VL0) \|\| !BS.getScheduleData(VL0)->isPartOfBundle()` is true, and we don't need `needToScheduleSingelInstruction(VL)`. The assertion would pass anyway. Ah, sorry, did not check logic correctly. Actually, it is safe to remove this check, was part of the development process, forgot to remove.

Address comments + rebase

Harbormaster completed remote builds in B154099: Diff 415089.Mar 14 2022, 9:08 AM

vporpo accepted this revision.Mar 14 2022, 12:00 PM

vporpo added inline comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
817	nit: perhaps rename it to `doesNotNeedToBeScheduled`? If it was a member function of the scheduler I think it would sound fine being called like `scheduler.doesNotNeedToSchedule(V)`.

This revision is now accepted and ready to land.Mar 14 2022, 12:00 PM

This revision was landed with ongoing or failed builds.Mar 16 2022, 6:07 AM

Closed by commit rG1eeb2bfe7273: [SLP]Do not schedule instructions with constants/argument/phi operands and… (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rG1eeb2bfe7273: [SLP]Do not schedule instructions with constants/argument/phi operands and….

Hi Alexey
with this patch, i noticed an assert building one of our runtime files , the test case .c produced is around 24000 lines
would you like it as is? or reduced ?

Instruction does not dominate all uses!

%39 = call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %18, <8 x i16> %38, <8 x i16> <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>)
%23 = shufflevector <8 x i16> %22, <8 x i16> %39, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14>

Instruction does not dominate all uses!

%67 = call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %44, <8 x i16> %66, <8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>)
%51 = shufflevector <8 x i16> %50, <8 x i16> %67, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14>

in function eshift
fatal error: error in backend: Broken function found, compilation aborted!

In D121121#3387320, @ronlieb wrote:
Hi Alexey
with this patch, i noticed an assert building one of our runtime files , the test case .c produced is around 24000 lines
would you like it as is? or reduced ?

Instruction does not dominate all uses!
%39 = call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %18, <8 x i16> %38, <8 x i16> <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>)
%23 = shufflevector <8 x i16> %22, <8 x i16> %39, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14>
Instruction does not dominate all uses!
%67 = call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %44, <8 x i16> %66, <8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>)
%51 = shufflevector <8 x i16> %50, <8 x i16> %67, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14>
in function eshift
fatal error: error in backend: Broken function found, compilation aborted!

Hi Ron, it would be good if you can provide reduced case. I'll revert the patch meanwhile.

ABataev added a reverting change: rG150ea7654312: Revert "[SLP]Do not schedule instructions with constants/argument/phi operands….Mar 16 2022, 1:56 PM

In D121121#3387334, @ABataev wrote:
In D121121#3387320, @ronlieb wrote:
Hi Alexey
with this patch, i noticed an assert building one of our runtime files , the test case .c produced is around 24000 lines
would you like it as is? or reduced ?

Instruction does not dominate all uses!
%39 = call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %18, <8 x i16> %38, <8 x i16> <i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8, i16 8>)
%23 = shufflevector <8 x i16> %22, <8 x i16> %39, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14>
Instruction does not dominate all uses!
%67 = call <8 x i16> @llvm.fshl.v8i16(<8 x i16> %44, <8 x i16> %66, <8 x i16> <i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15, i16 15>)
%51 = shufflevector <8 x i16> %50, <8 x i16> %67, <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14>
in function eshift
fatal error: error in backend: Broken function found, compilation aborted!
Hi Ron, it would be good if you can provide reduced case. I'll revert the patch meanwhile.

Hi we are seeing clang front end errors:

Instruction does not dominate all uses!
  %shuffle419 = shufflevector <2 x i64> %359, <2 x i64> poison, <2 x i32> <i32 1, i32 0>
  %385 = xor <2 x i64> %shuffle419, %360, !dbg !1753
Instruction does not dominate all uses!
  %shuffle418 = shufflevector <2 x i64> %357, <2 x i64> poison, <2 x i32> <i32 1, i32 0>
  %386 = xor <2 x i64> %shuffle418, %358, !dbg !1753
Instruction does not dominate all uses!
  %shuffle417 = shufflevector <2 x i64> %355, <2 x i64> poison, <2 x i32> <i32 1, i32 0>
  %387 = xor <2 x i64> %shuffle417, %356, !dbg !1753
Instruction does not dominate all uses!
  %shuffle = shufflevector <2 x i64> %353, <2 x i64> poison, <2 x i32> <i32 1, i32 0>
  %388 = xor <2 x i64> %shuffle, %354, !dbg !1753
Instruction does not dominate all uses!
  %463 = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> %416, <2 x i64> %462, <2 x i64> <i64 63, i64 63>), !dbg !1845
  %460 = shufflevector <2 x i64> %432, <2 x i64> %463, <2 x i32> <i32 0, i32 3>
in function HRSS_generate_key

When building boringssl and we found out this patch through git bisect. We have a reproducer available. I will file a github issue and post the reproducer there.

Filed https://github.com/llvm/llvm-project/issues/54407 with reproducer attached.

ABataev added a commit: rGd65cc8597792: [SLP]Do not schedule instructions with constants/argument/phi operands and….Mar 17 2022, 11:04 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

175 lines

test/

Transforms/

SLPVectorizer/

AArch64/

8 lines

67 lines

8 lines

4 lines

X86/

PR35628_2.ll

2 lines

PR40310.ll

8 lines

barriercall.ll

2 lines

consecutive-access.ll

2 lines

crash_cmpop.ll

2 lines

crash_exceed_scheduling.ll

22 lines

2 lines

2 lines

2 lines

14 lines

8 lines

4 lines

22 lines

pr47629-inseltpoison.ll

92 lines

pr47629.ll

92 lines

pr47642.ll

8 lines

rgb_phi.ll

20 lines

shrink_after_reorder2.ll

8 lines

sitofp-inseltpoison.ll

12 lines

sitofp.ll

12 lines

stores-non-ordered.ll

20 lines

vectorize-widest-phis.ll

2 lines

slp-max-phi-size.ll

80 lines

Diff 415799

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 770 Lines • ▼ Show 20 Lines	static void reorderScalars(SmallVectorImpl<Value *> &Scalars,
SmallVector<Value *> Prev(Scalars.size(),		SmallVector<Value *> Prev(Scalars.size(),
UndefValue::get(Scalars.front()->getType()));		UndefValue::get(Scalars.front()->getType()));
Prev.swap(Scalars);		Prev.swap(Scalars);
for (unsigned I = 0, E = Prev.size(); I < E; ++I)		for (unsigned I = 0, E = Prev.size(); I < E; ++I)
if (Mask[I] != UndefMaskElem)		if (Mask[I] != UndefMaskElem)
Scalars[Mask[I]] = Prev[I];		Scalars[Mask[I]] = Prev[I];
}		}

		/// Checks if the provided value does not require scheduling. It does not
		/// require scheduling if this is not an instruction or it is an instruction
		/// that does not read/write memory and all operands are either not instructions
		/// or phi nodes or instructions from different blocks.
		static bool areAllOperandsNonInsts(Value *V) {
		auto *I = dyn_cast<Instruction>(V);
		if (!I)
		return true;
		return !I->mayReadOrWriteMemory() && all_of(I->operands(), [I](Value *V) {
		auto *IO = dyn_cast<Instruction>(V);
		if (!IO)
		return true;
		return isa<PHINode>(IO) \|\| IO->getParent() != I->getParent();
		});
		}

		/// Checks if the provided value does not require scheduling. It does not
		/// require scheduling if this is not an instruction or it is an instruction
		/// that does not read/write memory and all users are phi nodes or instructions
		/// from the different blocks.
		static bool isUsedOutsideBlock(Value *V) {
		auto *I = dyn_cast<Instruction>(V);
		vporpoUnsubmitted Not Done Reply Inline Actions Nit: Please add a one line comment saying that this limit is to save compilation time when instrs have too many uses. vporpo: Nit: Please add a one line comment saying that this limit is to save compilation time when…
		if (!I)
		return true;
		// Limits the number of uses to save compile time.
		vporpoUnsubmitted Not Done Reply Inline Actions We may need to limit the number of users to a small integer because in some pathological cases we may have thousands of them. vporpo: We may need to limit the number of users to a small integer because in some pathological cases…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Good point, will do this. ABataev: Good point, will do this.
		constexpr int UsesLimit = 8;
		return !I->mayReadOrWriteMemory() && !I->hasNUsesOrMore(UsesLimit) &&
		all_of(I->users(), [I](User *U) {
		auto *IU = dyn_cast<Instruction>(U);
		if (!IU)
		return true;
		return IU->getParent() != I->getParent() \|\| isa<PHINode>(IU);
		});
		}

		/// Checks if the specified value does not require scheduling. It does not
		/// require scheduling if all operands and all users do not need to be scheduled
		/// in the current basic block.
		static bool doesNotNeedToBeScheduled(Value *V) {
		vporpoUnsubmitted Not Done Reply Inline Actions nit: perhaps rename it to `doesNotNeedToBeScheduled`? If it was a member function of the scheduler I think it would sound fine being called like `scheduler.doesNotNeedToSchedule(V)`. vporpo: nit: perhaps rename it to `doesNotNeedToBeScheduled`? If it was a member function of the…
		return areAllOperandsNonInsts(V) && isUsedOutsideBlock(V);
		}

		/// Checks if the specified array of instructions does not require scheduling.
		/// It is so if all either instructions have operands that do not require
		/// scheduling or their users do not require scheduling since they are phis or
		/// in other basic blocks.
		static bool doesNotNeedToSchedule(ArrayRef<Value *> VL) {
		return !VL.empty() &&
		(all_of(VL, isUsedOutsideBlock) \|\| all_of(VL, areAllOperandsNonInsts));
		}

namespace slpvectorizer {		namespace slpvectorizer {

/// Bottom Up SLP Vectorizer.		/// Bottom Up SLP Vectorizer.
class BoUpSLP {		class BoUpSLP {
struct TreeEntry;		struct TreeEntry;
struct ScheduleData;		struct ScheduleData;

public:		public:
▲ Show 20 Lines • Show All 1,566 Lines • ▼ Show 20 Lines	if (ReorderIndices.empty()) {
Last->setOperations(S);		Last->setOperations(S);
Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());		Last->ReorderIndices.append(ReorderIndices.begin(), ReorderIndices.end());
}		}
if (Last->State != TreeEntry::NeedToGather) {		if (Last->State != TreeEntry::NeedToGather) {
for (Value *V : VL) {		for (Value *V : VL) {
assert(!getTreeEntry(V) && "Scalar already in tree!");		assert(!getTreeEntry(V) && "Scalar already in tree!");
ScalarToTreeEntry[V] = Last;		ScalarToTreeEntry[V] = Last;
}		}
// Update the scheduler bundle to point to this TreeEntry.		// Update the scheduler bundle to point to this TreeEntry.
unsigned Lane = 0;		ScheduleData *BundleMember = Bundle.getValue();
		vporpoUnsubmitted Not Done Reply Inline Actions This is a bit hard to read. I think it could be simplified by iterating across lanes in the for loop `for (unsigned Lane = 0, Lanes = VL.size(); Lane != Lanes; ++Lane)` and then setting `BundleMember` inside the for loop. vporpo: This is a bit hard to read. I think it could be simplified by iterating across lanes in the for…
for (ScheduleData *BundleMember = Bundle.getValue(); BundleMember;		assert((BundleMember \|\| isa<PHINode>(S.MainOp) \|\|
BundleMember = BundleMember->NextInBundle) {		isVectorLikeInstWithConstOps(S.MainOp) \|\|
		vporpoUnsubmitted Not Done Reply Inline Actions I think we need a function like `mustBeScheduled(Value V)` that calls `areAllOperandsNonInsts()` and `isUsedOutsideBlock()`. Also these checks show up in more than one place. Do you think we could check once and and cache the outcome of the check in a map? Perhaps add a `NeedsScheduling` field in `ScheduleData` ? vporpo:* 1. I think we need a function like `mustBeScheduled(Value *V)` that calls…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Rather doubt we need a cache. The checks are pretty simple and should not take much time to perform. ABataev: Rather doubt we need a cache. The checks are pretty simple and should not take much time to…
		doesNotNeedToSchedule(VL)) &&
		"Bundle and VL out of sync");
		if (BundleMember) {
		for (Value *V : VL) {
		if (doesNotNeedToBeScheduled(V))
		continue;
		assert(BundleMember && "Unexpected end of bundle.");
BundleMember->TE = Last;		BundleMember->TE = Last;
BundleMember->Lane = Lane;		BundleMember = BundleMember->NextInBundle;
vporpoUnsubmitted Not Done Reply Inline Actions Why are we removing `Lane`? vporpo: Why are we removing `Lane`?
ABataevAuthorUnsubmitted Done Reply Inline Actions It is no needed anymore (actually used only in one place). Plus, it gets invalidated after graph reordering and it should be recalculated. Instead, better to find the corresponding instruction directly rather than keep this Lane and then recalculate it after graph reordering. Plus, saves some memory. ABataev: It is no needed anymore (actually used only in one place). Plus, it gets invalidated after…
++Lane;
}		}
assert((!Bundle.getValue() \|\| Lane == VL.size()) &&		}
"Bundle and VL out of sync");		assert(!BundleMember && "Bundle and VL out of sync");
} else {		} else {
MustGather.insert(VL.begin(), VL.end());		MustGather.insert(VL.begin(), VL.end());
}		}

if (UserTreeIdx.UserTE)		if (UserTreeIdx.UserTE)
Last->UserTreeIndices.push_back(UserTreeIdx);		Last->UserTreeIndices.push_back(UserTreeIdx);

return Last;		return Last;
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	void init(int BlockSchedulingRegionID, Value *OpVal) {
FirstInBundle = this;		FirstInBundle = this;
NextInBundle = nullptr;		NextInBundle = nullptr;
NextLoadStore = nullptr;		NextLoadStore = nullptr;
IsScheduled = false;		IsScheduled = false;
SchedulingRegionID = BlockSchedulingRegionID;		SchedulingRegionID = BlockSchedulingRegionID;
clearDependencies();		clearDependencies();
OpValue = OpVal;		OpValue = OpVal;
TE = nullptr;		TE = nullptr;
Lane = -1;
}		}

/// Verify basic self consistency properties		/// Verify basic self consistency properties
void verify() {		void verify() {
if (hasValidDependencies()) {		if (hasValidDependencies()) {
assert(UnscheduledDeps <= Dependencies && "invariant");		assert(UnscheduledDeps <= Dependencies && "invariant");
} else {		} else {
assert(UnscheduledDeps == Dependencies && "invariant");		assert(UnscheduledDeps == Dependencies && "invariant");
Show All 23 Lines	struct ScheduleData {

/// Returns true for single instructions and for bundle representatives		/// Returns true for single instructions and for bundle representatives
/// (= the head of a bundle).		/// (= the head of a bundle).
bool isSchedulingEntity() const { return FirstInBundle == this; }		bool isSchedulingEntity() const { return FirstInBundle == this; }

/// Returns true if it represents an instruction bundle and not only a		/// Returns true if it represents an instruction bundle and not only a
/// single instruction.		/// single instruction.
bool isPartOfBundle() const {		bool isPartOfBundle() const {
return NextInBundle != nullptr \|\| FirstInBundle != this;		return NextInBundle != nullptr \|\| FirstInBundle != this \|\| TE;
		vporpoUnsubmitted Not Done Reply Inline Actions Do we need the checks `NextInBundle != nullptr \|\| FirstInBundle != this` ? Shouldn't the check `TE != nullptr` be sufficient? vporpo: Do we need the checks `NextInBundle != nullptr \|\| FirstInBundle != this` ? Shouldn't the check…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, still need these checks since this member function is used before we actually assigning TE. ABataev: Yes, still need these checks since this member function is used before we actually assigning TE.
}		}

/// Returns true if it is ready for scheduling, i.e. it has no more		/// Returns true if it is ready for scheduling, i.e. it has no more
/// unscheduled depending instructions/bundles.		/// unscheduled depending instructions/bundles.
bool isReady() const {		bool isReady() const {
assert(isSchedulingEntity() &&		assert(isSchedulingEntity() &&
"can't consider non-scheduling entity for ready list");		"can't consider non-scheduling entity for ready list");
return unscheduledDepsInBundle() == 0 && !IsScheduled;		return unscheduledDepsInBundle() == 0 && !IsScheduled;
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	struct ScheduleData {
int Dependencies = InvalidDeps;		int Dependencies = InvalidDeps;

/// The number of dependencies minus the number of dependencies of scheduled		/// The number of dependencies minus the number of dependencies of scheduled
/// instructions. As soon as this is zero, the instruction/bundle gets ready		/// instructions. As soon as this is zero, the instruction/bundle gets ready
/// for scheduling.		/// for scheduling.
/// Note that this is negative as long as Dependencies is not calculated.		/// Note that this is negative as long as Dependencies is not calculated.
int UnscheduledDeps = InvalidDeps;		int UnscheduledDeps = InvalidDeps;

/// The lane of this node in the TreeEntry.
int Lane = -1;

/// True if this instruction is scheduled (or considered as scheduled in the		/// True if this instruction is scheduled (or considered as scheduled in the
/// dry-run).		/// dry-run).
bool IsScheduled = false;		bool IsScheduled = false;
};		};

#ifndef NDEBUG		#ifndef NDEBUG
friend inline raw_ostream &operator<<(raw_ostream &os,		friend inline raw_ostream &operator<<(raw_ostream &os,
const BoUpSLP::ScheduleData &SD) {		const BoUpSLP::ScheduleData &SD) {
SD.dump(os);		SD.dump(os);
return os;		return os;
}		}
#endif		#endif

friend struct GraphTraits<BoUpSLP *>;		friend struct GraphTraits<BoUpSLP *>;
friend struct DOTGraphTraits<BoUpSLP *>;		friend struct DOTGraphTraits<BoUpSLP *>;

/// Contains all scheduling data for a basic block.		/// Contains all scheduling data for a basic block.
		/// It does not schedules instructions, which are not memory read/write
		/// instructions and their operands are either constants, or arguments, or
		/// phis, or instructions from others blocks, or their users are phis or from
		/// the other blocks. The resulting vector instructions can be placed at the
		/// beginning of the basic block without scheduling (if operands does not need
		/// to be scheduled) or at the end of the block (if users are outside of the
		/// block). It allows to save some compile time and memory used by the
		vporpoUnsubmitted Not Done Reply Inline Actions Some of this text repeats multiple times in the patch, it is probably better to avoid too much repetition. What would also be nice to have here is some brief explanation about the design, like when exactly a ScheduleData entry is assigned to an Instruction. vporpo: Some of this text repeats multiple times in the patch, it is probably better to avoid too much…
		/// compiler.
		/// ScheduleData is assigned for each instruction in between the boundaries of
		/// the tree entry, even for those, which are not part of the graph. It is
		/// required to correctly follow the dependencies between the instructions and
		/// their correct scheduling. The ScheduleData is not allocated for the
		/// instructions, which do not require scheduling, like phis, nodes with
		/// extractelements/insertelements only or nodes with instructions, with
		/// uses/operands outside of the block.
struct BlockScheduling {		struct BlockScheduling {
BlockScheduling(BasicBlock *BB)		BlockScheduling(BasicBlock *BB)
: BB(BB), ChunkSize(BB->size()), ChunkPos(ChunkSize) {}		: BB(BB), ChunkSize(BB->size()), ChunkPos(ChunkSize) {}

void clear() {		void clear() {
ReadyInsts.clear();		ReadyInsts.clear();
ScheduleStart = nullptr;		ScheduleStart = nullptr;
ScheduleEnd = nullptr;		ScheduleEnd = nullptr;
Show All 11 Lines	void clear() {
// in the new region yet.		// in the new region yet.
++SchedulingRegionID;		++SchedulingRegionID;
}		}

ScheduleData getScheduleData(Instruction I) {		ScheduleData getScheduleData(Instruction I) {
if (BB != I->getParent())		if (BB != I->getParent())
// Avoid lookup if can't possibly be in map.		// Avoid lookup if can't possibly be in map.
return nullptr;		return nullptr;
ScheduleData *SD = ScheduleDataMap[I];		ScheduleData *SD = ScheduleDataMap.lookup(I);
if (SD && isInSchedulingRegion(SD))		if (SD && isInSchedulingRegion(SD))
return SD;		return SD;
return nullptr;		return nullptr;
}		}

ScheduleData getScheduleData(Value V) {		ScheduleData getScheduleData(Value V) {
if (auto *I = dyn_cast<Instruction>(V))		if (auto *I = dyn_cast<Instruction>(V))
return getScheduleData(I);		return getScheduleData(I);
return nullptr;		return nullptr;
}		}

ScheduleData getScheduleData(Value V, Value *Key) {		ScheduleData getScheduleData(Value V, Value *Key) {
if (V == Key)		if (V == Key)
return getScheduleData(V);		return getScheduleData(V);
auto I = ExtraScheduleDataMap.find(V);		auto I = ExtraScheduleDataMap.find(V);
if (I != ExtraScheduleDataMap.end()) {		if (I != ExtraScheduleDataMap.end()) {
ScheduleData *SD = I->second[Key];		ScheduleData *SD = I->second.lookup(Key);
if (SD && isInSchedulingRegion(SD))		if (SD && isInSchedulingRegion(SD))
return SD;		return SD;
}		}
return nullptr;		return nullptr;
}		}

bool isInSchedulingRegion(ScheduleData *SD) const {		bool isInSchedulingRegion(ScheduleData *SD) const {
return SD->SchedulingRegionID == SchedulingRegionID;		return SD->SchedulingRegionID == SchedulingRegionID;
}		}

/// Marks an instruction as scheduled and puts all dependent ready		/// Marks an instruction as scheduled and puts all dependent ready
/// instructions into the ready-list.		/// instructions into the ready-list.
template <typename ReadyListType>		template <typename ReadyListType>
void schedule(ScheduleData *SD, ReadyListType &ReadyList) {		void schedule(ScheduleData *SD, ReadyListType &ReadyList) {
SD->IsScheduled = true;		SD->IsScheduled = true;
LLVM_DEBUG(dbgs() << "SLP: schedule " << *SD << "\n");		LLVM_DEBUG(dbgs() << "SLP: schedule " << *SD << "\n");

for (ScheduleData *BundleMember = SD; BundleMember;		for (ScheduleData *BundleMember = SD; BundleMember;
BundleMember = BundleMember->NextInBundle) {		BundleMember = BundleMember->NextInBundle) {
if (BundleMember->Inst != BundleMember->OpValue)		if (BundleMember->Inst != BundleMember->OpValue)
continue;		continue;

// Handle the def-use chain dependencies.		// Handle the def-use chain dependencies.

// Decrement the unscheduled counter and insert to ready list if ready.		// Decrement the unscheduled counter and insert to ready list if ready.
auto &&DecrUnsched = [this, &ReadyList](Instruction *I) {		auto &&DecrUnsched = [this, &ReadyList](Instruction *I) {
doForAllOpcodes(I, [&ReadyList](ScheduleData *OpDef) {		doForAllOpcodes(I, [&ReadyList](ScheduleData *OpDef) {
if (OpDef && OpDef->hasValidDependencies() &&		if (OpDef && OpDef->hasValidDependencies() &&
OpDef->incrementUnscheduledDeps(-1) == 0) {		OpDef->incrementUnscheduledDeps(-1) == 0) {
// There are no more unscheduled dependencies after		// There are no more unscheduled dependencies after
// decrementing, so we can put the dependent instruction		// decrementing, so we can put the dependent instruction
// into the ready list.		// into the ready list.
ScheduleData *DepBundle = OpDef->FirstInBundle;		ScheduleData *DepBundle = OpDef->FirstInBundle;
assert(!DepBundle->IsScheduled &&		assert(!DepBundle->IsScheduled &&
"already scheduled bundle gets ready");		"already scheduled bundle gets ready");
ReadyList.insert(DepBundle);		ReadyList.insert(DepBundle);
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "SLP: gets ready (def): " << *DepBundle << "\n");		<< "SLP: gets ready (def): " << *DepBundle << "\n");
}		}
});		});
};		};

// If BundleMember is a vector bundle, its operands may have been		// If BundleMember is a vector bundle, its operands may have been
// reordered during buildTree(). We therefore need to get its operands		// reordered during buildTree(). We therefore need to get its operands
// through the TreeEntry.		// through the TreeEntry.
if (TreeEntry *TE = BundleMember->TE) {		if (TreeEntry *TE = BundleMember->TE) {
int Lane = BundleMember->Lane;		// Need to search for the lane since the tree entry can be reordered.
		int Lane = std::distance(TE->Scalars.begin(),
		find(TE->Scalars, BundleMember->Inst));
assert(Lane >= 0 && "Lane not set");		assert(Lane >= 0 && "Lane not set");

// Since vectorization tree is being built recursively this assertion		// Since vectorization tree is being built recursively this assertion
// ensures that the tree entry has all operands set before reaching		// ensures that the tree entry has all operands set before reaching
// this code. Couple of exceptions known at the moment are extracts		// this code. Couple of exceptions known at the moment are extracts
// where their second (immediate) operand is not added. Since		// where their second (immediate) operand is not added. Since
// immediates do not affect scheduler behavior this is considered		// immediates do not affect scheduler behavior this is considered
// okay.		// okay.
auto *In = TE->getMainOp();		auto *In = BundleMember->Inst;
assert(In &&		assert(In &&
(isa<ExtractValueInst>(In) \|\| isa<ExtractElementInst>(In) \|\|		(isa<ExtractValueInst>(In) \|\| isa<ExtractElementInst>(In) \|\|
In->getNumOperands() == TE->getNumOperands()) &&		In->getNumOperands() == TE->getNumOperands()) &&
"Missed TreeEntry operands?");		"Missed TreeEntry operands?");
(void)In; // fake use to avoid build failure when assertions disabled		(void)In; // fake use to avoid build failure when assertions disabled

for (unsigned OpIdx = 0, NumOperands = TE->getNumOperands();		for (unsigned OpIdx = 0, NumOperands = TE->getNumOperands();
OpIdx != NumOperands; ++OpIdx)		OpIdx != NumOperands; ++OpIdx)
Show All 28 Lines	void verify() {
return;		return;

assert(ScheduleStart->getParent() == ScheduleEnd->getParent() &&		assert(ScheduleStart->getParent() == ScheduleEnd->getParent() &&
ScheduleStart->comesBefore(ScheduleEnd) &&		ScheduleStart->comesBefore(ScheduleEnd) &&
"Not a valid scheduling region?");		"Not a valid scheduling region?");

for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {		for (auto *I = ScheduleStart; I != ScheduleEnd; I = I->getNextNode()) {
auto *SD = getScheduleData(I);		auto *SD = getScheduleData(I);
assert(SD && "primary scheduledata must exist in window");		if (!SD)
		continue;
assert(isInSchedulingRegion(SD) &&		assert(isInSchedulingRegion(SD) &&
"primary schedule data not in window?");		"primary schedule data not in window?");
assert(isInSchedulingRegion(SD->FirstInBundle) &&		assert(isInSchedulingRegion(SD->FirstInBundle) &&
"entire bundle in window!");		"entire bundle in window!");
(void)SD;		(void)SD;
doForAllOpcodes(I, [](ScheduleData *SD) { SD->verify(); });		doForAllOpcodes(I, [](ScheduleData *SD) { SD->verify(); });
}		}

▲ Show 20 Lines • Show All 1,025 Lines • ▼ Show 20 Lines	if (llvm::sortPtrAccesses(PointerOps, ScalarTy, DL, SE, Order)) {
if (TTI.isLegalMaskedGather(FixedVectorType::get(ScalarTy, VL.size()),		if (TTI.isLegalMaskedGather(FixedVectorType::get(ScalarTy, VL.size()),
CommonAlignment))		CommonAlignment))
return LoadsState::ScatterVectorize;		return LoadsState::ScatterVectorize;
}		}

return LoadsState::Gather;		return LoadsState::Gather;
}		}

		/// \return true if the specified list of values has only one instruction that
		/// requires scheduling, false otherwise.
		static bool needToScheduleSingleInstruction(ArrayRef<Value *> VL) {
		Value *NeedsScheduling = nullptr;
		vporpoUnsubmitted Not Done Reply Inline Actions perhaps rename it to `NeedsScheduling`? vporpo: perhaps rename it to `NeedsScheduling`?
		for (Value *V : VL) {
		if (doesNotNeedToBeScheduled(V))
		continue;
		if (!NeedsScheduling) {
		NeedsScheduling = V;
		continue;
		}
		return false;
		}
		return NeedsScheduling;
		}

void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,		void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
const EdgeInfo &UserTreeIdx) {		const EdgeInfo &UserTreeIdx) {
assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");		assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");

SmallVector<int> ReuseShuffleIndicies;		SmallVector<int> ReuseShuffleIndicies;
SmallVector<Value *> UniqueValues;		SmallVector<Value *> UniqueValues;
auto &&TryToFindDuplicates = [&VL, &ReuseShuffleIndicies, &UniqueValues,		auto &&TryToFindDuplicates = [&VL, &ReuseShuffleIndicies, &UniqueValues,
&UserTreeIdx,		&UserTreeIdx,
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines
#ifdef EXPENSIVE_CHECKS		#ifdef EXPENSIVE_CHECKS
// Make sure we didn't break any internal invariants		// Make sure we didn't break any internal invariants
BS.verify();		BS.verify();
#endif		#endif
if (!Bundle) {		if (!Bundle) {
LLVM_DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");		LLVM_DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");
assert((!BS.getScheduleData(VL0) \|\|		assert((!BS.getScheduleData(VL0) \|\|
!BS.getScheduleData(VL0)->isPartOfBundle()) &&		!BS.getScheduleData(VL0)->isPartOfBundle()) &&
"tryScheduleBundle should cancelScheduling on failure");		"tryScheduleBundle should cancelScheduling on failure");
		vporpoUnsubmitted Not Done Reply Inline Actions I don't understand why `needToScheduleSingleInstruction(VL)` needs to be here. If I understand correctly this assertion checks that if we have failed to schedule VL, then cacnelScheduling has worked correctly and `VL0` (and probably the rest of the values?) are not marked as part of a bundle. So `needToScheduleSingleInstruction(VL)` executes if `VL0` is part of a bundle. vporpo: I don't understand why `needToScheduleSingleInstruction(VL)` needs to be here. If I understand…
		ABataevAuthorUnsubmitted Done Reply Inline Actions isPartOfBundle is not powerfull enough to check if we have just a single instruction that requires scheduling. We don't have assigned TE yet and need an extra check here for a single schedulable instruction. ABataev: isPartOfBundle is not powerfull enough to check if we have just a single instruction that…
		vporpoUnsubmitted Not Done Reply Inline Actions I am probably missing something about the design here. If the check is false for `!BS.getScheduleData(VL0) \|\| !BS.getScheduleData(VL0)->isPartOfBundle()` then it means that `VL0` is part of a real bundle with more than one instruction in the bundle. In other words, how can we end up in such a situation where `VL0` is in a multi-instruction bundle and still have a single instruction in `VL` that needs scheduling? Let me explain. If we had only a single instruction that needs scheduling, then this could be either: (i) `VL0`, in which case it should be a single-instruction bundle so `isPartOfBundle()` would be false, or (ii) some other instruction in `VL`, in which case `VL0` should not require scheduling so `getScheduleData(VL0)` should be false because we no longer assign `ScheduleData` objects to instructions that don't need scheduling. vporpo: I am probably missing something about the design here. If the check is false for `!BS.
		ABataevAuthorUnsubmitted Done Reply Inline Actions We allocate ScheduleData not only for schedulable instruction, but also for all instruction in between, but since they don't have next elements, they are not considered to be part of bundle. Function `isPartOfBundle()` is not enough here, because if we have only single schedulable instruction, `isPartOfBundle()` will still return false here, because this instruction doers not have next element in bundle and TE is not set yet. Here we have a corner case, where only single instruction from the VL must be scheduled and `isPartOfBundle()` still returns false. ABataev: We allocate ScheduleData not only for schedulable instruction, but also for all instruction in…
		vporpoUnsubmitted Not Done Reply Inline Actions We allocate ScheduleData not only for schedulable instruction, but also for all instruction in between, I see, `initScheduleData()` won't allocate ScheduleData unless it is schedulable, but `extendSchedulingRegion()` will do so without checking. Why is this done this way, and not for example always skip ScheduleData? Here we have a corner case, where only single instruction from the VL must be scheduled and isPartOfBundle() still returns false. I don't follow. If `isPartOfBundle()` returns false then `!BS.getScheduleData(VL0) \|\| !BS.getScheduleData(VL0)->isPartOfBundle()` is true, and we don't need `needToScheduleSingelInstruction(VL)`. The assertion would pass anyway. vporpo: > We allocate ScheduleData not only for schedulable instruction, but also for all instruction…
		ABataevAuthorUnsubmitted Done Reply Inline Actions We allocate ScheduleData not only for schedulable instruction, but also for all instruction in between, I see, `initScheduleData()` won't allocate ScheduleData unless it is schedulable, but `extendSchedulingRegion()` will do so without checking. Why is this done this way, and not for example always skip ScheduleData? Actually, the check in `extendSchedulingRegion()` is not required, there is an assert instead. Plus, `extendSchedulingRegion()` calls `initScheduleData()`, which then checks if actually need to schedule the given instruction. Here we have a corner case, where only single instruction from the VL must be scheduled and isPartOfBundle() still returns false. I don't follow. If `isPartOfBundle()` returns false then `!BS.getScheduleData(VL0) \|\| !BS.getScheduleData(VL0)->isPartOfBundle()` is true, and we don't need `needToScheduleSingelInstruction(VL)`. The assertion would pass anyway. Ah, sorry, did not check logic correctly. Actually, it is safe to remove this check, was part of the development process, forgot to remove. ABataev: > > We allocate ScheduleData not only for schedulable instruction, but also for all instruction…
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
LLVM_DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");		LLVM_DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");

unsigned ShuffleOrOp = S.isAltShuffle() ?		unsigned ShuffleOrOp = S.isAltShuffle() ?
(unsigned) Instruction::ShuffleVector : S.getOpcode();		(unsigned) Instruction::ShuffleVector : S.getOpcode();
▲ Show 20 Lines • Show All 2,334 Lines • ▼ Show 20 Lines	void BoUpSLP::setInsertPointAfterBundle(const TreeEntry *E) {
// should be in this block.		// should be in this block.
auto *Front = E->getMainOp();		auto *Front = E->getMainOp();
auto *BB = Front->getParent();		auto *BB = Front->getParent();
assert(llvm::all_of(E->Scalars, [=](Value *V) -> bool {		assert(llvm::all_of(E->Scalars, [=](Value *V) -> bool {
auto *I = cast<Instruction>(V);		auto *I = cast<Instruction>(V);
return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;		return !E->isOpcodeOrAlt(I) \|\| I->getParent() == BB;
}));		}));

		// Set the insert point to the beginning of the basic block if the entry
		// should not be scheduled.
		if (E->State != TreeEntry::NeedToGather &&
		doesNotNeedToSchedule(E->Scalars)) {
		vporpoUnsubmitted Not Done Reply Inline Actions Perhaps move these checks in a method`TreeEntry::doesNotNeedScheduling()` ? vporpo: Perhaps move these checks in a method`TreeEntry::doesNotNeedScheduling()` ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions doesNotNeedScheduling works with a list of values, not with TreeEntry. And in some cases we need to call it for just a list. This check is needed only here, other cases do not require anything like this. ABataev: doesNotNeedScheduling works with a list of values, not with TreeEntry. And in some cases we…
		BasicBlock::iterator InsertPt;
		if (all_of(E->Scalars, isUsedOutsideBlock))
		InsertPt = BB->getTerminator()->getIterator();
		else
		InsertPt = BB->getFirstInsertionPt();
		Builder.SetInsertPoint(BB, InsertPt);
		Builder.SetCurrentDebugLocation(Front->getDebugLoc());
		return;
		}

// The last instruction in the bundle in program order.		// The last instruction in the bundle in program order.
Instruction *LastInst = nullptr;		Instruction *LastInst = nullptr;

// Find the last instruction. The common case should be that BB has been		// Find the last instruction. The common case should be that BB has been
// scheduled, and the last instruction is VL.back(). So we start with		// scheduled, and the last instruction is VL.back(). So we start with
// VL.back() and iterate over schedule data until we reach the end of the		// VL.back() and iterate over schedule data until we reach the end of the
// bundle. The end of the bundle is marked by null ScheduleData.		// bundle. The end of the bundle is marked by null ScheduleData.
if (BlocksSchedules.count(BB)) {		if (BlocksSchedules.count(BB)) {
auto *Bundle =		Value *V = E->isOneOf(E->Scalars.back());
BlocksSchedules[BB]->getScheduleData(E->isOneOf(E->Scalars.back()));		if (doesNotNeedToBeScheduled(V))
		V = *find_if_not(E->Scalars, doesNotNeedToBeScheduled);
		auto *Bundle = BlocksSchedules[BB]->getScheduleData(V);
if (Bundle && Bundle->isPartOfBundle())		if (Bundle && Bundle->isPartOfBundle())
for (; Bundle; Bundle = Bundle->NextInBundle)		for (; Bundle; Bundle = Bundle->NextInBundle)
if (Bundle->OpValue == Bundle->Inst)		if (Bundle->OpValue == Bundle->Inst)
LastInst = Bundle->Inst;		LastInst = Bundle->Inst;
}		}

// LastInst can still be null at this point if there's either not an entry		// LastInst can still be null at this point if there's either not an entry
// for BB in BlocksSchedules or there's no ScheduleData available for		// for BB in BlocksSchedules or there's no ScheduleData available for
▲ Show 20 Lines • Show All 1,209 Lines • ▼ Show 20 Lines	for (auto I = CSEWorkList.begin(), E = CSEWorkList.end(); I != E; ++I) {
}		}
}		}
CSEBlocks.clear();		CSEBlocks.clear();
GatherShuffleSeq.clear();		GatherShuffleSeq.clear();
}		}

BoUpSLP::ScheduleData *		BoUpSLP::ScheduleData *
BoUpSLP::BlockScheduling::buildBundle(ArrayRef<Value *> VL) {		BoUpSLP::BlockScheduling::buildBundle(ArrayRef<Value *> VL) {
ScheduleData *Bundle = nullptr;		ScheduleData *Bundle = nullptr;
ScheduleData *PrevInBundle = nullptr;		ScheduleData *PrevInBundle = nullptr;
for (Value *V : VL) {		for (Value *V : VL) {
		if (doesNotNeedToBeScheduled(V))
		continue;
ScheduleData *BundleMember = getScheduleData(V);		ScheduleData *BundleMember = getScheduleData(V);
assert(BundleMember &&		assert(BundleMember &&
"no ScheduleData for bundle member "		"no ScheduleData for bundle member "
"(maybe not in same basic block)");		"(maybe not in same basic block)");
assert(BundleMember->isSchedulingEntity() &&		assert(BundleMember->isSchedulingEntity() &&
"bundle member already part of other bundle");		"bundle member already part of other bundle");
if (PrevInBundle) {		if (PrevInBundle) {
PrevInBundle->NextInBundle = BundleMember;		PrevInBundle->NextInBundle = BundleMember;
Show All 11 Lines

// Groups the instructions to a bundle (which is then a single scheduling entity)		// Groups the instructions to a bundle (which is then a single scheduling entity)
// and schedules instructions until the bundle gets ready.		// and schedules instructions until the bundle gets ready.
Optional<BoUpSLP::ScheduleData *>		Optional<BoUpSLP::ScheduleData *>
BoUpSLP::BlockScheduling::tryScheduleBundle(ArrayRef<Value > VL, BoUpSLP SLP,		BoUpSLP::BlockScheduling::tryScheduleBundle(ArrayRef<Value > VL, BoUpSLP SLP,
const InstructionsState &S) {		const InstructionsState &S) {
// No need to schedule PHIs, insertelement, extractelement and extractvalue		// No need to schedule PHIs, insertelement, extractelement and extractvalue
// instructions.		// instructions.
if (isa<PHINode>(S.OpValue) \|\| isVectorLikeInstWithConstOps(S.OpValue))		if (isa<PHINode>(S.OpValue) \|\| isVectorLikeInstWithConstOps(S.OpValue) \|\|
		doesNotNeedToSchedule(VL))
return nullptr;		return nullptr;

// Initialize the instruction bundle.		// Initialize the instruction bundle.
Instruction *OldScheduleEnd = ScheduleEnd;		Instruction *OldScheduleEnd = ScheduleEnd;
LLVM_DEBUG(dbgs() << "SLP: bundle: " << *S.OpValue << "\n");		LLVM_DEBUG(dbgs() << "SLP: bundle: " << *S.OpValue << "\n");

auto TryScheduleBundleImpl = [this, OldScheduleEnd, SLP](bool ReSchedule,		auto TryScheduleBundleImpl = [this, OldScheduleEnd, SLP](bool ReSchedule,
ScheduleData *Bundle) {		ScheduleData *Bundle) {
Show All 29 Lines	while (((!Bundle && ReSchedule) \|\| (Bundle && !Bundle->isReady())) &&
"must be ready to schedule");		"must be ready to schedule");
schedule(Picked, ReadyInsts);		schedule(Picked, ReadyInsts);
}		}
};		};

// Make sure that the scheduling region contains all		// Make sure that the scheduling region contains all
// instructions of the bundle.		// instructions of the bundle.
for (Value *V : VL) {		for (Value *V : VL) {
		if (doesNotNeedToBeScheduled(V))
		continue;
if (!extendSchedulingRegion(V, S)) {		if (!extendSchedulingRegion(V, S)) {
// If the scheduling region got new instructions at the lower end (or it		// If the scheduling region got new instructions at the lower end (or it
// is a new region for the first bundle). This makes it necessary to		// is a new region for the first bundle). This makes it necessary to
// recalculate all dependencies.		// recalculate all dependencies.
// Otherwise the compiler may crash trying to incorrectly calculate		// Otherwise the compiler may crash trying to incorrectly calculate
// dependencies and emit instruction in the wrong order at the actual		// dependencies and emit instruction in the wrong order at the actual
// scheduling.		// scheduling.
TryScheduleBundleImpl(/ReSchedule=/false, nullptr);		TryScheduleBundleImpl(/ReSchedule=/false, nullptr);
return None;		return None;
}		}
}		}

bool ReSchedule = false;		bool ReSchedule = false;
for (Value *V : VL) {		for (Value *V : VL) {
		if (doesNotNeedToBeScheduled(V))
		continue;
ScheduleData *BundleMember = getScheduleData(V);		ScheduleData *BundleMember = getScheduleData(V);
assert(BundleMember &&		assert(BundleMember &&
"no ScheduleData for bundle member (maybe not in same basic block)");		"no ScheduleData for bundle member (maybe not in same basic block)");

// Make sure we don't leave the pieces of the bundle in the ready list when		// Make sure we don't leave the pieces of the bundle in the ready list when
// whole bundle might not be ready.		// whole bundle might not be ready.
ReadyInsts.remove(BundleMember);		ReadyInsts.remove(BundleMember);

Show All 13 Lines	if (!Bundle->isReady()) {
cancelScheduling(VL, S.OpValue);		cancelScheduling(VL, S.OpValue);
return None;		return None;
}		}
return Bundle;		return Bundle;
}		}

void BoUpSLP::BlockScheduling::cancelScheduling(ArrayRef<Value *> VL,		void BoUpSLP::BlockScheduling::cancelScheduling(ArrayRef<Value *> VL,
Value *OpValue) {		Value *OpValue) {
if (isa<PHINode>(OpValue) \|\| isVectorLikeInstWithConstOps(OpValue))		if (isa<PHINode>(OpValue) \|\| isVectorLikeInstWithConstOps(OpValue) \|\|
		doesNotNeedToSchedule(VL))
return;		return;

		if (doesNotNeedToBeScheduled(OpValue))
		OpValue = *find_if_not(VL, doesNotNeedToBeScheduled);
ScheduleData *Bundle = getScheduleData(OpValue);		ScheduleData *Bundle = getScheduleData(OpValue);
LLVM_DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");		LLVM_DEBUG(dbgs() << "SLP: cancel scheduling of " << *Bundle << "\n");
assert(!Bundle->IsScheduled &&		assert(!Bundle->IsScheduled &&
"Can't cancel bundle which is already scheduled");		"Can't cancel bundle which is already scheduled");
assert(Bundle->isSchedulingEntity() && Bundle->isPartOfBundle() &&		assert(Bundle->isSchedulingEntity() &&
		(Bundle->isPartOfBundle() \|\| needToScheduleSingleInstruction(VL)) &&
"tried to unbundle something which is not a bundle");		"tried to unbundle something which is not a bundle");

// Remove the bundle from the ready list.		// Remove the bundle from the ready list.
if (Bundle->isReady())		if (Bundle->isReady())
ReadyInsts.remove(Bundle);		ReadyInsts.remove(Bundle);

// Un-bundle: make single instructions out of the bundle.		// Un-bundle: make single instructions out of the bundle.
ScheduleData *BundleMember = Bundle;		ScheduleData *BundleMember = Bundle;
while (BundleMember) {		while (BundleMember) {
assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");		assert(BundleMember->FirstInBundle == Bundle && "corrupt bundle links");
BundleMember->FirstInBundle = BundleMember;		BundleMember->FirstInBundle = BundleMember;
ScheduleData *Next = BundleMember->NextInBundle;		ScheduleData *Next = BundleMember->NextInBundle;
BundleMember->NextInBundle = nullptr;		BundleMember->NextInBundle = nullptr;
		BundleMember->TE = nullptr;
if (BundleMember->unscheduledDepsInBundle() == 0) {		if (BundleMember->unscheduledDepsInBundle() == 0) {
ReadyInsts.insert(BundleMember);		ReadyInsts.insert(BundleMember);
}		}
BundleMember = Next;		BundleMember = Next;
}		}
}		}

BoUpSLP::ScheduleData *BoUpSLP::BlockScheduling::allocateScheduleDataChunks() {		BoUpSLP::ScheduleData *BoUpSLP::BlockScheduling::allocateScheduleDataChunks() {
// Allocate a new ScheduleData for the instruction.		// Allocate a new ScheduleData for the instruction.
if (ChunkPos >= ChunkSize) {		if (ChunkPos >= ChunkSize) {
ScheduleDataChunks.push_back(std::make_unique<ScheduleData[]>(ChunkSize));		ScheduleDataChunks.push_back(std::make_unique<ScheduleData[]>(ChunkSize));
ChunkPos = 0;		ChunkPos = 0;
}		}
return &(ScheduleDataChunks.back()[ChunkPos++]);		return &(ScheduleDataChunks.back()[ChunkPos++]);
}		}

bool BoUpSLP::BlockScheduling::extendSchedulingRegion(Value *V,		bool BoUpSLP::BlockScheduling::extendSchedulingRegion(Value *V,
const InstructionsState &S) {		const InstructionsState &S) {
if (getScheduleData(V, isOneOf(S, V)))		if (getScheduleData(V, isOneOf(S, V)))
return true;		return true;
Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
assert(I && "bundle member must be an instruction");		assert(I && "bundle member must be an instruction");
assert(!isa<PHINode>(I) && !isVectorLikeInstWithConstOps(I) &&		assert(!isa<PHINode>(I) && !isVectorLikeInstWithConstOps(I) &&
		!doesNotNeedToBeScheduled(I) &&
"phi nodes/insertelements/extractelements/extractvalues don't need to "		"phi nodes/insertelements/extractelements/extractvalues don't need to "
"be scheduled");		"be scheduled");
auto &&CheckScheduleForI = [this, &S](Instruction *I) -> bool {		auto &&CheckScheduleForI = [this, &S](Instruction *I) -> bool {
ScheduleData *ISD = getScheduleData(I);		ScheduleData *ISD = getScheduleData(I);
if (!ISD)		if (!ISD)
return false;		return false;
assert(isInSchedulingRegion(ISD) &&		assert(isInSchedulingRegion(ISD) &&
"ScheduleData not in scheduling region");		"ScheduleData not in scheduling region");
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
}		}

void BoUpSLP::BlockScheduling::initScheduleData(Instruction *FromI,		void BoUpSLP::BlockScheduling::initScheduleData(Instruction *FromI,
Instruction *ToI,		Instruction *ToI,
ScheduleData *PrevLoadStore,		ScheduleData *PrevLoadStore,
ScheduleData *NextLoadStore) {		ScheduleData *NextLoadStore) {
ScheduleData *CurrentLoadStore = PrevLoadStore;		ScheduleData *CurrentLoadStore = PrevLoadStore;
for (Instruction *I = FromI; I != ToI; I = I->getNextNode()) {		for (Instruction *I = FromI; I != ToI; I = I->getNextNode()) {
ScheduleData *SD = ScheduleDataMap[I];		// No need to allocate data for non-schedulable instructions.
		if (doesNotNeedToBeScheduled(I))
		continue;
		ScheduleData *SD = ScheduleDataMap.lookup(I);
		vporpoUnsubmitted Done Reply Inline Actions move this under `if (doesNotNeedToSchedule(I))` ? vporpo: move this under `if (doesNotNeedToSchedule(I))` ?
if (!SD) {		if (!SD) {
SD = allocateScheduleDataChunks();		SD = allocateScheduleDataChunks();
ScheduleDataMap[I] = SD;		ScheduleDataMap[I] = SD;
SD->Inst = I;		SD->Inst = I;
}		}
assert(!isInSchedulingRegion(SD) &&		assert(!isInSchedulingRegion(SD) &&
"new ScheduleData already in scheduling region");		"new ScheduleData already in scheduling region");
SD->init(SchedulingRegionID, I);		SD->init(SchedulingRegionID, I);
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	void BoUpSLP::scheduleBlock(BlockScheduling *BS) {

// Ensure that all dependency data is updated and fill the ready-list with		// Ensure that all dependency data is updated and fill the ready-list with
// initial instructions.		// initial instructions.
int Idx = 0;		int Idx = 0;
int NumToSchedule = 0;		int NumToSchedule = 0;
for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;		for (auto *I = BS->ScheduleStart; I != BS->ScheduleEnd;
I = I->getNextNode()) {		I = I->getNextNode()) {
BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {		BS->doForAllOpcodes(I, [this, &Idx, &NumToSchedule, BS](ScheduleData *SD) {
		TreeEntry *SDTE = getTreeEntry(SD->Inst);
assert((isVectorLikeInstWithConstOps(SD->Inst) \|\|		assert((isVectorLikeInstWithConstOps(SD->Inst) \|\|
SD->isPartOfBundle() == (getTreeEntry(SD->Inst) != nullptr)) &&		SD->isPartOfBundle() ==
		(SDTE && !doesNotNeedToSchedule(SDTE->Scalars))) &&
"scheduler and vectorizer bundle mismatch");		"scheduler and vectorizer bundle mismatch");
SD->FirstInBundle->SchedulingPriority = Idx++;		SD->FirstInBundle->SchedulingPriority = Idx++;
if (SD->isSchedulingEntity()) {		if (SD->isSchedulingEntity()) {
BS->calculateDependencies(SD, false, this);		BS->calculateDependencies(SD, false, this);
NumToSchedule++;		NumToSchedule++;
}		}
});		});
}		}
▲ Show 20 Lines • Show All 2,648 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll

	Show All 30 Lines
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	Show All 9 Lines
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]			; GENERIC-NEXT: br label [[FOR_COND_CLEANUP]]
	; GENERIC: for.cond.cleanup:			; GENERIC: for.cond.cleanup:
	; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; GENERIC-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]			; GENERIC-NEXT: ret i32 [[SUM_0_LCSSA]]
	; GENERIC: for.body:			; GENERIC: for.body:
	; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; GENERIC-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; GENERIC-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; GENERIC-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; GENERIC-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; GENERIC-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; GENERIC-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; GENERIC-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; GENERIC-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; GENERIC-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; GENERIC-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; GENERIC-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; GENERIC-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; GENERIC-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; GENERIC-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; GENERIC-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; GENERIC-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; GENERIC-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; GENERIC-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; GENERIC-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; GENERIC-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; GENERIC-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; GENERIC-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; GENERIC-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; GENERIC-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; GENERIC-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; GENERIC-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	Show All 9 Lines
	; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]			; KRYO-NEXT: br label [[FOR_COND_CLEANUP]]
	; KRYO: for.cond.cleanup:			; KRYO: for.cond.cleanup:
	; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]			; KRYO-NEXT: [[SUM_0_LCSSA:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[ADD66:%.]], [[FOR_COND_CLEANUP_LOOPEXIT:%.]] ]
	; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]			; KRYO-NEXT: ret i32 [[SUM_0_LCSSA]]
	; KRYO: for.body:			; KRYO: for.body:
	; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[I_0103:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[SUM_0102:%.*]] = phi i32 [ [[ADD66]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]
	; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]			; KRYO-NEXT: [[A_ADDR_0101:%.]] = phi i16 [ [[INCDEC_PTR58:%.]], [[FOR_BODY]] ], [ [[A:%.]], [[FOR_BODY_PREHEADER]] ]
				; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*			; KRYO-NEXT: [[TMP0:%.]] = bitcast i16 [[A_ADDR_0101]] to <8 x i16>*
	; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2			; KRYO-NEXT: [[TMP1:%.]] = load <8 x i16>, <8 x i16> [[TMP0]], align 2
	; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>			; KRYO-NEXT: [[TMP2:%.*]] = zext <8 x i16> [[TMP1]] to <8 x i32>
	; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>			; KRYO-NEXT: [[TMP3:%.]] = bitcast i16 [[B:%.]] to <8 x i16>
	; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2			; KRYO-NEXT: [[TMP4:%.]] = load <8 x i16>, <8 x i16> [[TMP3]], align 2
	; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>			; KRYO-NEXT: [[TMP5:%.*]] = zext <8 x i16> [[TMP4]] to <8 x i32>
	; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]			; KRYO-NEXT: [[TMP6:%.*]] = sub nsw <8 x i32> [[TMP2]], [[TMP5]]
	; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0			; KRYO-NEXT: [[TMP7:%.*]] = extractelement <8 x i32> [[TMP6]], i64 0
	Show All 33 Lines
	; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32			; KRYO-NEXT: [[CONV47:%.*]] = zext i16 [[TMP24]] to i32
	; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]			; KRYO-NEXT: [[ADD48:%.*]] = add nsw i32 [[ADD39]], [[CONV47]]
	; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6			; KRYO-NEXT: [[TMP25:%.*]] = extractelement <8 x i32> [[TMP6]], i64 6
	; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64			; KRYO-NEXT: [[TMP26:%.*]] = sext i32 [[TMP25]] to i64
	; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]			; KRYO-NEXT: [[ARRAYIDX55:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP26]]
	; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2			; KRYO-NEXT: [[TMP27:%.]] = load i16, i16 [[ARRAYIDX55]], align 2
	; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32			; KRYO-NEXT: [[CONV56:%.*]] = zext i16 [[TMP27]] to i32
	; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]			; KRYO-NEXT: [[ADD57:%.*]] = add nsw i32 [[ADD48]], [[CONV56]]
	; KRYO-NEXT: [[INCDEC_PTR58]] = getelementptr inbounds i16, i16* [[A_ADDR_0101]], i64 8
	; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7			; KRYO-NEXT: [[TMP28:%.*]] = extractelement <8 x i32> [[TMP6]], i64 7
	; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64			; KRYO-NEXT: [[TMP29:%.*]] = sext i32 [[TMP28]] to i64
	; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]			; KRYO-NEXT: [[ARRAYIDX64:%.]] = getelementptr inbounds i16, i16 [[G]], i64 [[TMP29]]
	; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2			; KRYO-NEXT: [[TMP30:%.]] = load i16, i16 [[ARRAYIDX64]], align 2
	; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32			; KRYO-NEXT: [[CONV65:%.*]] = zext i16 [[TMP30]] to i32
	; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]			; KRYO-NEXT: [[ADD66]] = add nsw i32 [[ADD57]], [[CONV65]]
	; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1			; KRYO-NEXT: [[INC]] = add nuw nsw i32 [[I_0103]], 1
	; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]			; KRYO-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], [[N]]
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

	Show All 29 Lines
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[P17]]			; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[P17]]
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR28330(			; MAX-COST-LABEL: @PR28330(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[P0:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; MAX-COST-NEXT: [[P1:%.*]] = icmp eq i8 [[P0]], 0			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; MAX-COST-NEXT: [[P2:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	; MAX-COST-NEXT: [[P3:%.*]] = icmp eq i8 [[P2]], 0
	; MAX-COST-NEXT: [[P4:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	; MAX-COST-NEXT: [[P5:%.*]] = icmp eq i8 [[P4]], 0
	; MAX-COST-NEXT: [[P6:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 4), align 4
	; MAX-COST-NEXT: [[P7:%.*]] = icmp eq i8 [[P6]], 0
	; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0
	; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0
	; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0
	; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[P19:%.*]] = select i1 [[P1]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P20:%.*]] = add i32 [[P17]], [[P19]]			; MAX-COST-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; MAX-COST-NEXT: [[P21:%.*]] = select i1 [[P3]], i32 -720, i32 -80			; MAX-COST-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], [[P17]]
	; MAX-COST-NEXT: [[P22:%.*]] = add i32 [[P20]], [[P21]]
	; MAX-COST-NEXT: [[P23:%.*]] = select i1 [[P5]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P24:%.*]] = add i32 [[P22]], [[P23]]
	; MAX-COST-NEXT: [[P25:%.*]] = select i1 [[P7]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P26:%.*]] = add i32 [[P24]], [[P25]]
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P28:%.*]] = add i32 [[P26]], [[P27]]
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P30:%.*]] = add i32 [[P28]], [[P29]]
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[P30]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	%p1 = icmp eq i8 %p0, 0			%p1 = icmp eq i8 %p0, 0
	%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	%p3 = icmp eq i8 %p2, 0			%p3 = icmp eq i8 %p2, 0
	%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; GATHER-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>			; GATHER-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])			; GATHER-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], -5			; GATHER-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], -5
	; GATHER-NEXT: br label [[FOR_BODY]]			; GATHER-NEXT: br label [[FOR_BODY]]
	;			;
	; MAX-COST-LABEL: @PR32038(			; MAX-COST-LABEL: @PR32038(
	; MAX-COST-NEXT: entry:			; MAX-COST-NEXT: entry:
	; MAX-COST-NEXT: [[TMP0:%.]] = load <4 x i8>, <4 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <4 x i8>*), align 1			; MAX-COST-NEXT: [[TMP0:%.]] = load <8 x i8>, <8 x i8> bitcast (i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1) to <8 x i8>*), align 1
	; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <4 x i8> [[TMP0]], zeroinitializer			; MAX-COST-NEXT: [[TMP1:%.*]] = icmp eq <8 x i8> [[TMP0]], zeroinitializer
	; MAX-COST-NEXT: [[P8:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 5), align 1
	; MAX-COST-NEXT: [[P9:%.*]] = icmp eq i8 [[P8]], 0
	; MAX-COST-NEXT: [[P10:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 6), align 2
	; MAX-COST-NEXT: [[P11:%.*]] = icmp eq i8 [[P10]], 0
	; MAX-COST-NEXT: [[P12:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 7), align 1
	; MAX-COST-NEXT: [[P13:%.*]] = icmp eq i8 [[P12]], 0
	; MAX-COST-NEXT: [[P14:%.]] = load i8, i8 getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 8), align 8
	; MAX-COST-NEXT: [[P15:%.*]] = icmp eq i8 [[P14]], 0
	; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]			; MAX-COST-NEXT: br label [[FOR_BODY:%.*]]
	; MAX-COST: for.body:			; MAX-COST: for.body:
	; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[P34:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; MAX-COST-NEXT: [[P17:%.]] = phi i32 [ [[OP_EXTRA:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; MAX-COST-NEXT: [[TMP2:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> <i32 -720, i32 -720, i32 -720, i32 -720>, <4 x i32> <i32 -80, i32 -80, i32 -80, i32 -80>			; MAX-COST-NEXT: [[TMP2:%.*]] = select <8 x i1> [[TMP1]], <8 x i32> <i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720, i32 -720>, <8 x i32> <i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80, i32 -80>
	; MAX-COST-NEXT: [[P27:%.*]] = select i1 [[P9]], i32 -720, i32 -80			; MAX-COST-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP2]])
	; MAX-COST-NEXT: [[P29:%.*]] = select i1 [[P11]], i32 -720, i32 -80			; MAX-COST-NEXT: [[OP_EXTRA]] = add i32 [[TMP3]], -5
	; MAX-COST-NEXT: [[TMP3:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP2]])
	; MAX-COST-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], [[P27]]
	; MAX-COST-NEXT: [[TMP5:%.*]] = add i32 [[TMP4]], [[P29]]
	; MAX-COST-NEXT: [[OP_EXTRA:%.*]] = add i32 [[TMP5]], -5
	; MAX-COST-NEXT: [[P31:%.*]] = select i1 [[P13]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P32:%.*]] = add i32 [[OP_EXTRA]], [[P31]]
	; MAX-COST-NEXT: [[P33:%.*]] = select i1 [[P15]], i32 -720, i32 -80
	; MAX-COST-NEXT: [[P34]] = add i32 [[P32]], [[P33]]
	; MAX-COST-NEXT: br label [[FOR_BODY]]			; MAX-COST-NEXT: br label [[FOR_BODY]]
	;			;
	entry:			entry:
	%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1			%p0 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 1), align 1
	%p1 = icmp eq i8 %p0, 0			%p1 = icmp eq i8 %p0, 0
	%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2			%p2 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 2), align 2
	%p3 = icmp eq i8 %p2, 0			%p3 = icmp eq i8 %p2, 0
	%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1			%p4 = load i8, i8* getelementptr inbounds ([80 x i8], [80 x i8]* @a, i64 0, i64 3), align 1
	Show All 33 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-di.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; Debug informations shouldn't affect spill cost.			; Debug informations shouldn't affect spill cost.
	; RUN: opt -S -slp-vectorizer %s -o - \| FileCheck %s			; RUN: opt -S -slp-vectorizer %s -o - \| FileCheck %s

	target triple = "aarch64"			target triple = "aarch64"

	%struct.S = type { i64, i64 }			%struct.S = type { i64, i64 }

	define void @patatino(i64 %n, i64 %i, %struct.S* %p) !dbg !7 {			define void @patatino(i64 %n, i64 %i, %struct.S* %p) !dbg !7 {
	; CHECK-LABEL: @patatino(			; CHECK-LABEL: @patatino(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[N:%.*]], metadata [[META18:![0-9]+]], metadata !DIExpression()), !dbg [[DBG23:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[N:%.*]], metadata [[META18:![0-9]+]], metadata !DIExpression()), !dbg [[DBG23:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[I:%.*]], metadata [[META19:![0-9]+]], metadata !DIExpression()), !dbg [[DBG24:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 [[I:%.*]], metadata [[META19:![0-9]+]], metadata !DIExpression()), !dbg [[DBG24:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata %struct.S* [[P:%.*]], metadata [[META20:![0-9]+]], metadata !DIExpression()), !dbg [[DBG25:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata %struct.S* [[P:%.*]], metadata [[META20:![0-9]+]], metadata !DIExpression()), !dbg [[DBG25:![0-9]+]]
	; CHECK-NEXT: [[X1:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P]], i64 [[N]], i32 0, !dbg [[DBG26:![0-9]+]]			; CHECK-NEXT: [[X1:%.]] = getelementptr inbounds [[STRUCT_S:%.]], %struct.S* [[P]], i64 [[N]], i32 0, !dbg [[DBG26:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META21:![0-9]+]], metadata !DIExpression()), !dbg [[DBG27:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META21:![0-9]+]], metadata !DIExpression()), !dbg [[DBG27:![0-9]+]]
	; CHECK-NEXT: [[Y3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[N]], i32 1, !dbg [[DBG28:![0-9]+]]			; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META22:![0-9]+]], metadata !DIExpression()), !dbg [[DBG28:![0-9]+]]
				; CHECK-NEXT: [[Y3:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[N]], i32 1, !dbg [[DBG29:![0-9]+]]
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[X1]] to <2 x i64>*, !dbg [[DBG26]]			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[X1]] to <2 x i64>*, !dbg [[DBG26]]
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8, !dbg [[DBG26]], !tbaa [[TBAA29:![0-9]+]]			; CHECK-NEXT: [[TMP1:%.]] = load <2 x i64>, <2 x i64> [[TMP0]], align 8, !dbg [[DBG26]], !tbaa [[TBAA30:![0-9]+]]
	; CHECK-NEXT: call void @llvm.dbg.value(metadata i64 undef, metadata [[META22:![0-9]+]], metadata !DIExpression()), !dbg [[DBG33:![0-9]+]]
	; CHECK-NEXT: [[X5:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 0, !dbg [[DBG34:![0-9]+]]			; CHECK-NEXT: [[X5:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 0, !dbg [[DBG34:![0-9]+]]
	; CHECK-NEXT: [[Y7:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 1, !dbg [[DBG35:![0-9]+]]			; CHECK-NEXT: [[Y7:%.]] = getelementptr inbounds [[STRUCT_S]], %struct.S [[P]], i64 [[I]], i32 1, !dbg [[DBG35:![0-9]+]]
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[X5]] to <2 x i64>*, !dbg [[DBG36:![0-9]+]]			; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[X5]] to <2 x i64>*, !dbg [[DBG36:![0-9]+]]
	; CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[TMP2]], align 8, !dbg [[DBG36]], !tbaa [[TBAA29]]			; CHECK-NEXT: store <2 x i64> [[TMP1]], <2 x i64>* [[TMP2]], align 8, !dbg [[DBG36]], !tbaa [[TBAA30]]
	; CHECK-NEXT: ret void, !dbg [[DBG37:![0-9]+]]			; CHECK-NEXT: ret void, !dbg [[DBG37:![0-9]+]]
	;			;
	entry:			entry:
	call void @llvm.dbg.value(metadata i64 %n, metadata !18, metadata !DIExpression()), !dbg !23			call void @llvm.dbg.value(metadata i64 %n, metadata !18, metadata !DIExpression()), !dbg !23
	call void @llvm.dbg.value(metadata i64 %i, metadata !19, metadata !DIExpression()), !dbg !24			call void @llvm.dbg.value(metadata i64 %i, metadata !19, metadata !DIExpression()), !dbg !24
	call void @llvm.dbg.value(metadata %struct.S* %p, metadata !20, metadata !DIExpression()), !dbg !25			call void @llvm.dbg.value(metadata %struct.S* %p, metadata !20, metadata !DIExpression()), !dbg !25
	%x1 = getelementptr inbounds %struct.S, %struct.S* %p, i64 %n, i32 0, !dbg !26			%x1 = getelementptr inbounds %struct.S, %struct.S* %p, i64 %n, i32 0, !dbg !26
	%0 = load i64, i64* %x1, align 8, !dbg !26, !tbaa !27			%0 = load i64, i64* %x1, align 8, !dbg !26, !tbaa !27
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/trunc-insertion.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S \| FileCheck %s
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"
	@d = internal unnamed_addr global i32 5, align 4			@d = internal unnamed_addr global i32 5, align 4

	define dso_local void @l() local_unnamed_addr {			define dso_local void @l() local_unnamed_addr {
	; CHECK-LABEL: @l(			; CHECK-LABEL: @l(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ undef, [[BB:%.]] ], [ [[TMP11:%.]], [[BB25:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i16> [ undef, [[BB:%.]] ], [ [[TMP11:%.]], [[BB25:%.]] ]
	; CHECK-NEXT: br i1 undef, label [[BB3:%.]], label [[BB11:%.]]			; CHECK-NEXT: br i1 undef, label [[BB3:%.]], label [[BB11:%.]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[I4:%.*]] = zext i1 undef to i32
	; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i16> [[TMP0]], undef			; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i16> [[TMP0]], undef
				; CHECK-NEXT: [[I4:%.*]] = zext i1 undef to i32
	; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <2 x i16> [[TMP1]], <i16 8, i16 8>			; CHECK-NEXT: [[TMP2:%.*]] = icmp ugt <2 x i16> [[TMP1]], <i16 8, i16 8>
	; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i1> [[TMP2]] to <2 x i32>			; CHECK-NEXT: [[TMP3:%.*]] = zext <2 x i1> [[TMP2]] to <2 x i32>
	; CHECK-NEXT: br label [[BB25]]			; CHECK-NEXT: br label [[BB25]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: [[I12:%.*]] = zext i1 undef to i32
	; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i16> [[TMP0]], undef			; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i16> [[TMP0]], undef
				; CHECK-NEXT: [[I12:%.*]] = zext i1 undef to i32
	; CHECK-NEXT: [[TMP5:%.*]] = sext <2 x i16> [[TMP4]] to <2 x i64>			; CHECK-NEXT: [[TMP5:%.*]] = sext <2 x i16> [[TMP4]] to <2 x i64>
	; CHECK-NEXT: [[TMP6:%.*]] = icmp ule <2 x i64> undef, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = icmp ule <2 x i64> undef, [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = zext <2 x i1> [[TMP6]] to <2 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = zext <2 x i1> [[TMP6]] to <2 x i32>
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ult <2 x i32> undef, [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = icmp ult <2 x i32> undef, [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = zext <2 x i1> [[TMP8]] to <2 x i32>			; CHECK-NEXT: [[TMP9:%.*]] = zext <2 x i1> [[TMP8]] to <2 x i32>
	; CHECK-NEXT: br label [[BB25]]			; CHECK-NEXT: br label [[BB25]]
	; CHECK: bb25:			; CHECK: bb25:
	; CHECK-NEXT: [[I28:%.*]] = phi i32 [ [[I12]], [[BB11]] ], [ [[I4]], [[BB3]] ]			; CHECK-NEXT: [[I28:%.*]] = phi i32 [ [[I12]], [[BB11]] ], [ [[I4]], [[BB3]] ]
	▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell \| FileCheck %s			; RUN: opt -slp-vectorizer -slp-vectorize-hor -slp-vectorize-hor-store -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=haswell \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128-ni:1"

	define void @test() #0 {			define void @test() #0 {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i64 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA1:%.*]], [[LOOP]] ]			; CHECK-NEXT: [[DUMMY_PHI:%.]] = phi i64 [ 1, [[ENTRY:%.]] ], [ [[OP_EXTRA1:%.*]], [[LOOP]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi i64 [ 2, [[ENTRY]] ], [ [[TMP3:%.]], [[LOOP]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi i64 [ 2, [[ENTRY]] ], [ [[TMP3:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[DUMMY_ADD:%.*]] = add i16 0, 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i64> poison, i64 [[TMP0]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP1]], <4 x i64> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i64> [[SHUFFLE]], <i64 3, i64 2, i64 1, i64 0>			; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i64> [[SHUFFLE]], <i64 3, i64 2, i64 1, i64 0>
	; CHECK-NEXT: [[TMP3]] = extractelement <4 x i64> [[TMP2]], i32 3			; CHECK-NEXT: [[TMP3]] = extractelement <4 x i64> [[TMP2]], i32 3
				; CHECK-NEXT: [[DUMMY_ADD:%.*]] = add i16 0, 0
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP2]], i32 0
	; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP4]], 32			; CHECK-NEXT: [[DUMMY_SHL:%.*]] = shl i64 [[TMP4]], 32
	; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i64> <i64 1, i64 1, i64 1, i64 1>, [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.*]] = ashr exact <4 x i64> [[TMP5]], <i64 32, i64 32, i64 32, i64 32>			; CHECK-NEXT: [[TMP6:%.*]] = ashr exact <4 x i64> [[TMP5]], <i64 32, i64 32, i64 32, i64 32>
	; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP6]])			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vector.reduce.add.v4i64(<4 x i64> [[TMP6]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i64 [[TMP7]], 0			; CHECK-NEXT: [[OP_EXTRA:%.*]] = add i64 [[TMP7]], 0
	; CHECK-NEXT: [[OP_EXTRA1]] = add i64 [[OP_EXTRA]], [[TMP3]]			; CHECK-NEXT: [[OP_EXTRA1]] = add i64 [[OP_EXTRA]], [[TMP3]]
	; CHECK-NEXT: br label [[LOOP]]			; CHECK-NEXT: br label [[LOOP]]
	Show All 29 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR40310.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake < %s \| FileCheck %s

	define void @mainTest(i32 %param, i32 * %vals, i32 %len) {			define void @mainTest(i32 %param, i32 * %vals, i32 %len) {
	; CHECK-LABEL: @mainTest(			; CHECK-LABEL: @mainTest(
	; CHECK-NEXT: bci_15.preheader:			; CHECK-NEXT: bci_15.preheader:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 31, i32 poison>, i32 [[PARAM:%.]], i32 1			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 31, i32 poison>, i32 [[PARAM:%.]], i32 1
	; CHECK-NEXT: br label [[BCI_15:%.*]]			; CHECK-NEXT: br label [[BCI_15:%.*]]
	; CHECK: bci_15:			; CHECK: bci_15:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP7:%.]], [[BCI_15]] ], [ [[TMP0]], [[BCI_15_PREHEADER:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 15			; CHECK-NEXT: [[TMP3:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 -1>
	; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[VALS:%.*]] unordered, align 4			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <16 x i32> [[SHUFFLE]], i32 15
	; CHECK-NEXT: [[TMP4:%.*]] = add <16 x i32> [[SHUFFLE]], <i32 15, i32 14, i32 13, i32 12, i32 11, i32 10, i32 9, i32 8, i32 7, i32 6, i32 5, i32 4, i32 3, i32 2, i32 1, i32 -1>			; CHECK-NEXT: store atomic i32 [[TMP4]], i32* [[VALS:%.*]] unordered, align 4
	; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = call i32 @llvm.vector.reduce.and.v16i32(<16 x i32> [[TMP3]])
	; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP5]], [[TMP2]]			; CHECK-NEXT: [[OP_EXTRA:%.*]] = and i32 [[TMP5]], [[TMP2]]
	; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16			; CHECK-NEXT: [[V44:%.*]] = add i32 [[TMP2]], 16
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[V44]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[V44]], i32 0
	; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[OP_EXTRA]], i32 1			; CHECK-NEXT: [[TMP7]] = insertelement <2 x i32> [[TMP6]], i32 [[OP_EXTRA]], i32 1
	; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 true, label [[BCI_15]], label [[LOOPEXIT:%.*]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/barriercall.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	define i32 @foo(i32* nocapture %A, i32 %n) {			define i32 @foo(i32* nocapture %A, i32 %n) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.*]] = tail call i32 (...) @bar()
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[N:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> poison, i32 [[N:%.]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP0]], <4 x i32> poison, <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.*]] = mul nsw <4 x i32> [[SHUFFLE]], <i32 5, i32 9, i32 3, i32 10>			; CHECK-NEXT: [[TMP1:%.*]] = mul nsw <4 x i32> [[SHUFFLE]], <i32 5, i32 9, i32 3, i32 10>
	; CHECK-NEXT: [[TMP2:%.*]] = shl <4 x i32> [[SHUFFLE]], <i32 5, i32 9, i32 3, i32 10>			; CHECK-NEXT: [[TMP2:%.*]] = shl <4 x i32> [[SHUFFLE]], <i32 5, i32 9, i32 3, i32 10>
	; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
				; CHECK-NEXT: [[CALL:%.*]] = tail call i32 (...) @bar()
	; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], <i32 9, i32 9, i32 9, i32 9>			; CHECK-NEXT: [[TMP4:%.*]] = add nsw <4 x i32> [[TMP3]], <i32 9, i32 9, i32 9, i32 9>
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[A:%.]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[A:%.]] to <4 x i32>
	; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP4]], <4 x i32>* [[TMP5]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%call = tail call i32 (...) @bar() #2			%call = tail call i32 (...) @bar() #2
	%mul = mul nsw i32 %n, 5			%mul = mul nsw i32 %n, 5
	Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll

	Show First 20 Lines • Show All 505 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x double> [ [[TMP6]], [[FOR_BODY]] ], [ zeroinitializer, [[ENTRY]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x double> [ [[TMP6]], [[FOR_BODY]] ], [ zeroinitializer, [[ENTRY]] ]
	; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_018]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = zext i32 [[I_018]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 [[IDXPROM]]
	; CHECK-NEXT: [[ADD1:%.*]] = or i32 [[I_018]], 1			; CHECK-NEXT: [[ADD1:%.*]] = or i32 [[I_018]], 1
	; CHECK-NEXT: [[IDXPROM2:%.*]] = zext i32 [[ADD1]] to i64			; CHECK-NEXT: [[IDXPROM2:%.*]] = zext i32 [[ADD1]] to i64
	; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A]], i64 [[IDXPROM2]]			; CHECK-NEXT: [[ARRAYIDX3:%.]] = getelementptr inbounds double, double [[A]], i64 [[IDXPROM2]]
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[TMP6]] = fadd <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[ADD5]] = add i32 [[I_018]], 2			; CHECK-NEXT: [[ADD5]] = add i32 [[I_018]], 2
	; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[ADD5]], [[N]]			; CHECK-NEXT: [[CMP:%.*]] = icmp ult i32 [[ADD5]], [[N]]
				; CHECK-NEXT: [[TMP6]] = fadd <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_COND_CLEANUP]]
	;			;
	entry:			entry:
	%cmp15 = icmp eq i32 %n, 0			%cmp15 = icmp eq i32 %n, 0
	br i1 %cmp15, label %for.cond.cleanup, label %for.body			br i1 %cmp15, label %for.cond.cleanup, label %for.body

	for.cond.cleanup: ; preds = %for.body, %entry			for.cond.cleanup: ; preds = %for.body, %entry
	%x.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ]			%x.0.lcssa = phi double [ 0.000000e+00, %entry ], [ %add, %for.body ]
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

	Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0			; AVX-NEXT: [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
	; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1			; AVX-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
	; AVX-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]			; AVX-NEXT: [[ADD13]] = fadd float [[TMP12]], [[TMP13]]
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0			; AVX-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[TMP13]], i32 0
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1			; AVX-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1
	; AVX-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>			; AVX-NEXT: [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
	; AVX-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>			; AVX-NEXT: [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>
	; AVX-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]
	; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32			; AVX-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
				; AVX-NEXT: [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]
	; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; AVX-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; AVX: for.end:			; AVX: for.end:
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -slp-min-tree-size=2 -slp-threshold=-1000 -slp-max-look-ahead-depth=1 -slp-schedule-budget=27 -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -slp-min-tree-size=2 -slp-threshold=-1000 -slp-max-look-ahead-depth=1 -slp-schedule-budget=27 -S -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s

	define void @exceed(double %0, double %1) {			define void @exceed(double %0, double %1) {
	; CHECK-LABEL: @exceed(			; CHECK-LABEL: @exceed(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[TMP0:%.]], i32 0			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> poison, double [[TMP0:%.]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP0]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> poison, double [[TMP1:%.]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> poison, double [[TMP1:%.]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = fdiv fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP6]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = fdiv fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[IX:%.*]] = fmul double [[TMP7]], undef			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP7]], i32 1
				; CHECK-NEXT: [[IX:%.*]] = fmul double [[TMP8]], undef
	; CHECK-NEXT: [[IXX0:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX0:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX1:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX1:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX2:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX2:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX3:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX3:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX4:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX4:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX5:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX5:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IX1:%.*]] = fmul double [[TMP7]], undef			; CHECK-NEXT: [[IX1:%.*]] = fmul double [[TMP8]], undef
	; CHECK-NEXT: [[IXX10:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX10:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX11:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX11:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX12:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX12:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX13:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX14:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX15:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX20:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX21:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <2 x double> [[TMP7]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP9]], [[TMP9]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP7]], [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
				; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP11]], [[TMP6]]
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/cross_block_slp.ll

	Show All 16 Lines
	; }			; }


	define i32 @foo(double* nocapture %A, float* nocapture %B, i32 %g) {			define i32 @foo(double* nocapture %A, float* nocapture %B, i32 %g) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[B:%.]] to <2 x float>			; CHECK-NEXT: [[TMP0:%.]] = bitcast float [[B:%.]] to <2 x float>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <2 x float>, <2 x float> [[TMP0]], align 4
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x float> [[TMP1]], <float 5.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[G:%.]], 0			; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[G:%.]], 0
				; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x float> [[TMP1]], <float 5.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_END:%.]], label [[IF_THEN:%.]]			; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_END:%.]], label [[IF_THEN:%.]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: [[CALL:%.*]] = tail call i32 (...) @bar()			; CHECK-NEXT: [[CALL:%.*]] = tail call i32 (...) @bar()
	; CHECK-NEXT: br label [[IF_END]]			; CHECK-NEXT: br label [[IF_END]]
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[TMP3:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>			; CHECK-NEXT: [[TMP3:%.*]] = fpext <2 x float> [[TMP2]] to <2 x double>
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[A:%.]] to <2 x double>			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/cycle_dup.ll

	Show All 18 Lines
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i32>, <4 x i32> [[TMP0]], align 4
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A]], i64 13			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds i32, i32 [[A]], i64 13
	; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX4]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX4]], align 4
	; CHECK-NEXT: [[CMP24:%.*]] = icmp sgt i32 [[TMP2]], 0			; CHECK-NEXT: [[CMP24:%.*]] = icmp sgt i32 [[TMP2]], 0
	; CHECK-NEXT: br i1 [[CMP24]], label [[FOR_BODY:%.]], label [[FOR_END:%.]]			; CHECK-NEXT: br i1 [[CMP24]], label [[FOR_BODY:%.]], label [[FOR_END:%.]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_029:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[I_029:%.]] = phi i32 [ [[INC:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = phi <4 x i32> [ [[TMP4:%.]], [[FOR_BODY]] ], [ [[TMP1]], [[ENTRY]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <4 x i32> [ [[TMP4:%.]], [[FOR_BODY]] ], [ [[TMP1]], [[ENTRY]] ]
	; CHECK-NEXT: [[TMP4]] = mul nsw <4 x i32> [[TMP3]], <i32 18, i32 19, i32 12, i32 9>
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I_029]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I_029]], 1
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[TMP2]]			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[INC]], [[TMP2]]
				; CHECK-NEXT: [[TMP4]] = mul nsw <4 x i32> [[TMP3]], <i32 18, i32 19, i32 12, i32 9>
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP5:%.*]] = phi <4 x i32> [ [[TMP1]], [[ENTRY]] ], [ [[TMP4]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP5:%.*]] = phi <4 x i32> [ [[TMP1]], [[ENTRY]] ], [ [[TMP4]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[A]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP5]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/external_user.ll

	Show All 28 Lines
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_020:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_020:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP1]], [[ENTRY]] ], [ [[TMP5:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP1]], [[ENTRY]] ], [ [[TMP5:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+01, double 1.000000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+01, double 1.000000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], <double 4.000000e+00, double 4.000000e+00>			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[TMP5]] = fadd <2 x double> [[TMP4]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I_020]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I_020]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 100			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 100
				; CHECK-NEXT: [[TMP5]] = fadd <2 x double> [[TMP4]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[B:%.]] to <2 x double>			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[B:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
	; CHECK-NEXT: ret double [[TMP7]]			; CHECK-NEXT: ret double [[TMP7]]
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

	Show All 19 Lines
	; CHECK-NEXT: switch i32 [[TMP3]], label [[WHILE_BODY_BACKEDGE]] [			; CHECK-NEXT: switch i32 [[TMP3]], label [[WHILE_BODY_BACKEDGE]] [
	; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]			; CHECK-NEXT: i32 2, label [[SW_BB:%.*]]
	; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]			; CHECK-NEXT: i32 4, label [[SW_BB6:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: sw.bb:			; CHECK: sw.bb:
	; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.]] = extractelement <2 x i32> [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = ptrtoint i32 [[TMP5]] to i64			; CHECK-NEXT: [[TMP6:%.]] = ptrtoint i32 [[TMP5]] to i64
	; CHECK-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32			; CHECK-NEXT: [[TMP7:%.*]] = trunc i64 [[TMP6]] to i32
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[TMP8:%.]] = extractelement <2 x i32> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP9:%.]] = extractelement <2 x i32> [[TMP4]], i32 1			; CHECK-NEXT: store i32 [[TMP7]], i32* [[TMP8]], align 4
	; CHECK-NEXT: store i32 [[TMP7]], i32* [[TMP9]], align 4
	; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2			; CHECK-NEXT: [[INCDEC_PTR5:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2
				; CHECK-NEXT: [[TMP9:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: sw.bb6:			; CHECK: sw.bb6:
	; CHECK-NEXT: [[INCDEC_PTR8:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2			; CHECK-NEXT: [[INCDEC_PTR8:%.]] = getelementptr inbounds i32, i32 [[C_022]], i64 2
	; CHECK-NEXT: [[TMP10:%.]] = ptrtoint i32 [[INCDEC_PTR]] to i64			; CHECK-NEXT: [[TMP10:%.]] = ptrtoint i32 [[INCDEC_PTR]] to i64
	; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP10]] to i32			; CHECK-NEXT: [[TMP11:%.*]] = trunc i64 [[TMP10]] to i32
	; CHECK-NEXT: [[TMP12:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>			; CHECK-NEXT: [[TMP12:%.]] = extractelement <2 x i32> [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP13:%.]] = extractelement <2 x i32> [[TMP4]], i32 0			; CHECK-NEXT: store i32 [[TMP11]], i32* [[TMP12]], align 4
	; CHECK-NEXT: store i32 [[TMP11]], i32* [[TMP13]], align 4			; CHECK-NEXT: [[TMP13:%.]] = getelementptr i32, <2 x i32> [[TMP1]], <2 x i64> <i64 2, i64 2>
	; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]			; CHECK-NEXT: br label [[WHILE_BODY_BACKEDGE]]
	; CHECK: while.body.backedge:			; CHECK: while.body.backedge:
	; CHECK-NEXT: [[C_022_BE]] = phi i32* [ [[INCDEC_PTR]], [[WHILE_BODY]] ], [ [[INCDEC_PTR8]], [[SW_BB6]] ], [ [[INCDEC_PTR5]], [[SW_BB]] ]			; CHECK-NEXT: [[C_022_BE]] = phi i32* [ [[INCDEC_PTR]], [[WHILE_BODY]] ], [ [[INCDEC_PTR8]], [[SW_BB6]] ], [ [[INCDEC_PTR5]], [[SW_BB]] ]
	; CHECK-NEXT: [[TMP14]] = phi <2 x i32*> [ [[TMP4]], [[WHILE_BODY]] ], [ [[TMP12]], [[SW_BB6]] ], [ [[TMP8]], [[SW_BB]] ]			; CHECK-NEXT: [[TMP14]] = phi <2 x i32*> [ [[TMP4]], [[WHILE_BODY]] ], [ [[TMP13]], [[SW_BB6]] ], [ [[TMP9]], [[SW_BB]] ]
	; CHECK-NEXT: br label [[WHILE_BODY]]			; CHECK-NEXT: br label [[WHILE_BODY]]
	; CHECK: while.end:			; CHECK: while.end:
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	entry:			entry:
	%0 = load i32, i32* @e, align 4			%0 = load i32, i32* @e, align 4
	%tobool.not19 = icmp eq i32 %0, 0			%tobool.not19 = icmp eq i32 %0, 0
	br i1 %tobool.not19, label %while.end, label %while.body			br i1 %tobool.not19, label %while.end, label %while.body
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/multi_block.ll

	Show All 15 Lines
	; A[9] = 5.0 + F1;			; A[9] = 5.0 + F1;
	; }			; }


	define i32 @bar(double* nocapture %A, i32 %d) {			define i32 @bar(double* nocapture %A, i32 %d) {
	; CHECK-LABEL: @bar(			; CHECK-LABEL: @bar(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[A:%.]] to <2 x double>			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 8
	; CHECK-NEXT: [[TMP3:%.*]] = fptrunc <2 x double> [[TMP2]] to <2 x float>			; CHECK-NEXT: [[TMP3:%.]] = icmp eq i32 [[D:%.]], 0
	; CHECK-NEXT: [[TMP4:%.]] = icmp eq i32 [[D:%.]], 0			; CHECK-NEXT: [[TMP4:%.*]] = fptrunc <2 x double> [[TMP2]] to <2 x float>
	; CHECK-NEXT: br i1 [[TMP4]], label [[TMP7:%.]], label [[TMP5:%.]]			; CHECK-NEXT: br i1 [[TMP3]], label [[TMP7:%.]], label [[TMP5:%.]]
	; CHECK: 5:			; CHECK: 5:
	; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 (...) @foo()			; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 (...) @foo()
	; CHECK-NEXT: br label [[TMP7]]			; CHECK-NEXT: br label [[TMP7]]
	; CHECK: 7:			; CHECK: 7:
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP3]], <float 4.000000e+00, float 5.000000e+00>			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x float> [[TMP4]], <float 4.000000e+00, float 5.000000e+00>
	; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[A]], i64 8			; CHECK-NEXT: [[TMP9:%.]] = getelementptr inbounds double, double [[A]], i64 8
	; CHECK-NEXT: [[TMP10:%.*]] = fpext <2 x float> [[TMP8]] to <2 x double>			; CHECK-NEXT: [[TMP10:%.*]] = fpext <2 x float> [[TMP8]] to <2 x double>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 9.000000e+00, double 5.000000e+00>			; CHECK-NEXT: [[TMP11:%.*]] = fadd <2 x double> [[TMP10]], <double 9.000000e+00, double 5.000000e+00>
	; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP9]] to <2 x double>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast double [[TMP9]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8			; CHECK-NEXT: store <2 x double> [[TMP11]], <2 x double>* [[TMP12]], align 8
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%1 = load double, double* %A, align 8			%1 = load double, double* %A, align 8
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	;
%g1 = getelementptr inbounds i32, ptr %r, i64 %sub1		%g1 = getelementptr inbounds i32, ptr %r, i64 %sub1
%g2 = getelementptr inbounds i32, ptr %r, i64 %sub2		%g2 = getelementptr inbounds i32, ptr %r, i64 %sub2
%g3 = getelementptr inbounds i32, ptr %r, i64 %sub3		%g3 = getelementptr inbounds i32, ptr %r, i64 %sub3
ret void		ret void
}		}

define void @test2(i64* %a, i64* %b) {		define void @test2(i64* %a, i64* %b) {
; CHECK-LABEL: @test2(		; CHECK-LABEL: @test2(
; CHECK-NEXT: [[A2:%.]] = getelementptr inbounds i64, ptr [[A:%.]], i64 2		; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x ptr> poison, ptr [[A:%.]], i32 0
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x ptr> poison, ptr [[A]], i32 0
; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x ptr> [[TMP1]], ptr [[B:%.]], i32 1		; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x ptr> [[TMP1]], ptr [[B:%.]], i32 1
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i64, <2 x ptr> [[TMP2]], <2 x i64> <i64 1, i64 3>		; CHECK-NEXT: [[TMP3:%.*]] = getelementptr i64, <2 x ptr> [[TMP2]], <2 x i64> <i64 1, i64 3>
		; CHECK-NEXT: [[A2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 2
; CHECK-NEXT: [[TMP4:%.*]] = ptrtoint <2 x ptr> [[TMP3]] to <2 x i64>		; CHECK-NEXT: [[TMP4:%.*]] = ptrtoint <2 x ptr> [[TMP3]] to <2 x i64>
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0		; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 0
; CHECK-NEXT: [[TMP6:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8		; CHECK-NEXT: [[TMP6:%.*]] = load <2 x i64>, ptr [[TMP5]], align 8
; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP4]], [[TMP6]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP4]], [[TMP6]]
; CHECK-NEXT: store <2 x i64> [[TMP7]], ptr [[TMP5]], align 8		; CHECK-NEXT: store <2 x i64> [[TMP7]], ptr [[TMP5]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%a1 = getelementptr inbounds i64, i64* %a, i64 1		%a1 = getelementptr inbounds i64, i64* %a, i64 1
Show All 12 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

	Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[A:%.]] to <2 x double>
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[I_019:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[I_019:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[INC:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP1]], [[ENTRY]] ], [ [[TMP5:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <2 x double> [ [[TMP1]], [[ENTRY]] ], [ [[TMP5:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+01, double 1.000000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 1.000000e+01, double 1.000000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], <double 4.000000e+00, double 4.000000e+00>			; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> [[TMP3]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[TMP5]] = fadd <2 x double> [[TMP4]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: [[INC]] = add nsw i32 [[I_019]], 1			; CHECK-NEXT: [[INC]] = add nsw i32 [[I_019]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 100			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i32 [[INC]], 100
				; CHECK-NEXT: [[TMP5]] = fadd <2 x double> [[TMP4]], <double 4.000000e+00, double 4.000000e+00>
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[B:%.]] to <2 x double>			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[B:%.]] to <2 x double>
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%arrayidx = getelementptr inbounds double, double* %A, i64 1			%arrayidx = getelementptr inbounds double, double* %A, i64 1
	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP2]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[R_052:%.]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP19:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <4 x float> [ [[TMP2]], [[ENTRY]] ], [ [[TMP20:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP7:%.]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP12:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x float> [[TMP7]], i32 0
	; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00			; CHECK-NEXT: [[MUL:%.*]] = fmul float [[TMP8]], 7.000000e+00
	; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]			; CHECK-NEXT: [[ADD6]] = fadd float [[R_052]], [[MUL]]
	; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[TMP9:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4			; CHECK-NEXT: [[TMP10:%.]] = load float, float [[ARRAYIDX14]], align 4
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX19:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[ARRAYIDX19]] to <2 x float>*
	; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4			; CHECK-NEXT: [[TMP12]] = load <2 x float>, <2 x float>* [[TMP11]], align 4
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP7]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <2 x float> [[TMP7]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> poison, float [[TMP13]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> poison, float [[TMP13]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP10]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <4 x float> [[TMP14]], float [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP16:%.*]] = shufflevector <2 x float> [[TMP12]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[TMP17]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>			; CHECK-NEXT: [[TMP18:%.*]] = fmul <4 x float> [[TMP17]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+01, float 1.100000e+01>
	; CHECK-NEXT: [[TMP19]] = fadd <4 x float> [[TMP6]], [[TMP18]]			; CHECK-NEXT: [[TMP19:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[TMP20:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP19]], 121
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP20]], 121			; CHECK-NEXT: [[TMP20]] = fadd <4 x float> [[TMP6]], [[TMP18]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP19]], i32 0			; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x float> [[TMP20]], i32 0
	; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]			; CHECK-NEXT: [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP19]], i32 1			; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP20]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP19]], i32 2			; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x float> [[TMP20]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]
	; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x float> [[TMP19]], i32 3			; CHECK-NEXT: [[TMP24:%.*]] = extractelement <4 x float> [[TMP20]], i32 3
	; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]
	; CHECK-NEXT: ret float [[ADD31]]			; CHECK-NEXT: ret float [[ADD31]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	%arrayidx2 = getelementptr inbounds float, float* %A, i64 2			%arrayidx2 = getelementptr inbounds float, float* %A, i64 2
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; vectorization of same typed phi nodes.			; vectorization of same typed phi nodes.
	define float @sort_phi_type(float* nocapture readonly %A) {			define float @sort_phi_type(float* nocapture readonly %A) {
	; CHECK-LABEL: @sort_phi_type(			; CHECK-LABEL: @sort_phi_type(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY:%.]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP9:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <4 x float> [ <float 1.000000e+01, float 1.000000e+01, float 1.000000e+01, float 1.000000e+01>, [[ENTRY]] ], [ [[TMP9:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4
				; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[TMP0]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <4 x float> [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP0]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP3]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> [[TMP2]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP0]], i32 3			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP5]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP5]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP0]], i32 2			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP6]], float [[TMP7]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP6]], float [[TMP7]], i32 3
	; CHECK-NEXT: [[TMP9]] = fmul <4 x float> [[TMP8]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>			; CHECK-NEXT: [[TMP9]] = fmul <4 x float> [[TMP8]], <float 8.000000e+00, float 9.000000e+00, float 1.000000e+02, float 1.110000e+02>
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add nsw i64 [[INDVARS_IV]], 4
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i64 [[INDVARS_IV_NEXT]], 128
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1
	; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[ADD29:%.*]] = fadd float [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP9]], i32 2			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP9]], i32 2
	; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP12]]			; CHECK-NEXT: [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP9]], i32 3			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP9]], i32 3
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

	Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX512F-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512F-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4			; AVX512F-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
	; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_3(			; AVX512VL-LABEL: @gather_load_3(
	; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i64 0
	; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1			; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
	; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i64 0			; AVX512VL-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 1
	; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
	; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: store i32 [[TMP6]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP8:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP8:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP4]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1>			; AVX512VL-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1>
	; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5			; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5
	; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*			; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*
	; AVX512VL-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2			; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2
	; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6			; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
	; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6			; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]
	▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: store i32 [[T16]], i32* [[T13]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T16]], i32* [[T13]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_4(			; AVX512VL-LABEL: @gather_load_4(
	; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i64 0			; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i64 0
	; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
				; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5			; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
	; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6			; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
	; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7			; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
	; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
	▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6			; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6
	; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7			; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7
	; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]			; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]
	; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_div(			; AVX512F-LABEL: @gather_load_div(
	; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0			; AVX512F-NEXT: [[TMP3:%.]] = insertelement <2 x float> poison, float* [[TMP1:%.*]], i64 0
	; AVX512F-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512F-NEXT: [[TMP4:%.]] = shufflevector <2 x float> [[TMP3]], <2 x float*> poison, <2 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP4:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512F-NEXT: [[TMP5:%.]] = getelementptr float, <2 x float> [[TMP4]], <2 x i64> <i64 8, i64 5>
	; AVX512F-NEXT: [[TMP5:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i64 0			; AVX512F-NEXT: [[TMP6:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i64 0
	; AVX512F-NEXT: [[TMP6:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512F-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP6]], <4 x float*> poison, <4 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP7:%.]] = getelementptr float, <2 x float> [[TMP6]], <2 x i64> <i64 8, i64 5>			; AVX512F-NEXT: [[TMP7:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>
	; AVX512F-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512F-NEXT: [[TMP8:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0
	; AVX512F-NEXT: [[TMP9:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP10:%.]] = shufflevector <4 x float> [[TMP4]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP9:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512F-NEXT: [[TMP11:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> [[TMP10]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; AVX512F-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP7]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>			; AVX512F-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP8]], i64 7			; AVX512F-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP15:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP14]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> poison, <8 x i32> zeroinitializer			; AVX512F-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP10]], i64 7
	; AVX512F-NEXT: [[TMP16:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX512F-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP16]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP9]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP15]], [[TMP17]]			; AVX512F-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP17]]
	; AVX512F-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX512F-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512F-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_div(			; AVX512VL-LABEL: @gather_load_div(
	; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0			; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <2 x float> poison, float* [[TMP1:%.*]], i64 0
	; AVX512VL-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP4:%.]] = shufflevector <2 x float> [[TMP3]], <2 x float*> poison, <2 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr float, <2 x float> [[TMP4]], <2 x i64> <i64 8, i64 5>
	; AVX512VL-NEXT: [[TMP5:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i64 0			; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i64 0
	; AVX512VL-NEXT: [[TMP6:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP6]], <4 x float*> poison, <4 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr float, <2 x float> [[TMP6]], <2 x i64> <i64 8, i64 5>			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>
	; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512VL-NEXT: [[TMP8:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0
	; AVX512VL-NEXT: [[TMP9:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP10:%.]] = shufflevector <4 x float> [[TMP4]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP9:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512VL-NEXT: [[TMP11:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> [[TMP10]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; AVX512VL-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP7]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>			; AVX512VL-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP8]], i64 7			; AVX512VL-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP15:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP14]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> poison, <8 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP10]], i64 7
	; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX512VL-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP16]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP9]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP15]], [[TMP17]]			; AVX512VL-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP17]]
	; AVX512VL-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX512VL-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512VL-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	%3 = load float, float* %1, align 4, !tbaa !2			%3 = load float, float* %1, align 4, !tbaa !2
	%4 = getelementptr inbounds float, float* %1, i64 4			%4 = getelementptr inbounds float, float* %1, i64 4
	%5 = load float, float* %4, align 4, !tbaa !2			%5 = load float, float* %4, align 4, !tbaa !2
	%6 = fdiv float %3, %5			%6 = fdiv float %3, %5
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

	Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[TMP28]], i32* [[TMP25]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21			; AVX512F-NEXT: [[TMP30:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 21
	; AVX512F-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP31:%.]] = load i32, i32 [[TMP30]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4			; AVX512F-NEXT: [[TMP32:%.*]] = add i32 [[TMP31]], 4
	; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[TMP32]], i32* [[TMP29]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_3(			; AVX512VL-LABEL: @gather_load_3(
	; AVX512VL-NEXT: [[TMP3:%.]] = load i32, i32 [[TMP1:%.*]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x i32> poison, i32* [[TMP1:%.*]], i64 0
	; AVX512VL-NEXT: [[TMP4:%.*]] = add i32 [[TMP3]], 1			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP3]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1			; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
	; AVX512VL-NEXT: store i32 [[TMP4]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP5:%.]] = load i32, i32 [[TMP1]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x i32> poison, i32* [[TMP1]], i64 0			; AVX512VL-NEXT: [[TMP6:%.*]] = add i32 [[TMP5]], 1
	; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP6]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr inbounds i32, i32 [[TMP0:%.*]], i64 1
	; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: store i32 [[TMP6]], i32* [[TMP0]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP8:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP7]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP8:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP4]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1>			; AVX512VL-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], <i32 2, i32 3, i32 4, i32 1>
	; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5			; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 5
	; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP5]] to <4 x i32>*			; AVX512VL-NEXT: [[TMP11:%.]] = bitcast i32 [[TMP7]] to <4 x i32>*
	; AVX512VL-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store <4 x i32> [[TMP9]], <4 x i32>* [[TMP11]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9			; AVX512VL-NEXT: [[TMP12:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 9
	; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP13:%.]] = load i32, i32 [[TMP12]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2			; AVX512VL-NEXT: [[TMP14:%.*]] = add i32 [[TMP13]], 2
	; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6			; AVX512VL-NEXT: [[TMP15:%.]] = getelementptr inbounds i32, i32 [[TMP0]], i64 6
	; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP10]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store i32 [[TMP14]], i32* [[TMP10]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6			; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr inbounds i32, i32 [[TMP1]], i64 6
	; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP17:%.]] = load i32, i32 [[TMP16]], align 4, !tbaa [[TBAA0]]
	▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
	; AVX512F-NEXT: store i32 [[T16]], i32* [[T13]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T16]], i32* [[T13]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T20]], i32* [[T17]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T24]], i32* [[T21]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T28]], i32* [[T25]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store i32 [[T32]], i32* [[T29]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_4(			; AVX512VL-LABEL: @gather_load_4(
	; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i64 0			; AVX512VL-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32* [[T1:%.*]], i64 0
	; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[TMP1]], <4 x i32*> poison, <4 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>			; AVX512VL-NEXT: [[TMP2:%.]] = getelementptr i32, <4 x i32> [[SHUFFLE]], <4 x i64> <i64 11, i64 4, i64 15, i64 18>
				; AVX512VL-NEXT: [[T5:%.]] = getelementptr inbounds i32, i32 [[T0:%.*]], i64 1
	; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5			; AVX512VL-NEXT: [[T21:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 5
	; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9			; AVX512VL-NEXT: [[T22:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 9
	; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6			; AVX512VL-NEXT: [[T25:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 6
	; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6			; AVX512VL-NEXT: [[T26:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 6
	; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7			; AVX512VL-NEXT: [[T29:%.]] = getelementptr inbounds i32, i32 [[T0]], i64 7
	; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21			; AVX512VL-NEXT: [[T30:%.]] = getelementptr inbounds i32, i32 [[T1]], i64 21
	; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[T3:%.]] = load i32, i32 [[T1]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP3:%.]] = call <4 x i32> @llvm.masked.gather.v4i32.v4p0i32(<4 x i32> [[TMP2]], i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> undef), !tbaa [[TBAA0]]
	▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6			; AVX2-NEXT: [[TMP48:%.*]] = insertelement <8 x float> [[TMP47]], float [[TMP29]], i64 6
	; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7			; AVX2-NEXT: [[TMP49:%.*]] = insertelement <8 x float> [[TMP48]], float [[TMP33]], i64 7
	; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]			; AVX2-NEXT: [[TMP50:%.*]] = fdiv <8 x float> [[TMP41]], [[TMP49]]
	; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX2-NEXT: [[TMP51:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]			; AVX2-NEXT: store <8 x float> [[TMP50]], <8 x float>* [[TMP51]], align 4, !tbaa [[TBAA0]]
	; AVX2-NEXT: ret void			; AVX2-NEXT: ret void
	;			;
	; AVX512F-LABEL: @gather_load_div(			; AVX512F-LABEL: @gather_load_div(
	; AVX512F-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0			; AVX512F-NEXT: [[TMP3:%.]] = insertelement <2 x float> poison, float* [[TMP1:%.*]], i64 0
	; AVX512F-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512F-NEXT: [[TMP4:%.]] = shufflevector <2 x float> [[TMP3]], <2 x float*> poison, <2 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP4:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512F-NEXT: [[TMP5:%.]] = getelementptr float, <2 x float> [[TMP4]], <2 x i64> <i64 8, i64 5>
	; AVX512F-NEXT: [[TMP5:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i64 0			; AVX512F-NEXT: [[TMP6:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i64 0
	; AVX512F-NEXT: [[TMP6:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512F-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP6]], <4 x float*> poison, <4 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP7:%.]] = getelementptr float, <2 x float> [[TMP6]], <2 x i64> <i64 8, i64 5>			; AVX512F-NEXT: [[TMP7:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>
	; AVX512F-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512F-NEXT: [[TMP8:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0
	; AVX512F-NEXT: [[TMP9:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0			; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512F-NEXT: [[TMP10:%.]] = shufflevector <4 x float> [[TMP4]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP9:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512F-NEXT: [[TMP11:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> [[TMP10]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; AVX512F-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP7]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512F-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>			; AVX512F-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP8]], i64 7			; AVX512F-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512F-NEXT: [[TMP15:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP14]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512F-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> poison, <8 x i32> zeroinitializer			; AVX512F-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP10]], i64 7
	; AVX512F-NEXT: [[TMP16:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX512F-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP16]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512F-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP9]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512F-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP15]], [[TMP17]]			; AVX512F-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP17]]
	; AVX512F-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX512F-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512F-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]			; AVX512F-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX512F-NEXT: ret void			; AVX512F-NEXT: ret void
	;			;
	; AVX512VL-LABEL: @gather_load_div(			; AVX512VL-LABEL: @gather_load_div(
	; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <4 x float> poison, float* [[TMP1:%.*]], i64 0			; AVX512VL-NEXT: [[TMP3:%.]] = insertelement <2 x float> poison, float* [[TMP1:%.*]], i64 0
	; AVX512VL-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP3]], <4 x float*> poison, <4 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP4:%.]] = shufflevector <2 x float> [[TMP3]], <2 x float*> poison, <2 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP4:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>			; AVX512VL-NEXT: [[TMP5:%.]] = getelementptr float, <2 x float> [[TMP4]], <2 x i64> <i64 8, i64 5>
	; AVX512VL-NEXT: [[TMP5:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i64 0			; AVX512VL-NEXT: [[TMP6:%.]] = insertelement <4 x float> poison, float* [[TMP1]], i64 0
	; AVX512VL-NEXT: [[TMP6:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512VL-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x float> [[TMP6]], <4 x float*> poison, <4 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr float, <2 x float> [[TMP6]], <2 x i64> <i64 8, i64 5>			; AVX512VL-NEXT: [[TMP7:%.]] = getelementptr float, <4 x float> [[SHUFFLE1]], <4 x i64> <i64 10, i64 3, i64 14, i64 17>
	; AVX512VL-NEXT: [[TMP8:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20			; AVX512VL-NEXT: [[TMP8:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0
	; AVX512VL-NEXT: [[TMP9:%.]] = insertelement <8 x float> poison, float* [[TMP1]], i64 0			; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> poison, <8 x i32> zeroinitializer
	; AVX512VL-NEXT: [[TMP10:%.]] = shufflevector <4 x float> [[TMP4]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP9:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>
	; AVX512VL-NEXT: [[TMP11:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> [[TMP10]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP10:%.]] = getelementptr inbounds float, float [[TMP1]], i64 20
	; AVX512VL-NEXT: [[TMP12:%.]] = shufflevector <2 x float> [[TMP7]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX512VL-NEXT: [[TMP11:%.]] = shufflevector <4 x float> [[TMP7]], <4 x float*> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP13:%.]] = shufflevector <8 x float> [[TMP11]], <8 x float*> [[TMP12]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>			; AVX512VL-NEXT: [[TMP12:%.]] = shufflevector <8 x float> [[TMP8]], <8 x float*> [[TMP11]], <8 x i32> <i32 0, i32 8, i32 9, i32 10, i32 11, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP14:%.]] = insertelement <8 x float> [[TMP13]], float* [[TMP8]], i64 7			; AVX512VL-NEXT: [[TMP13:%.]] = shufflevector <2 x float> [[TMP5]], <2 x float*> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX512VL-NEXT: [[TMP15:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP14]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP14:%.]] = shufflevector <8 x float> [[TMP12]], <8 x float*> [[TMP13]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 8, i32 9, i32 undef>
	; AVX512VL-NEXT: [[SHUFFLE:%.]] = shufflevector <8 x float> [[TMP9]], <8 x float*> poison, <8 x i32> zeroinitializer			; AVX512VL-NEXT: [[TMP15:%.]] = insertelement <8 x float> [[TMP14]], float* [[TMP10]], i64 7
	; AVX512VL-NEXT: [[TMP16:%.]] = getelementptr float, <8 x float> [[SHUFFLE]], <8 x i64> <i64 4, i64 13, i64 11, i64 44, i64 33, i64 30, i64 27, i64 23>			; AVX512VL-NEXT: [[TMP16:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP15]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP16]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]			; AVX512VL-NEXT: [[TMP17:%.]] = call <8 x float> @llvm.masked.gather.v8f32.v8p0f32(<8 x float> [[TMP9]], i32 4, <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 true>, <8 x float> undef), !tbaa [[TBAA0]]
	; AVX512VL-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP15]], [[TMP17]]			; AVX512VL-NEXT: [[TMP18:%.*]] = fdiv <8 x float> [[TMP16]], [[TMP17]]
	; AVX512VL-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>			; AVX512VL-NEXT: [[TMP19:%.]] = bitcast float [[TMP0:%.]] to <8 x float>
	; AVX512VL-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]			; AVX512VL-NEXT: store <8 x float> [[TMP18]], <8 x float>* [[TMP19]], align 4, !tbaa [[TBAA0]]
	; AVX512VL-NEXT: ret void			; AVX512VL-NEXT: ret void
	;			;
	; AVX512-LABEL: @gather_load_div(			; AVX512-LABEL: @gather_load_div(
	; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10			; AVX512-NEXT: [[TMP3:%.]] = getelementptr inbounds float, float [[TMP1:%.*]], i64 10
	; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0			; AVX512-NEXT: [[TMP4:%.]] = insertelement <2 x float> poison, float* [[TMP1]], i32 0
	; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> poison, <2 x i32> zeroinitializer			; AVX512-NEXT: [[TMP5:%.]] = shufflevector <2 x float> [[TMP4]], <2 x float*> poison, <2 x i32> zeroinitializer
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/pr47642.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -instcombine -S < %s \| FileCheck %s
	; These code should be fully vectorized by D57059 patch			; These code should be fully vectorized by D57059 patch

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define <4 x i32> @foo(<4 x i32> %x, i32 %f) {			define <4 x i32> @foo(<4 x i32> %x, i32 %f) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: [[VECINIT:%.]] = insertelement <4 x i32> undef, i32 [[F:%.]], i64 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32 [[F:%.]], i64 0
	; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[F]], 1
	; CHECK-NEXT: [[VECINIT1:%.*]] = insertelement <4 x i32> [[VECINIT]], i32 [[ADD]], i64 1
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[F]], i64 0
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <2 x i32> zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <2 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> [[TMP2]], <i32 2, i32 3>			; CHECK-NEXT: [[TMP3:%.*]] = add nsw <2 x i32> [[TMP2]], <i32 2, i32 3>
				; CHECK-NEXT: [[VECINIT:%.*]] = insertelement <4 x i32> undef, i32 [[F]], i64 0
				; CHECK-NEXT: [[ADD:%.*]] = add nsw i32 [[F]], 1
				; CHECK-NEXT: [[VECINIT1:%.*]] = insertelement <4 x i32> [[VECINIT]], i32 [[ADD]], i64 1
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINIT51:%.*]] = shufflevector <4 x i32> [[VECINIT1]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[VECINIT51:%.*]] = shufflevector <4 x i32> [[VECINIT1]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x i32> [[VECINIT51]]			; CHECK-NEXT: ret <4 x i32> [[VECINIT51]]
	;			;
	%vecinit = insertelement <4 x i32> undef, i32 %f, i32 0			%vecinit = insertelement <4 x i32> undef, i32 %f, i32 0
	%add = add nsw i32 %f, 1			%add = add nsw i32 %f, 1
	%vecinit1 = insertelement <4 x i32> %vecinit, i32 %add, i32 1			%vecinit1 = insertelement <4 x i32> %vecinit, i32 %add, i32 1
	%add2 = add nsw i32 %f, 2			%add2 = add nsw i32 %f, 2
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/rgb_phi.ll

	Show All 27 Lines
	; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 2			; CHECK-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds float, float [[A]], i64 2
	; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load float, float [[ARRAYIDX2]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[TMP4:%.]] = phi float [ [[TMP3]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi float [ [[TMP3]], [[ENTRY:%.]] ], [ [[DOTPRE:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.]] ]
	; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[B_032:%.]] = phi float [ [[TMP2]], [[ENTRY]] ], [ [[ADD14:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[B_032:%.]] = phi float [ [[TMP2]], [[ENTRY]] ], [ [[ADD14:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[TMP5:%.]] = phi <2 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP11:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]			; CHECK-NEXT: [[TMP5:%.]] = phi <2 x float> [ [[TMP1]], [[ENTRY]] ], [ [[TMP14:%.]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = add nsw i64 [[INDVARS_IV]], 1			; CHECK-NEXT: [[TMP6:%.*]] = add nsw i64 [[INDVARS_IV]], 1
	; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP6]]			; CHECK-NEXT: [[ARRAYIDX7:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP6]]
	; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX7]], align 4			; CHECK-NEXT: [[TMP7:%.]] = load float, float [[ARRAYIDX7]], align 4
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> poison, float [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> [[TMP8]], float [[TMP7]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> [[TMP8]], float [[TMP7]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP9]], <float 7.000000e+00, float 8.000000e+00>			; CHECK-NEXT: [[TMP10:%.*]] = fmul <2 x float> [[TMP9]], <float 7.000000e+00, float 8.000000e+00>
	; CHECK-NEXT: [[TMP11]] = fadd <2 x float> [[TMP5]], [[TMP10]]			; CHECK-NEXT: [[TMP11:%.*]] = add nsw i64 [[INDVARS_IV]], 2
	; CHECK-NEXT: [[TMP12:%.*]] = add nsw i64 [[INDVARS_IV]], 2			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP11]]
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds float, float [[A]], i64 [[TMP12]]			; CHECK-NEXT: [[TMP12:%.]] = load float, float [[ARRAYIDX12]], align 4
	; CHECK-NEXT: [[TMP13:%.]] = load float, float [[ARRAYIDX12]], align 4			; CHECK-NEXT: [[MUL13:%.*]] = fmul float [[TMP12]], 9.000000e+00
	; CHECK-NEXT: [[MUL13:%.*]] = fmul float [[TMP13]], 9.000000e+00
	; CHECK-NEXT: [[ADD14]] = fadd float [[B_032]], [[MUL13]]			; CHECK-NEXT: [[ADD14]] = fadd float [[B_032]], [[MUL13]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3			; CHECK-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3
	; CHECK-NEXT: [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; CHECK-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP14]], 121			; CHECK-NEXT: [[CMP:%.*]] = icmp slt i32 [[TMP13]], 121
				; CHECK-NEXT: [[TMP14]] = fadd <2 x float> [[TMP5]], [[TMP10]]
	; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]			; CHECK-NEXT: br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]
	; CHECK: for.body.for.body_crit_edge:			; CHECK: for.body.for.body_crit_edge:
	; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]			; CHECK-NEXT: [[ARRAYIDX3_PHI_TRANS_INSERT:%.]] = getelementptr inbounds float, float [[A]], i64 [[INDVARS_IV_NEXT]]
	; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4			; CHECK-NEXT: [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4
	; CHECK-NEXT: br label [[FOR_BODY]]			; CHECK-NEXT: br label [[FOR_BODY]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP11]], i32 0			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <2 x float> [[TMP14]], i32 0
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP11]], i32 1			; CHECK-NEXT: [[TMP16:%.*]] = extractelement <2 x float> [[TMP14]], i32 1
	; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[TMP15]], [[TMP16]]			; CHECK-NEXT: [[ADD16:%.*]] = fadd float [[TMP15]], [[TMP16]]
	; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[ADD14]]			; CHECK-NEXT: [[ADD17:%.*]] = fadd float [[ADD16]], [[ADD14]]
	; CHECK-NEXT: ret float [[ADD17]]			; CHECK-NEXT: ret float [[ADD17]]
	;			;
	entry:			entry:
	%0 = load float, float* %A, align 4			%0 = load float, float* %A, align 4
	%arrayidx1 = getelementptr inbounds float, float* %A, i64 1			%arrayidx1 = getelementptr inbounds float, float* %A, i64 1
	%1 = load float, float* %arrayidx1, align 4			%1 = load float, float* %arrayidx1, align 4
	Show All 38 Lines

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder2.ll

	Show All 15 Lines
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 poison, i32 undef>, i32 [[ADD7:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x i32> <i32 poison, i32 undef>, i32 [[ADD7:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = sdiv <2 x i32> [[TMP0]], <i32 2, i32 2>			; CHECK-NEXT: [[TMP1:%.*]] = sdiv <2 x i32> [[TMP0]], <i32 2, i32 2>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 1, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <4 x i32> <i32 1, i32 1, i32 0, i32 0>
	; CHECK-NEXT: switch i32 undef, label [[SW_EPILOG:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[SW_EPILOG:%.*]] [
	; CHECK-NEXT: i32 0, label [[SW_BB:%.*]]			; CHECK-NEXT: i32 0, label [[SW_BB:%.*]]
	; CHECK-NEXT: i32 2, label [[SW_BB]]			; CHECK-NEXT: i32 2, label [[SW_BB]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: sw.bb:			; CHECK: sw.bb:
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i32 [[G]] to <2 x i32>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x i32>, <2 x i32> [[TMP2]], align 4
	; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 2, i32 0>			; CHECK-NEXT: [[SHRINK_SHUFFLE:%.*]] = shufflevector <4 x i32> [[SHUFFLE]], <4 x i32> poison, <2 x i32> <i32 2, i32 0>
	; CHECK-NEXT: [[TMP4:%.*]] = xor <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>			; CHECK-NEXT: [[TMP2:%.*]] = xor <2 x i32> [[SHRINK_SHUFFLE]], <i32 -1, i32 -1>
	; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP3]], [[TMP4]]			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[G]] to <2 x i32>*
				; CHECK-NEXT: [[TMP4:%.]] = load <2 x i32>, <2 x i32> [[TMP3]], align 4
				; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: br label [[SW_EPILOG]]			; CHECK-NEXT: br label [[SW_EPILOG]]
	; CHECK: sw.epilog:			; CHECK: sw.epilog:
	; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i32> [ undef, [[ENTRY:%.]] ], [ [[TMP5]], [[SW_BB]] ]			; CHECK-NEXT: [[TMP6:%.]] = phi <2 x i32> [ undef, [[ENTRY:%.]] ], [ [[TMP5]], [[SW_BB]] ]
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <4 x i32> <i32 1, i32 1, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> poison, <4 x i32> <i32 1, i32 1, i32 0, i32 0>
	; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> poison, [[SHUFFLE]]			; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> poison, [[SHUFFLE]]
	; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[TMP7]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP8:%.*]] = add <4 x i32> [[TMP7]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[B]] to <4 x i32>*			; CHECK-NEXT: [[TMP9:%.]] = bitcast i32 [[B]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4			; CHECK-NEXT: store <4 x i32> [[TMP8]], <4 x i32>* [[TMP9]], align 4
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/sitofp-inseltpoison.ll

	Show First 20 Lines • Show All 1,059 Lines • ▼ Show 20 Lines
	}			}

	;			;
	; SITOFP BUILDVECTOR			; SITOFP BUILDVECTOR
	;			;

	define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {			define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {
	; SSE-LABEL: @sitofp_4xi32_4f64(			; SSE-LABEL: @sitofp_4xi32_4f64(
	; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32 [[A0:%.]], i32 0			; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32 [[A2:%.]], i32 0
	; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[A1:%.]], i32 1			; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[A3:%.]], i32 1
	; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>			; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>
	; SSE-NEXT: [[TMP4:%.]] = insertelement <2 x i32> poison, i32 [[A2:%.]], i32 0			; SSE-NEXT: [[TMP4:%.]] = insertelement <2 x i32> poison, i32 [[A0:%.]], i32 0
	; SSE-NEXT: [[TMP5:%.]] = insertelement <2 x i32> [[TMP4]], i32 [[A3:%.]], i32 1			; SSE-NEXT: [[TMP5:%.]] = insertelement <2 x i32> [[TMP4]], i32 [[A1:%.]], i32 1
	; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP5]] to <2 x double>			; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP5]] to <2 x double>
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SSE-NEXT: [[RES31:%.*]] = shufflevector <4 x double> [[TMP7]], <4 x double> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; SSE-NEXT: [[RES31:%.*]] = shufflevector <4 x double> [[TMP7]], <4 x double> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; SSE-NEXT: ret <4 x double> [[RES31]]			; SSE-NEXT: ret <4 x double> [[RES31]]
	;			;
	; AVX-LABEL: @sitofp_4xi32_4f64(			; AVX-LABEL: @sitofp_4xi32_4f64(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1			; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1
	; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2			; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2
	; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3			; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

	Show First 20 Lines • Show All 1,059 Lines • ▼ Show 20 Lines
	}			}

	;			;
	; SITOFP BUILDVECTOR			; SITOFP BUILDVECTOR
	;			;

	define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {			define <4 x double> @sitofp_4xi32_4f64(i32 %a0, i32 %a1, i32 %a2, i32 %a3) #0 {
	; SSE-LABEL: @sitofp_4xi32_4f64(			; SSE-LABEL: @sitofp_4xi32_4f64(
	; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32 [[A0:%.]], i32 0			; SSE-NEXT: [[TMP1:%.]] = insertelement <2 x i32> poison, i32 [[A2:%.]], i32 0
	; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[A1:%.]], i32 1			; SSE-NEXT: [[TMP2:%.]] = insertelement <2 x i32> [[TMP1]], i32 [[A3:%.]], i32 1
	; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>			; SSE-NEXT: [[TMP3:%.*]] = sitofp <2 x i32> [[TMP2]] to <2 x double>
	; SSE-NEXT: [[TMP4:%.]] = insertelement <2 x i32> poison, i32 [[A2:%.]], i32 0			; SSE-NEXT: [[TMP4:%.]] = insertelement <2 x i32> poison, i32 [[A0:%.]], i32 0
	; SSE-NEXT: [[TMP5:%.]] = insertelement <2 x i32> [[TMP4]], i32 [[A3:%.]], i32 1			; SSE-NEXT: [[TMP5:%.]] = insertelement <2 x i32> [[TMP4]], i32 [[A1:%.]], i32 1
	; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP5]] to <2 x double>			; SSE-NEXT: [[TMP6:%.*]] = sitofp <2 x i32> [[TMP5]] to <2 x double>
	; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SSE-NEXT: [[TMP7:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; SSE-NEXT: [[TMP8:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; SSE-NEXT: [[RES31:%.*]] = shufflevector <4 x double> [[TMP7]], <4 x double> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; SSE-NEXT: [[RES31:%.*]] = shufflevector <4 x double> [[TMP7]], <4 x double> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; SSE-NEXT: ret <4 x double> [[RES31]]			; SSE-NEXT: ret <4 x double> [[RES31]]
	;			;
	; AVX-LABEL: @sitofp_4xi32_4f64(			; AVX-LABEL: @sitofp_4xi32_4f64(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[A0:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1			; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> [[TMP1]], i32 [[A1:%.]], i32 1
	; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2			; AVX-NEXT: [[TMP3:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[A2:%.]], i32 2
	; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3			; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP3]], i32 [[A3:%.]], i32 3
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/stores-non-ordered.ll

	Show All 13 Lines
	; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0			; CHECK-NEXT: [[INN_ADDR:%.]] = getelementptr inbounds i32, i32 [[INN:%.*]], i64 0
	; CHECK-NEXT: [[LOAD_5:%.]] = load i32, i32 [[INN_ADDR]], align 4			; CHECK-NEXT: [[LOAD_5:%.]] = load i32, i32 [[INN_ADDR]], align 4
	; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1			; CHECK-NEXT: [[GEP_4:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 1
	; CHECK-NEXT: [[LOAD_6:%.]] = load i32, i32 [[GEP_4]], align 4			; CHECK-NEXT: [[LOAD_6:%.]] = load i32, i32 [[GEP_4]], align 4
	; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2			; CHECK-NEXT: [[GEP_5:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 2
	; CHECK-NEXT: [[LOAD_7:%.]] = load i32, i32 [[GEP_5]], align 4			; CHECK-NEXT: [[LOAD_7:%.]] = load i32, i32 [[GEP_5]], align 4
	; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3
	; CHECK-NEXT: [[LOAD_8:%.]] = load i32, i32 [[GEP_6]], align 4			; CHECK-NEXT: [[LOAD_8:%.]] = load i32, i32 [[GEP_6]], align 4
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_1]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_2]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[LOAD_3]], i32 1			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x i32> [[TMP1]], i32 [[LOAD_4]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_5]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_6]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[LOAD_7]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x i32> [[TMP3]], i32 [[LOAD_8]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_2]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_1]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[LOAD_4]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x i32> [[TMP6]], i32 [[LOAD_3]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i32> poison, i32 [[LOAD_5]], i32 0
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[LOAD_8]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i32> [[TMP8]], i32 [[LOAD_7]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i32> [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = mul <2 x i32> [[TMP7]], [[TMP9]]
	; CHECK-NEXT: br label [[BLOCK1:%.*]]			; CHECK-NEXT: br label [[BLOCK1:%.*]]
	; CHECK: block1:			; CHECK: block1:
	; CHECK-NEXT: [[GEP_X:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 5			; CHECK-NEXT: [[GEP_X:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 5
	; CHECK-NEXT: [[LOAD_9:%.]] = load i32, i32 [[GEP_X]], align 4			; CHECK-NEXT: [[LOAD_9:%.]] = load i32, i32 [[GEP_X]], align 4
	; CHECK-NEXT: br label [[BLOCK2:%.*]]			; CHECK-NEXT: br label [[BLOCK2:%.*]]
	; CHECK: block2:			; CHECK: block2:
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0			; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1			; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2			; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3			; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
	; CHECK-NEXT: [[GEP_11:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 4			; CHECK-NEXT: [[GEP_11:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 4
	; CHECK-NEXT: store i32 [[LOAD_9]], i32* [[GEP_9]], align 4			; CHECK-NEXT: store i32 [[LOAD_9]], i32* [[GEP_9]], align 4
	; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[GEP_10]] to <2 x i32>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast i32 [[GEP_10]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP11]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP10]], <2 x i32>* [[TMP11]], align 4
	; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[GEP_7]] to <2 x i32>*			; CHECK-NEXT: [[TMP12:%.]] = bitcast i32 [[GEP_7]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP10]], <2 x i32>* [[TMP12]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP12]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 1			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 1
	%load.2 = load i32, i32* %gep.1, align 4			%load.2 = load i32, i32* %gep.1, align 4
	%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 2			%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 2
	%load.3 = load i32, i32* %gep.2, align 4			%load.3 = load i32, i32* %gep.2, align 4
	Show All 35 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

	Show All 10 Lines
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP18:%.]], [[BB3:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP18:%.]], [[BB3:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8
	; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>
				; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP9]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x double> poison, double [[TMP10]], i32 0			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x double> poison, double [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP9]], i32 1			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP9]], i32 1
	▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	; MAX32-NEXT: store float [[PHI31]], float* undef, align 4			; MAX32-NEXT: store float [[PHI31]], float* undef, align 4
	; MAX32-NEXT: ret void			; MAX32-NEXT: ret void
	;			;
	; MAX256-LABEL: @phi_float32(			; MAX256-LABEL: @phi_float32(
	; MAX256-NEXT: bb:			; MAX256-NEXT: bb:
	; MAX256-NEXT: br label [[BB1:%.*]]			; MAX256-NEXT: br label [[BB1:%.*]]
	; MAX256: bb1:			; MAX256: bb1:
	; MAX256-NEXT: [[I:%.]] = fpext half [[HVAL:%.]] to float			; MAX256-NEXT: [[I:%.]] = fpext half [[HVAL:%.]] to float
	; MAX256-NEXT: [[I3:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[I6:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[I9:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[TMP0:%.*]] = insertelement <8 x float> poison, float [[I]], i32 0			; MAX256-NEXT: [[TMP0:%.*]] = insertelement <8 x float> poison, float [[I]], i32 0
	; MAX256-NEXT: [[SHUFFLE11:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX256-NEXT: [[SHUFFLE11:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX256-NEXT: [[TMP1:%.]] = insertelement <8 x float> poison, float [[FVAL:%.]], i32 0			; MAX256-NEXT: [[TMP1:%.]] = insertelement <8 x float> poison, float [[FVAL:%.]], i32 0
	; MAX256-NEXT: [[SHUFFLE12:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX256-NEXT: [[SHUFFLE12:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX256-NEXT: [[TMP2:%.*]] = fmul <8 x float> [[SHUFFLE11]], [[SHUFFLE12]]			; MAX256-NEXT: [[TMP2:%.*]] = fmul <8 x float> [[SHUFFLE11]], [[SHUFFLE12]]
	; MAX256-NEXT: [[TMP3:%.*]] = fadd <8 x float> zeroinitializer, [[TMP2]]			; MAX256-NEXT: [[I3:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[TMP4:%.*]] = insertelement <8 x float> poison, float [[I3]], i32 0			; MAX256-NEXT: [[TMP3:%.*]] = insertelement <8 x float> poison, float [[I3]], i32 0
	; MAX256-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x float> [[TMP4]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX256-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX256-NEXT: [[TMP5:%.*]] = fmul <8 x float> [[SHUFFLE]], [[SHUFFLE12]]			; MAX256-NEXT: [[TMP4:%.*]] = fmul <8 x float> [[SHUFFLE]], [[SHUFFLE12]]
	; MAX256-NEXT: [[TMP6:%.*]] = fadd <8 x float> zeroinitializer, [[TMP5]]			; MAX256-NEXT: [[I6:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[TMP7:%.*]] = insertelement <8 x float> poison, float [[I6]], i32 0			; MAX256-NEXT: [[TMP5:%.*]] = insertelement <8 x float> poison, float [[I6]], i32 0
	; MAX256-NEXT: [[SHUFFLE5:%.*]] = shufflevector <8 x float> [[TMP7]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX256-NEXT: [[SHUFFLE5:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX256-NEXT: [[TMP8:%.*]] = fmul <8 x float> [[SHUFFLE5]], [[SHUFFLE12]]			; MAX256-NEXT: [[TMP6:%.*]] = fmul <8 x float> [[SHUFFLE5]], [[SHUFFLE12]]
	; MAX256-NEXT: [[TMP9:%.*]] = fadd <8 x float> zeroinitializer, [[TMP8]]			; MAX256-NEXT: [[I9:%.*]] = fpext half [[HVAL]] to float
	; MAX256-NEXT: [[TMP10:%.*]] = insertelement <8 x float> poison, float [[I9]], i32 0			; MAX256-NEXT: [[TMP7:%.*]] = insertelement <8 x float> poison, float [[I9]], i32 0
	; MAX256-NEXT: [[SHUFFLE8:%.*]] = shufflevector <8 x float> [[TMP10]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX256-NEXT: [[SHUFFLE8:%.*]] = shufflevector <8 x float> [[TMP7]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX256-NEXT: [[TMP11:%.*]] = fmul <8 x float> [[SHUFFLE8]], [[SHUFFLE12]]			; MAX256-NEXT: [[TMP8:%.*]] = fmul <8 x float> [[SHUFFLE8]], [[SHUFFLE12]]
	; MAX256-NEXT: [[TMP12:%.*]] = fadd <8 x float> zeroinitializer, [[TMP11]]			; MAX256-NEXT: [[TMP9:%.*]] = fadd <8 x float> zeroinitializer, [[TMP4]]
				; MAX256-NEXT: [[TMP10:%.*]] = fadd <8 x float> zeroinitializer, [[TMP6]]
				; MAX256-NEXT: [[TMP11:%.*]] = fadd <8 x float> zeroinitializer, [[TMP8]]
				; MAX256-NEXT: [[TMP12:%.*]] = fadd <8 x float> zeroinitializer, [[TMP2]]
	; MAX256-NEXT: switch i32 undef, label [[BB5:%.*]] [			; MAX256-NEXT: switch i32 undef, label [[BB5:%.*]] [
	; MAX256-NEXT: i32 0, label [[BB2:%.*]]			; MAX256-NEXT: i32 0, label [[BB2:%.*]]
	; MAX256-NEXT: i32 1, label [[BB3:%.*]]			; MAX256-NEXT: i32 1, label [[BB3:%.*]]
	; MAX256-NEXT: i32 2, label [[BB4:%.*]]			; MAX256-NEXT: i32 2, label [[BB4:%.*]]
	; MAX256-NEXT: ]			; MAX256-NEXT: ]
	; MAX256: bb3:			; MAX256: bb3:
	; MAX256-NEXT: br label [[BB2]]			; MAX256-NEXT: br label [[BB2]]
	; MAX256: bb4:			; MAX256: bb4:
	; MAX256-NEXT: br label [[BB2]]			; MAX256-NEXT: br label [[BB2]]
	; MAX256: bb5:			; MAX256: bb5:
	; MAX256-NEXT: br label [[BB2]]			; MAX256-NEXT: br label [[BB2]]
	; MAX256: bb2:			; MAX256: bb2:
	; MAX256-NEXT: [[TMP13:%.*]] = phi <8 x float> [ [[TMP6]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]			; MAX256-NEXT: [[TMP13:%.*]] = phi <8 x float> [ [[TMP9]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]
	; MAX256-NEXT: [[TMP14:%.*]] = phi <8 x float> [ [[TMP9]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[TMP9]], [[BB5]] ], [ [[TMP9]], [[BB1]] ]			; MAX256-NEXT: [[TMP14:%.*]] = phi <8 x float> [ [[TMP10]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[TMP10]], [[BB5]] ], [ [[TMP10]], [[BB1]] ]
	; MAX256-NEXT: [[TMP15:%.*]] = phi <8 x float> [ [[TMP12]], [[BB3]] ], [ [[TMP12]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[TMP12]], [[BB1]] ]			; MAX256-NEXT: [[TMP15:%.*]] = phi <8 x float> [ [[TMP11]], [[BB3]] ], [ [[TMP11]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[TMP11]], [[BB1]] ]
	; MAX256-NEXT: [[TMP16:%.*]] = phi <8 x float> [ [[TMP3]], [[BB3]] ], [ [[TMP3]], [[BB4]] ], [ [[TMP3]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]			; MAX256-NEXT: [[TMP16:%.*]] = phi <8 x float> [ [[TMP12]], [[BB3]] ], [ [[TMP12]], [[BB4]] ], [ [[TMP12]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]
	; MAX256-NEXT: [[TMP17:%.*]] = extractelement <8 x float> [[TMP14]], i32 7			; MAX256-NEXT: [[TMP17:%.*]] = extractelement <8 x float> [[TMP14]], i32 7
	; MAX256-NEXT: store float [[TMP17]], float* undef, align 4			; MAX256-NEXT: store float [[TMP17]], float* undef, align 4
	; MAX256-NEXT: ret void			; MAX256-NEXT: ret void
	;			;
	; MAX1024-LABEL: @phi_float32(			; MAX1024-LABEL: @phi_float32(
	; MAX1024-NEXT: bb:			; MAX1024-NEXT: bb:
	; MAX1024-NEXT: br label [[BB1:%.*]]			; MAX1024-NEXT: br label [[BB1:%.*]]
	; MAX1024: bb1:			; MAX1024: bb1:
	; MAX1024-NEXT: [[I:%.]] = fpext half [[HVAL:%.]] to float			; MAX1024-NEXT: [[I:%.]] = fpext half [[HVAL:%.]] to float
	; MAX1024-NEXT: [[I3:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[I6:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[I9:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[TMP0:%.*]] = insertelement <8 x float> poison, float [[I]], i32 0			; MAX1024-NEXT: [[TMP0:%.*]] = insertelement <8 x float> poison, float [[I]], i32 0
	; MAX1024-NEXT: [[SHUFFLE11:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX1024-NEXT: [[SHUFFLE11:%.*]] = shufflevector <8 x float> [[TMP0]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX1024-NEXT: [[TMP1:%.]] = insertelement <8 x float> poison, float [[FVAL:%.]], i32 0			; MAX1024-NEXT: [[TMP1:%.]] = insertelement <8 x float> poison, float [[FVAL:%.]], i32 0
	; MAX1024-NEXT: [[SHUFFLE12:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX1024-NEXT: [[SHUFFLE12:%.*]] = shufflevector <8 x float> [[TMP1]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX1024-NEXT: [[TMP2:%.*]] = fmul <8 x float> [[SHUFFLE11]], [[SHUFFLE12]]			; MAX1024-NEXT: [[TMP2:%.*]] = fmul <8 x float> [[SHUFFLE11]], [[SHUFFLE12]]
	; MAX1024-NEXT: [[TMP3:%.*]] = fadd <8 x float> zeroinitializer, [[TMP2]]			; MAX1024-NEXT: [[I3:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[TMP4:%.*]] = insertelement <8 x float> poison, float [[I3]], i32 0			; MAX1024-NEXT: [[TMP3:%.*]] = insertelement <8 x float> poison, float [[I3]], i32 0
	; MAX1024-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x float> [[TMP4]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX1024-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x float> [[TMP3]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX1024-NEXT: [[TMP5:%.*]] = fmul <8 x float> [[SHUFFLE]], [[SHUFFLE12]]			; MAX1024-NEXT: [[TMP4:%.*]] = fmul <8 x float> [[SHUFFLE]], [[SHUFFLE12]]
	; MAX1024-NEXT: [[TMP6:%.*]] = fadd <8 x float> zeroinitializer, [[TMP5]]			; MAX1024-NEXT: [[I6:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[TMP7:%.*]] = insertelement <8 x float> poison, float [[I6]], i32 0			; MAX1024-NEXT: [[TMP5:%.*]] = insertelement <8 x float> poison, float [[I6]], i32 0
	; MAX1024-NEXT: [[SHUFFLE5:%.*]] = shufflevector <8 x float> [[TMP7]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX1024-NEXT: [[SHUFFLE5:%.*]] = shufflevector <8 x float> [[TMP5]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX1024-NEXT: [[TMP8:%.*]] = fmul <8 x float> [[SHUFFLE5]], [[SHUFFLE12]]			; MAX1024-NEXT: [[TMP6:%.*]] = fmul <8 x float> [[SHUFFLE5]], [[SHUFFLE12]]
	; MAX1024-NEXT: [[TMP9:%.*]] = fadd <8 x float> zeroinitializer, [[TMP8]]			; MAX1024-NEXT: [[I9:%.*]] = fpext half [[HVAL]] to float
	; MAX1024-NEXT: [[TMP10:%.*]] = insertelement <8 x float> poison, float [[I9]], i32 0			; MAX1024-NEXT: [[TMP7:%.*]] = insertelement <8 x float> poison, float [[I9]], i32 0
	; MAX1024-NEXT: [[SHUFFLE8:%.*]] = shufflevector <8 x float> [[TMP10]], <8 x float> poison, <8 x i32> zeroinitializer			; MAX1024-NEXT: [[SHUFFLE8:%.*]] = shufflevector <8 x float> [[TMP7]], <8 x float> poison, <8 x i32> zeroinitializer
	; MAX1024-NEXT: [[TMP11:%.*]] = fmul <8 x float> [[SHUFFLE8]], [[SHUFFLE12]]			; MAX1024-NEXT: [[TMP8:%.*]] = fmul <8 x float> [[SHUFFLE8]], [[SHUFFLE12]]
	; MAX1024-NEXT: [[TMP12:%.*]] = fadd <8 x float> zeroinitializer, [[TMP11]]			; MAX1024-NEXT: [[TMP9:%.*]] = fadd <8 x float> zeroinitializer, [[TMP4]]
				; MAX1024-NEXT: [[TMP10:%.*]] = fadd <8 x float> zeroinitializer, [[TMP6]]
				; MAX1024-NEXT: [[TMP11:%.*]] = fadd <8 x float> zeroinitializer, [[TMP8]]
				; MAX1024-NEXT: [[TMP12:%.*]] = fadd <8 x float> zeroinitializer, [[TMP2]]
	; MAX1024-NEXT: switch i32 undef, label [[BB5:%.*]] [			; MAX1024-NEXT: switch i32 undef, label [[BB5:%.*]] [
	; MAX1024-NEXT: i32 0, label [[BB2:%.*]]			; MAX1024-NEXT: i32 0, label [[BB2:%.*]]
	; MAX1024-NEXT: i32 1, label [[BB3:%.*]]			; MAX1024-NEXT: i32 1, label [[BB3:%.*]]
	; MAX1024-NEXT: i32 2, label [[BB4:%.*]]			; MAX1024-NEXT: i32 2, label [[BB4:%.*]]
	; MAX1024-NEXT: ]			; MAX1024-NEXT: ]
	; MAX1024: bb3:			; MAX1024: bb3:
	; MAX1024-NEXT: br label [[BB2]]			; MAX1024-NEXT: br label [[BB2]]
	; MAX1024: bb4:			; MAX1024: bb4:
	; MAX1024-NEXT: br label [[BB2]]			; MAX1024-NEXT: br label [[BB2]]
	; MAX1024: bb5:			; MAX1024: bb5:
	; MAX1024-NEXT: br label [[BB2]]			; MAX1024-NEXT: br label [[BB2]]
	; MAX1024: bb2:			; MAX1024: bb2:
	; MAX1024-NEXT: [[TMP13:%.*]] = phi <8 x float> [ [[TMP6]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]			; MAX1024-NEXT: [[TMP13:%.*]] = phi <8 x float> [ [[TMP9]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]
	; MAX1024-NEXT: [[TMP14:%.*]] = phi <8 x float> [ [[TMP9]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[TMP9]], [[BB5]] ], [ [[TMP9]], [[BB1]] ]			; MAX1024-NEXT: [[TMP14:%.*]] = phi <8 x float> [ [[TMP10]], [[BB3]] ], [ [[SHUFFLE12]], [[BB4]] ], [ [[TMP10]], [[BB5]] ], [ [[TMP10]], [[BB1]] ]
	; MAX1024-NEXT: [[TMP15:%.*]] = phi <8 x float> [ [[TMP12]], [[BB3]] ], [ [[TMP12]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[TMP12]], [[BB1]] ]			; MAX1024-NEXT: [[TMP15:%.*]] = phi <8 x float> [ [[TMP11]], [[BB3]] ], [ [[TMP11]], [[BB4]] ], [ [[SHUFFLE12]], [[BB5]] ], [ [[TMP11]], [[BB1]] ]
	; MAX1024-NEXT: [[TMP16:%.*]] = phi <8 x float> [ [[TMP3]], [[BB3]] ], [ [[TMP3]], [[BB4]] ], [ [[TMP3]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]			; MAX1024-NEXT: [[TMP16:%.*]] = phi <8 x float> [ [[TMP12]], [[BB3]] ], [ [[TMP12]], [[BB4]] ], [ [[TMP12]], [[BB5]] ], [ [[SHUFFLE12]], [[BB1]] ]
	; MAX1024-NEXT: [[TMP17:%.*]] = extractelement <8 x float> [[TMP14]], i32 7			; MAX1024-NEXT: [[TMP17:%.*]] = extractelement <8 x float> [[TMP14]], i32 7
	; MAX1024-NEXT: store float [[TMP17]], float* undef, align 4			; MAX1024-NEXT: store float [[TMP17]], float* undef, align 4
	; MAX1024-NEXT: ret void			; MAX1024-NEXT: ret void
	;			;
	bb:			bb:
	br label %bb1			br label %bb1

	bb1:			bb1:
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Do not schedule instructions with constants/argument/phi operands and external users.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 415799

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/gather-reduce.ll

llvm/test/Transforms/SLPVectorizer/AArch64/gather-root.ll

llvm/test/Transforms/SLPVectorizer/AArch64/spillcost-di.ll

llvm/test/Transforms/SLPVectorizer/AArch64/trunc-insertion.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35628_2.ll

llvm/test/Transforms/SLPVectorizer/X86/PR40310.ll

llvm/test/Transforms/SLPVectorizer/X86/barriercall.ll

llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/cross_block_slp.ll

llvm/test/Transforms/SLPVectorizer/X86/cycle_dup.ll

llvm/test/Transforms/SLPVectorizer/X86/external_user.ll

llvm/test/Transforms/SLPVectorizer/X86/geps-non-pow-2.ll

llvm/test/Transforms/SLPVectorizer/X86/multi_block.ll

llvm/test/Transforms/SLPVectorizer/X86/opaque-ptr.ll

llvm/test/Transforms/SLPVectorizer/X86/phi.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47629.ll

llvm/test/Transforms/SLPVectorizer/X86/pr47642.ll

llvm/test/Transforms/SLPVectorizer/X86/rgb_phi.ll

llvm/test/Transforms/SLPVectorizer/X86/shrink_after_reorder2.ll

llvm/test/Transforms/SLPVectorizer/X86/sitofp-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/sitofp.ll

llvm/test/Transforms/SLPVectorizer/X86/stores-non-ordered.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

llvm/test/Transforms/SLPVectorizer/slp-max-phi-size.ll

[SLP]Do not schedule instructions with constants/argument/phi operands and external users.
ClosedPublic