This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
10/23
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
2
transpose-inseltpoison.ll
2
transpose.ll
-
vectorize-free-extracts-inserts.ll
-
X86/
-
PR39774.ll
-
addsub.ll
-
commutativity.ll
-
crash_exceed_scheduling.ll
-
crash_smallpt.ll
-
extractelement.ll
-
insert-shuffle.ll
1
lookahead.ll
1/2
operandorder.ll
-
store-jumbled.ll
-
stores_vectorize.ll
-
supernode.ll

Differential D101109

[SLP]Improve multinode analysis.
ClosedPublic

Authored by ABataev on Apr 22 2021, 2:01 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
vdmitrie
anton-afanasyev
vporpo

Commits

rGbd053769867f: [SLP]Improve multinode analysis.

Summary

Changes the preliminary multinode analysis:

Introduced scores for reversed loads/extractelements.
Improved shallow score calculation.
Lowered the cost of external uses (no need to consider it several times, just ones).
The initial lane for analysis is the one with the minimal possible reorderings.

These changes in general shall reduce compile time and improve the
reordering in many cases.

Part of D57059.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Apr 22 2021, 2:01 PM

Herald added subscribers: tmatheson, hiraditya. · View Herald TranscriptApr 22 2021, 2:01 PM

ABataev requested review of this revision.Apr 22 2021, 2:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2021, 2:01 PM

Harbormaster completed remote builds in B100376: Diff 339777.Apr 22 2021, 3:56 PM

Rebase

Harbormaster completed remote builds in B101150: Diff 340824.Apr 27 2021, 7:39 AM

ABataev updated this revision to Diff 345547.May 14 2021, 1:45 PM

Rebase

Harbormaster completed remote builds in B104585: Diff 345547.May 14 2021, 2:33 PM

ABataev mentioned this in D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..May 18 2021, 11:04 AM

RKSimon added a reviewer: anton-afanasyev.May 18 2021, 1:10 PM

SjoerdMeijer added a subscriber: SjoerdMeijer.May 20 2021, 6:50 AM

Have you been able to investigate any of these instruction increase regressions?

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll
68–69	Regression?
210	Regression?
llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll
74	Regression?
210	Regression?
llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll
596	Regression?

In D101109#2771153, @RKSimon wrote:

Have you been able to investigate any of these instruction increase regressions?

I think most of them can be fixed, requires D103247 and 1 or 2 extra patches to allow tree reordering for larger subsets of trees.

Matt added a subscriber: Matt.Jun 4 2021, 8:27 AM

rebase?

In D101109#2807872, @RKSimon wrote:

rebase?

Need to prepare 1 or 2 extra patches to fix the regressions introduced in this patch (allow reordering for insertelements etc.). Will rebase it after this.

Rebase

There are still regressions, even after we allowed reordering of insertelements. It is because the reordering is not quite effective. I have an idea of how to improve it (and avoid rebuilding the tree for the second time and improve compile time), will try to implement it next week.

Harbormaster completed remote builds in B108831: Diff 351472.Jun 11 2021, 10:09 AM

RKSimon added inline comments.Jun 16 2021, 12:37 AM

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll
138–139	A lot of these tests aren't preserving the broadcast any more - I'm not sure if it really matters although the testnames now look wrong?

ABataev added inline comments.Jun 16 2021, 4:03 AM

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll
138–139	I'll rename affected test cases

Rebase

Harbormaster completed remote builds in B110281: Diff 353476.Jun 21 2021, 3:05 PM

rebase?

Rebase

It depends on D105020, which should fix all the regressions caused by this patch

Harbormaster completed remote builds in B113195: Diff 357503.Jul 9 2021, 8:10 AM

ABataev mentioned this in D105730: [SLP] match logical and/or as reduction candidates.Jul 12 2021, 5:51 AM

RKSimon added inline comments.Jul 13 2021, 2:16 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1091	Should this be m_Deferred or m_Specific? I thought m_Deferred was only necessary in the same match call?
1096	Can this be a single line comment?

Rebase, fixes and addressed comments.

Harbormaster completed remote builds in B115608: Diff 360863.Jul 22 2021, 10:02 AM

Rebase

Harbormaster completed remote builds in B130946: Diff 382655.Oct 27 2021, 8:23 AM

Rebase

Harbormaster completed remote builds in B130968: Diff 382690.Oct 27 2021, 9:45 AM

Perhaps the score changes could be split into a separate patch?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1391	Could you add a comment what these `unsigned` values are for? (or perhaps use a struct instead?). Could you also describe the the purpose of `HashMap` and what the keys and values are?
1392	Is there any reasoning behind the iteration in reverse? If so could you please add a comment?
1442	NIT: I find `Code` a bit confusing, also perhaps there is no need to refer to `Parent` in the variable name? Perhaps rename to something like `NumOpsWithSameOpcode`?
1443	Could you add a bit more text in the comment what the hashed is used for? I can see that it is used as a key in the `HashMap` above, but could you explain how it is being used?
1465–1476	Could you elaborate a bit on this? If I understand correctly the more similar opcodes we can find, the easier it is to reorder them, therefore this can act as a tie-breaker when the NumOfAPOs is equal?
1478	If I am not mistaken this code will count the consecutive operands with same opcode and BB. Is it because this is a good enough approximation?

vporpo added a reviewer: vporpo.Nov 6 2021, 3:21 AM

In D101109#3113468, @vporpo wrote:

Perhaps the score changes could be split into a separate patch?

Not alone, they cause regressions. Will try to separate cost-model changes.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1391	Yes, just forgot to add some extra comments, will add them after updates. `std::pair<unsigned, unsigned>` is used to implement a simple voting algorithm and choose the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. The first unsigned is a counter for voting, the second unsigned is the counter of lanes with instructions with same/alternate opcodes and same parent basic block.
1392	This is just to be closer to the original results, before this patch, nothing else, if we have multiple lanes with same cost.
1442	Will rename it but I'd rather keep `Parent`, because I compare not only opcodes but the parent too.
1443	It is used to count operands, actually their position id and opcode value. It is used in the voting mechanism to find the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. I can use `SmallVector<unsigned>` instead but to use hash code, it is faster and requires less memory.
1465–1476	If the lane already has operands with the same opcode and same parent, no need to swap the operands in this lane, with a high probability such lane already can be vectorized effectively.
1478	Yes, exactly, in most cases it results in the optimal values in the lane.

Rebase + address comments

Harbormaster completed remote builds in B133989: Diff 386887.Nov 12 2021, 11:08 AM

tmatheson removed a subscriber: tmatheson.Nov 12 2021, 2:07 PM

Rebase

Harbormaster completed remote builds in B134587: Diff 387729.Nov 16 2021, 1:40 PM

Rebase

Harbormaster completed remote builds in B137174: Diff 391377.Dec 2 2021, 11:08 AM

@vporpo Any more comments?

vporpo added inline comments.Dec 3 2021, 12:41 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1480	Are you using `NumOpsWithSameOpcodeParent == 0` as a check for the first iteration ? Shouldn't you be using `!OpcodeI` intsead ? I find this code a bit hard to follow, because I can't tell which of the `if` conditions are for checking for the first iteration and which ones are part of the heuristic. Should it be updating the `OpcodeI` and `Parent` only in the first iteration (like below), or should it be doing it whenever there is a mismatch? if (auto *I = dyn_cast<Instruction>(OpData.V)) { // First iteration if (!OpcodeI) { OpcodeI = I; Parent = I->getParent(); } // Mismatch if (!getSameOpcode({OpcodeI, I}).getOpcode() \|\| I->getParent() != Parent) ++NumOpsWithSameOpcodeParent; else NumOpsWithSameOpcodeParent = std::min(NumOpsWithSameOpcodeParent-1, 0); } Perhaps peeling the first iteration might make the code easier to follow?
1481	Why is `NumOpsWithSameOpcodeParent` set to 1 the first time a mismatch is found? Shouldn't it be set to 0 ?

ABataev added inline comments.Dec 7 2021, 1:56 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1480	This is again a kind of voting algorithm. This code works every time, we start voting on a value with the new opcode, not only on the first iteration. We just try to find the opcode with not less than NumOperands/2 number of occurrences here, if no such opcode - just choose any of them, there are no profitable elements.
1481	It is a kind of increasing the counter for the first element in the sequence.

vporpo added inline comments.Dec 7 2021, 3:19 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1481	Yes, it is increasing it, but shouldn't it be decreasing it instead (or letting it remain 0) ? This code block executes when there is a mismatch of opcode or parent (or if it is the first iteration), so shouldn't we be decreasing the value of`NumOpsWithSameOpcodeParent` (like in line 1463)? What confuses me here is that `NumOpsWithSameOpcodeParent` looks like a normal counter that counts the opcode/parent matches. So I would expect it to increase by one if the opcode/parents match (like what line 1466 does), and to decrease by one if there is a mismatch. But it seems to be more complicated than that: When it reaches 0 it foced to 1 even when there is an opcode mismatch. I find this a bit counter intuitive. For example if we have mismatching opcodes in sequence, I would expect it to keep decreasing, or at least be capped to 0. But it seems like the value of `NumOpsWithSameOpcodeParent` will be 0, then 1, then 0, then 1 like so: before the loop: 0 iteration 1: 1 (because it was == 0) iteration 2: 0 (because of opcode mismatch) iteration 3: 1 (because it was == 0)

ABataev added inline comments.Dec 7 2021, 3:24 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1481	This is how the voting algorithm works. Here is described the main idea https://www.geeksforgeeks.org/boyer-moore-majority-voting-algorithm/

vporpo added inline comments.Dec 7 2021, 4:10 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1481	OK that makes sense now, thanks for clarifying! Could you please add a comment saying that this loop is a Boyer-Moore majority voting for finding the majority opcode and the number of times it occurs?

ABataev added inline comments.Dec 7 2021, 4:15 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1481	Sure, will do it tomorrow.

Rebase + improve analysis for extractelements.

Harbormaster completed remote builds in B139065: Diff 394028.Dec 13 2021, 2:30 PM

LGTM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

1092–1098

Nit: Could you use temporary variables and perhaps try to simplify the expression to make it a bit easier to read, something like:

bool MatchExtract1 = match(V1, m_ExtractElt(m_Value(EV1), m_ConstantInt(Ex1Idx)));
bool MatchExtract2 = match(V2, m_ExtractElt(m_Value(EV2), m_CombineOr(m_ConstantInt(Ex2Idx),  m_Undef())));
bool AcceptedEV2 = !EV2 || (isUndefVector(EV2) && EV2->getType() == EV1->getType()) || EV2 == EV1;
if ((MatchExtract1 && isa<UndefValue>(V2)) ||
    (MatchExtract1 && MatchExtract2 && AcceptedEV2)) {

This revision is now accepted and ready to land.Dec 13 2021, 3:16 PM

This revision was landed with ongoing or failed builds.Dec 14 2021, 6:18 AM

Closed by commit rGbd053769867f: [SLP]Improve multinode analysis. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGbd053769867f: [SLP]Improve multinode analysis..

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

In D101109#3213510, @ye-luo wrote:

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

Hi, do you have a reproducer?

In D101109#3213634, @ABataev wrote:

In D101109#3213510, @ye-luo wrote:

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

Hi, do you have a reproducer?

Initially I was not sure where the issue is from and just reported my observation. After a careful inspection, I found it is an interaction between clang and the random number generator in the boost libraries. Since I had little knowledge about the inside details of the library, I decided not to debug it. Instead I just moved my application out of boost and the RNG from C++ standard library works well with Clang. So I won't work on an reproducer. If boost developers find an issue, they will report bugs. Right now assume everything is good.

In D101109#3218469, @ye-luo wrote:

In D101109#3213634, @ABataev wrote:

In D101109#3213510, @ye-luo wrote:

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

Hi, do you have a reproducer?

Initially I was not sure where the issue is from and just reported my observation. After a careful inspection, I found it is an interaction between clang and the random number generator in the boost libraries. Since I had little knowledge about the inside details of the library, I decided not to debug it. Instead I just moved my application out of boost and the RNG from C++ standard library works well with Clang. So I won't work on an reproducer. If boost developers find an issue, they will report bugs. Right now assume everything is good.

Ok, thanks for letting me know!

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

253 lines

test/

Transforms/

SLPVectorizer/

AArch64/

transpose-inseltpoison.ll

30 lines

transpose.ll

30 lines

vectorize-free-extracts-inserts.ll

20 lines

X86/

PR39774.ll

2 lines

addsub.ll

24 lines

commutativity.ll

20 lines

crash_exceed_scheduling.ll

6 lines

18 lines

4 lines

34 lines

35 lines

44 lines

4 lines

6 lines

2 lines

Diff 394227

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,010 Lines • ▼ Show 20 Lines	void clearUsed() {
OpsVec[OpIdx][Lane].IsUsed = false;		OpsVec[OpIdx][Lane].IsUsed = false;
}		}

/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.		/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.
void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {		void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {
std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);		std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);
}		}

// The hard-coded scores listed here are not very important. When computing		// The hard-coded scores listed here are not very important, though it shall
// the scores of matching one sub-tree with another, we are basically		// be higher for better matches to improve the resulting cost. When
// counting the number of values that are matching. So even if all scores		// computing the scores of matching one sub-tree with another, we are
// are set to 1, we would still get a decent matching result.		// basically counting the number of values that are matching. So even if all
		// scores are set to 1, we would still get a decent matching result.
// However, sometimes we have to break ties. For example we may have to		// However, sometimes we have to break ties. For example we may have to
// choose between matching loads vs matching opcodes. This is what these		// choose between matching loads vs matching opcodes. This is what these
// scores are helping us with: they provide the order of preference.		// scores are helping us with: they provide the order of preference. Also,
		// this is important if the scalar is externally used or used in another
		// tree entry node in the different lane.

/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).		/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).
static const int ScoreConsecutiveLoads = 3;		static const int ScoreConsecutiveLoads = 4;
		/// Loads from reversed memory addresses, e.g. load(A[i+1]), load(A[i]).
		static const int ScoreReversedLoads = 3;
/// ExtractElementInst from same vector and consecutive indexes.		/// ExtractElementInst from same vector and consecutive indexes.
static const int ScoreConsecutiveExtracts = 3;		static const int ScoreConsecutiveExtracts = 4;
		/// ExtractElementInst from same vector and reversed indices.
		static const int ScoreReversedExtracts = 3;
/// Constants.		/// Constants.
static const int ScoreConstants = 2;		static const int ScoreConstants = 2;
/// Instructions with the same opcode.		/// Instructions with the same opcode.
static const int ScoreSameOpcode = 2;		static const int ScoreSameOpcode = 2;
/// Instructions with alt opcodes (e.g, add + sub).		/// Instructions with alt opcodes (e.g, add + sub).
static const int ScoreAltOpcodes = 1;		static const int ScoreAltOpcodes = 1;
/// Identical instructions (a.k.a. splat or broadcast).		/// Identical instructions (a.k.a. splat or broadcast).
static const int ScoreSplat = 1;		static const int ScoreSplat = 1;
/// Matching with an undef is preferable to failing.		/// Matching with an undef is preferable to failing.
static const int ScoreUndef = 1;		static const int ScoreUndef = 1;
/// Score for failing to find a decent match.		/// Score for failing to find a decent match.
static const int ScoreFail = 0;		static const int ScoreFail = 0;
/// User exteranl to the vectorized code.		/// User exteranl to the vectorized code.
static const int ExternalUseCost = 1;		static const int ExternalUseCost = 1;
/// The user is internal but in a different lane.		/// The user is internal but in a different lane.
static const int UserInDiffLaneCost = ExternalUseCost;		static const int UserInDiffLaneCost = ExternalUseCost;

/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.		/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.
static int getShallowScore(Value V1, Value V2, const DataLayout &DL,		static int getShallowScore(Value V1, Value V2, const DataLayout &DL,
ScalarEvolution &SE) {		ScalarEvolution &SE, int NumLanes) {
		if (V1 == V2)
		return VLOperands::ScoreSplat;

auto *LI1 = dyn_cast<LoadInst>(V1);		auto *LI1 = dyn_cast<LoadInst>(V1);
auto *LI2 = dyn_cast<LoadInst>(V2);		auto *LI2 = dyn_cast<LoadInst>(V2);
if (LI1 && LI2) {		if (LI1 && LI2) {
if (LI1->getParent() != LI2->getParent())		if (LI1->getParent() != LI2->getParent())
return VLOperands::ScoreFail;		return VLOperands::ScoreFail;

Optional<int> Dist = getPointersDiff(		Optional<int> Dist = getPointersDiff(
LI1->getType(), LI1->getPointerOperand(), LI2->getType(),		LI1->getType(), LI1->getPointerOperand(), LI2->getType(),
LI2->getPointerOperand(), DL, SE, /StrictCheck=/true);		LI2->getPointerOperand(), DL, SE, /StrictCheck=/true);
return (Dist && *Dist == 1) ? VLOperands::ScoreConsecutiveLoads		if (!Dist)
: VLOperands::ScoreFail;		return VLOperands::ScoreFail;
		// The distance is too large - still may be profitable to use masked
		// loads/gathers.
		if (std::abs(*Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		// This still will detect consecutive loads, but we might have "holes"
		// in some cases. It is ok for non-power-2 vectorization and may produce
		// better results. It should not affect current vectorization.
		return (*Dist > 0) ? VLOperands::ScoreConsecutiveLoads
		: VLOperands::ScoreReversedLoads;
}		}

auto *C1 = dyn_cast<Constant>(V1);		auto *C1 = dyn_cast<Constant>(V1);
auto *C2 = dyn_cast<Constant>(V2);		auto *C2 = dyn_cast<Constant>(V2);
if (C1 && C2)		if (C1 && C2)
return VLOperands::ScoreConstants;		return VLOperands::ScoreConstants;

// Extracts from consecutive indexes of the same vector better score as		// Extracts from consecutive indexes of the same vector better score as
// the extracts could be optimized away.		// the extracts could be optimized away.
Value *EV;		Value *EV1;
ConstantInt Ex1Idx, Ex2Idx;		ConstantInt *Ex1Idx;
		RKSimonUnsubmitted Not Done Reply Inline Actions Should this be m_Deferred or m_Specific? I thought m_Deferred was only necessary in the same match call? RKSimon: Should this be m_Deferred or m_Specific? I thought m_Deferred was only necessary in the same…
if (match(V1, m_ExtractElt(m_Value(EV), m_ConstantInt(Ex1Idx))) &&		if (match(V1, m_ExtractElt(m_Value(EV1), m_ConstantInt(Ex1Idx)))) {
match(V2, m_ExtractElt(m_Deferred(EV), m_ConstantInt(Ex2Idx))) &&		// Undefs are always profitable for extractelements.
Ex1Idx->getZExtValue() + 1 == Ex2Idx->getZExtValue())		if (isa<UndefValue>(V2))
		return VLOperands::ScoreConsecutiveExtracts;
		Value *EV2 = nullptr;
		RKSimonUnsubmitted Not Done Reply Inline Actions Can this be a single line comment? RKSimon: Can this be a single line comment?
		ConstantInt *Ex2Idx = nullptr;
		if (match(V2,
		vporpoUnsubmitted Not Done Reply Inline Actions Nit: Could you use temporary variables and perhaps try to simplify the expression to make it a bit easier to read, something like: bool MatchExtract1 = match(V1, m_ExtractElt(m_Value(EV1), m_ConstantInt(Ex1Idx))); bool MatchExtract2 = match(V2, m_ExtractElt(m_Value(EV2), m_CombineOr(m_ConstantInt(Ex2Idx), m_Undef()))); bool AcceptedEV2 = !EV2 \|\| (isUndefVector(EV2) && EV2->getType() == EV1->getType()) \|\| EV2 == EV1; if ((MatchExtract1 && isa<UndefValue>(V2)) \|\| (MatchExtract1 && MatchExtract2 && AcceptedEV2)) { vporpo: Nit: Could you use temporary variables and perhaps try to simplify the expression to make it a…
		m_ExtractElt(m_Value(EV2), m_CombineOr(m_ConstantInt(Ex2Idx),
		m_Undef())))) {
		// Undefs are always profitable for extractelements.
		if (!Ex2Idx)
return VLOperands::ScoreConsecutiveExtracts;		return VLOperands::ScoreConsecutiveExtracts;
		if (isUndefVector(EV2) && EV2->getType() == EV1->getType())
		return VLOperands::ScoreConsecutiveExtracts;
		if (EV2 == EV1) {
		int Idx1 = Ex1Idx->getZExtValue();
		int Idx2 = Ex2Idx->getZExtValue();
		int Dist = Idx2 - Idx1;
		// The distance is too large - still may be profitable to use
		// shuffles.
		if (std::abs(Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		return (Dist > 0) ? VLOperands::ScoreConsecutiveExtracts
		: VLOperands::ScoreReversedExtracts;
		}
		}
		}

auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (I1 && I2) {		if (I1 && I2) {
if (I1 == I2)		if (I1->getParent() != I2->getParent())
return VLOperands::ScoreSplat;		return VLOperands::ScoreFail;
InstructionsState S = getSameOpcode({I1, I2});		InstructionsState S = getSameOpcode({I1, I2});
// Note: Only consider instructions with <= 2 operands to avoid		// Note: Only consider instructions with <= 2 operands to avoid
// complexity explosion.		// complexity explosion.
if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)		if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)
return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes		return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes
: VLOperands::ScoreSameOpcode;		: VLOperands::ScoreSameOpcode;
}		}

if (isa<UndefValue>(V2))		if (isa<UndefValue>(V2))
return VLOperands::ScoreUndef;		return VLOperands::ScoreUndef;

return VLOperands::ScoreFail;		return VLOperands::ScoreFail;
}		}

/// Holds the values and their lane that are taking part in the look-ahead		/// Holds the values and their lanes that are taking part in the look-ahead
/// score calculation. This is used in the external uses cost calculation.		/// score calculation. This is used in the external uses cost calculation.
SmallDenseMap<Value *, int> InLookAheadValues;		/// Need to hold all the lanes in case of splat/broadcast at least to
		/// correctly check for the use in the different lane.
		SmallDenseMap<Value *, SmallSet<int, 4>> InLookAheadValues;

/// \Returns the additinal cost due to uses of \p LHS and \p RHS that are		/// \returns the additional cost due to uses of \p LHS and \p RHS that are
/// either external to the vectorized code, or require shuffling.		/// either external to the vectorized code, or require shuffling.
int getExternalUsesCost(const std::pair<Value *, int> &LHS,		int getExternalUsesCost(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS) {		const std::pair<Value *, int> &RHS) {
int Cost = 0;		int Cost = 0;
std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};		std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};
for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {		for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {
Value *V = Values[Idx].first;		Value *V = Values[Idx].first;
if (isa<Constant>(V)) {		if (isa<Constant>(V)) {
// Since this is a function pass, it doesn't make semantic sense to		// Since this is a function pass, it doesn't make semantic sense to
// walk the users of a subclass of Constant. The users could be in		// walk the users of a subclass of Constant. The users could be in
// another function, or even another module that happens to be in		// another function, or even another module that happens to be in
// the same LLVMContext.		// the same LLVMContext.
continue;		continue;
}		}

// Calculate the absolute lane, using the minimum relative lane of LHS		// Calculate the absolute lane, using the minimum relative lane of LHS
// and RHS as base and Idx as the offset.		// and RHS as base and Idx as the offset.
int Ln = std::min(LHS.second, RHS.second) + Idx;		int Ln = std::min(LHS.second, RHS.second) + Idx;
assert(Ln >= 0 && "Bad lane calculation");		assert(Ln >= 0 && "Bad lane calculation");
unsigned UsersBudget = LookAheadUsersBudget;		unsigned UsersBudget = LookAheadUsersBudget;
for (User *U : V->users()) {		for (User *U : V->users()) {
if (const TreeEntry *UserTE = R.getTreeEntry(U)) {		if (const TreeEntry *UserTE = R.getTreeEntry(U)) {
// The user is in the VectorizableTree. Check if we need to insert.		// The user is in the VectorizableTree. Check if we need to insert.
auto It = llvm::find(UserTE->Scalars, U);		int UserLn = UserTE->findLaneForValue(U);
assert(It != UserTE->Scalars.end() && "U is in UserTE");
int UserLn = std::distance(UserTE->Scalars.begin(), It);
assert(UserLn >= 0 && "Bad lane");		assert(UserLn >= 0 && "Bad lane");
if (UserLn != Ln)		// If the values are different, check just the line of the current
		// value. If the values are the same, need to add UserInDiffLaneCost
		// only if UserLn does not match both line numbers.
		if ((LHS.first != RHS.first && UserLn != Ln) \|\|
		(LHS.first == RHS.first && UserLn != LHS.second &&
		UserLn != RHS.second)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// Check if the user is in the look-ahead code.		// Check if the user is in the look-ahead code.
auto It2 = InLookAheadValues.find(U);		auto It2 = InLookAheadValues.find(U);
if (It2 != InLookAheadValues.end()) {		if (It2 != InLookAheadValues.end()) {
// The user is in the look-ahead code. Check the lane.		// The user is in the look-ahead code. Check the lane.
if (It2->second != Ln)		if (!It2->getSecond().contains(Ln)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// The user is neither in SLP tree nor in the look-ahead code.		// The user is neither in SLP tree nor in the look-ahead code.
Cost += ExternalUseCost;		Cost += ExternalUseCost;
		break;
}		}
}		}
// Limit the number of visited uses to cap compilation time.		// Limit the number of visited uses to cap compilation time.
if (--UsersBudget == 0)		if (--UsersBudget == 0)
break;		break;
}		}
}		}
return Cost;		return Cost;
Show All 22 Lines	class VLOperands {
/// Luís F. W. Góes		/// Luís F. W. Góes
int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,		int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS, int CurrLevel,		const std::pair<Value *, int> &RHS, int CurrLevel,
int MaxLevel) {		int MaxLevel) {

Value *V1 = LHS.first;		Value *V1 = LHS.first;
Value *V2 = RHS.first;		Value *V2 = RHS.first;
// Get the shallow score of V1 and V2.		// Get the shallow score of V1 and V2.
int ShallowScoreAtThisLevel =		int ShallowScoreAtThisLevel = std::max(
std::max((int)ScoreFail, getShallowScore(V1, V2, DL, SE) -		(int)ScoreFail, getShallowScore(V1, V2, DL, SE, getNumLanes()) -
getExternalUsesCost(LHS, RHS));		getExternalUsesCost(LHS, RHS));
int Lane1 = LHS.second;		int Lane1 = LHS.second;
int Lane2 = RHS.second;		int Lane2 = RHS.second;

// If reached MaxLevel,		// If reached MaxLevel,
// or if V1 and V2 are not instructions,		// or if V1 and V2 are not instructions,
// or if they are SPLAT,		// or if they are SPLAT,
// or if they are not consecutive, early return the current cost.		// or if they are not consecutive,
		// or if profitable to vectorize loads or extractelements, early return
		// the current cost.
auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|		if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|
ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|		ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|
(isa<LoadInst>(I1) && isa<LoadInst>(I2) && ShallowScoreAtThisLevel))		(((isa<LoadInst>(I1) && isa<LoadInst>(I2)) \|\|
		(isa<ExtractElementInst>(I1) && isa<ExtractElementInst>(I2))) &&
		ShallowScoreAtThisLevel))
return ShallowScoreAtThisLevel;		return ShallowScoreAtThisLevel;
assert(I1 && I2 && "Should have early exited.");		assert(I1 && I2 && "Should have early exited.");

// Keep track of in-tree values for determining the external-use cost.		// Keep track of in-tree values for determining the external-use cost.
InLookAheadValues[V1] = Lane1;		InLookAheadValues[V1].insert(Lane1);
InLookAheadValues[V2] = Lane2;		InLookAheadValues[V2].insert(Lane2);

// Contains the I2 operand indexes that got matched with I1 operands.		// Contains the I2 operand indexes that got matched with I1 operands.
SmallSet<unsigned, 4> Op2Used;		SmallSet<unsigned, 4> Op2Used;

// Recursion towards the operands of I1 and I2. We are trying all possbile		// Recursion towards the operands of I1 and I2. We are trying all possible
// operand pairs, and keeping track of the best score.		// operand pairs, and keeping track of the best score.
for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();		for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();
OpIdx1 != NumOperands1; ++OpIdx1) {		OpIdx1 != NumOperands1; ++OpIdx1) {
// Try to pair op1I with the best operand of I2.		// Try to pair op1I with the best operand of I2.
int MaxTmpScore = 0;		int MaxTmpScore = 0;
unsigned MaxOpIdx2 = 0;		unsigned MaxOpIdx2 = 0;
bool FoundBest = false;		bool FoundBest = false;
// If I2 is commutative try all combinations.		// If I2 is commutative try all combinations.
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	getBestOperand(unsigned OpIdx, int Lane, int LastLane,
if (BestOp.Idx) {		if (BestOp.Idx) {
getData(BestOp.Idx.getValue(), Lane).IsUsed = true;		getData(BestOp.Idx.getValue(), Lane).IsUsed = true;
return BestOp.Idx;		return BestOp.Idx;
}		}
// If we could not find a good match return None.		// If we could not find a good match return None.
return None;		return None;
}		}

/// Helper for reorderOperandVecs. \Returns the lane that we should start		/// Helper for reorderOperandVecs.
/// reordering from. This is the one which has the least number of operands		/// \returns the lane that we should start reordering from. This is the one
/// that can freely move about.		/// which has the least number of operands that can freely move about or
		/// less profitable because it already has the most optimal set of operands.
unsigned getBestLaneToStartReordering() const {		unsigned getBestLaneToStartReordering() const {
unsigned BestLane = 0;
unsigned Min = UINT_MAX;		unsigned Min = UINT_MAX;
for (unsigned Lane = 0, NumLanes = getNumLanes(); Lane != NumLanes;		unsigned SameOpNumber = 0;
++Lane) {		// std::pair<unsigned, unsigned> is used to implement a simple voting
		vporpoUnsubmitted Not Done Reply Inline Actions Could you add a comment what these `unsigned` values are for? (or perhaps use a struct instead?). Could you also describe the the purpose of `HashMap` and what the keys and values are? vporpo: Could you add a comment what these `unsigned` values are for? (or perhaps use a struct instead?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, just forgot to add some extra comments, will add them after updates. `std::pair<unsigned, unsigned>` is used to implement a simple voting algorithm and choose the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. The first unsigned is a counter for voting, the second unsigned is the counter of lanes with instructions with same/alternate opcodes and same parent basic block. ABataev: Yes, just forgot to add some extra comments, will add them after updates. `std::pair<unsigned…
unsigned NumFreeOps = getMaxNumOperandsThatCanBeReordered(Lane);		// algorithm and choose the lane with the least number of operands that
		vporpoUnsubmitted Not Done Reply Inline Actions Is there any reasoning behind the iteration in reverse? If so could you please add a comment? vporpo: Is there any reasoning behind the iteration in reverse? If so could you please add a comment?
		ABataevAuthorUnsubmitted Done Reply Inline Actions This is just to be closer to the original results, before this patch, nothing else, if we have multiple lanes with same cost. ABataev: This is just to be closer to the original results, before this patch, nothing else, if we have…
if (NumFreeOps < Min) {		// can freely move about or less profitable because it already has the
Min = NumFreeOps;		// most optimal set of operands. The first unsigned is a counter for
BestLane = Lane;		// voting, the second unsigned is the counter of lanes with instructions
		// with same/alternate opcodes and same parent basic block.
		MapVector<unsigned, std::pair<unsigned, unsigned>> HashMap;
		// Try to be closer to the original results, if we have multiple lanes
		// with same cost. If 2 lanes have the same cost, use the one with the
		// lowest index.
		for (int I = getNumLanes(); I > 0; --I) {
		unsigned Lane = I - 1;
		OperandsOrderData NumFreeOpsHash =
		getMaxNumOperandsThatCanBeReordered(Lane);
		// Compare the number of operands that can move and choose the one with
		// the least number.
		if (NumFreeOpsHash.NumOfAPOs < Min) {
		Min = NumFreeOpsHash.NumOfAPOs;
		SameOpNumber = NumFreeOpsHash.NumOpsWithSameOpcodeParent;
		HashMap.clear();
		HashMap[NumFreeOpsHash.Hash] = std::make_pair(1, Lane);
		} else if (NumFreeOpsHash.NumOfAPOs == Min &&
		NumFreeOpsHash.NumOpsWithSameOpcodeParent < SameOpNumber) {
		// Select the most optimal lane in terms of number of operands that
		// should be moved around.
		SameOpNumber = NumFreeOpsHash.NumOpsWithSameOpcodeParent;
		HashMap[NumFreeOpsHash.Hash] = std::make_pair(1, Lane);
		} else if (NumFreeOpsHash.NumOfAPOs == Min &&
		NumFreeOpsHash.NumOpsWithSameOpcodeParent == SameOpNumber) {
		++HashMap[NumFreeOpsHash.Hash].first;
		}
		}
		// Select the lane with the minimum counter.
		unsigned BestLane = 0;
		unsigned CntMin = UINT_MAX;
		for (const auto &Data : reverse(HashMap)) {
		if (Data.second.first < CntMin) {
		CntMin = Data.second.first;
		BestLane = Data.second.second;
}		}
}		}
return BestLane;		return BestLane;
}		}

/// \Returns the maximum number of operands that are allowed to be reordered		/// Data structure that helps to reorder operands.
/// for \p Lane. This is used as a heuristic for selecting the first lane to		struct OperandsOrderData {
/// start operand reordering.		/// The best number of operands with the same APOs, which can be
unsigned getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {		/// reordered.
		unsigned NumOfAPOs = UINT_MAX;
		/// Number of operands with the same/alternate instruction opcode and
		/// parent.
		unsigned NumOpsWithSameOpcodeParent = 0;
		vporpoUnsubmitted Not Done Reply Inline Actions NIT: I find `Code` a bit confusing, also perhaps there is no need to refer to `Parent` in the variable name? Perhaps rename to something like `NumOpsWithSameOpcode`? vporpo: NIT: I find `Code` a bit confusing, also perhaps there is no need to refer to `Parent` in the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will rename it but I'd rather keep `Parent`, because I compare not only opcodes but the parent too. ABataev: Will rename it but I'd rather keep `Parent`, because I compare not only opcodes but the parent…
		/// Hash for the actual operands ordering.
		vporpoUnsubmitted Not Done Reply Inline Actions Could you add a bit more text in the comment what the hashed is used for? I can see that it is used as a key in the `HashMap` above, but could you explain how it is being used? vporpo: Could you add a bit more text in the comment what the hashed is used for? I can see that it is…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is used to count operands, actually their position id and opcode value. It is used in the voting mechanism to find the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. I can use `SmallVector<unsigned>` instead but to use hash code, it is faster and requires less memory. ABataev: It is used to count operands, actually their position id and opcode value. It is used in the…
		/// Used to count operands, actually their position id and opcode
		/// value. It is used in the voting mechanism to find the lane with the
		/// least number of operands that can freely move about or less profitable
		/// because it already has the most optimal set of operands. Can be
		/// replaced with SmallVector<unsigned> instead but hash code is faster
		/// and requires less memory.
		unsigned Hash = 0;
		};
		/// \returns the maximum number of operands that are allowed to be reordered
		/// for \p Lane and the number of compatible instructions(with the same
		/// parent/opcode). This is used as a heuristic for selecting the first lane
		/// to start operand reordering.
		OperandsOrderData getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {
unsigned CntTrue = 0;		unsigned CntTrue = 0;
unsigned NumOperands = getNumOperands();		unsigned NumOperands = getNumOperands();
// Operands with the same APO can be reordered. We therefore need to count		// Operands with the same APO can be reordered. We therefore need to count
// how many of them we have for each APO, like this: Cnt[APO] = x.		// how many of them we have for each APO, like this: Cnt[APO] = x.
// Since we only have two APOs, namely true and false, we can avoid using		// Since we only have two APOs, namely true and false, we can avoid using
// a map. Instead we can simply count the number of operands that		// a map. Instead we can simply count the number of operands that
// correspond to one of them (in this case the 'true' APO), and calculate		// correspond to one of them (in this case the 'true' APO), and calculate
// the other by subtracting it from the total number of operands.		// the other by subtracting it from the total number of operands.
for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx)		// Operands with the same instruction opcode and parent are more
if (getData(OpIdx, Lane).APO)		// profitable since we don't need to move them in many cases, with a high
		// probability such lane already can be vectorized effectively.
		bool AllUndefs = true;
		unsigned NumOpsWithSameOpcodeParent = 0;
		Instruction *OpcodeI = nullptr;
		BasicBlock *Parent = nullptr;
		unsigned Hash = 0;
		for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {
		const OperandData &OpData = getData(OpIdx, Lane);
		if (OpData.APO)
++CntTrue;		++CntTrue;
		vporpoUnsubmitted Not Done Reply Inline Actions Could you elaborate a bit on this? If I understand correctly the more similar opcodes we can find, the easier it is to reorder them, therefore this can act as a tie-breaker when the NumOfAPOs is equal? vporpo: Could you elaborate a bit on this? If I understand correctly the more similar opcodes we can…
		ABataevAuthorUnsubmitted Done Reply Inline Actions If the lane already has operands with the same opcode and same parent, no need to swap the operands in this lane, with a high probability such lane already can be vectorized effectively. ABataev: If the lane already has operands with the same opcode and same parent, no need to swap the…
unsigned CntFalse = NumOperands - CntTrue;		// Use Boyer-Moore majority voting for finding the majority opcode and
return std::max(CntTrue, CntFalse);		// the number of times it occurs.
		vporpoUnsubmitted Not Done Reply Inline Actions If I am not mistaken this code will count the consecutive operands with same opcode and BB. Is it because this is a good enough approximation? vporpo: If I am not mistaken this code will count the consecutive operands with same opcode and BB. Is…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, exactly, in most cases it results in the optimal values in the lane. ABataev: Yes, exactly, in most cases it results in the optimal values in the lane.
		if (auto *I = dyn_cast<Instruction>(OpData.V)) {
		if (!OpcodeI \|\| !getSameOpcode({OpcodeI, I}).getOpcode() \|\|
		vporpoUnsubmitted Not Done Reply Inline Actions Are you using `NumOpsWithSameOpcodeParent == 0` as a check for the first iteration ? Shouldn't you be using `!OpcodeI` intsead ? I find this code a bit hard to follow, because I can't tell which of the `if` conditions are for checking for the first iteration and which ones are part of the heuristic. Should it be updating the `OpcodeI` and `Parent` only in the first iteration (like below), or should it be doing it whenever there is a mismatch? if (auto I = dyn_cast<Instruction>(OpData.V)) { // First iteration if (!OpcodeI) { OpcodeI = I; Parent = I->getParent(); } // Mismatch if (!getSameOpcode({OpcodeI, I}).getOpcode() \|\| I->getParent() != Parent) ++NumOpsWithSameOpcodeParent; else NumOpsWithSameOpcodeParent = std::min(NumOpsWithSameOpcodeParent-1, 0); } Perhaps peeling the first iteration might make the code easier to follow? vporpo:* Are you using `NumOpsWithSameOpcodeParent == 0` as a check for the first iteration ? Shouldn't…
		ABataevAuthorUnsubmitted Done Reply Inline Actions This is again a kind of voting algorithm. This code works every time, we start voting on a value with the new opcode, not only on the first iteration. We just try to find the opcode with not less than NumOperands/2 number of occurrences here, if no such opcode - just choose any of them, there are no profitable elements. ABataev: This is again a kind of voting algorithm. This code works every time, we start voting on a…
		I->getParent() != Parent) {
		vporpoUnsubmitted Not Done Reply Inline Actions Why is `NumOpsWithSameOpcodeParent` set to 1 the first time a mismatch is found? Shouldn't it be set to 0 ? vporpo: Why is `NumOpsWithSameOpcodeParent` set to 1 the first time a mismatch is found? Shouldn't it…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is a kind of increasing the counter for the first element in the sequence. ABataev: It is a kind of increasing the counter for the first element in the sequence.
		vporpoUnsubmitted Not Done Reply Inline Actions Yes, it is increasing it, but shouldn't it be decreasing it instead (or letting it remain 0) ? This code block executes when there is a mismatch of opcode or parent (or if it is the first iteration), so shouldn't we be decreasing the value of`NumOpsWithSameOpcodeParent` (like in line 1463)? What confuses me here is that `NumOpsWithSameOpcodeParent` looks like a normal counter that counts the opcode/parent matches. So I would expect it to increase by one if the opcode/parents match (like what line 1466 does), and to decrease by one if there is a mismatch. But it seems to be more complicated than that: When it reaches 0 it foced to 1 even when there is an opcode mismatch. I find this a bit counter intuitive. For example if we have mismatching opcodes in sequence, I would expect it to keep decreasing, or at least be capped to 0. But it seems like the value of `NumOpsWithSameOpcodeParent` will be 0, then 1, then 0, then 1 like so: before the loop: 0 iteration 1: 1 (because it was == 0) iteration 2: 0 (because of opcode mismatch) iteration 3: 1 (because it was == 0) vporpo: Yes, it is increasing it, but shouldn't it be decreasing it instead (or letting it remain 0) ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions This is how the voting algorithm works. Here is described the main idea https://www.geeksforgeeks.org/boyer-moore-majority-voting-algorithm/ ABataev: This is how the voting algorithm works. Here is described the main idea https://www.
		vporpoUnsubmitted Not Done Reply Inline Actions OK that makes sense now, thanks for clarifying! Could you please add a comment saying that this loop is a Boyer-Moore majority voting for finding the majority opcode and the number of times it occurs? vporpo: OK that makes sense now, thanks for clarifying! Could you please add a comment saying that this…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, will do it tomorrow. ABataev: Sure, will do it tomorrow.
		if (NumOpsWithSameOpcodeParent == 0) {
		NumOpsWithSameOpcodeParent = 1;
		OpcodeI = I;
		Parent = I->getParent();
		} else {
		--NumOpsWithSameOpcodeParent;
		}
		} else {
		++NumOpsWithSameOpcodeParent;
		}
		}
		Hash = hash_combine(
		Hash, hash_value((OpIdx + 1) * (OpData.V->getValueID() + 1)));
		AllUndefs = AllUndefs && isa<UndefValue>(OpData.V);
		}
		if (AllUndefs)
		return {};
		OperandsOrderData Data;
		Data.NumOfAPOs = std::max(CntTrue, NumOperands - CntTrue);
		Data.NumOpsWithSameOpcodeParent = NumOpsWithSameOpcodeParent;
		Data.Hash = Hash;
		return Data;
}		}

/// Go through the instructions in VL and append their operands.		/// Go through the instructions in VL and append their operands.
void appendOperandsOfVL(ArrayRef<Value *> VL) {		void appendOperandsOfVL(ArrayRef<Value *> VL) {
assert(!VL.empty() && "Bad VL");		assert(!VL.empty() && "Bad VL");
assert((empty() \|\| VL.size() == getNumLanes()) &&		assert((empty() \|\| VL.size() == getNumLanes()) &&
"Expected same number of lanes");		"Expected same number of lanes");
assert(isa<Instruction>(VL[0]) && "Expected instruction");		assert(isa<Instruction>(VL[0]) && "Expected instruction");
▲ Show 20 Lines • Show All 1,501 Lines • ▼ Show 20 Lines

void BoUpSLP::reorderTopToBottom() {		void BoUpSLP::reorderTopToBottom() {
// Maps VF to the graph nodes.		// Maps VF to the graph nodes.
DenseMap<unsigned, SmallPtrSet<TreeEntry *, 4>> VFToOrderedEntries;		DenseMap<unsigned, SmallPtrSet<TreeEntry *, 4>> VFToOrderedEntries;
// ExtractElement gather nodes which can be vectorized and need to handle		// ExtractElement gather nodes which can be vectorized and need to handle
// their ordering.		// their ordering.
DenseMap<const TreeEntry *, OrdersType> GathersToOrders;		DenseMap<const TreeEntry *, OrdersType> GathersToOrders;
// Find all reorderable nodes with the given VF.		// Find all reorderable nodes with the given VF.
// Currently the are vectorized loads,extracts + some gathering of extracts.		// Currently the are vectorized stores,loads,extracts + some gathering of
		// extracts.
for_each(VectorizableTree, [this, &VFToOrderedEntries, &GathersToOrders](		for_each(VectorizableTree, [this, &VFToOrderedEntries, &GathersToOrders](
const std::unique_ptr<TreeEntry> &TE) {		const std::unique_ptr<TreeEntry> &TE) {
if (Optional<OrdersType> CurrentOrder =		if (Optional<OrdersType> CurrentOrder =
getReorderingData(TE.get(), /TopToBottom=*/true)) {		getReorderingData(TE.get(), /TopToBottom=*/true)) {
VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());		VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());
if (TE->State != TreeEntry::Vectorize)		if (TE->State != TreeEntry::Vectorize)
GathersToOrders.try_emplace(TE.get(), *CurrentOrder);		GathersToOrders.try_emplace(TE.get(), *CurrentOrder);
}		}
▲ Show 20 Lines • Show All 604 Lines • ▼ Show 20 Lines	if (getTreeEntry(I)) {
<< ") is already in tree.\n");		<< ") is already in tree.\n");
if (TryToFindDuplicates(S))		if (TryToFindDuplicates(S))
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}

// If any of the scalars is marked as a value that needs to stay scalar, then
// we need to gather the scalars.
// The reduction nodes (stored in UserIgnoreList) also should stay scalar.		// The reduction nodes (stored in UserIgnoreList) also should stay scalar.
for (Value *V : VL) {		for (Value *V : VL) {
if (MustGather.count(V) \|\| is_contained(UserIgnoreList, V)) {		if (is_contained(UserIgnoreList, V)) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");
if (TryToFindDuplicates(S))		if (TryToFindDuplicates(S))
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}
}		}

▲ Show 20 Lines • Show All 6,532 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	;
%tmp2.0 = add i64 %tmp0.0, %tmp0.1		%tmp2.0 = add i64 %tmp0.0, %tmp0.1
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
store i64 %tmp2.0, i64* %c.0, align 8		store i64 %tmp2.0, i64* %c.0, align 8
store i64 %tmp2.1, i64* %c.1, align 8		store i64 %tmp2.1, i64* %c.1, align 8
ret void		ret void
}		}

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[V0:%.]], [[V1:%.*]]
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 3, i32 6>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 3, i32 6>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: ret <4 x i32> [[TMP5]]		; CHECK-NEXT: ret <4 x i32> [[TMP5]]
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	;
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i64 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i64 0		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i64 1		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i64 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i64 1
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
Show All 13 Lines	;
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = sub <4 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = sub <4 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 7, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 7, i32 2>
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP5]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP5]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP7:%.*]] = and <4 x i32> [[TMP6]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP7:%.*]] = and <4 x i32> [[TMP6]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP8:%.*]] = mul nuw <4 x i32> [[TMP7]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP8:%.*]] = mul nuw <4 x i32> [[TMP7]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i32> [[TMP9]], [[TMP8]]		; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i32> [[TMP9]], [[TMP8]]
; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP10]])		; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP10]])
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = add <4 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 3, i32 6>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 3, i32 6>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: ret <4 x i32> [[TMP5]]		; CHECK-NEXT: ret <4 x i32> [[TMP5]]
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
%v0.2 = extractelement <4 x i32> %v0, i32 2		%v0.2 = extractelement <4 x i32> %v0, i32 2
%v0.3 = extractelement <4 x i32> %v0, i32 3		%v0.3 = extractelement <4 x i32> %v0, i32 3
%v1.0 = extractelement <4 x i32> %v1, i32 0		%v1.0 = extractelement <4 x i32> %v1, i32 0
%v1.1 = extractelement <4 x i32> %v1, i32 1		%v1.1 = extractelement <4 x i32> %v1, i32 1
%v1.2 = extractelement <4 x i32> %v1, i32 2		%v1.2 = extractelement <4 x i32> %v1, i32 2
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	;
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i64 0		; CHECK-NEXT: [[TMP1:%.]] = add <2 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i64 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i64 0		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 1, i32 2>
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i64 1		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP6]], [[TMP8]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> poison, <2 x i32> <i32 1, i32 1>
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i64 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i64 1
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
Show All 13 Lines	;
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = sub <4 x i32> [[V0:%.]], [[V1:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = sub <4 x i32> [[V0:%.]], [[V1:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP2:%.*]] = add <4 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 7, i32 2>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 1, i32 4, i32 7, i32 2>
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>
; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]		; CHECK-NEXT: [[TMP5:%.*]] = add <4 x i32> [[TMP4]], [[TMP3]]
; CHECK-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP5]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP6:%.*]] = lshr <4 x i32> [[TMP5]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP7:%.*]] = and <4 x i32> [[TMP6]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP7:%.*]] = and <4 x i32> [[TMP6]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP8:%.*]] = mul nuw <4 x i32> [[TMP7]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP8:%.*]] = mul nuw <4 x i32> [[TMP7]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i32> [[TMP9]], [[TMP8]]		; CHECK-NEXT: [[TMP10:%.*]] = xor <4 x i32> [[TMP9]], [[TMP8]]
; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP10]])		; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP10]])
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	Show First 20 Lines • Show All 276 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_0]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_2]], i32 1
	; CHECK-NEXT: [[A_LANE_2:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_1]], i32 2
	; CHECK-NEXT: [[A_LANE_3:%.*]] = fmul double [[V1_LANE_3]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_3]], i32 3
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[A_LANE_2]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[V2_LANE_2]], i32 2
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[A_LANE_3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x double> [[TMP6]], double [[V2_LANE_0]], i32 3
				; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x double> [[TMP3]], [[TMP7]]
				; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x double> [[TMP8]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[A_INS_3]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP9]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 \| FileCheck %s --check-prefix=CHECK			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-6 \| FileCheck %s --check-prefix=CHECK
	; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-8 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION			; RUN: opt -slp-vectorizer -S < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=skylake -slp-threshold=-7 -slp-min-tree-size=6 \| FileCheck %s --check-prefix=FORCE_REDUCTION

	define void @Test(i32) {			define void @Test(i32) {
	; CHECK-LABEL: @Test(			; CHECK-LABEL: @Test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP10:%.]], [[LOOP]] ], [ zeroinitializer, [[ENTRY:%.*]] ]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
	▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/addsub.ll

Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines	;
store double %13, double* %14		store double %13, double* %14
ret void		ret void
}		}

define void @vec_shuff_reorder() #0 {		define void @vec_shuff_reorder() #0 {
; CHECK-LABEL: @vec_shuff_reorder(		; CHECK-LABEL: @vec_shuff_reorder(
; CHECK-NEXT: [[TMP1:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4		; CHECK-NEXT: [[TMP1:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4
; CHECK-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4		; CHECK-NEXT: [[TMP2:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4
; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1) to <2 x float>*), align 4		; CHECK-NEXT: [[TMP3:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4
; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1) to <2 x float>*), align 4		; CHECK-NEXT: [[TMP4:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4
; CHECK-NEXT: [[TMP5:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 3), align 4		; CHECK-NEXT: [[TMP5:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 2) to <2 x float>*), align 4
; CHECK-NEXT: [[TMP6:%.]] = load float, float getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 3), align 4		; CHECK-NEXT: [[TMP6:%.]] = load <2 x float>, <2 x float> bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 2) to <2 x float>*), align 4
; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0		; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float [[TMP3]], i32 1
; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>		; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP5]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP10:%.*]] = insertelement <4 x float> [[TMP9]], float [[TMP5]], i32 3		; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP9]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP1]], i32 0		; CHECK-NEXT: [[TMP11:%.*]] = insertelement <4 x float> poison, float [[TMP2]], i32 0
; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP12:%.*]] = insertelement <4 x float> [[TMP11]], float [[TMP4]], i32 1
; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <4 x float> [[TMP11]], <4 x float> [[TMP12]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>		; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x float> [[TMP6]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP6]], i32 3		; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <4 x float> [[TMP12]], <4 x float> [[TMP13]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[TMP15:%.*]] = fadd <4 x float> [[TMP10]], [[TMP14]]		; CHECK-NEXT: [[TMP15:%.*]] = fadd <4 x float> [[TMP10]], [[TMP14]]
; CHECK-NEXT: [[TMP16:%.*]] = fsub <4 x float> [[TMP10]], [[TMP14]]		; CHECK-NEXT: [[TMP16:%.*]] = fsub <4 x float> [[TMP10]], [[TMP14]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <4 x float> [[TMP15]], <4 x float> [[TMP16]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: store <4 x float> [[TMP17]], <4 x float>* bitcast ([4 x float]* @fc to <4 x float>*), align 4		; CHECK-NEXT: store <4 x float> [[TMP17]], <4 x float>* bitcast ([4 x float]* @fc to <4 x float>*), align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%1 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4		%1 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4
%2 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4		%2 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4
Show All 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show All 10 Lines
	@cle32 = external unnamed_addr global [32 x i32], align 16			@cle32 = external unnamed_addr global [32 x i32], align 16


	; Check that we correctly detect a splat/broadcast by leveraging the			; Check that we correctly detect a splat/broadcast by leveraging the
	; commutativity property of `xor`.			; commutativity property of `xor`.

	define void @splat(i8 %a, i8 %b, i8 %c) {			define void @splat(i8 %a, i8 %b, i8 %c) {
	; SSE-LABEL: @splat(			; SSE-LABEL: @splat(
	; SSE-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; SSE-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0
	; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer			; SSE-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1
	; SSE-NEXT: [[TMP2:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0			; SSE-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; SSE-NEXT: [[TMP3:%.]] = insertelement <16 x i8> [[TMP2]], i8 [[B:%.]], i32 1			; SSE-NEXT: [[TMP3:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
	; SSE-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; SSE-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> zeroinitializer
	; SSE-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]			; SSE-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]
	; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16			; SSE-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @splat(			; AVX-LABEL: @splat(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP1]], <16 x i8> poison, <16 x i32> zeroinitializer			; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> [[TMP1]], i8 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP2:%.]] = insertelement <16 x i8> poison, i8 [[A:%.]], i32 0			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i8> [[TMP2]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; AVX-NEXT: [[TMP3:%.]] = insertelement <16 x i8> [[TMP2]], i8 [[B:%.]], i32 1			; AVX-NEXT: [[TMP3:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
	; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i8> [[TMP3]], <16 x i8> poison, <16 x i32> zeroinitializer
	; AVX-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]			; AVX-NEXT: [[TMP4:%.*]] = xor <16 x i8> [[SHUFFLE]], [[SHUFFLE1]]
	; AVX-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16			; AVX-NEXT: store <16 x i8> [[TMP4]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%1 = xor i8 %c, %a			%1 = xor i8 %c, %a
	store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16			store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16
	%2 = xor i8 %a, %c			%2 = xor i8 %a, %c
	store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)			store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 28 Lines
	; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX22:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP13]], [[TMP14]]			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	; CHECK: label:			; CHECK: label:
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show All 24 Lines
	; CHECK: for.cond36.preheader:			; CHECK: for.cond36.preheader:
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY42_LR_PH_US:%.]], label [[_Z5CLAMPD_EXIT_1:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY42_LR_PH_US:%.]], label [[_Z5CLAMPD_EXIT_1:%.]]
	; CHECK: cond.false51.us:			; CHECK: cond.false51.us:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cond.true48.us:			; CHECK: cond.true48.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]
	; CHECK: cond.false66.us:			; CHECK: cond.false66.us:
	; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef			; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[ADD_I276_US]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double poison, double 0xBFA5CC2D1960285F>, double [[ADD_I276_US]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> [[TMP0]], <double 0.000000e+00, double 0xBFA5CC2D1960285F>			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> <double 0.000000e+00, double undef>, [[TMP0]]
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.400000e+02, double 1.400000e+02>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.400000e+02, double 1.400000e+02>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 5.000000e+01, double 5.200000e+01>			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], <double 5.000000e+01, double 5.200000e+01>
	; CHECK-NEXT: [[TMP4:%.*]] = fmul <2 x double> undef, [[TMP1]]			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP1]], i32 0
	; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP5]], align 8			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double poison, double undef>, double [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP5]], i32 1
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP6]], align 8			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP6]], [[TMP7]]
				; CHECK-NEXT: [[TMP9:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*
				; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP9]], align 8
				; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*
				; CHECK-NEXT: store <2 x double> [[TMP8]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cond.true63.us:			; CHECK: cond.true63.us:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: for.body42.lr.ph.us:			; CHECK: for.body42.lr.ph.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE48_US:%.]], label [[COND_FALSE51_US:%.]]
	; CHECK: _Z5clampd.exit.1:			; CHECK: _Z5clampd.exit.1:
	; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]			; CHECK-NEXT: br label [[FOR_COND36_PREHEADER]]
	;			;
	▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	%struct.sw = type { float, float, float, float }			%struct.sw = type { float, float, float, float }

	define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {			define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0			; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[Y:%.]] = getelementptr inbounds [[STRUCT_SW]], %struct.sw [[V]], i64 0, i32 1			; CHECK-NEXT: [[Y:%.]] = getelementptr inbounds [[STRUCT_SW]], %struct.sw [[V]], i64 0, i32 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[X]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[X]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 16			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 16
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP3:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP3:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> <float poison, float undef, float poison, float poison>, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <4 x float> [[TMP4]], <4 x float> [[TMP5]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP7]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> poison, [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <4 x float> [[TMP8]], float [[TMP3]], i32 2			; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], poison
	; CHECK-NEXT: [[TMP10:%.*]] = fmul <4 x float> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], poison
	; CHECK-NEXT: [[TMP11:%.*]] = fadd <4 x float> poison, [[TMP10]]			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x float> [[TMP9]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = fadd <4 x float> [[TMP11]], poison			; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <4 x float> [[TMP12]], poison			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x float> [[TMP9]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x float> [[TMP13]], i32 0			; CHECK-NEXT: [[VEC2:%.*]] = insertelement <2 x float> [[VEC1]], float [[TMP11]], i32 1
	; CHECK-NEXT: [[VEC1:%.*]] = insertelement <2 x float> undef, float [[TMP14]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x float> [[TMP9]], i32 2
	; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x float> [[TMP13]], i32 1			; CHECK-NEXT: [[VEC3:%.*]] = insertelement <2 x float> undef, float [[TMP12]], i32 0
	; CHECK-NEXT: [[VEC2:%.*]] = insertelement <2 x float> [[VEC1]], float [[TMP15]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x float> [[TMP9]], i32 3
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <4 x float> [[TMP13]], i32 2			; CHECK-NEXT: [[VEC4:%.*]] = insertelement <2 x float> [[VEC3]], float [[TMP13]], i32 1
	; CHECK-NEXT: [[VEC3:%.*]] = insertelement <2 x float> undef, float [[TMP16]], i32 0
	; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[TMP13]], i32 3
	; CHECK-NEXT: [[VEC4:%.*]] = insertelement <2 x float> [[VEC3]], float [[TMP17]], i32 1
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VEC2]], 0			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[VEC2]], 0
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[VEC4]], 1			; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[VEC4]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	;			;
	entry:			entry:
	%0 = load float, float* undef, align 4			%0 = load float, float* undef, align 4
	%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0			%x = getelementptr inbounds %struct.sw, %struct.sw* %v, i64 0, i32 0
	%1 = load float, float* %x, align 16			%1 = load float, float* %x, align 16
	Show All 27 Lines

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

	Show All 31 Lines
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP13]], [[TMP10]]			; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP10]], [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8			; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0			; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0			; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0
	; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0			; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0
	; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0			; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0
	; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1			; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
	; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2			; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2
	; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2			; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2
	; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1			; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1
	; CHECK-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8			; CHECK-NEXT: [[B0:%.]] = load double, double [[IDXB0]], align 8
	; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8			; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8
	; CHECK-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXA0]] to <2 x double>*
				; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8			; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8
	; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8			; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[B0]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[B2]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[A1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> [[TMP4]], double [[B2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A2]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = fsub fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[B1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[A2]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP6]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP8]], [[TMP1]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP9]]
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP9]], [[TMP6]]
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
	; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1			; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: store double [[A1]], double* [[EXT1:%.*]], align 8			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP1]], i32 1
				; CHECK-NEXT: store double [[TMP12]], double* [[EXT1:%.*]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%IdxA0 = getelementptr inbounds double, double* %A, i64 0			%IdxA0 = getelementptr inbounds double, double* %A, i64 0
	%IdxB0 = getelementptr inbounds double, double* %B, i64 0			%IdxB0 = getelementptr inbounds double, double* %B, i64 0
	%IdxC0 = getelementptr inbounds double, double* %C, i64 0			%IdxC0 = getelementptr inbounds double, double* %C, i64 0
	%IdxD0 = getelementptr inbounds double, double* %D, i64 0			%IdxD0 = getelementptr inbounds double, double* %D, i64 0

	▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines

	; Same as @ChecksExtractScores, but the extratelement vector operands do not match.			; Same as @ChecksExtractScores, but the extratelement vector operands do not match.
	define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {			define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {
	; CHECK-LABEL: @ChecksExtractScores_different_vectors(			; CHECK-LABEL: @ChecksExtractScores_different_vectors(
	; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0			; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
	; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1			; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
	; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4			; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
				RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
	; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4			; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
	; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0			; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
	; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1			; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
	; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4			; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4			; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
	; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0			; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
	; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1			; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRA1]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRA1]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRB0]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRB0]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[TMP2]]
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP5]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP2]]			; CHECK-NEXT: [[TMP8:%.*]] = fmul <2 x double> [[TMP7]], [[TMP2]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[TMP8]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP8]]
	; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0			; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
	; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1			; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
	; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP10:%.]] = bitcast double [[SIDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8			; CHECK-NEXT: store <2 x double> [[TMP9]], <2 x double>* [[TMP10]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	lp:
%to_2 = getelementptr double, double * %to, i64 1		%to_2 = getelementptr double, double * %to, i64 1
store double %v1_1, double *%to		store double %v1_1, double *%to
store double %v1_2, double *%to_2		store double %v1_2, double *%to_2
br i1 undef, label %lp, label %ext		br i1 undef, label %lp, label %ext

ext:		ext:
ret void		ret void
}		}

define void @shuffle_nodes_match1(double * noalias %from, double * noalias %to, double %v1, double %v2) {		define void @shuffle_nodes_match1(double * noalias %from, double * noalias %to, double %v1, double %v2) {
		RKSimonUnsubmitted Not Done Reply Inline Actions A lot of these tests aren't preserving the broadcast any more - I'm not sure if it really matters although the testnames now look wrong? RKSimon: A lot of these tests aren't preserving the broadcast any more - I'm not sure if it really…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll rename affected test cases ABataev: I'll rename affected test cases
; CHECK-LABEL: @shuffle_nodes_match1(		; CHECK-LABEL: @shuffle_nodes_match1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LP:%.*]]		; CHECK-NEXT: br label [[LP:%.*]]
; CHECK: lp:		; CHECK: lp:
; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]		; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i64 0		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i64 1		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i64 0		; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> zeroinitializer		; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]		; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
; CHECK: ext:		; CHECK: ext:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %lp		br label %lp

lp:		lp:
Show All 15 Lines
define void @vecload_vs_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {		define void @vecload_vs_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {
; CHECK-LABEL: @vecload_vs_broadcast4(		; CHECK-LABEL: @vecload_vs_broadcast4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LP:%.*]]		; CHECK-NEXT: br label [[LP:%.*]]
; CHECK: lp:		; CHECK: lp:
; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]		; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP2]], [[TMP3]]		; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>		; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]		; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
; CHECK: ext:		; CHECK: ext:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %lp		br label %lp

lp:		lp:
Show All 14 Lines


define void @shuffle_nodes_match2(double * noalias %from, double * noalias %to, double %v1, double %v2) {		define void @shuffle_nodes_match2(double * noalias %from, double * noalias %to, double %v1, double %v2) {
; CHECK-LABEL: @shuffle_nodes_match2(		; CHECK-LABEL: @shuffle_nodes_match2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LP:%.*]]		; CHECK-NEXT: br label [[LP:%.*]]
; CHECK: lp:		; CHECK: lp:
; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]		; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i64 0		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i64 1
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> poison, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP2]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i64 0		; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i64 1		; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]		; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
; CHECK: ext:		; CHECK: ext:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %lp		br label %lp

lp:		lp:
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
; c[1] = b[1]+a[1]; // swapped b[1] and a[1]		; c[1] = b[1]+a[1]; // swapped b[1] and a[1]

define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){		define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){
; CHECK-LABEL: @load_reorder_double(		; CHECK-LABEL: @load_reorder_double(
; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>
; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>		; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>
; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>		; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4		; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%1 = load double, double* %a		%1 = load double, double* %a
%2 = load double, double* %b		%2 = load double, double* %b
%3 = fadd double %1, %2		%3 = fadd double %1, %2
store double %3, double* %c		store double %3, double* %c
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/store-jumbled.ll

	Show All 16 Lines
	; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3			; CHECK-NEXT: [[GEP_6:%.]] = getelementptr inbounds i32, i32 [[INN_ADDR]], i64 3
	; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[INN_ADDR]] to <4 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = bitcast i32 [[INN_ADDR]] to <4 x i32>*
	; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4			; CHECK-NEXT: [[TMP4:%.]] = load <4 x i32>, <4 x i32> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP5:%.*]] = mul <4 x i32> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0			; CHECK-NEXT: [[GEP_7:%.]] = getelementptr inbounds i32, i32 [[OUT:%.*]], i64 0
	; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1			; CHECK-NEXT: [[GEP_8:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 1
	; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2			; CHECK-NEXT: [[GEP_9:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 2
	; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3			; CHECK-NEXT: [[GEP_10:%.]] = getelementptr inbounds i32, i32 [[OUT]], i64 3
	; CHECK-NEXT: [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 0, i32 2>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 1, i32 3, i32 0, i32 2>
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[GEP_7]] to <4 x i32>*
	; CHECK-NEXT: store <4 x i32> [[REORDER_SHUFFLE]], <4 x i32>* [[TMP6]], align 4			; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 undef
	;			;
	%in.addr = getelementptr inbounds i32, i32* %in, i64 0			%in.addr = getelementptr inbounds i32, i32* %in, i64 0
	%load.1 = load i32, i32* %in.addr, align 4			%load.1 = load i32, i32* %in.addr, align 4
	%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 1			%gep.1 = getelementptr inbounds i32, i32* %in.addr, i64 1
	%load.2 = load i32, i32* %gep.1, align 4			%load.2 = load i32, i32* %gep.1, align 4
	%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 2			%gep.2 = getelementptr inbounds i32, i32* %in.addr, i64 2
	%load.3 = load i32, i32* %gep.2, align 4			%load.3 = load i32, i32* %gep.2, align 4
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/stores_vectorize.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3			; CHECK-NEXT: [[ARRAYIDX11:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 3
	; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[P3]] to <4 x i64>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast i64 [[P3]] to <4 x i64>*
	; CHECK-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <4 x i64>, <4 x i64> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 11			; CHECK-NEXT: [[ARRAYIDX12:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 11
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[ARRAYIDX1]] to <4 x i64>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast i64 [[ARRAYIDX1]] to <4 x i64>*
	; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <4 x i64>, <4 x i64> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = shl <4 x i64> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = shl <4 x i64> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 4			; CHECK-NEXT: [[ARRAYIDX14:%.]] = getelementptr inbounds i64, i64 [[P3]], i64 4
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i64> [[TMP4]], <4 x i64> poison, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i64 [[ARRAYIDX14]] to <4 x i64>*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i64 [[ARRAYIDX14]] to <4 x i64>*
	; CHECK-NEXT: store <4 x i64> [[TMP5]], <4 x i64>* [[TMP6]], align 8			; CHECK-NEXT: store <4 x i64> [[SHUFFLE]], <4 x i64>* [[TMP5]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = load i64, i64* %p3, align 8			%0 = load i64, i64* %p3, align 8
	%arrayidx1 = getelementptr inbounds i64, i64* %p3, i64 8			%arrayidx1 = getelementptr inbounds i64, i64* %p3, i64 8
	%1 = load i64, i64* %arrayidx1, align 8			%1 = load i64, i64* %arrayidx1, align 8
	%shl = shl i64 %0, %1			%shl = shl i64 %0, %1
	%arrayidx2 = getelementptr inbounds i64, i64* %p3, i64 7			%arrayidx2 = getelementptr inbounds i64, i64* %p3, i64 7
	▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/supernode.ll

	Show All 17 Lines
	; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8			; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8
	; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8
	; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*
	; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8			; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8
	; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0
	; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1			; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1
	; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP1]]			; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1			; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1
	; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]			; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]
	; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; ENABLED-NEXT: ret void			; ENABLED-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve multinode analysis.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 394227

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/X86/PR39774.ll

llvm/test/Transforms/SLPVectorizer/X86/addsub.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/store-jumbled.ll

llvm/test/Transforms/SLPVectorizer/X86/stores_vectorize.ll

llvm/test/Transforms/SLPVectorizer/X86/supernode.ll

[SLP]Improve multinode analysis.
ClosedPublic