This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
10/23
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
2
transpose-inseltpoison.ll
2
transpose.ll
-
vectorize-free-extracts-inserts.ll
-
X86/
-
commutativity.ll
-
crash_exceed_scheduling.ll
-
crash_smallpt.ll
-
extractelement.ll
1
lookahead.ll
1/2
operandorder.ll
-
supernode.ll

Differential D101109

[SLP]Improve multinode analysis.
ClosedPublic

Authored by ABataev on Apr 22 2021, 2:01 PM.

Download Raw Diff

Details

Reviewers

spatel
RKSimon
vdmitrie
anton-afanasyev
vporpo

Commits

rGbd053769867f: [SLP]Improve multinode analysis.

Summary

Changes the preliminary multinode analysis:

Introduced scores for reversed loads/extractelements.
Improved shallow score calculation.
Lowered the cost of external uses (no need to consider it several times, just ones).
The initial lane for analysis is the one with the minimal possible reorderings.

These changes in general shall reduce compile time and improve the
reordering in many cases.

Part of D57059.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Apr 22 2021, 2:01 PM

Herald added subscribers: tmatheson, hiraditya. · View Herald TranscriptApr 22 2021, 2:01 PM

ABataev requested review of this revision.Apr 22 2021, 2:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2021, 2:01 PM

Harbormaster completed remote builds in B100376: Diff 339777.Apr 22 2021, 3:56 PM

Rebase

Harbormaster completed remote builds in B101150: Diff 340824.Apr 27 2021, 7:39 AM

ABataev updated this revision to Diff 345547.May 14 2021, 1:45 PM

Rebase

Harbormaster completed remote builds in B104585: Diff 345547.May 14 2021, 2:33 PM

ABataev mentioned this in D57059: [SLP] Initial support for the vectorization of the non-power-of-2 vectors..May 18 2021, 11:04 AM

RKSimon added a reviewer: anton-afanasyev.May 18 2021, 1:10 PM

SjoerdMeijer added a subscriber: SjoerdMeijer.May 20 2021, 6:50 AM

Have you been able to investigate any of these instruction increase regressions?

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll
87	Regression?
231	Regression?
llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll
83	Regression?
231	Regression?
llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll
592	Regression?

In D101109#2771153, @RKSimon wrote:

Have you been able to investigate any of these instruction increase regressions?

I think most of them can be fixed, requires D103247 and 1 or 2 extra patches to allow tree reordering for larger subsets of trees.

Matt added a subscriber: Matt.Jun 4 2021, 8:27 AM

rebase?

In D101109#2807872, @RKSimon wrote:

rebase?

Need to prepare 1 or 2 extra patches to fix the regressions introduced in this patch (allow reordering for insertelements etc.). Will rebase it after this.

Rebase

There are still regressions, even after we allowed reordering of insertelements. It is because the reordering is not quite effective. I have an idea of how to improve it (and avoid rebuilding the tree for the second time and improve compile time), will try to implement it next week.

Harbormaster completed remote builds in B108831: Diff 351472.Jun 11 2021, 10:09 AM

RKSimon added inline comments.Jun 16 2021, 12:37 AM

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll
139	A lot of these tests aren't preserving the broadcast any more - I'm not sure if it really matters although the testnames now look wrong?

ABataev added inline comments.Jun 16 2021, 4:03 AM

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll
139	I'll rename affected test cases

Rebase

Harbormaster completed remote builds in B110281: Diff 353476.Jun 21 2021, 3:05 PM

rebase?

Rebase

It depends on D105020, which should fix all the regressions caused by this patch

Harbormaster completed remote builds in B113195: Diff 357503.Jul 9 2021, 8:10 AM

ABataev mentioned this in D105730: [SLP] match logical and/or as reduction candidates.Jul 12 2021, 5:51 AM

RKSimon added inline comments.Jul 13 2021, 2:16 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1015	Should this be m_Deferred or m_Specific? I thought m_Deferred was only necessary in the same match call?
1020	Can this be a single line comment?

Rebase, fixes and addressed comments.

Harbormaster completed remote builds in B115608: Diff 360863.Jul 22 2021, 10:02 AM

Rebase

Harbormaster completed remote builds in B130946: Diff 382655.Oct 27 2021, 8:23 AM

Rebase

Harbormaster completed remote builds in B130968: Diff 382690.Oct 27 2021, 9:45 AM

Perhaps the score changes could be split into a separate patch?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1317	Could you add a comment what these `unsigned` values are for? (or perhaps use a struct instead?). Could you also describe the the purpose of `HashMap` and what the keys and values are?
1318	Is there any reasoning behind the iteration in reverse? If so could you please add a comment?
1329	NIT: I find `Code` a bit confusing, also perhaps there is no need to refer to `Parent` in the variable name? Perhaps rename to something like `NumOpsWithSameOpcode`?
1330	Could you add a bit more text in the comment what the hashed is used for? I can see that it is used as a key in the `HashMap` above, but could you explain how it is being used?
1336–1363	Could you elaborate a bit on this? If I understand correctly the more similar opcodes we can find, the easier it is to reorder them, therefore this can act as a tie-breaker when the NumOfAPOs is equal?
1366	If I am not mistaken this code will count the consecutive operands with same opcode and BB. Is it because this is a good enough approximation?

vporpo added a reviewer: vporpo.Nov 6 2021, 3:21 AM

In D101109#3113468, @vporpo wrote:

Perhaps the score changes could be split into a separate patch?

Not alone, they cause regressions. Will try to separate cost-model changes.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1317	Yes, just forgot to add some extra comments, will add them after updates. `std::pair<unsigned, unsigned>` is used to implement a simple voting algorithm and choose the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. The first unsigned is a counter for voting, the second unsigned is the counter of lanes with instructions with same/alternate opcodes and same parent basic block.
1318	This is just to be closer to the original results, before this patch, nothing else, if we have multiple lanes with same cost.
1329	Will rename it but I'd rather keep `Parent`, because I compare not only opcodes but the parent too.
1330	It is used to count operands, actually their position id and opcode value. It is used in the voting mechanism to find the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. I can use `SmallVector<unsigned>` instead but to use hash code, it is faster and requires less memory.
1336–1363	If the lane already has operands with the same opcode and same parent, no need to swap the operands in this lane, with a high probability such lane already can be vectorized effectively.
1366	Yes, exactly, in most cases it results in the optimal values in the lane.

Rebase + address comments

Harbormaster completed remote builds in B133989: Diff 386887.Nov 12 2021, 11:08 AM

tmatheson removed a subscriber: tmatheson.Nov 12 2021, 2:07 PM

Rebase

Harbormaster completed remote builds in B134587: Diff 387729.Nov 16 2021, 1:40 PM

Rebase

Harbormaster completed remote builds in B137174: Diff 391377.Dec 2 2021, 11:08 AM

@vporpo Any more comments?

vporpo added inline comments.Dec 3 2021, 12:41 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1368	Are you using `NumOpsWithSameOpcodeParent == 0` as a check for the first iteration ? Shouldn't you be using `!OpcodeI` intsead ? I find this code a bit hard to follow, because I can't tell which of the `if` conditions are for checking for the first iteration and which ones are part of the heuristic. Should it be updating the `OpcodeI` and `Parent` only in the first iteration (like below), or should it be doing it whenever there is a mismatch? if (auto *I = dyn_cast<Instruction>(OpData.V)) { // First iteration if (!OpcodeI) { OpcodeI = I; Parent = I->getParent(); } // Mismatch if (!getSameOpcode({OpcodeI, I}).getOpcode() \|\| I->getParent() != Parent) ++NumOpsWithSameOpcodeParent; else NumOpsWithSameOpcodeParent = std::min(NumOpsWithSameOpcodeParent-1, 0); } Perhaps peeling the first iteration might make the code easier to follow?
1369	Why is `NumOpsWithSameOpcodeParent` set to 1 the first time a mismatch is found? Shouldn't it be set to 0 ?

ABataev added inline comments.Dec 7 2021, 1:56 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1368	This is again a kind of voting algorithm. This code works every time, we start voting on a value with the new opcode, not only on the first iteration. We just try to find the opcode with not less than NumOperands/2 number of occurrences here, if no such opcode - just choose any of them, there are no profitable elements.
1369	It is a kind of increasing the counter for the first element in the sequence.

vporpo added inline comments.Dec 7 2021, 3:19 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1369	Yes, it is increasing it, but shouldn't it be decreasing it instead (or letting it remain 0) ? This code block executes when there is a mismatch of opcode or parent (or if it is the first iteration), so shouldn't we be decreasing the value of`NumOpsWithSameOpcodeParent` (like in line 1463)? What confuses me here is that `NumOpsWithSameOpcodeParent` looks like a normal counter that counts the opcode/parent matches. So I would expect it to increase by one if the opcode/parents match (like what line 1466 does), and to decrease by one if there is a mismatch. But it seems to be more complicated than that: When it reaches 0 it foced to 1 even when there is an opcode mismatch. I find this a bit counter intuitive. For example if we have mismatching opcodes in sequence, I would expect it to keep decreasing, or at least be capped to 0. But it seems like the value of `NumOpsWithSameOpcodeParent` will be 0, then 1, then 0, then 1 like so: before the loop: 0 iteration 1: 1 (because it was == 0) iteration 2: 0 (because of opcode mismatch) iteration 3: 1 (because it was == 0)

ABataev added inline comments.Dec 7 2021, 3:24 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1369	This is how the voting algorithm works. Here is described the main idea https://www.geeksforgeeks.org/boyer-moore-majority-voting-algorithm/

vporpo added inline comments.Dec 7 2021, 4:10 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1369	OK that makes sense now, thanks for clarifying! Could you please add a comment saying that this loop is a Boyer-Moore majority voting for finding the majority opcode and the number of times it occurs?

ABataev added inline comments.Dec 7 2021, 4:15 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1369	Sure, will do it tomorrow.

Rebase + improve analysis for extractelements.

Harbormaster completed remote builds in B139065: Diff 394028.Dec 13 2021, 2:30 PM

LGTM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

1016–1022

Nit: Could you use temporary variables and perhaps try to simplify the expression to make it a bit easier to read, something like:

bool MatchExtract1 = match(V1, m_ExtractElt(m_Value(EV1), m_ConstantInt(Ex1Idx)));
bool MatchExtract2 = match(V2, m_ExtractElt(m_Value(EV2), m_CombineOr(m_ConstantInt(Ex2Idx),  m_Undef())));
bool AcceptedEV2 = !EV2 || (isUndefVector(EV2) && EV2->getType() == EV1->getType()) || EV2 == EV1;
if ((MatchExtract1 && isa<UndefValue>(V2)) ||
    (MatchExtract1 && MatchExtract2 && AcceptedEV2)) {

This revision is now accepted and ready to land.Dec 13 2021, 3:16 PM

This revision was landed with ongoing or failed builds.Dec 14 2021, 6:18 AM

Closed by commit rGbd053769867f: [SLP]Improve multinode analysis. (authored by ABataev). · Explain Why

This revision was automatically updated to reflect the committed changes.

ABataev added a commit: rGbd053769867f: [SLP]Improve multinode analysis..

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

In D101109#3213510, @ye-luo wrote:

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

Hi, do you have a reproducer?

In D101109#3213634, @ABataev wrote:

In D101109#3213510, @ye-luo wrote:

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

Hi, do you have a reproducer?

Initially I was not sure where the issue is from and just reported my observation. After a careful inspection, I found it is an interaction between clang and the random number generator in the boost libraries. Since I had little knowledge about the inside details of the library, I decided not to debug it. Instead I just moved my application out of boost and the RNG from C++ standard library works well with Clang. So I won't work on an reproducer. If boost developers find an issue, they will report bugs. Right now assume everything is good.

In D101109#3218469, @ye-luo wrote:

In D101109#3213634, @ABataev wrote:

In D101109#3213510, @ye-luo wrote:

This patch caused many test failure in my application on Power9. Although this patch sounds like affecting SLP, adding -fno-slp-vectorize doesn't improve the pass rate but changing -O3 to -O0 does.

Hi, do you have a reproducer?

Initially I was not sure where the issue is from and just reported my observation. After a careful inspection, I found it is an interaction between clang and the random number generator in the boost libraries. Since I had little knowledge about the inside details of the library, I decided not to debug it. Instead I just moved my application out of boost and the RNG from C++ standard library works well with Clang. So I won't work on an reproducer. If boost developers find an issue, they will report bugs. Right now assume everything is good.

Ok, thanks for letting me know!

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

173 lines

test/

Transforms/

SLPVectorizer/

AArch64/

transpose-inseltpoison.ll

108 lines

transpose.ll

108 lines

vectorize-free-extracts-inserts.ll

21 lines

X86/

commutativity.ll

42 lines

crash_exceed_scheduling.ll

7 lines

4 lines

4 lines

62 lines

38 lines

2 lines

Diff 345547

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 935 Lines • ▼ Show 20 Lines	void clearUsed() {
OpsVec[OpIdx][Lane].IsUsed = false;		OpsVec[OpIdx][Lane].IsUsed = false;
}		}

/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.		/// Swap the operand at \p OpIdx1 with that one at \p OpIdx2.
void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {		void swap(unsigned OpIdx1, unsigned OpIdx2, unsigned Lane) {
std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);		std::swap(OpsVec[OpIdx1][Lane], OpsVec[OpIdx2][Lane]);
}		}

// The hard-coded scores listed here are not very important. When computing		// The hard-coded scores listed here are not very important, though it shall
// the scores of matching one sub-tree with another, we are basically		// be higher for better matches to improve the resulting cost. When
// counting the number of values that are matching. So even if all scores		// computing the scores of matching one sub-tree with another, we are
// are set to 1, we would still get a decent matching result.		// basically counting the number of values that are matching. So even if all
		// scores are set to 1, we would still get a decent matching result.
// However, sometimes we have to break ties. For example we may have to		// However, sometimes we have to break ties. For example we may have to
// choose between matching loads vs matching opcodes. This is what these		// choose between matching loads vs matching opcodes. This is what these
// scores are helping us with: they provide the order of preference.		// scores are helping us with: they provide the order of preference. Also,
		// this is important if the scalar is externally used or used in another
		// tree entry node in the different lane.

/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).		/// Loads from consecutive memory addresses, e.g. load(A[i]), load(A[i+1]).
static const int ScoreConsecutiveLoads = 3;		static const int ScoreConsecutiveLoads = 4;
		/// Loads from reversed memory addresses, e.g. load(A[i+1]), load(A[i]).
		static const int ScoreReversedLoads = 3;
/// ExtractElementInst from same vector and consecutive indexes.		/// ExtractElementInst from same vector and consecutive indexes.
static const int ScoreConsecutiveExtracts = 3;		static const int ScoreConsecutiveExtracts = 4;
		/// ExtractElementInst from same vector and reversed indices.
		static const int ScoreReversedExtracts = 3;
/// Constants.		/// Constants.
static const int ScoreConstants = 2;		static const int ScoreConstants = 2;
/// Instructions with the same opcode.		/// Instructions with the same opcode.
static const int ScoreSameOpcode = 2;		static const int ScoreSameOpcode = 2;
/// Instructions with alt opcodes (e.g, add + sub).		/// Instructions with alt opcodes (e.g, add + sub).
static const int ScoreAltOpcodes = 1;		static const int ScoreAltOpcodes = 1;
/// Identical instructions (a.k.a. splat or broadcast).		/// Identical instructions (a.k.a. splat or broadcast).
static const int ScoreSplat = 1;		static const int ScoreSplat = 1;
/// Matching with an undef is preferable to failing.		/// Matching with an undef is preferable to failing.
static const int ScoreUndef = 1;		static const int ScoreUndef = 1;
/// Score for failing to find a decent match.		/// Score for failing to find a decent match.
static const int ScoreFail = 0;		static const int ScoreFail = 0;
/// User exteranl to the vectorized code.		/// User exteranl to the vectorized code.
static const int ExternalUseCost = 1;		static const int ExternalUseCost = 1;
/// The user is internal but in a different lane.		/// The user is internal but in a different lane.
static const int UserInDiffLaneCost = ExternalUseCost;		static const int UserInDiffLaneCost = ExternalUseCost;

/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.		/// \returns the score of placing \p V1 and \p V2 in consecutive lanes.
static int getShallowScore(Value V1, Value V2, const DataLayout &DL,		static int getShallowScore(Value V1, Value V2, const DataLayout &DL,
ScalarEvolution &SE) {		ScalarEvolution &SE, int NumLanes) {
		if (V1 == V2)
		return VLOperands::ScoreSplat;

auto *LI1 = dyn_cast<LoadInst>(V1);		auto *LI1 = dyn_cast<LoadInst>(V1);
auto *LI2 = dyn_cast<LoadInst>(V2);		auto *LI2 = dyn_cast<LoadInst>(V2);
if (LI1 && LI2) {		if (LI1 && LI2) {
if (LI1->getParent() != LI2->getParent())		if (LI1->getParent() != LI2->getParent())
return VLOperands::ScoreFail;		return VLOperands::ScoreFail;

Optional<int> Dist =		Optional<int> Dist =
getPointersDiff(LI1->getPointerOperand(), LI2->getPointerOperand(),		getPointersDiff(LI1->getPointerOperand(), LI2->getPointerOperand(),
DL, SE, /StrictCheck=/true);		DL, SE, /StrictCheck=/true);
return (Dist && *Dist == 1) ? VLOperands::ScoreConsecutiveLoads		if (!Dist)
: VLOperands::ScoreFail;		return VLOperands::ScoreFail;
		// The distance is too large - still may be profitable to use masked
		// loads/gathers.
		if (std::abs(*Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		return (*Dist > 0) ? VLOperands::ScoreConsecutiveLoads
		: VLOperands::ScoreReversedLoads;
}		}

auto *C1 = dyn_cast<Constant>(V1);		auto *C1 = dyn_cast<Constant>(V1);
auto *C2 = dyn_cast<Constant>(V2);		auto *C2 = dyn_cast<Constant>(V2);
if (C1 && C2)		if (C1 && C2)
return VLOperands::ScoreConstants;		return VLOperands::ScoreConstants;

// Extracts from consecutive indexes of the same vector better score as		// Extracts from consecutive indexes of the same vector better score as
// the extracts could be optimized away.		// the extracts could be optimized away.
Value *EV;		Value *EV;
ConstantInt Ex1Idx, Ex2Idx;		ConstantInt Ex1Idx, Ex2Idx;
if (match(V1, m_ExtractElt(m_Value(EV), m_ConstantInt(Ex1Idx))) &&		if (match(V2, m_ExtractElt(m_Value(EV), m_ConstantInt(Ex2Idx))) &&
match(V2, m_ExtractElt(m_Deferred(EV), m_ConstantInt(Ex2Idx))) &&		match(V1, m_ExtractElt(m_Deferred(EV), m_ConstantInt(Ex1Idx)))) {
		RKSimonUnsubmitted Not Done Reply Inline Actions Should this be m_Deferred or m_Specific? I thought m_Deferred was only necessary in the same match call? RKSimon: Should this be m_Deferred or m_Specific? I thought m_Deferred was only necessary in the same…
Ex1Idx->getZExtValue() + 1 == Ex2Idx->getZExtValue())		int Idx1 = Ex1Idx->getZExtValue();
return VLOperands::ScoreConsecutiveExtracts;		int Idx2 = Ex2Idx->getZExtValue();
		int Dist = Idx2 - Idx1;
		// The distance is too large - still may be profitable to use
		// shuffles.
		RKSimonUnsubmitted Not Done Reply Inline Actions Can this be a single line comment? RKSimon: Can this be a single line comment?
		if (std::abs(Dist) > NumLanes / 2)
		return VLOperands::ScoreAltOpcodes;
		vporpoUnsubmitted Not Done Reply Inline Actions Nit: Could you use temporary variables and perhaps try to simplify the expression to make it a bit easier to read, something like: bool MatchExtract1 = match(V1, m_ExtractElt(m_Value(EV1), m_ConstantInt(Ex1Idx))); bool MatchExtract2 = match(V2, m_ExtractElt(m_Value(EV2), m_CombineOr(m_ConstantInt(Ex2Idx), m_Undef()))); bool AcceptedEV2 = !EV2 \|\| (isUndefVector(EV2) && EV2->getType() == EV1->getType()) \|\| EV2 == EV1; if ((MatchExtract1 && isa<UndefValue>(V2)) \|\| (MatchExtract1 && MatchExtract2 && AcceptedEV2)) { vporpo: Nit: Could you use temporary variables and perhaps try to simplify the expression to make it a…
		return (Dist > 0) ? VLOperands::ScoreConsecutiveExtracts
		: VLOperands::ScoreReversedExtracts;
		}

auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (I1 && I2) {		if (I1 && I2) {
if (I1 == I2)		if (I1->getParent() != I2->getParent())
return VLOperands::ScoreSplat;		return VLOperands::ScoreFail;
InstructionsState S = getSameOpcode({I1, I2});		InstructionsState S = getSameOpcode({I1, I2});
// Note: Only consider instructions with <= 2 operands to avoid		// Note: Only consider instructions with <= 2 operands to avoid
// complexity explosion.		// complexity explosion.
if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)		if (S.getOpcode() && S.MainOp->getNumOperands() <= 2)
return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes		return S.isAltShuffle() ? VLOperands::ScoreAltOpcodes
: VLOperands::ScoreSameOpcode;		: VLOperands::ScoreSameOpcode;
}		}

if (isa<UndefValue>(V2))		if (isa<UndefValue>(V2))
return VLOperands::ScoreUndef;		return VLOperands::ScoreUndef;

return VLOperands::ScoreFail;		return VLOperands::ScoreFail;
}		}

/// Holds the values and their lane that are taking part in the look-ahead		/// Holds the values and their lanes that are taking part in the look-ahead
/// score calculation. This is used in the external uses cost calculation.		/// score calculation. This is used in the external uses cost calculation.
SmallDenseMap<Value *, int> InLookAheadValues;		/// Need to hold all the lanes in case of splat/broadcast at least to
		/// correctly check for the use in the different lane.
		SmallDenseMap<Value *, SmallSet<int, 4>> InLookAheadValues;

/// \Returns the additinal cost due to uses of \p LHS and \p RHS that are		/// \returns the additional cost due to uses of \p LHS and \p RHS that are
/// either external to the vectorized code, or require shuffling.		/// either external to the vectorized code, or require shuffling.
int getExternalUsesCost(const std::pair<Value *, int> &LHS,		int getExternalUsesCost(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS) {		const std::pair<Value *, int> &RHS) {
int Cost = 0;		int Cost = 0;
std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};		std::array<std::pair<Value *, int>, 2> Values = {{LHS, RHS}};
for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {		for (int Idx = 0, IdxE = Values.size(); Idx != IdxE; ++Idx) {
Value *V = Values[Idx].first;		Value *V = Values[Idx].first;
if (isa<Constant>(V)) {		if (isa<Constant>(V)) {
// Since this is a function pass, it doesn't make semantic sense to		// Since this is a function pass, it doesn't make semantic sense to
// walk the users of a subclass of Constant. The users could be in		// walk the users of a subclass of Constant. The users could be in
// another function, or even another module that happens to be in		// another function, or even another module that happens to be in
// the same LLVMContext.		// the same LLVMContext.
continue;		continue;
}		}

// Calculate the absolute lane, using the minimum relative lane of LHS		// Calculate the absolute lane, using the minimum relative lane of LHS
// and RHS as base and Idx as the offset.		// and RHS as base and Idx as the offset.
int Ln = std::min(LHS.second, RHS.second) + Idx;		int Ln = std::min(LHS.second, RHS.second) + Idx;
assert(Ln >= 0 && "Bad lane calculation");		assert(Ln >= 0 && "Bad lane calculation");
unsigned UsersBudget = LookAheadUsersBudget;		unsigned UsersBudget = LookAheadUsersBudget;
for (User *U : V->users()) {		for (User *U : V->users()) {
if (const TreeEntry *UserTE = R.getTreeEntry(U)) {		if (const TreeEntry *UserTE = R.getTreeEntry(U)) {
// The user is in the VectorizableTree. Check if we need to insert.		// The user is in the VectorizableTree. Check if we need to insert.
auto It = llvm::find(UserTE->Scalars, U);		const auto *It = llvm::find(UserTE->Scalars, U);
assert(It != UserTE->Scalars.end() && "U is in UserTE");		assert(It != UserTE->Scalars.end() && "U is in UserTE");
int UserLn = std::distance(UserTE->Scalars.begin(), It);		int UserLn = std::distance(UserTE->Scalars.begin(), It);
assert(UserLn >= 0 && "Bad lane");		assert(UserLn >= 0 && "Bad lane");
if (UserLn != Ln)		// If the values are different, check just the line of the current
		// value. If the values are the same, need to add UserInDiffLaneCost
		// only if UserLn does not match both line numbers.
		if ((LHS.first != RHS.first && UserLn != Ln) \|\|
		(LHS.first == RHS.first && UserLn != LHS.second &&
		UserLn != RHS.second)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// Check if the user is in the look-ahead code.		// Check if the user is in the look-ahead code.
auto It2 = InLookAheadValues.find(U);		auto It2 = InLookAheadValues.find(U);
if (It2 != InLookAheadValues.end()) {		if (It2 != InLookAheadValues.end()) {
// The user is in the look-ahead code. Check the lane.		// The user is in the look-ahead code. Check the lane.
if (It2->second != Ln)		if (!It2->getSecond().contains(Ln)) {
Cost += UserInDiffLaneCost;		Cost += UserInDiffLaneCost;
		break;
		}
} else {		} else {
// The user is neither in SLP tree nor in the look-ahead code.		// The user is neither in SLP tree nor in the look-ahead code.
Cost += ExternalUseCost;		Cost += ExternalUseCost;
		break;
}		}
}		}
// Limit the number of visited uses to cap compilation time.		// Limit the number of visited uses to cap compilation time.
if (--UsersBudget == 0)		if (--UsersBudget == 0)
break;		break;
}		}
}		}
return Cost;		return Cost;
Show All 22 Lines	class VLOperands {
/// Luís F. W. Góes		/// Luís F. W. Góes
int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,		int getScoreAtLevelRec(const std::pair<Value *, int> &LHS,
const std::pair<Value *, int> &RHS, int CurrLevel,		const std::pair<Value *, int> &RHS, int CurrLevel,
int MaxLevel) {		int MaxLevel) {

Value *V1 = LHS.first;		Value *V1 = LHS.first;
Value *V2 = RHS.first;		Value *V2 = RHS.first;
// Get the shallow score of V1 and V2.		// Get the shallow score of V1 and V2.
int ShallowScoreAtThisLevel =		int ShallowScoreAtThisLevel = std::max(
std::max((int)ScoreFail, getShallowScore(V1, V2, DL, SE) -		(int)ScoreFail, getShallowScore(V1, V2, DL, SE, getNumLanes()) -
getExternalUsesCost(LHS, RHS));		getExternalUsesCost(LHS, RHS));
int Lane1 = LHS.second;		int Lane1 = LHS.second;
int Lane2 = RHS.second;		int Lane2 = RHS.second;

// If reached MaxLevel,		// If reached MaxLevel,
// or if V1 and V2 are not instructions,		// or if V1 and V2 are not instructions,
// or if they are SPLAT,		// or if they are SPLAT,
// or if they are not consecutive, early return the current cost.		// or if they are not consecutive,
		// or if profitable to vectorize loads or extractelements, early return
		// the current cost.
auto *I1 = dyn_cast<Instruction>(V1);		auto *I1 = dyn_cast<Instruction>(V1);
auto *I2 = dyn_cast<Instruction>(V2);		auto *I2 = dyn_cast<Instruction>(V2);
if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|		if (CurrLevel == MaxLevel \|\| !(I1 && I2) \|\| I1 == I2 \|\|
ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|		ShallowScoreAtThisLevel == VLOperands::ScoreFail \|\|
(isa<LoadInst>(I1) && isa<LoadInst>(I2) && ShallowScoreAtThisLevel))		(((isa<LoadInst>(I1) && isa<LoadInst>(I2)) \|\|
		(isa<ExtractElementInst>(I1) && isa<ExtractElementInst>(I2))) &&
		ShallowScoreAtThisLevel))
return ShallowScoreAtThisLevel;		return ShallowScoreAtThisLevel;
assert(I1 && I2 && "Should have early exited.");		assert(I1 && I2 && "Should have early exited.");

// Keep track of in-tree values for determining the external-use cost.		// Keep track of in-tree values for determining the external-use cost.
InLookAheadValues[V1] = Lane1;		InLookAheadValues[V1].insert(Lane1);
InLookAheadValues[V2] = Lane2;		InLookAheadValues[V2].insert(Lane2);

// Contains the I2 operand indexes that got matched with I1 operands.		// Contains the I2 operand indexes that got matched with I1 operands.
SmallSet<unsigned, 4> Op2Used;		SmallSet<unsigned, 4> Op2Used;

// Recursion towards the operands of I1 and I2. We are trying all possbile		// Recursion towards the operands of I1 and I2. We are trying all possible
// operand pairs, and keeping track of the best score.		// operand pairs, and keeping track of the best score.
for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();		for (unsigned OpIdx1 = 0, NumOperands1 = I1->getNumOperands();
OpIdx1 != NumOperands1; ++OpIdx1) {		OpIdx1 != NumOperands1; ++OpIdx1) {
// Try to pair op1I with the best operand of I2.		// Try to pair op1I with the best operand of I2.
int MaxTmpScore = 0;		int MaxTmpScore = 0;
unsigned MaxOpIdx2 = 0;		unsigned MaxOpIdx2 = 0;
bool FoundBest = false;		bool FoundBest = false;
// If I2 is commutative try all combinations.		// If I2 is commutative try all combinations.
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	getBestOperand(unsigned OpIdx, int Lane, int LastLane,
if (BestOp.Idx) {		if (BestOp.Idx) {
getData(BestOp.Idx.getValue(), Lane).IsUsed = true;		getData(BestOp.Idx.getValue(), Lane).IsUsed = true;
return BestOp.Idx;		return BestOp.Idx;
}		}
// If we could not find a good match return None.		// If we could not find a good match return None.
return None;		return None;
}		}

/// Helper for reorderOperandVecs. \Returns the lane that we should start		/// Helper for reorderOperandVecs.
/// reordering from. This is the one which has the least number of operands		/// \returns the lane that we should start reordering from. This is the one
/// that can freely move about.		/// which has the least number of operands that can freely move about or
		/// less profitable because it already has the most optimal set of operands.
unsigned getBestLaneToStartReordering() const {		unsigned getBestLaneToStartReordering() const {
unsigned BestLane = 0;		unsigned BestLane = 0;
unsigned Min = UINT_MAX;		unsigned Min = UINT_MAX;
for (unsigned Lane = 0, NumLanes = getNumLanes(); Lane != NumLanes;		unsigned SameOpNumber = 0;
++Lane) {		for (int I = getNumLanes(); I > 0; --I) {
unsigned NumFreeOps = getMaxNumOperandsThatCanBeReordered(Lane);		unsigned Lane = I - 1;
if (NumFreeOps < Min) {		std::pair<unsigned, unsigned> NumFreeOpsHash =
Min = NumFreeOps;		getMaxNumOperandsThatCanBeReordered(Lane);
		// Compare the number of operands that can move and choose the one with
		// the least number.
		if (NumFreeOpsHash.first < Min) {
		Min = NumFreeOpsHash.first;
		SameOpNumber = NumFreeOpsHash.second;
		BestLane = Lane;
		} else if (NumFreeOpsHash.first == Min &&
		NumFreeOpsHash.second < SameOpNumber) {
		// Select the most optimal lane in terms of number of operands that
		// should be moved around.
		SameOpNumber = NumFreeOpsHash.second;
BestLane = Lane;		BestLane = Lane;
}		}
		vporpoUnsubmitted Not Done Reply Inline Actions Could you add a comment what these `unsigned` values are for? (or perhaps use a struct instead?). Could you also describe the the purpose of `HashMap` and what the keys and values are? vporpo: Could you add a comment what these `unsigned` values are for? (or perhaps use a struct instead?
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, just forgot to add some extra comments, will add them after updates. `std::pair<unsigned, unsigned>` is used to implement a simple voting algorithm and choose the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. The first unsigned is a counter for voting, the second unsigned is the counter of lanes with instructions with same/alternate opcodes and same parent basic block. ABataev: Yes, just forgot to add some extra comments, will add them after updates. `std::pair<unsigned…
}		}
		vporpoUnsubmitted Not Done Reply Inline Actions Is there any reasoning behind the iteration in reverse? If so could you please add a comment? vporpo: Is there any reasoning behind the iteration in reverse? If so could you please add a comment?
		ABataevAuthorUnsubmitted Done Reply Inline Actions This is just to be closer to the original results, before this patch, nothing else, if we have multiple lanes with same cost. ABataev: This is just to be closer to the original results, before this patch, nothing else, if we have…
return BestLane;		return BestLane;
}		}

/// \Returns the maximum number of operands that are allowed to be reordered		/// \returns the maximum number of operands that are allowed to be reordered
/// for \p Lane. This is used as a heuristic for selecting the first lane to		/// for \p Lane and the number of compatible instructions(with the same
/// start operand reordering.		/// parent/opcode). This is used as a heuristic for selecting the first lane
unsigned getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {		/// to start operand reordering.
		std::pair<unsigned, unsigned>
		getMaxNumOperandsThatCanBeReordered(unsigned Lane) const {
unsigned CntTrue = 0;		unsigned CntTrue = 0;
unsigned NumOperands = getNumOperands();		unsigned NumOperands = getNumOperands();
		vporpoUnsubmitted Not Done Reply Inline Actions NIT: I find `Code` a bit confusing, also perhaps there is no need to refer to `Parent` in the variable name? Perhaps rename to something like `NumOpsWithSameOpcode`? vporpo: NIT: I find `Code` a bit confusing, also perhaps there is no need to refer to `Parent` in the…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Will rename it but I'd rather keep `Parent`, because I compare not only opcodes but the parent too. ABataev: Will rename it but I'd rather keep `Parent`, because I compare not only opcodes but the parent…
// Operands with the same APO can be reordered. We therefore need to count		// Operands with the same APO can be reordered. We therefore need to count
		vporpoUnsubmitted Not Done Reply Inline Actions Could you add a bit more text in the comment what the hashed is used for? I can see that it is used as a key in the `HashMap` above, but could you explain how it is being used? vporpo: Could you add a bit more text in the comment what the hashed is used for? I can see that it is…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is used to count operands, actually their position id and opcode value. It is used in the voting mechanism to find the lane with the least number of operands that can freely move about or less profitable because it already has the most optimal set of operands. I can use `SmallVector<unsigned>` instead but to use hash code, it is faster and requires less memory. ABataev: It is used to count operands, actually their position id and opcode value. It is used in the…
// how many of them we have for each APO, like this: Cnt[APO] = x.		// how many of them we have for each APO, like this: Cnt[APO] = x.
// Since we only have two APOs, namely true and false, we can avoid using		// Since we only have two APOs, namely true and false, we can avoid using
// a map. Instead we can simply count the number of operands that		// a map. Instead we can simply count the number of operands that
// correspond to one of them (in this case the 'true' APO), and calculate		// correspond to one of them (in this case the 'true' APO), and calculate
// the other by subtracting it from the total number of operands.		// the other by subtracting it from the total number of operands.
for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx)		// Operands with the same instruction opcode and parent are more
if (getData(OpIdx, Lane).APO)		// profitable since we don't need to move them in many cases.
		bool AllUndefs = true;
		unsigned SameCodeParentOps = 0;
		Instruction *OpcodeI = nullptr;
		BasicBlock *Parent = nullptr;
		for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {
		const OperandData &OpData = getData(OpIdx, Lane);
		if (OpData.APO)
++CntTrue;		++CntTrue;
		if (auto *I = dyn_cast<Instruction>(OpData.V)) {
		if (!OpcodeI \|\| !getSameOpcode({OpcodeI, I}).getOpcode() \|\|
		I->getParent() != Parent) {
		if (SameCodeParentOps == 0) {
		SameCodeParentOps = 1;
		OpcodeI = I;
		Parent = I->getParent();
		} else {
		--SameCodeParentOps;
		}
		} else {
		++SameCodeParentOps;
		}
		}
		AllUndefs = AllUndefs && isa<UndefValue>(OpData.V);
		}
		if (AllUndefs)
		return std::make_pair(UINT_MAX, 0);
		vporpoUnsubmitted Not Done Reply Inline Actions Could you elaborate a bit on this? If I understand correctly the more similar opcodes we can find, the easier it is to reorder them, therefore this can act as a tie-breaker when the NumOfAPOs is equal? vporpo: Could you elaborate a bit on this? If I understand correctly the more similar opcodes we can…
		ABataevAuthorUnsubmitted Done Reply Inline Actions If the lane already has operands with the same opcode and same parent, no need to swap the operands in this lane, with a high probability such lane already can be vectorized effectively. ABataev: If the lane already has operands with the same opcode and same parent, no need to swap the…
unsigned CntFalse = NumOperands - CntTrue;		unsigned CntFalse = NumOperands - CntTrue;
return std::max(CntTrue, CntFalse);		return std::make_pair(std::max(CntTrue, CntFalse), SameCodeParentOps);
}		}
		vporpoUnsubmitted Not Done Reply Inline Actions If I am not mistaken this code will count the consecutive operands with same opcode and BB. Is it because this is a good enough approximation? vporpo: If I am not mistaken this code will count the consecutive operands with same opcode and BB. Is…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Yes, exactly, in most cases it results in the optimal values in the lane. ABataev: Yes, exactly, in most cases it results in the optimal values in the lane.

/// Go through the instructions in VL and append their operands.		/// Go through the instructions in VL and append their operands.
		vporpoUnsubmitted Not Done Reply Inline Actions Are you using `NumOpsWithSameOpcodeParent == 0` as a check for the first iteration ? Shouldn't you be using `!OpcodeI` intsead ? I find this code a bit hard to follow, because I can't tell which of the `if` conditions are for checking for the first iteration and which ones are part of the heuristic. Should it be updating the `OpcodeI` and `Parent` only in the first iteration (like below), or should it be doing it whenever there is a mismatch? if (auto I = dyn_cast<Instruction>(OpData.V)) { // First iteration if (!OpcodeI) { OpcodeI = I; Parent = I->getParent(); } // Mismatch if (!getSameOpcode({OpcodeI, I}).getOpcode() \|\| I->getParent() != Parent) ++NumOpsWithSameOpcodeParent; else NumOpsWithSameOpcodeParent = std::min(NumOpsWithSameOpcodeParent-1, 0); } Perhaps peeling the first iteration might make the code easier to follow? vporpo:* Are you using `NumOpsWithSameOpcodeParent == 0` as a check for the first iteration ? Shouldn't…
		ABataevAuthorUnsubmitted Done Reply Inline Actions This is again a kind of voting algorithm. This code works every time, we start voting on a value with the new opcode, not only on the first iteration. We just try to find the opcode with not less than NumOperands/2 number of occurrences here, if no such opcode - just choose any of them, there are no profitable elements. ABataev: This is again a kind of voting algorithm. This code works every time, we start voting on a…
void appendOperandsOfVL(ArrayRef<Value *> VL) {		void appendOperandsOfVL(ArrayRef<Value *> VL) {
		vporpoUnsubmitted Not Done Reply Inline Actions Why is `NumOpsWithSameOpcodeParent` set to 1 the first time a mismatch is found? Shouldn't it be set to 0 ? vporpo: Why is `NumOpsWithSameOpcodeParent` set to 1 the first time a mismatch is found? Shouldn't it…
		ABataevAuthorUnsubmitted Done Reply Inline Actions It is a kind of increasing the counter for the first element in the sequence. ABataev: It is a kind of increasing the counter for the first element in the sequence.
		vporpoUnsubmitted Not Done Reply Inline Actions Yes, it is increasing it, but shouldn't it be decreasing it instead (or letting it remain 0) ? This code block executes when there is a mismatch of opcode or parent (or if it is the first iteration), so shouldn't we be decreasing the value of`NumOpsWithSameOpcodeParent` (like in line 1463)? What confuses me here is that `NumOpsWithSameOpcodeParent` looks like a normal counter that counts the opcode/parent matches. So I would expect it to increase by one if the opcode/parents match (like what line 1466 does), and to decrease by one if there is a mismatch. But it seems to be more complicated than that: When it reaches 0 it foced to 1 even when there is an opcode mismatch. I find this a bit counter intuitive. For example if we have mismatching opcodes in sequence, I would expect it to keep decreasing, or at least be capped to 0. But it seems like the value of `NumOpsWithSameOpcodeParent` will be 0, then 1, then 0, then 1 like so: before the loop: 0 iteration 1: 1 (because it was == 0) iteration 2: 0 (because of opcode mismatch) iteration 3: 1 (because it was == 0) vporpo: Yes, it is increasing it, but shouldn't it be decreasing it instead (or letting it remain 0) ?
		ABataevAuthorUnsubmitted Done Reply Inline Actions This is how the voting algorithm works. Here is described the main idea https://www.geeksforgeeks.org/boyer-moore-majority-voting-algorithm/ ABataev: This is how the voting algorithm works. Here is described the main idea https://www.
		vporpoUnsubmitted Not Done Reply Inline Actions OK that makes sense now, thanks for clarifying! Could you please add a comment saying that this loop is a Boyer-Moore majority voting for finding the majority opcode and the number of times it occurs? vporpo: OK that makes sense now, thanks for clarifying! Could you please add a comment saying that this…
		ABataevAuthorUnsubmitted Done Reply Inline Actions Sure, will do it tomorrow. ABataev: Sure, will do it tomorrow.
assert(!VL.empty() && "Bad VL");		assert(!VL.empty() && "Bad VL");
assert((empty() \|\| VL.size() == getNumLanes()) &&		assert((empty() \|\| VL.size() == getNumLanes()) &&
"Expected same number of lanes");		"Expected same number of lanes");
assert(isa<Instruction>(VL[0]) && "Expected instruction");		assert(isa<Instruction>(VL[0]) && "Expected instruction");
unsigned NumOperands = cast<Instruction>(VL[0])->getNumOperands();		unsigned NumOperands = cast<Instruction>(VL[0])->getNumOperands();
OpsVec.resize(NumOperands);		OpsVec.resize(NumOperands);
unsigned NumLanes = VL.size();		unsigned NumLanes = VL.size();
for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {		for (unsigned OpIdx = 0; OpIdx != NumOperands; ++OpIdx) {
▲ Show 20 Lines • Show All 1,427 Lines • ▼ Show 20 Lines	for (Value *V : VL) {
if (getTreeEntry(I)) {		if (getTreeEntry(I)) {
LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V		LLVM_DEBUG(dbgs() << "SLP: The instruction (" << *V
<< ") is already in tree.\n");		<< ") is already in tree.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// If any of the scalars is marked as a value that needs to stay scalar, then
// we need to gather the scalars.
// The reduction nodes (stored in UserIgnoreList) also should stay scalar.		// The reduction nodes (stored in UserIgnoreList) also should stay scalar.
for (Value *V : VL) {		for (Value *V : VL) {
if (MustGather.count(V) \|\| is_contained(UserIgnoreList, V)) {		if (is_contained(UserIgnoreList, V)) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx);
return;		return;
}		}
}		}

// Check that all of the users of the scalars that we want to vectorize are		// Check that all of the users of the scalars that we want to vectorize are
// schedulable.		// schedulable.
▲ Show 20 Lines • Show All 5,420 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"		target triple = "aarch64--linux-gnu"

define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {		define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {
; CHECK-LABEL: @build_vec_v2i64(		; CHECK-LABEL: @build_vec_v2i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP3]], [[TMP6]]
; CHECK-NEXT: ret <2 x i64> [[TMP9]]		; CHECK-NEXT: ret <2 x i64> [[TMP7]]
;		;
%v0.0 = extractelement <2 x i64> %v0, i32 0		%v0.0 = extractelement <2 x i64> %v0, i32 0
%v0.1 = extractelement <2 x i64> %v0, i32 1		%v0.1 = extractelement <2 x i64> %v0, i32 1
%v1.0 = extractelement <2 x i64> %v1, i32 0		%v1.0 = extractelement <2 x i64> %v1, i32 0
%v1.1 = extractelement <2 x i64> %v1, i32 1		%v1.1 = extractelement <2 x i64> %v1, i32 1
%tmp0.0 = add i64 %v0.0, %v1.0		%tmp0.0 = add i64 %v0.0, %v1.0
%tmp0.1 = add i64 %v0.1, %v1.1		%tmp0.1 = add i64 %v0.1, %v1.1
%tmp1.0 = sub i64 %v0.0, %v1.0		%tmp1.0 = sub i64 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
store i64 %tmp2.0, i64* %c.0, align 8		store i64 %tmp2.0, i64* %c.0, align 8
store i64 %tmp2.1, i64* %c.1, align 8		store i64 %tmp2.1, i64* %c.1, align 8
ret void		ret void
}		}

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP3]], [[TMP8]]
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
; CHECK-NEXT: ret <4 x i32> [[TMP9]]		; CHECK-NEXT: ret <4 x i32> [[TMP9]]
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
%v0.2 = extractelement <4 x i32> %v0, i32 2		%v0.2 = extractelement <4 x i32> %v0, i32 2
%v0.3 = extractelement <4 x i32> %v0, i32 3		%v0.3 = extractelement <4 x i32> %v0, i32 3
%v1.0 = extractelement <4 x i32> %v1, i32 0		%v1.0 = extractelement <4 x i32> %v1, i32 0
%v1.1 = extractelement <4 x i32> %v1, i32 1		%v1.1 = extractelement <4 x i32> %v1, i32 1
Show All 15 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]		; CHECK-NEXT: ret <4 x i32> [[SHUFFLE2]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP7]], [[TMP8]]
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP3_32:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_32]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> poison, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
Show All 10 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]		; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]
; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]		; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])		; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])
; CHECK-NEXT: ret i32 [[TMP15]]		; CHECK-NEXT: ret i32 [[TMP15]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s		; RUN: opt < %s -slp-vectorizer -instcombine -S \| FileCheck %s

target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"		target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"		target triple = "aarch64--linux-gnu"

define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {		define <2 x i64> @build_vec_v2i64(<2 x i64> %v0, <2 x i64> %v1) {
; CHECK-LABEL: @build_vec_v2i64(		; CHECK-LABEL: @build_vec_v2i64(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i64> [[V0:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i64> [[V1:%.]], <2 x i64> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i64> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i64> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i64> [[TMP3]], <2 x i64> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i64> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i64> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i64> [[TMP6]], <2 x i64> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i64> [[TMP4]], <2 x i64> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i64> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i64> [[TMP3]], [[TMP6]]
; CHECK-NEXT: ret <2 x i64> [[TMP9]]		; CHECK-NEXT: ret <2 x i64> [[TMP7]]
;		;
%v0.0 = extractelement <2 x i64> %v0, i32 0		%v0.0 = extractelement <2 x i64> %v0, i32 0
%v0.1 = extractelement <2 x i64> %v0, i32 1		%v0.1 = extractelement <2 x i64> %v0, i32 1
%v1.0 = extractelement <2 x i64> %v1, i32 0		%v1.0 = extractelement <2 x i64> %v1, i32 0
%v1.1 = extractelement <2 x i64> %v1, i32 1		%v1.1 = extractelement <2 x i64> %v1, i32 1
%tmp0.0 = add i64 %v0.0, %v1.0		%tmp0.0 = add i64 %v0.0, %v1.0
%tmp0.1 = add i64 %v0.1, %v1.1		%tmp0.1 = add i64 %v0.1, %v1.1
%tmp1.0 = sub i64 %v0.0, %v1.0		%tmp1.0 = sub i64 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = add i64 %tmp1.0, %tmp1.1		%tmp2.1 = add i64 %tmp1.0, %tmp1.1
store i64 %tmp2.0, i64* %c.0, align 8		store i64 %tmp2.0, i64* %c.0, align 8
store i64 %tmp2.1, i64* %c.1, align 8		store i64 %tmp2.1, i64* %c.1, align 8
ret void		ret void
}		}

define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define <4 x i32> @build_vec_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32(		; CHECK-LABEL: @build_vec_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP3:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
		; CHECK-NEXT: [[TMP6:%.*]] = add <4 x i32> [[TMP4]], [[TMP5]]
		; CHECK-NEXT: [[TMP7:%.*]] = sub <4 x i32> [[TMP4]], [[TMP5]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP3]], [[TMP8]]
; CHECK-NEXT: ret <4 x i32> [[TMP9]]		; CHECK-NEXT: ret <4 x i32> [[TMP9]]
;		;
%v0.0 = extractelement <4 x i32> %v0, i32 0		%v0.0 = extractelement <4 x i32> %v0, i32 0
%v0.1 = extractelement <4 x i32> %v0, i32 1		%v0.1 = extractelement <4 x i32> %v0, i32 1
%v0.2 = extractelement <4 x i32> %v0, i32 2		%v0.2 = extractelement <4 x i32> %v0, i32 2
%v0.3 = extractelement <4 x i32> %v0, i32 3		%v0.3 = extractelement <4 x i32> %v0, i32 3
%v1.0 = extractelement <4 x i32> %v1, i32 0		%v1.0 = extractelement <4 x i32> %v1, i32 0
%v1.1 = extractelement <4 x i32> %v1, i32 1		%v1.1 = extractelement <4 x i32> %v1, i32 1
Show All 15 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_reuse_0(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_reuse_0(		; CHECK-LABEL: @build_vec_v4i32_reuse_0(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> undef, <2 x i32> <i32 1, i32 0>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP3:%.*]] = add <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> [[TMP4]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP6:%.*]] = add <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP7:%.*]] = sub <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP5:%.*]] = sub <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x i32> [[TMP6]], <2 x i32> [[TMP7]], <2 x i32> <i32 0, i32 3>		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP7:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>		; CHECK-NEXT: [[SHUFFLE2:%.*]] = shufflevector <2 x i32> [[TMP7]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 1>
; CHECK-NEXT: ret <4 x i32> [[SHUFFLE]]		; CHECK-NEXT: ret <4 x i32> [[SHUFFLE2]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp1.0 = sub i32 %v0.0, %v1.0		%tmp1.0 = sub i32 %v0.0, %v1.0
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	;
%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1		%tmp2.1 = insertelement <4 x i32> %tmp2.0, i32 %tmp1.1, i32 1
%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2		%tmp2.2 = insertelement <4 x i32> %tmp2.1, i32 %tmp1.2, i32 2
%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3		%tmp2.3 = insertelement <4 x i32> %tmp2.2, i32 %tmp1.3, i32 3
ret <4 x i32> %tmp2.3		ret <4 x i32> %tmp2.3
}		}

define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {		define <4 x i32> @build_vec_v4i32_3_binops(<2 x i32> %v0, <2 x i32> %v1) {
; CHECK-LABEL: @build_vec_v4i32_3_binops(		; CHECK-LABEL: @build_vec_v4i32_3_binops(
; CHECK-NEXT: [[V0_0:%.]] = extractelement <2 x i32> [[V0:%.]], i32 0		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <2 x i32> [[V0:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V0_1:%.*]] = extractelement <2 x i32> [[V0]], i32 1		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <2 x i32> [[V1:%.]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[V1_0:%.]] = extractelement <2 x i32> [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP1:%.*]] = add <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[V1_1:%.*]] = extractelement <2 x i32> [[V1]], i32 1		; CHECK-NEXT: [[TMP2:%.*]] = mul <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP0_0:%.*]] = add i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> [[TMP2]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP0_1:%.*]] = add i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP4:%.*]] = add <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_0:%.*]] = mul i32 [[V0_0]], [[V1_0]]		; CHECK-NEXT: [[TMP5:%.*]] = mul <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP1_1:%.*]] = mul i32 [[V0_1]], [[V1_1]]		; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
; CHECK-NEXT: [[TMP1:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = xor <2 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> [[TMP1]], <2 x i32> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP8:%.*]] = xor <2 x i32> [[V0]], [[V1]]
; CHECK-NEXT: [[TMP3:%.*]] = xor <2 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP9:%.*]] = add <2 x i32> [[TMP3]], [[TMP6]]
; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP3]], <2 x i32> undef, <2 x i32> <i32 1, i32 1>		; CHECK-NEXT: [[TMP10:%.*]] = add <2 x i32> [[TMP7]], [[TMP8]]
; CHECK-NEXT: [[TMP2_0:%.*]] = add i32 [[TMP0_0]], [[TMP0_1]]		; CHECK-NEXT: [[TMP3_32:%.*]] = shufflevector <2 x i32> [[TMP9]], <2 x i32> [[TMP10]], <4 x i32> <i32 0, i32 1, i32 2, i32 3>
; CHECK-NEXT: [[TMP2_1:%.*]] = add i32 [[TMP1_0]], [[TMP1_1]]		; CHECK-NEXT: ret <4 x i32> [[TMP3_32]]
; CHECK-NEXT: [[TMP5:%.*]] = add <2 x i32> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP3_0:%.*]] = insertelement <4 x i32> undef, i32 [[TMP2_0]], i32 0
; CHECK-NEXT: [[TMP3_1:%.*]] = insertelement <4 x i32> [[TMP3_0]], i32 [[TMP2_1]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP3_31:%.*]] = shufflevector <4 x i32> [[TMP3_1]], <4 x i32> [[TMP6]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: ret <4 x i32> [[TMP3_31]]
;		;
%v0.0 = extractelement <2 x i32> %v0, i32 0		%v0.0 = extractelement <2 x i32> %v0, i32 0
%v0.1 = extractelement <2 x i32> %v0, i32 1		%v0.1 = extractelement <2 x i32> %v0, i32 1
%v1.0 = extractelement <2 x i32> %v1, i32 0		%v1.0 = extractelement <2 x i32> %v1, i32 0
%v1.1 = extractelement <2 x i32> %v1, i32 1		%v1.1 = extractelement <2 x i32> %v1, i32 1
%tmp0.0 = add i32 %v0.0, %v1.0		%tmp0.0 = add i32 %v0.0, %v1.0
%tmp0.1 = add i32 %v0.1, %v1.1		%tmp0.1 = add i32 %v0.1, %v1.1
%tmp0.2 = xor i32 %v0.0, %v1.0		%tmp0.2 = xor i32 %v0.0, %v1.0
Show All 10 Lines	;
%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1		%tmp3.1 = insertelement <4 x i32> %tmp3.0, i32 %tmp2.1, i32 1
%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2		%tmp3.2 = insertelement <4 x i32> %tmp3.1, i32 %tmp2.2, i32 2
%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3		%tmp3.3 = insertelement <4 x i32> %tmp3.2, i32 %tmp2.3, i32 3
ret <4 x i32> %tmp3.3		ret <4 x i32> %tmp3.3
}		}

define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {		define i32 @reduction_v4i32(<4 x i32> %v0, <4 x i32> %v1) {
; CHECK-LABEL: @reduction_v4i32(		; CHECK-LABEL: @reduction_v4i32(
; CHECK-NEXT: [[TMP1:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE:%.]] = shufflevector <4 x i32> [[V0:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 3, i32 2>		; CHECK-NEXT: [[SHUFFLE1:%.]] = shufflevector <4 x i32> [[V1:%.]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 3, i32 2>
		; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[V1]], <4 x i32> undef, <4 x i32> <i32 1, i32 0, i32 2, i32 3>
		RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = sub <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP4:%.*]] = add <4 x i32> [[TMP1]], [[TMP2]]
; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> [[TMP4]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP6:%.*]] = sub <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[V0]], [[V1]]		; CHECK-NEXT: [[TMP7:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 6, i32 3>		; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x i32> [[TMP6]], <4 x i32> [[TMP7]], <4 x i32> <i32 0, i32 5, i32 2, i32 7>
; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]		; CHECK-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP8]], [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>		; CHECK-NEXT: [[TMP10:%.*]] = lshr <4 x i32> [[TMP9]], <i32 15, i32 15, i32 15, i32 15>
; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>		; CHECK-NEXT: [[TMP11:%.*]] = and <4 x i32> [[TMP10]], <i32 65537, i32 65537, i32 65537, i32 65537>
; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>		; CHECK-NEXT: [[TMP12:%.*]] = mul nuw <4 x i32> [[TMP11]], <i32 65535, i32 65535, i32 65535, i32 65535>
; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]		; CHECK-NEXT: [[TMP13:%.*]] = add <4 x i32> [[TMP12]], [[TMP9]]
; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]		; CHECK-NEXT: [[TMP14:%.*]] = xor <4 x i32> [[TMP13]], [[TMP12]]
; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])		; CHECK-NEXT: [[TMP15:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[TMP14]])
; CHECK-NEXT: ret i32 [[TMP15]]		; CHECK-NEXT: ret i32 [[TMP15]]
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	Show First 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0			; CHECK-NEXT: [[V1_LANE_0:%.*]] = extractelement <9 x double> [[V_1]], i32 0
	; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1			; CHECK-NEXT: [[V1_LANE_1:%.*]] = extractelement <9 x double> [[V_1]], i32 1
	; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2			; CHECK-NEXT: [[V1_LANE_2:%.*]] = extractelement <9 x double> [[V_1]], i32 2
	; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3			; CHECK-NEXT: [[V1_LANE_3:%.*]] = extractelement <9 x double> [[V_1]], i32 3
	; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16			; CHECK-NEXT: [[V_2:%.]] = load <4 x double>, <4 x double> [[PTR_2:%.*]], align 16
	; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0			; CHECK-NEXT: [[V2_LANE_0:%.*]] = extractelement <4 x double> [[V_2]], i32 0
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[A_LANE_0:%.*]] = fmul double [[V1_LANE_0]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_0]], i32 0
	; CHECK-NEXT: [[A_LANE_1:%.*]] = fmul double [[V1_LANE_2]], [[V2_LANE_1]]			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_2]], i32 1
	; CHECK-NEXT: [[A_LANE_2:%.*]] = fmul double [[V1_LANE_1]], [[V2_LANE_2]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_1]], i32 2
	; CHECK-NEXT: [[A_LANE_3:%.*]] = fmul double [[V1_LANE_3]], [[V2_LANE_0]]			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_3]], i32 3
	; CHECK-NEXT: [[A_INS_0:%.*]] = insertelement <9 x double> undef, double [[A_LANE_0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[A_INS_1:%.*]] = insertelement <9 x double> [[A_INS_0]], double [[A_LANE_1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[A_INS_2:%.*]] = insertelement <9 x double> [[A_INS_1]], double [[A_LANE_2]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[V2_LANE_2]], i32 2
	; CHECK-NEXT: [[A_INS_3:%.*]] = insertelement <9 x double> [[A_INS_2]], double [[A_LANE_3]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x double> [[TMP6]], double [[V2_LANE_0]], i32 3
				; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x double> [[TMP3]], [[TMP7]]
				; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x double> [[TMP8]], <4 x double> undef, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK-NEXT: [[A_INS_31:%.*]] = shufflevector <9 x double> undef, <9 x double> [[TMP9]], <9 x i32> <i32 9, i32 10, i32 11, i32 12, i32 4, i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[A_INS_3]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[A_INS_31]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; SSE-NEXT: store i8 [[TMP14]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 13), align 1			; SSE-NEXT: store i8 [[TMP14]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 13), align 1
	; SSE-NEXT: [[TMP15:%.*]] = xor i8 [[A]], [[C]]			; SSE-NEXT: [[TMP15:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: store i8 [[TMP15]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 14), align 1			; SSE-NEXT: store i8 [[TMP15]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 14), align 1
	; SSE-NEXT: [[TMP16:%.*]] = xor i8 [[A]], [[C]]			; SSE-NEXT: [[TMP16:%.*]] = xor i8 [[A]], [[C]]
	; SSE-NEXT: store i8 [[TMP16]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1			; SSE-NEXT: store i8 [[TMP16]], i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 15), align 1
	; SSE-NEXT: ret void			; SSE-NEXT: ret void
	;			;
	; AVX-LABEL: @splat(			; AVX-LABEL: @splat(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <2 x i8> poison, i8 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP2:%.*]] = insertelement <16 x i8> [[TMP1]], i8 [[C]], i32 1			; AVX-NEXT: [[TMP2:%.]] = insertelement <2 x i8> [[TMP1]], i8 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP3:%.*]] = insertelement <16 x i8> [[TMP2]], i8 [[C]], i32 2			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i8> [[TMP2]], <2 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>
	; AVX-NEXT: [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 [[C]], i32 3			; AVX-NEXT: [[TMP3:%.]] = insertelement <16 x i8> poison, i8 [[C:%.]], i32 0
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 [[C]], i32 4			; AVX-NEXT: [[TMP4:%.*]] = insertelement <16 x i8> [[TMP3]], i8 [[C]], i32 1
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 [[C]], i32 5			; AVX-NEXT: [[TMP5:%.*]] = insertelement <16 x i8> [[TMP4]], i8 [[C]], i32 2
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <16 x i8> [[TMP6]], i8 [[C]], i32 6			; AVX-NEXT: [[TMP6:%.*]] = insertelement <16 x i8> [[TMP5]], i8 [[C]], i32 3
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <16 x i8> [[TMP7]], i8 [[C]], i32 7			; AVX-NEXT: [[TMP7:%.*]] = insertelement <16 x i8> [[TMP6]], i8 [[C]], i32 4
	; AVX-NEXT: [[TMP9:%.*]] = insertelement <16 x i8> [[TMP8]], i8 [[C]], i32 8			; AVX-NEXT: [[TMP8:%.*]] = insertelement <16 x i8> [[TMP7]], i8 [[C]], i32 5
	; AVX-NEXT: [[TMP10:%.*]] = insertelement <16 x i8> [[TMP9]], i8 [[C]], i32 9			; AVX-NEXT: [[TMP9:%.*]] = insertelement <16 x i8> [[TMP8]], i8 [[C]], i32 6
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <16 x i8> [[TMP10]], i8 [[C]], i32 10			; AVX-NEXT: [[TMP10:%.*]] = insertelement <16 x i8> [[TMP9]], i8 [[C]], i32 7
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <16 x i8> [[TMP11]], i8 [[C]], i32 11			; AVX-NEXT: [[TMP11:%.*]] = insertelement <16 x i8> [[TMP10]], i8 [[C]], i32 8
	; AVX-NEXT: [[TMP13:%.*]] = insertelement <16 x i8> [[TMP12]], i8 [[C]], i32 12			; AVX-NEXT: [[TMP12:%.*]] = insertelement <16 x i8> [[TMP11]], i8 [[C]], i32 9
	; AVX-NEXT: [[TMP14:%.*]] = insertelement <16 x i8> [[TMP13]], i8 [[C]], i32 13			; AVX-NEXT: [[TMP13:%.*]] = insertelement <16 x i8> [[TMP12]], i8 [[C]], i32 10
	; AVX-NEXT: [[TMP15:%.*]] = insertelement <16 x i8> [[TMP14]], i8 [[C]], i32 14			; AVX-NEXT: [[TMP14:%.*]] = insertelement <16 x i8> [[TMP13]], i8 [[C]], i32 11
	; AVX-NEXT: [[TMP16:%.*]] = insertelement <16 x i8> [[TMP15]], i8 [[C]], i32 15			; AVX-NEXT: [[TMP15:%.*]] = insertelement <16 x i8> [[TMP14]], i8 [[C]], i32 12
	; AVX-NEXT: [[TMP17:%.]] = insertelement <2 x i8> poison, i8 [[A:%.]], i32 0			; AVX-NEXT: [[TMP16:%.*]] = insertelement <16 x i8> [[TMP15]], i8 [[C]], i32 13
	; AVX-NEXT: [[TMP18:%.]] = insertelement <2 x i8> [[TMP17]], i8 [[B:%.]], i32 1			; AVX-NEXT: [[TMP17:%.*]] = insertelement <16 x i8> [[TMP16]], i8 [[C]], i32 14
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x i8> [[TMP18]], <2 x i8> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; AVX-NEXT: [[TMP18:%.*]] = insertelement <16 x i8> [[TMP17]], i8 [[C]], i32 15
	; AVX-NEXT: [[TMP19:%.*]] = xor <16 x i8> [[TMP16]], [[SHUFFLE]]			; AVX-NEXT: [[TMP19:%.*]] = xor <16 x i8> [[SHUFFLE]], [[TMP18]]
	; AVX-NEXT: store <16 x i8> [[TMP19]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16			; AVX-NEXT: store <16 x i8> [[TMP19]], <16 x i8>* bitcast ([32 x i8]* @cle to <16 x i8>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%1 = xor i8 %c, %a			%1 = xor i8 %c, %a
	store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16			store i8 %1, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 0), align 16
	%2 = xor i8 %a, %c			%2 = xor i8 %a, %c
	store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)			store i8 %2, i8* getelementptr inbounds ([32 x i8], [32 x i8]* @cle, i64 0, i64 1)
	%3 = xor i8 %a, %c			%3 = xor i8 %a, %c
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; AVX-NEXT: [[TMP5:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0			; AVX-NEXT: [[TMP5:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 1			; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 1
	; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[A]], i32 2			; AVX-NEXT: [[TMP7:%.*]] = insertelement <4 x i32> [[TMP6]], i32 [[A]], i32 2
	; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[A]], i32 3			; AVX-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> [[TMP7]], i32 [[A]], i32 3
	; AVX-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]			; AVX-NEXT: [[TMP9:%.*]] = add <4 x i32> [[TMP4]], [[TMP8]]
	; AVX-NEXT: [[TMP10:%.]] = insertelement <4 x i32> [[TMP5]], i32 [[B:%.]], i32 1			; AVX-NEXT: [[TMP10:%.]] = insertelement <4 x i32> [[TMP5]], i32 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[C]], i32 2			; AVX-NEXT: [[TMP11:%.*]] = insertelement <4 x i32> [[TMP10]], i32 [[C]], i32 2
	; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[A]], i32 3			; AVX-NEXT: [[TMP12:%.*]] = insertelement <4 x i32> [[TMP11]], i32 [[A]], i32 3
	; AVX-NEXT: [[TMP13:%.*]] = xor <4 x i32> [[TMP9]], [[TMP12]]			; AVX-NEXT: [[TMP13:%.*]] = xor <4 x i32> [[TMP12]], [[TMP9]]
	; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16			; AVX-NEXT: store <4 x i32> [[TMP13]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%add1 = add i32 %c, %a			%add1 = add i32 %c, %a
	%add2 = add i32 %c, %a			%add2 = add i32 %c, %a
	%add3 = add i32 %a, %c			%add3 = add i32 %a, %c
	%add4 = add i32 %c, %a			%add4 = add i32 %c, %a
	%1 = xor i32 %add1, %a			%1 = xor i32 %add1, %a
	Show All 9 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

	Show All 29 Lines
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[TMP6]], i32 0
	; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]			; CHECK-NEXT: [[IX2:%.*]] = fmul double [[TMP8]], [[TMP8]]
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP2]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP9]]
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP5]]
	; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul fast <2 x double> [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef			; CHECK-NEXT: [[IXX101:%.*]] = fsub double undef, undef
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP7]], i32 0			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x double> poison, double [[TMP7]], i32 0
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double undef, i32 1			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x double> [[TMP13]], double [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP1]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = fmul fast <2 x double> [[TMP14]], undef
	; CHECK-NEXT: [[TMP16:%.*]] = fmul fast <2 x double> [[TMP14]], [[TMP15]]
	; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [			; CHECK-NEXT: switch i32 undef, label [[BB1:%.*]] [
	; CHECK-NEXT: i32 0, label [[BB2:%.*]]			; CHECK-NEXT: i32 0, label [[BB2:%.*]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: br label [[LABEL:%.*]]			; CHECK-NEXT: br label [[LABEL:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: br label [[LABEL]]			; CHECK-NEXT: br label [[LABEL]]
	; CHECK: label:			; CHECK: label:
	; CHECK-NEXT: [[TMP17:%.*]] = phi <2 x double> [ [[TMP12]], [[BB1]] ], [ [[TMP16]], [[BB2]] ]			; CHECK-NEXT: [[TMP16:%.*]] = phi <2 x double> [ [[TMP12]], [[BB1]] ], [ [[TMP15]], [[BB2]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%i10 = fdiv fast double %0, %1			%i10 = fdiv fast double %0, %1
	%ix = fmul double %i10, undef			%ix = fmul double %i10, undef
	%ixx0 = fsub double undef, undef			%ixx0 = fsub double undef, undef
	%ixx1 = fsub double undef, undef			%ixx1 = fsub double undef, undef
	%ixx2 = fsub double undef, undef			%ixx2 = fsub double undef, undef
	Show All 39 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

	Show All 25 Lines
	; CHECK-NEXT: br i1 undef, label [[FOR_BODY42_LR_PH_US:%.]], label [[_Z5CLAMPD_EXIT_1:%.]]			; CHECK-NEXT: br i1 undef, label [[FOR_BODY42_LR_PH_US:%.]], label [[_Z5CLAMPD_EXIT_1:%.]]
	; CHECK: cond.false51.us:			; CHECK: cond.false51.us:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cond.true48.us:			; CHECK: cond.true48.us:
	; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]			; CHECK-NEXT: br i1 undef, label [[COND_TRUE63_US:%.]], label [[COND_FALSE66_US:%.]]
	; CHECK: cond.false66.us:			; CHECK: cond.false66.us:
	; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef			; CHECK-NEXT: [[ADD_I276_US:%.*]] = fadd double 0.000000e+00, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[ADD_I276_US]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[ADD_I276_US]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double undef, i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double 0xBFA5CC2D1960285F, i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 0.000000e+00, double 0xBFA5CC2D1960285F>			; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> <double 0.000000e+00, double undef>, [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 1.400000e+02, double 1.400000e+02>			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 1.400000e+02, double 1.400000e+02>
	; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], <double 5.000000e+01, double 5.200000e+01>			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP3]], <double 5.000000e+01, double 5.200000e+01>
	; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> undef, [[TMP2]]			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> undef, [[TMP2]]
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[AGG_TMP99208_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP6]], align 8			; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP6]], align 8
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[AGG_TMP101211_SROA_0_0_IDX]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	▲ Show 20 Lines • Show All 123 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

	Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]			; CHECK-NEXT: [[X1X1:%.*]] = fmul float [[X1]], [[X1]]
	; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]			; CHECK-NEXT: [[ADD:%.*]] = fadd float [[X0X0]], [[X1X1]]
	; CHECK-NEXT: ret float [[ADD]]			; CHECK-NEXT: ret float [[ADD]]
	;			;
	; THRESH1-LABEL: @f_used_twice_in_tree(			; THRESH1-LABEL: @f_used_twice_in_tree(
	; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH1-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH1-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH1-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH1-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH1-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH1-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH1-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH1-NEXT: ret float [[ADD]]			; THRESH1-NEXT: ret float [[ADD]]
	;			;
	; THRESH2-LABEL: @f_used_twice_in_tree(			; THRESH2-LABEL: @f_used_twice_in_tree(
	; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1			; THRESH2-NEXT: [[TMP1:%.]] = extractelement <2 x float> [[X:%.]], i32 1
	; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0			; THRESH2-NEXT: [[TMP2:%.*]] = insertelement <2 x float> poison, float [[TMP1]], i32 0
	; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1			; THRESH2-NEXT: [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
	; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[X]], [[TMP3]]			; THRESH2-NEXT: [[TMP4:%.*]] = fmul <2 x float> [[TMP3]], [[X]]
	; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; THRESH2-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; THRESH2-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]			; THRESH2-NEXT: [[ADD:%.*]] = fadd float [[TMP5]], [[TMP6]]
	; THRESH2-NEXT: ret float [[ADD]]			; THRESH2-NEXT: ret float [[ADD]]
	;			;
	%x0 = extractelement <2 x float> %x, i32 0			%x0 = extractelement <2 x float> %x, i32 0
	%x1 = extractelement <2 x float> %x, i32 1			%x1 = extractelement <2 x float> %x, i32 1
	%x0x0 = fmul float %x0, %x1			%x0x0 = fmul float %x0, %x1
	%x1x1 = fmul float %x1, %x1			%x1x1 = fmul float %x1, %x1
	%add = fadd float %x0x0, %x1x1			%add = fadd float %x0x0, %x1x1
	ret float %add			ret float %add
	}			}

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

	Show All 31 Lines
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDX4]] to <2 x double>*
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fadd fast <2 x double> [[TMP9]], [[TMP8]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8			; CHECK-NEXT: store <2 x double> [[TMP10]], <2 x double>* [[TMP11]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[IDX6:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 6			; CHECK-NEXT: [[IDX6:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 6
	; CHECK-NEXT: [[IDX7:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 7			; CHECK-NEXT: [[IDX7:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 7
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[IDX2]] to <2 x double>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP4]]			; CHECK-NEXT: [[TMP6:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP7:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8			; CHECK-NEXT: store <2 x double> [[TMP6]], <2 x double>* [[TMP7]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[IDX6]] to <2 x double>*
	; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load <2 x double>, <2 x double> [[TMP6]], align 8
	; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]			; CHECK-NEXT: [[TMP9:%.*]] = fadd fast <2 x double> [[TMP5]], [[TMP7]]
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP8]], <2 x double> [[TMP9]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP11:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP12:%.*]] = fsub fast <2 x double> [[TMP1]], [[TMP3]]
	; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP13:%.*]] = shufflevector <2 x double> [[TMP11]], <2 x double> [[TMP12]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP13]], [[TMP10]]			; CHECK-NEXT: [[TMP14:%.*]] = fadd fast <2 x double> [[TMP10]], [[TMP13]]
	; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP15:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8			; CHECK-NEXT: store <2 x double> [[TMP14]], <2 x double>* [[TMP15]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	%idx2 = getelementptr inbounds double, double* %array, i64 2			%idx2 = getelementptr inbounds double, double* %array, i64 2
	Show All 37 Lines
	; S[0] S[1]			; S[0] S[1]
	;			;
	; SLP should reorder the operands of the RHS add taking into consideration the cost of external uses.			; SLP should reorder the operands of the RHS add taking into consideration the cost of external uses.
	; It is more profitable to reorder the operands of the RHS add, because A[1] has an external use.			; It is more profitable to reorder the operands of the RHS add, because A[1] has an external use.

	define void @lookahead_external_uses(double* %A, double %B, double %C, double %D, double %S, double %Ext1, double %Ext2) {			define void @lookahead_external_uses(double* %A, double %B, double %C, double %D, double %S, double %Ext1, double %Ext2) {
	; CHECK-LABEL: @lookahead_external_uses(			; CHECK-LABEL: @lookahead_external_uses(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[IDXB0:%.]] = getelementptr inbounds double, double [[B:%.*]], i64 0			; CHECK-NEXT: [[IDXA0:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0			; CHECK-NEXT: [[IDXC0:%.]] = getelementptr inbounds double, double [[C:%.*]], i64 0
	; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0			; CHECK-NEXT: [[IDXD0:%.]] = getelementptr inbounds double, double [[D:%.*]], i64 0
	; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 1			; CHECK-NEXT: [[IDXA1:%.]] = getelementptr inbounds double, double [[A]], i64 1
	; CHECK-NEXT: [[IDXB2:%.]] = getelementptr inbounds double, double [[B]], i64 2			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double* [[B:%.*]], i32 0
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double* [[A]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double* [[B]], i32 1
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double* [[A]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = getelementptr double, <2 x double> [[TMP1]], <2 x i64> <i64 0, i64 2>			; CHECK-NEXT: [[TMP2:%.]] = getelementptr double, <2 x double> [[TMP1]], <2 x i64> <i64 0, i64 2>
				; CHECK-NEXT: [[IDXA2:%.]] = getelementptr inbounds double, double [[A]], i64 2
	; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1			; CHECK-NEXT: [[IDXB1:%.]] = getelementptr inbounds double, double [[B]], i64 1
	; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; CHECK-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8			; CHECK-NEXT: [[D0:%.]] = load double, double [[IDXD0]], align 8
	; CHECK-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[IDXA0]] to <2 x double>*
	; CHECK-NEXT: [[B2:%.]] = load double, double [[IDXB2]], align 8			; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 8
	; CHECK-NEXT: [[TMP3:%.]] = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double> [[TMP2]], i32 8, <2 x i1> <i1 true, i1 true>, <2 x double> undef)			; CHECK-NEXT: [[TMP5:%.]] = call <2 x double> @llvm.masked.gather.v2f64.v2p0f64(<2 x double> [[TMP2]], i32 8, <2 x i1> <i1 true, i1 true>, <2 x double> undef)
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; CHECK-NEXT: [[A2:%.]] = load double, double [[IDXA2]], align 8
	; CHECK-NEXT: [[TMP5:%.]] = load <2 x double>, <2 x double> [[TMP4]], align 8			; CHECK-NEXT: [[B1:%.]] = load double, double [[IDXB1]], align 8
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = fsub fast <2 x double> [[TMP4]], [[TMP5]]
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[A1]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP7]], double [[A2]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> [[TMP8]], double [[B2]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[D0]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = fsub fast <2 x double> [[TMP7]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> [[TMP9]], double [[B1]], i32 1
	; CHECK-NEXT: [[TMP11:%.*]] = fsub fast <2 x double> [[TMP3]], [[TMP5]]			; CHECK-NEXT: [[TMP11:%.*]] = fsub fast <2 x double> [[TMP8]], [[TMP10]]
	; CHECK-NEXT: [[TMP12:%.*]] = fadd fast <2 x double> [[TMP11]], [[TMP10]]			; CHECK-NEXT: [[TMP12:%.*]] = fadd fast <2 x double> [[TMP6]], [[TMP11]]
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds double, double [[S:%.*]], i64 0
	; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1			; CHECK-NEXT: [[IDXS1:%.]] = getelementptr inbounds double, double [[S]], i64 1
	; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; CHECK-NEXT: [[TMP13:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP12]], <2 x double>* [[TMP13]], align 8			; CHECK-NEXT: store <2 x double> [[TMP12]], <2 x double>* [[TMP13]], align 8
	; CHECK-NEXT: store double [[A1]], double* [[EXT1:%.*]], align 8			; CHECK-NEXT: [[TMP14:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
				; CHECK-NEXT: store double [[TMP14]], double* [[EXT1:%.*]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%IdxA0 = getelementptr inbounds double, double* %A, i64 0			%IdxA0 = getelementptr inbounds double, double* %A, i64 0
	%IdxB0 = getelementptr inbounds double, double* %B, i64 0			%IdxB0 = getelementptr inbounds double, double* %B, i64 0
	%IdxC0 = getelementptr inbounds double, double* %C, i64 0			%IdxC0 = getelementptr inbounds double, double* %C, i64 0
	%IdxD0 = getelementptr inbounds double, double* %D, i64 0			%IdxD0 = getelementptr inbounds double, double* %D, i64 0

	▲ Show 20 Lines • Show All 315 Lines • ▼ Show 20 Lines

	; Same as @ChecksExtractScores, but the extratelement vector operands do not match.			; Same as @ChecksExtractScores, but the extratelement vector operands do not match.
	define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {			define void @ChecksExtractScores_different_vectors(double* %storeArray, double* %array, <2 x double> %vecPtr1, <2 x double> %vecPtr2, <2 x double>* %vecPtr3, <2 x double>* %vecPtr4) {
	; CHECK-LABEL: @ChecksExtractScores_different_vectors(			; CHECK-LABEL: @ChecksExtractScores_different_vectors(
	; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0			; CHECK-NEXT: [[IDX0:%.]] = getelementptr inbounds double, double [[ARRAY:%.*]], i64 0
	; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1			; CHECK-NEXT: [[IDX1:%.]] = getelementptr inbounds double, double [[ARRAY]], i64 1
	; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[IDX0]] to <2 x double>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
				; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
				RKSimonUnsubmitted Not Done Reply Inline Actions Regression? RKSimon: Regression?
	; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4			; CHECK-NEXT: [[LOADVEC:%.]] = load <2 x double>, <2 x double> [[VECPTR1:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4			; CHECK-NEXT: [[LOADVEC2:%.]] = load <2 x double>, <2 x double> [[VECPTR2:%.*]], align 4
	; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0			; CHECK-NEXT: [[EXTRA0:%.*]] = extractelement <2 x double> [[LOADVEC]], i32 0
	; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1			; CHECK-NEXT: [[EXTRA1:%.*]] = extractelement <2 x double> [[LOADVEC2]], i32 1
	; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4			; CHECK-NEXT: [[LOADVEC3:%.]] = load <2 x double>, <2 x double> [[VECPTR3:%.*]], align 4
	; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4			; CHECK-NEXT: [[LOADVEC4:%.]] = load <2 x double>, <2 x double> [[VECPTR4:%.*]], align 4
	; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0			; CHECK-NEXT: [[EXTRB0:%.*]] = extractelement <2 x double> [[LOADVEC3]], i32 0
	; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1			; CHECK-NEXT: [[EXTRB1:%.*]] = extractelement <2 x double> [[LOADVEC4]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> poison, double [[EXTRB0]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <2 x double> [[TMP3]], double [[EXTRA1]], i32 1
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP2]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = fmul <2 x double> [[TMP4]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP2]], i32 0			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <2 x double> [[TMP6]], double [[EXTRB1]], i32 1
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x double> [[TMP6]], double [[TMP7]], i32 1			; CHECK-NEXT: [[TMP8:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 1
	; CHECK-NEXT: [[TMP9:%.*]] = fmul <2 x double> [[TMP4]], [[TMP8]]			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> poison, double [[TMP8]], i32 0
	; CHECK-NEXT: [[TMP10:%.*]] = insertelement <2 x double> poison, double [[EXTRA0]], i32 0			; CHECK-NEXT: [[TMP10:%.*]] = extractelement <2 x double> [[SHUFFLE]], i32 0
	; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP10]], double [[EXTRB1]], i32 1			; CHECK-NEXT: [[TMP11:%.*]] = insertelement <2 x double> [[TMP9]], double [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP11]], [[TMP2]]			; CHECK-NEXT: [[TMP12:%.*]] = fmul <2 x double> [[TMP7]], [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP12]], [[TMP9]]			; CHECK-NEXT: [[TMP13:%.*]] = fadd <2 x double> [[TMP5]], [[TMP12]]
	; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0			; CHECK-NEXT: [[SIDX0:%.]] = getelementptr inbounds double, double [[STOREARRAY:%.*]], i64 0
	; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1			; CHECK-NEXT: [[SIDX1:%.]] = getelementptr inbounds double, double [[STOREARRAY]], i64 1
	; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[SIDX0]] to <2 x double>*			; CHECK-NEXT: [[TMP14:%.]] = bitcast double [[SIDX0]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8			; CHECK-NEXT: store <2 x double> [[TMP13]], <2 x double>* [[TMP14]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%idx0 = getelementptr inbounds double, double* %array, i64 0			%idx0 = getelementptr inbounds double, double* %array, i64 0
	%idx1 = getelementptr inbounds double, double* %array, i64 1			%idx1 = getelementptr inbounds double, double* %array, i64 1
	Show All 25 Lines

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -instcombine -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s		; RUN: opt < %s -basic-aa -slp-vectorizer -slp-threshold=-100 -instcombine -dce -S -mtriple=i386-apple-macosx10.8.0 -mcpu=corei7-avx \| FileCheck %s

target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"		target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n8:16:32-S128"

; Make sure we order the operands of commutative operations so that we get		; Make sure we order the operands of commutative operations so that we get
; bigger vectorizable trees.		; bigger vectorizable trees.

define void @shuffle_operands1(double * noalias %from, double * noalias %to, double %v1, double %v2) {		define void @shuffle_operands1(double * noalias %from, double * noalias %to, double %v1, double %v2) {
; CHECK-LABEL: @shuffle_operands1(		; CHECK-LABEL: @shuffle_operands1(
; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[FROM:%.]] to <2 x double>		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[FROM:%.]] to <2 x double>
; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i32 0		; CHECK-NEXT: [[TMP3:%.]] = insertelement <2 x double> poison, double [[V1:%.]], i32 0
; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i32 1		; CHECK-NEXT: [[TMP4:%.]] = insertelement <2 x double> [[TMP3]], double [[V2:%.]], i32 1
; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]		; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], [[TMP2]]
; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TO:%.]] to <2 x double>		; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4		; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%from_1 = getelementptr double, double *%from, i64 1		%from_1 = getelementptr double, double *%from, i64 1
%v0_1 = load double , double * %from		%v0_1 = load double , double * %from
%v0_2 = load double , double * %from_1		%v0_2 = load double , double * %from_1
%v1_1 = fadd double %v0_1, %v1		%v1_1 = fadd double %v0_1, %v1
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	lp:
store double %v1_1, double *%to		store double %v1_1, double *%to
store double %v1_2, double *%to_2		store double %v1_2, double *%to_2
br i1 undef, label %lp, label %ext		br i1 undef, label %lp, label %ext

ext:		ext:
ret void		ret void
}		}

define void @shuffle_preserve_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {		define void @shuffle_preserve_broadcast4(double * noalias %from, double * noalias %to, double %v1, double %v2) {
		RKSimonUnsubmitted Not Done Reply Inline Actions A lot of these tests aren't preserving the broadcast any more - I'm not sure if it really matters although the testnames now look wrong? RKSimon: A lot of these tests aren't preserving the broadcast any more - I'm not sure if it really…
		ABataevAuthorUnsubmitted Done Reply Inline Actions I'll rename affected test cases ABataev: I'll rename affected test cases
; CHECK-LABEL: @shuffle_preserve_broadcast4(		; CHECK-LABEL: @shuffle_preserve_broadcast4(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LP:%.*]]		; CHECK-NEXT: br label [[LP:%.*]]
; CHECK: lp:		; CHECK: lp:
; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]		; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[P]], i32 1		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP2]], [[SHUFFLE]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i32 0		; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <2 x double> [[TMP2]], <2 x double> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]		; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
; CHECK: ext:		; CHECK: ext:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %lp		br label %lp

lp:		lp:
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines


define void @shuffle_preserve_broadcast6(double * noalias %from, double * noalias %to, double %v1, double %v2) {		define void @shuffle_preserve_broadcast6(double * noalias %from, double * noalias %to, double %v1, double %v2) {
; CHECK-LABEL: @shuffle_preserve_broadcast6(		; CHECK-LABEL: @shuffle_preserve_broadcast6(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br label [[LP:%.*]]		; CHECK-NEXT: br label [[LP:%.*]]
; CHECK: lp:		; CHECK: lp:
; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]		; CHECK-NEXT: [[P:%.]] = phi double [ 1.000000e+00, [[LP]] ], [ 0.000000e+00, [[ENTRY:%.]] ]
; CHECK-NEXT: [[FROM_1:%.]] = getelementptr double, double [[FROM:%.*]], i32 1		; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[FROM:%.]] to <2 x double>
; CHECK-NEXT: [[V0_1:%.]] = load double, double [[FROM]], align 4		; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 4
; CHECK-NEXT: [[V0_2:%.]] = load double, double [[FROM_1]], align 4		; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[V0_1]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> [[TMP1]], double [[P]], i32 1
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x double> [[TMP0]], <2 x double> undef, <2 x i32> zeroinitializer		; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[SHUFFLE]], [[TMP2]]
; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[V0_2]], i32 0		; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[P]], i32 1		; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 4
; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[TO:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 4
; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]		; CHECK-NEXT: br i1 undef, label [[LP]], label [[EXT:%.*]]
; CHECK: ext:		; CHECK: ext:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
br label %lp		br label %lp

lp:		lp:
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
; c[1] = b[1]+a[1]; // swapped b[1] and a[1]		; c[1] = b[1]+a[1]; // swapped b[1] and a[1]

define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){		define void @load_reorder_double(double* nocapture %c, double* noalias nocapture readonly %a, double* noalias nocapture readonly %b){
; CHECK-LABEL: @load_reorder_double(		; CHECK-LABEL: @load_reorder_double(
; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>		; CHECK-NEXT: [[TMP1:%.]] = bitcast double [[B:%.]] to <2 x double>
; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4		; CHECK-NEXT: [[TMP2:%.]] = load <2 x double>, <2 x double> [[TMP1]], align 4
; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>		; CHECK-NEXT: [[TMP3:%.]] = bitcast double [[A:%.]] to <2 x double>
; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4		; CHECK-NEXT: [[TMP4:%.]] = load <2 x double>, <2 x double> [[TMP3]], align 4
; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP4]], [[TMP2]]		; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP2]], [[TMP4]]
; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>		; CHECK-NEXT: [[TMP6:%.]] = bitcast double [[C:%.]] to <2 x double>
; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4		; CHECK-NEXT: store <2 x double> [[TMP5]], <2 x double>* [[TMP6]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%1 = load double, double* %a		%1 = load double, double* %a
%2 = load double, double* %b		%2 = load double, double* %b
%3 = fadd double %1, %2		%3 = fadd double %1, %2
store double %3, double* %c		store double %3, double* %c
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/supernode.ll

	Show All 17 Lines
	; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8			; ENABLED-NEXT: [[A0:%.]] = load double, double [[IDXA0]], align 8
	; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8			; ENABLED-NEXT: [[A1:%.]] = load double, double [[IDXA1]], align 8
	; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*			; ENABLED-NEXT: [[TMP0:%.]] = bitcast double [[IDXB0]] to <2 x double>*
	; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; ENABLED-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8			; ENABLED-NEXT: [[C0:%.]] = load double, double [[IDXC0]], align 8
	; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8			; ENABLED-NEXT: [[C1:%.]] = load double, double [[IDXC1]], align 8
	; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0			; ENABLED-NEXT: [[TMP2:%.*]] = insertelement <2 x double> poison, double [[A0]], i32 0
	; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1			; ENABLED-NEXT: [[TMP3:%.*]] = insertelement <2 x double> [[TMP2]], double [[C1]], i32 1
	; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP3]], [[TMP1]]			; ENABLED-NEXT: [[TMP4:%.*]] = fadd fast <2 x double> [[TMP1]], [[TMP3]]
	; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0			; ENABLED-NEXT: [[TMP5:%.*]] = insertelement <2 x double> poison, double [[C0]], i32 0
	; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1			; ENABLED-NEXT: [[TMP6:%.*]] = insertelement <2 x double> [[TMP5]], double [[A1]], i32 1
	; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]			; ENABLED-NEXT: [[TMP7:%.*]] = fadd fast <2 x double> [[TMP4]], [[TMP6]]
	; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*			; ENABLED-NEXT: [[TMP8:%.]] = bitcast double [[IDXS0]] to <2 x double>*
	; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8			; ENABLED-NEXT: store <2 x double> [[TMP7]], <2 x double>* [[TMP8]], align 8
	; ENABLED-NEXT: ret void			; ENABLED-NEXT: ret void
	;			;
	entry:			entry:
	▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve multinode analysis.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 345547

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/transpose-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/AArch64/transpose.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_exceed_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_smallpt.ll

llvm/test/Transforms/SLPVectorizer/X86/extractelement.ll

llvm/test/Transforms/SLPVectorizer/X86/lookahead.ll

llvm/test/Transforms/SLPVectorizer/X86/operandorder.ll

llvm/test/Transforms/SLPVectorizer/X86/supernode.ll

[SLP]Improve multinode analysis.
ClosedPublic