This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
9
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
-
tsc-s116.ll
-
vectorize-free-extracts-inserts.ll
-
X86/
-
PR35865-inseltpoison.ll
-
PR35865.ll
-
alternate-cmp-swapped-pred.ll
-
broadcast_long.ll
1
buildvector-same-lane-insert.ll
-
cmp-as-alternate-ops.ll
-
commutativity.ll
-
crash_7zip.ll
-
crash_lencod.ll
-
crash_scheduling.ll
-
extract-scalar-from-undef.ll
-
insert-element-build-vector-inseltpoison.ll
-
insert-element-build-vector.ll
-
insert-shuffle.ll
-
jumbled-load-multiuse.ll
-
landing_pad.ll
-
load-partial-vector-shuffle.ll
-
matched-shuffled-entries.ll
-
partail.ll
-
phi-undef-input.ll
-
remark_extract_broadcast.ll
-
reused-undefs.ll
-
vectorize-widest-phis.ll

Differential D127119

[SLP]Fix undef handling in gather function.
Needs ReviewPublic

Authored by ABataev on Jun 6 2022, 8:40 AM.

Download Raw Diff

Details

Reviewers

RKSimon
nlopes
hvdijk
vporpo

Summary

SLP cannot replace undefs with poison without extra checks, plus need to
preserve the original order of scalars, if decided not to shuffle the
final vectorbuild sequence.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ABataev created this revision.Jun 6 2022, 8:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 8:40 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

ABataev requested review of this revision.Jun 6 2022, 8:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 8:40 AM

Harbormaster completed remote builds in B168071: Diff 434497.Jun 6 2022, 9:27 AM

Rebase

vporpo added inline comments.Jun 6 2022, 11:26 AM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	Doesn't an undef here mean that lane 4 can potentially be poison? Wouldn't that be incorrect?

Harbormaster completed remote builds in B168103: Diff 434535.Jun 6 2022, 11:32 AM

hvdijk added inline comments.Jun 6 2022, 11:33 AM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	A mask of undef means the result is undef, not poison, even if the input contains poison elements. See https://llvm.org/docs/LangRef.html#shufflevector-instruction (which is less clear than I would have liked, it used to explicitly say undef, it now says "undefined" to be more readable, but it's not obvious that it is still meant to refer to undef rather than poison)

nikic added a subscriber: nikic.Jun 6 2022, 11:37 AM

nikic added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	This will be switched to return a poison element in the future, so it would be best not to rely on it. (cc @nlopes).

nlopes added inline comments.Jun 6 2022, 11:46 AM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	Right, thanks! We need to switch that behavior to fix a bunch of optimizations.

hvdijk added inline comments.Jun 6 2022, 11:51 AM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	D93818 linked for reference. Okay, are you saying for the purpose of SLPVectorizer, you would prefer that we already treat undef masks as selecting poison? That should always result in code that is already correct today, but it may not always be optimal under the current rules.

vporpo added inline comments.Jun 6 2022, 11:55 AM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	Thanks for the explanation!
38 ↗	(On Diff #434535)	`SHUFFLE` is: `LD LD LD Undef` `SHUFFLE1` is: `LD LD LD LD` Shouldn't we reuse SHUFFLE` `for both operands of` mul` ?

nlopes added a subscriber: aqjune.Jun 6 2022, 12:05 PM

nlopes added inline comments.

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	That would be awesome, yes! It would also motivate us to get that done (@aqjune :)

ABataev added inline comments.Jun 6 2022, 12:07 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	I'll try to update this patch to follow this new rule

ABataev added inline comments.Jun 6 2022, 12:10 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	Just one question. Shall I assume that <0, 0, 0, 0> and <0, 0, 0, undef> can be safely merged to <0, 0, 0, 0> then? Or better to wait for D93818?

hvdijk added inline comments.Jun 6 2022, 12:13 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	I think my D127073 actually already achieved that. The concerns there of regressions there are cases where under the current rules, we can use undef masks, but under those proposed new rules of undef masks, we cannot as we need to ensure the result is not poison. However, this diff is intended to also perform some additional optimisations so it may still be worth updating this one.

ABataev added inline comments.Jun 6 2022, 12:15 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	Yep, my update would be pretty similar to your original patch, just some situation will be handled a bit better. Just need to clarify the question, if shuffle <ld, ld, ld, undef> can be safely replaced by shuffle <ld, ld, ld, ld>

hvdijk added inline comments.Jun 6 2022, 12:19 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	If we ensure that the vectorizer only ever generates a mask of <0, 0, 0, undef> when we element 3 is meant to be poison, it will be valid to merge that with <0, 0, 0, 0> even without waiting for D93818, I think: we do not have to support arbitrary existing shufflevectors here that may rely on undef masks giving undef results, we only have to worry about the new shufflevectors we generate.

nlopes added inline comments.Jun 6 2022, 1:19 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	Just need to clarify the question, if shuffle <ld, ld, ld, undef> can be safely replaced by shuffle <ld, ld, ld, ld> It's correct to do that in just 2 cases: ld is known to be non-poison (eg via ValueTracking), or ld is present in the expression tree already. so x * undef can be replaced with x * x, but not with x * y. If instead of undef, you've poison, then it can safely be replaced with ld. We've been moving away from initializing vectors with undef, so hopefully undef should be rare in vectors these days. The only exception left is shufflevector. It would be nice to duplicate some of the SLP tests to use poison rather than undef to ensure the code is simplifying this appropriately.

hvdijk added inline comments.Jun 6 2022, 1:27 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	This explanation is correct for arbitrary existing shuffles, but I wrote earlier that I do not think that is the case we are concerned with. We are concerned with new shuffle instructions that we generate. If we know that the undef mask was generated by SLPVectorizer (because it is a mask that is not even actually the operand to a shufflevector instruction yet, because it is a mask that we have built up in order to potentially generate a shufflevector instruction), and we decide that in SLPVectorizer, we only generate an undef mask if we originally wanted the result to be poison, that is also a case where we can treat an undef mask as selecting poison.

aqjune added inline comments.Jun 6 2022, 5:51 PM

llvm/test/Transforms/SLPVectorizer/X86/diamond_broadcast_extra_shuffle.ll
37 ↗	(On Diff #434535)	Let me slightly make a progress to that patch and its friends. I think I will have some time for this.

Address comments

Harbormaster completed remote builds in B168282: Diff 434790.Jun 7 2022, 6:43 AM

Comparing the changes to the tests in this diff to those in D127073, I am seeing a number of tests where we have more shufflevectors, and none where we have fewer. Are there improvements that are not as obvious to see?

In D127119#3564461, @hvdijk wrote:

Comparing the changes to the tests in this diff to those in D127073, I am seeing a number of tests where we have more shufflevectors, and none where we have fewer. Are there improvements that are not as obvious to see?

Yes, there are. These extra shuffles caused by changes in performExtractsShuffleAction() and in IsIdenticalOrLessDefined lambda, these changes (they treat UndefMaskElem as possible poison) increase number of shuffles. Without them, there are less shuffles, these extra changes are required for correct handling of UndefMaskElem as posion.

In D127119#3564522, @ABataev wrote:

In D127119#3564461, @hvdijk wrote:

Comparing the changes to the tests in this diff to those in D127073, I am seeing a number of tests where we have more shufflevectors, and none where we have fewer. Are there improvements that are not as obvious to see?

Yes, there are. These extra shuffles caused by changes in performExtractsShuffleAction() and in IsIdenticalOrLessDefined lambda, these changes (they treat UndefMaskElem as possible poison) increase number of shuffles. Without them, there are less shuffles, these extra changes are required for correct handling of UndefMaskElem as posion.

That suggests that D127073 still results in incorrect code in some cases. I was under the impression that it was already correct, just not optimal. Can you point to specific tests where you believe D127073 results in wrong code?

Taking a random example

--- a/llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll
@@ -19,9 +19,11 @@ define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef
 ; CHECK-NEXT:    [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef
 ; CHECK-NEXT:    [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT:    [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:    [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0
-; CHECK-NEXT:    [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1
+; CHECK-NEXT:    [[TMP11:%.*]] = shufflevector <2 x float> undef, <2 x float> [[TMP10]], <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:    [[TMP13:%.*]] = shufflevector <2 x float> undef, <2 x float> [[TMP12]], <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:    [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP11]], 0
+; CHECK-NEXT:    [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP13]], 1
 ; CHECK-NEXT:    ret { <2 x float>, <2 x float> } [[INS2]]
 ;
 entry:

The extra shufflevectors here, TMP11 and TMP13, are identity shuffles that should not have been generated, they do not change the handling of undef/poison.

In D127119#3564531, @hvdijk wrote:
In D127119#3564522, @ABataev wrote:

In D127119#3564461, @hvdijk wrote:

Comparing the changes to the tests in this diff to those in D127073, I am seeing a number of tests where we have more shufflevectors, and none where we have fewer. Are there improvements that are not as obvious to see?

Yes, there are. These extra shuffles caused by changes in performExtractsShuffleAction() and in IsIdenticalOrLessDefined lambda, these changes (they treat UndefMaskElem as possible poison) increase number of shuffles. Without them, there are less shuffles, these extra changes are required for correct handling of UndefMaskElem as posion.

That suggests that D127073 still results in incorrect code in some cases. I was under the impression that it was already correct, just not optimal. Can you point to specific tests where you believe D127073 results in wrong code?

Taking a random example
--- a/llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll
+++ b/llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll
@@ -19,9 +19,11 @@ define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
 ; CHECK-NEXT:    [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef
 ; CHECK-NEXT:    [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef
 ; CHECK-NEXT:    [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
-; CHECK-NEXT:    [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
-; CHECK-NEXT:    [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0
-; CHECK-NEXT:    [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1
+; CHECK-NEXT:    [[TMP11:%.*]] = shufflevector <2 x float> undef, <2 x float> [[TMP10]], <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:    [[TMP12:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:    [[TMP13:%.*]] = shufflevector <2 x float> undef, <2 x float> [[TMP12]], <2 x i32> <i32 2, i32 3>
+; CHECK-NEXT:    [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP11]], 0
+; CHECK-NEXT:    [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP13]], 1
 ; CHECK-NEXT:    ret { <2 x float>, <2 x float> } [[INS2]]
 ;
 entry:
The extra shufflevectors here, TMP11 and TMP13, are identity shuffles that should not have been generated, they do not change the handling of undef/poison.

Yes, I see some extra shuffles, that can be removed safely. Working on the improvements.

Address comments.

Harbormaster completed remote builds in B168396: Diff 434939.Jun 7 2022, 3:00 PM

Thanks, I am seeing improvements with this new version. I'll try to go over the changes in more detail later, some initial superficial comments now.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6994	This extra overload does not look like it adds anything: it takes a parameter `U` if and only if `U` is `Value `. The previous overload takes `Value ` if and only if `U` is `Value *`. That is the same thing; this new overload does not appear to add anything and the file compiles successfully for me if I remove this.
6999	This extra overload goes against the documentation above `ValueSelect`. The documentation for `ValueSelect` says it takes a `Value `, and if a `Value ` is wanted, returns that value, otherwise returns a default value. This extra overload is needed because `ValueSelect::get` now gets called with a `const TreeEntry *` in the instantiation of `performExtractsShuffleAction<const TreeEntry>`, but that is contrary to its documentation and it is not clear what it is supposed to mean.
7030	`a && b \|\| !a && c` is simpler expressed as `a ? b : c`.
7039	`(a ? b : c) == b` is simpler expressed as `a \|\| c == b`.
7045	This comment seems like it does not match how `IsCompleteIdentity` is set: we can get `IsCompleteIdentity` to be false even when we select elements only from a single vector.

Address comments

Harbormaster completed remote builds in B169767: Diff 436845.Jun 14 2022, 12:35 PM

Ping!

vporpo added inline comments.Jun 22 2022, 12:20 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9146	These `->getAggregateElement(...)` and `->getOperand(...)` expressions repeat several times in the code and make this code quite verbose. Is there a way to avoid this repetition?

Address comments

Harbormaster completed remote builds in B171929: Diff 439867.Jun 24 2022, 1:31 PM

vporpo added inline comments.Jun 24 2022, 1:37 PM

llvm/test/Transforms/SLPVectorizer/X86/buildvector-same-lane-insert.ll
50	TMP9 seems to be redundant here, it looks like it is a copy of TMP3: TMP8 is: TMP3[0], undef TMP9 is: TMP3[0], TMP3[1] I guess this was an issue even before this patch: TMP8 was a copy of TMP3, so the TMP8 shufflevector was redundant.

Address comments

Harbormaster completed remote builds in B172464: Diff 440605.Jun 28 2022, 7:47 AM

vporpo added inline comments.Jun 28 2022, 12:02 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9100	Please add a comment describing the lambda.
9165–9168	This also repeats in line 9054. Perhaps move it to a lambda like `getOperandIndex(SI2, VecOp2)` ?

hvdijk added inline comments.Jun 28 2022, 12:03 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
9165–9168	Actually... That just looks very wrong, `=` rather than `==`.

Address comments.

Harbormaster completed remote builds in B172560: Diff 440728.Jun 28 2022, 1:52 PM

Fixes and improvements.

Harbormaster completed remote builds in B173255: Diff 441713.Jul 1 2022, 10:53 AM

Rebase + fixes

Harbormaster completed remote builds in B180771: Diff 451973.Aug 11 2022, 4:23 PM

Rebase

Harbormaster completed remote builds in B182922: Diff 454949.Aug 23 2022, 4:05 PM

Rebase

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptSep 1 2022, 6:34 AM

Harbormaster completed remote builds in B184564: Diff 457251.Sep 1 2022, 7:36 AM

Rebase

Harbormaster completed remote builds in B185848: Diff 459069.Sep 9 2022, 8:57 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

235 lines

test/

Transforms/

SLPVectorizer/

AArch64/

tsc-s116.ll

24 lines

vectorize-free-extracts-inserts.ll

10 lines

X86/

PR35865-inseltpoison.ll

2 lines

PR35865.ll

2 lines

alternate-cmp-swapped-pred.ll

16 lines

broadcast_long.ll

2 lines

buildvector-same-lane-insert.ll

10 lines

cmp-as-alternate-ops.ll

8 lines

6 lines

13 lines

2 lines

11 lines

extract-scalar-from-undef.ll

22 lines

insert-element-build-vector-inseltpoison.ll

2 lines

insert-element-build-vector.ll

4 lines

insert-shuffle.ll

7 lines

jumbled-load-multiuse.ll

9 lines

landing_pad.ll

12 lines

load-partial-vector-shuffle.ll

7 lines

matched-shuffled-entries.ll

4 lines

partail.ll

35 lines

phi-undef-input.ll

12 lines

remark_extract_broadcast.ll

2 lines

reused-undefs.ll

4 lines

vectorize-widest-phis.ll

13 lines

Diff 457251

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,864 Lines • ▼ Show 20 Lines	for (Instruction *Inst : OrderedScalars) {
}		}

PrevInst = Inst;		PrevInst = Inst;
}		}

return Cost;		return Cost;
}		}

		/// Checks if the buildvector last insertelement, with \p I as a part of this
		/// buildvector, dominates all other uses of \p Root.
		static bool dominatesAllUses(InsertElementInst I, InsertElementInst Root) {
		if (I->getParent() != Root->getParent())
		return false;
		// Find top insertelement of the buildvector.
		Value *V = I;
		InsertElementInst *Top = I;
		while (auto *IE = dyn_cast<InsertElementInst>(V)) {
		if (!IE->hasOneUse())
		break;
		Top = IE;
		V = IE->user_back();
		}
		if (Top->getParent() != Root->getParent())
		return false;
		BasicBlock *BB = Top->getParent();
		unsigned Cnt = 0;
		for (const User *U : Root->users()) {
		auto *UInst = dyn_cast<Instruction>(U);
		if (!UInst \|\| UInst->getParent() != BB)
		continue;
		if (UInst->comesBefore(Top) \|\| UInst == Top) {
		++Cnt;
		if (Cnt > 1)
		return false;
		}
		}
		return true;
		}

/// Check if two insertelement instructions are from the same buildvector.		/// Check if two insertelement instructions are from the same buildvector.
static bool areTwoInsertFromSameBuildVector(		static bool areTwoInsertFromSameBuildVector(
InsertElementInst VU, InsertElementInst V,		InsertElementInst VU, InsertElementInst V,
function_ref<Value (InsertElementInst )> GetBaseOperand) {		function_ref<Value (InsertElementInst )> GetBaseOperand) {
// Instructions must be from the same basic blocks.		// Instructions must be from the same basic blocks.
if (VU->getParent() != V->getParent())		if (VU->getParent() != V->getParent())
return false;		return false;
// Checks if 2 insertelements are from the same buildvector.		// Checks if 2 insertelements are from the same buildvector.
if (VU->getType() != V->getType())		if (VU->getType() != V->getType())
return false;		return false;
// Multiple used inserts are separate nodes.		// Multiple used inserts are separate nodes.
if (!VU->hasOneUse() && !V->hasOneUse())		if (!VU->hasOneUse() && !V->hasOneUse())
return false;		return false;
		auto &&IsAllowedToBeInBuildVector = [](InsertElementInst *V,
		InsertElementInst *TopI) {
		return V->hasOneUse() \|\| dominatesAllUses(TopI, V);
		};
auto *IE1 = VU;		auto *IE1 = VU;
auto *IE2 = V;		auto *IE2 = V;
unsigned Idx1 = *getInsertIndex(IE1);		unsigned Idx1 = *getInsertIndex(IE1);
unsigned Idx2 = *getInsertIndex(IE2);		unsigned Idx2 = *getInsertIndex(IE2);
// Go through the vector operand of insertelement instructions trying to find		// Go through the vector operand of insertelement instructions trying to find
// either VU as the original vector for IE2 or V as the original vector for		// either VU as the original vector for IE2 or V as the original vector for
// IE1.		// IE1.
do {		do {
if (IE2 == VU)		if (IE2 == VU)
return VU->hasOneUse();		return IsAllowedToBeInBuildVector(VU, V);
if (IE1 == V)		if (IE1 == V)
return V->hasOneUse();		return IsAllowedToBeInBuildVector(V, VU);
if (IE1) {		if (IE1) {
if ((IE1 != VU && !IE1->hasOneUse()) \|\|		if ((IE1 != VU && !IE1->hasOneUse()) \|\|
getInsertIndex(IE1).value_or(Idx2) == Idx2)		getInsertIndex(IE1).value_or(Idx2) == Idx2)
IE1 = nullptr;		IE1 = nullptr;
else		else
IE1 = dyn_cast_or_null<InsertElementInst>(GetBaseOperand(IE1));		IE1 = dyn_cast_or_null<InsertElementInst>(GetBaseOperand(IE1));
}		}
if (IE2) {		if (IE2) {
Show All 33 Lines	if (I2 && ((I2 == IE2 \|\| I2->hasOneUse())) &&
getInsertIndex(I2).value_or(Idx1) != Idx1)		getInsertIndex(I2).value_or(Idx1) != Idx1)
I2 = dyn_cast<InsertElementInst>(I2->getOperand(0));		I2 = dyn_cast<InsertElementInst>(I2->getOperand(0));
} while ((I1 && PrevI1 != I1) \|\| (I2 && PrevI2 != I2));		} while ((I1 && PrevI1 != I1) \|\| (I2 && PrevI2 != I2));
llvm_unreachable("Two different buildvectors not expected.");		llvm_unreachable("Two different buildvectors not expected.");
}		}

namespace {		namespace {
/// Returns incoming Value , if the requested type is Value too, or a default		/// Returns incoming Value , if the requested type is Value too, or a default
/// value, otherwise.		/// value, otherwise. Also, if the requested type is Value * and the incoming
		/// value is not Value *, returns nullptr.
struct ValueSelect {		struct ValueSelect {
template <typename U>		template <typename U>
static typename std::enable_if<std::is_same<Value , U>::value, Value >::type		static typename std::enable_if<std::is_same<Value , U>::value, Value >::type
get(Value *V) {		get(Value *V) {
return V;		return V;
}		}
template <typename U>		template <typename U>
		static typename std::enable_if<!std::is_same<Value , U>::value, Value >::type
		get(U) {
		return nullptr;
		}
		hvdijkUnsubmitted Not Done Reply Inline Actions This extra overload does not look like it adds anything: it takes a parameter `U` if and only if `U` is `Value `. The previous overload takes `Value ` if and only if `U` is `Value `. That is the same thing; this new overload does not appear to add anything and the file compiles successfully for me if I remove this. hvdijk:* This extra overload does not look like it adds anything: it takes a parameter `U` if and only…
		template <typename U>
static typename std::enable_if<!std::is_same<Value *, U>::value, U>::type		static typename std::enable_if<!std::is_same<Value *, U>::value, U>::type
get(Value *) {		get(Value *) {
return U();		return U();
}		}
		hvdijkUnsubmitted Not Done Reply Inline Actions This extra overload goes against the documentation above `ValueSelect`. The documentation for `ValueSelect` says it takes a `Value `, and if a `Value ` is wanted, returns that value, otherwise returns a default value. This extra overload is needed because `ValueSelect::get` now gets called with a `const TreeEntry ` in the instantiation of `performExtractsShuffleAction<const TreeEntry>`, but that is contrary to its documentation and it is not clear what it is supposed to mean. hvdijk:* This extra overload goes against the documentation above `ValueSelect`. The documentation for…
};		};
} // namespace		} // namespace

/// Does the analysis of the provided shuffle masks and performs the requested		/// Does the analysis of the provided shuffle masks and performs the requested
/// actions on the vectors with the given shuffle masks. It tries to do it in		/// actions on the vectors with the given shuffle masks. It tries to do it in
/// several steps.		/// several steps.
/// 1. If the Base vector is not undef vector, resizing the very first mask to		/// 1. If the Base vector is not undef vector, resizing the very first mask to
/// have common VF and perform action for 2 input vectors (including non-undef		/// have common VF and perform action for 2 input vectors (including non-undef
Show All 9 Lines	static T *performExtractsShuffleAction(
MutableArrayRef<std::pair<T , SmallVector<int>>> ShuffleMask, Value Base,		MutableArrayRef<std::pair<T , SmallVector<int>>> ShuffleMask, Value Base,
function_ref<unsigned(T *)> GetVF,		function_ref<unsigned(T *)> GetVF,
function_ref<std::pair<T , bool>(T , ArrayRef<int>)> ResizeAction,		function_ref<std::pair<T , bool>(T , ArrayRef<int>)> ResizeAction,
function_ref<T (ArrayRef<int>, ArrayRef<T >)> Action) {		function_ref<T (ArrayRef<int>, ArrayRef<T >)> Action) {
assert(!ShuffleMask.empty() && "Empty list of shuffles for inserts.");		assert(!ShuffleMask.empty() && "Empty list of shuffles for inserts.");
SmallVector<int> Mask(ShuffleMask.begin()->second);		SmallVector<int> Mask(ShuffleMask.begin()->second);
auto VMIt = std::next(ShuffleMask.begin());		auto VMIt = std::next(ShuffleMask.begin());
T *Prev = nullptr;		T *Prev = nullptr;
bool IsBaseNotUndef = !isUndefVector(Base);		Value FirstV = ValueSelect::get<T >(ShuffleMask.begin()->first);
		bool IsBaseNotUndef =
		!(FirstV ? (isa<PoisonValue>(Base) \|\|
		(isUndefVector(Base) && isGuaranteedNotToBePoison(FirstV)))
		: isUndefVector(Base));
if (IsBaseNotUndef) {		if (IsBaseNotUndef) {
		hvdijkUnsubmitted Not Done Reply Inline Actions `a && b \|\| !a && c` is simpler expressed as `a ? b : c`. hvdijk: `a && b \|\| !a && c` is simpler expressed as `a ? b : c`.
// Base is not undef, need to combine it with the next subvectors.		// Base is not undef, need to combine it with the next subvectors.
std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
for (unsigned Idx = 0, VF = Mask.size(); Idx < VF; ++Idx) {		bool IsCompleteIdentity = true;
if (Mask[Idx] == UndefMaskElem)		for (int Idx = 0, VF = Mask.size(); Idx < VF; ++Idx) {
		if (Mask[Idx] == UndefMaskElem) {
Mask[Idx] = Idx;		Mask[Idx] = Idx;
else		IsCompleteIdentity = false;
		} else {
		IsCompleteIdentity &= (Res.second \|\| Mask[Idx] == Idx);
		hvdijkUnsubmitted Not Done Reply Inline Actions `(a ? b : c) == b` is simpler expressed as `a \|\| c == b`. hvdijk: `(a ? b : c) == b` is simpler expressed as `a \|\| c == b`.
Mask[Idx] = (Res.second ? Idx : Mask[Idx]) + VF;		Mask[Idx] = (Res.second ? Idx : Mask[Idx]) + VF;
}		}
		}
		if (IsCompleteIdentity) {
		// Found complete identity, i.e. elements only from single vector are
		// selected in the identity order.
		hvdijkUnsubmitted Not Done Reply Inline Actions This comment seems like it does not match how `IsCompleteIdentity` is set: we can get `IsCompleteIdentity` to be false even when we select elements only from a single vector. hvdijk: This comment seems like it does not match how `IsCompleteIdentity` is set: we can get…
		Prev = Res.first;
		std::iota(Mask.begin(), Mask.end(), 0);
		} else {
auto V = ValueSelect::get<T >(Base);		auto V = ValueSelect::get<T >(Base);
(void)V;		(void)V;
assert((!V \|\| GetVF(V) == Mask.size()) &&		assert((!V \|\| GetVF(V) == Mask.size()) &&
"Expected base vector of VF number of elements.");		"Expected base vector of VF number of elements.");
Prev = Action(Mask, {nullptr, Res.first});		Prev = Action(Mask, {nullptr, Res.first});
		}
} else if (ShuffleMask.size() == 1) {		} else if (ShuffleMask.size() == 1) {
// Base is undef and only 1 vector is shuffled - perform the action only for		// Base is undef and only 1 vector is shuffled - perform the action only for
// single vector, if the mask is not the identity mask.		// single vector, if the mask is not the identity mask.
std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);		std::pair<T *, bool> Res = ResizeAction(ShuffleMask.begin()->first, Mask);
if (Res.second)		if (Res.second)
// Identity mask is found.		// Identity mask is found.
Prev = Res.first;		Prev = Res.first;
else		else
▲ Show 20 Lines • Show All 796 Lines • ▼ Show 20 Lines	Value BoUpSLP::createBuildVector(ArrayRef<Value > VL) {
unsigned VF = VL.size();		unsigned VF = VL.size();
// Exploit possible reuse of values across lanes.		// Exploit possible reuse of values across lanes.
SmallVector<int> ReuseShuffleIndicies;		SmallVector<int> ReuseShuffleIndicies;
SmallVector<Value *> UniqueValues;		SmallVector<Value *> UniqueValues;
if (VL.size() > 2) {		if (VL.size() > 2) {
DenseMap<Value *, unsigned> UniquePositions;		DenseMap<Value *, unsigned> UniquePositions;
unsigned NumValues =		unsigned NumValues =
std::distance(VL.begin(), find_if(reverse(VL), [](Value *V) {		std::distance(VL.begin(), find_if(reverse(VL), [](Value *V) {
return !isa<UndefValue>(V);		return !isa<PoisonValue>(V);
}).base());		}).base());
VF = std::max<unsigned>(VF, PowerOf2Ceil(NumValues));		VF = std::min<unsigned>(VF, PowerOf2Ceil(NumValues));
int UniqueVals = 0;		bool IsIdentity = true;
		int UndefIdx = UndefMaskElem;
		int ReplaceUndefIdx = UndefMaskElem;
for (Value *V : VL.drop_back(VL.size() - VF)) {		for (Value *V : VL.drop_back(VL.size() - VF)) {
if (isa<UndefValue>(V)) {		if (isa<PoisonValue>(V)) {
ReuseShuffleIndicies.emplace_back(UndefMaskElem);		ReuseShuffleIndicies.emplace_back(UndefMaskElem);
continue;		continue;
}		}
if (isConstant(V)) {		if (isConstant(V)) {
		if (isa<UndefValue>(V)) {
		if (UndefIdx == UndefMaskElem) {
		UndefIdx = UniqueValues.size();
		UniqueValues.emplace_back(V);
		}
		ReuseShuffleIndicies.emplace_back(UndefIdx);
		} else {
ReuseShuffleIndicies.emplace_back(UniqueValues.size());		ReuseShuffleIndicies.emplace_back(UniqueValues.size());
UniqueValues.emplace_back(V);		UniqueValues.emplace_back(V);
		}
continue;		continue;
}		}
auto Res = UniquePositions.try_emplace(V, UniqueValues.size());		auto Res = UniquePositions.try_emplace(V, UniqueValues.size());
ReuseShuffleIndicies.emplace_back(Res.first->second);		ReuseShuffleIndicies.emplace_back(Res.first->second);
if (Res.second) {		if (Res.second) {
UniqueValues.emplace_back(V);		UniqueValues.emplace_back(V);
++UniqueVals;		if (ReplaceUndefIdx == UndefMaskElem)
}		ReplaceUndefIdx = Res.first->second;
}		} else {
if (UniqueVals == 1 && UniqueValues.size() == 1) {		IsIdentity = false;
// Emit pure splat vector.
ReuseShuffleIndicies.append(VF - ReuseShuffleIndicies.size(),
UndefMaskElem);
} else if (UniqueValues.size() >= VF - 1 \|\| UniqueValues.size() <= 1) {
if (UniqueValues.empty()) {
assert(all_of(VL, UndefValue::classof) && "Expected list of undefs.");
NumValues = VF;
}		}
ReuseShuffleIndicies.clear();
UniqueValues.clear();
UniqueValues.append(VL.begin(), std::next(VL.begin(), NumValues));
}		}
UniqueValues.append(VF - UniqueValues.size(),		// Check if undef values can be safely replaced by some non-const vals.
		if (UndefIdx != UndefMaskElem && ReplaceUndefIdx != UndefMaskElem &&
		!IsIdentity) {
		int MinIdx = std::min(ReplaceUndefIdx, UndefIdx);
		for (unsigned I = 0; I < VF; ++I)
		if (ReuseShuffleIndicies[I] == UndefIdx \|\|
		ReuseShuffleIndicies[I] == ReplaceUndefIdx)
		ReuseShuffleIndicies[I] = MinIdx;
		if (MinIdx != ReplaceUndefIdx) {
		std::swap(UniqueValues[ReplaceUndefIdx], UniqueValues[UndefIdx]);
		UndefIdx = ReplaceUndefIdx;
		}
		UniqueValues.erase(std::next(UniqueValues.begin(), UndefIdx));
		}
		if (!IsIdentity) {
		UniqueValues.append((UniqueValues.size() == 1 ? VL.size() : VF) -
		UniqueValues.size(),
PoisonValue::get(VL[0]->getType()));		PoisonValue::get(VL[0]->getType()));
		ReuseShuffleIndicies.append(VL.size() - VF, UndefMaskElem);
VL = UniqueValues;		VL = UniqueValues;
		} else {
		ReuseShuffleIndicies.clear();
		}
}		}

ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleSeq,		ShuffleInstructionBuilder ShuffleBuilder(Builder, VF, GatherShuffleSeq,
CSEBlocks);		CSEBlocks);
Value *Vec = gather(VL);		Value *Vec = gather(VL);
if (!ReuseShuffleIndicies.empty()) {		if (!ReuseShuffleIndicies.empty()) {
ShuffleBuilder.addMask(ReuseShuffleIndicies);		ShuffleBuilder.addMask(ReuseShuffleIndicies);
Vec = ShuffleBuilder.finalize(Vec);		Vec = ShuffleBuilder.finalize(Vec);
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	case Instruction::InsertElement: {
if (!IsIdentity \|\| NumElts != NumScalars) {		if (!IsIdentity \|\| NumElts != NumScalars) {
V = Builder.CreateShuffleVector(V, Mask);		V = Builder.CreateShuffleVector(V, Mask);
if (auto *I = dyn_cast<Instruction>(V)) {		if (auto *I = dyn_cast<Instruction>(V)) {
GatherShuffleSeq.insert(I);		GatherShuffleSeq.insert(I);
CSEBlocks.insert(I->getParent());		CSEBlocks.insert(I->getParent());
}		}
}		}

if ((!IsIdentity \|\| Offset != 0 \|\|		bool IsUndefFirstOp = isUndefVector(FirstInsert->getOperand(0));
!isUndefVector(FirstInsert->getOperand(0))) &&		if ((!IsIdentity \|\| Offset != 0 \|\| !IsUndefFirstOp) &&
NumElts != NumScalars) {		NumElts != NumScalars) {
SmallVector<int> InsertMask(NumElts);		SmallVector<int> InsertMask(NumElts, UndefMaskElem);
		if (!IsUndefFirstOp)
std::iota(InsertMask.begin(), InsertMask.end(), 0);		std::iota(InsertMask.begin(), InsertMask.end(), 0);
for (unsigned I = 0; I < NumElts; I++) {		for (unsigned I = 0; I < NumElts; I++) {
if (Mask[I] != UndefMaskElem)		if (Mask[I] != UndefMaskElem)
InsertMask[Offset + I] = NumElts + I;		InsertMask[Offset + I] = NumElts + I;
}		}

V = Builder.CreateShuffleVector(		V = Builder.CreateShuffleVector(
FirstInsert->getOperand(0), V, InsertMask,		FirstInsert->getOperand(0), V, InsertMask,
cast<Instruction>(E->Scalars.back())->getName());		cast<Instruction>(E->Scalars.back())->getName());
▲ Show 20 Lines • Show All 845 Lines • ▼ Show 20 Lines	for (Instruction *II : reverse(Inserts)) {
NewInst = II;		NewInst = II;
}		}
LastInsert->replaceAllUsesWith(NewInst);		LastInsert->replaceAllUsesWith(NewInst);
for (InsertElementInst *IE : reverse(ShuffledInserts[I].InsertElements)) {		for (InsertElementInst *IE : reverse(ShuffledInserts[I].InsertElements)) {
IE->replaceUsesOfWith(IE->getOperand(0),		IE->replaceUsesOfWith(IE->getOperand(0),
PoisonValue::get(IE->getOperand(0)->getType()));		PoisonValue::get(IE->getOperand(0)->getType()));
IE->replaceUsesOfWith(IE->getOperand(1),		IE->replaceUsesOfWith(IE->getOperand(1),
PoisonValue::get(IE->getOperand(1)->getType()));		PoisonValue::get(IE->getOperand(1)->getType()));
		if (IE == FirstInsert && dominatesAllUses(LastInsert, FirstInsert))
		FirstInsert->replaceAllUsesWith(NewInst);
eraseInstruction(IE);		eraseInstruction(IE);
}		}
CSEBlocks.insert(LastInsert->getParent());		CSEBlocks.insert(LastInsert->getParent());
}		}

// For each vectorized value:		// For each vectorized value:
for (auto &TEPtr : VectorizableTree) {		for (auto &TEPtr : VectorizableTree) {
TreeEntry *Entry = TEPtr.get();		TreeEntry *Entry = TEPtr.get();
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	auto &&IsIdenticalOrLessDefined = [this](Instruction I1, Instruction I2,
if (I1->getType() != I2->getType())		if (I1->getType() != I2->getType())
return false;		return false;
auto *SI1 = dyn_cast<ShuffleVectorInst>(I1);		auto *SI1 = dyn_cast<ShuffleVectorInst>(I1);
auto *SI2 = dyn_cast<ShuffleVectorInst>(I2);		auto *SI2 = dyn_cast<ShuffleVectorInst>(I2);
if (!SI1 \|\| !SI2)		if (!SI1 \|\| !SI2)
return I1->isIdenticalTo(I2);		return I1->isIdenticalTo(I2);
if (SI1->isIdenticalTo(SI2))		if (SI1->isIdenticalTo(SI2))
return true;		return true;
for (int I = 0, E = SI1->getNumOperands(); I < E; ++I)		for (int I = 0, E = SI1->getNumOperands(); I < E; ++I) {
if (SI1->getOperand(I) != SI2->getOperand(I))		Value *SIOp = SI1->getOperand(I);
		if (isa<UndefValue>(SIOp) &&
		SIOp->getType() == SI2->getOperand(I)->getType())
		continue;
		if (!any_of(SI2->operands(), [SIOp](Value *Op) { return Op == SIOp; }) \|\|
		any_of(SI2->operands(), [SI1](Value *Op) { return Op == SI1; }))
return false;		return false;
		}
// Check if the second instruction is more defined than the first one.		// Check if the second instruction is more defined than the first one.
NewMask.assign(SI2->getShuffleMask().begin(), SI2->getShuffleMask().end());		NewMask.assign(SI2->getShuffleMask().begin(), SI2->getShuffleMask().end());
ArrayRef<int> SM1 = SI1->getShuffleMask();		ArrayRef<int> SM1 = SI1->getShuffleMask();
// Count trailing undefs in the mask to check the final number of used		// Count trailing undefs in the mask to check the final number of used
// registers.		// registers.
		auto &&IsStrictUndef = [](Value *V) {
		return isa_and_nonnull<UndefValue>(V) && !isa<PoisonValue>(V);
		};
unsigned LastUndefsCnt = 0;		unsigned LastUndefsCnt = 0;
		unsigned VF = cast<VectorType>(SI1->getOperand(0)->getType())
		->getElementCount()
		.getKnownMinValue();
		// Tries to fetch the vector element (scalar value) for \p VecOp using given
		vporpoUnsubmitted Not Done Reply Inline Actions Please add a comment describing the lambda. vporpo: Please add a comment describing the lambda.
		// index \p Idx. \p RootOp is used to stop lookup through shuffles, if \p
		// RootOp is a root of the \p VecOp shuffle.
		auto &&GetVecElem = [VF](Value &VecOp, int Idx, Value RootOp) -> Value * {
		unsigned EVF = VF;
		while (auto *SVOp = dyn_cast<ShuffleVectorInst>(VecOp)) {
		VecOp = SVOp->getOperand(Idx / EVF);
		Idx = SVOp->getMaskValue(Idx % EVF);
		auto *FTy = dyn_cast<FixedVectorType>(VecOp->getType());
		if (!FTy)
		break;
		EVF = FTy->getElementCount().getKnownMinValue();
		if (Idx == UndefMaskElem)
		return PoisonValue::get(
		cast<VectorType>(VecOp->getType())->getElementType());
		if (VecOp == RootOp)
		break;
		}
		if (isa<PoisonValue>(VecOp))
		return PoisonValue::get(
		cast<VectorType>(VecOp->getType())->getElementType());
		if (isa<UndefValue>(VecOp))
		return UndefValue::get(
		cast<VectorType>(VecOp->getType())->getElementType());
		if (auto *CV1 = dyn_cast<ConstantVector>(VecOp))
		return CV1->getAggregateElement(Idx % EVF);
		return nullptr;
		};
for (int I = 0, E = NewMask.size(); I < E; ++I) {		for (int I = 0, E = NewMask.size(); I < E; ++I) {
if (SM1[I] == UndefMaskElem)		if (SM1[I] == UndefMaskElem)
++LastUndefsCnt;		++LastUndefsCnt;
else		else
LastUndefsCnt = 0;		LastUndefsCnt = 0;
if (NewMask[I] != UndefMaskElem && SM1[I] != UndefMaskElem &&		if (NewMask[I] == UndefMaskElem) {
NewMask[I] != SM1[I])
return false;
if (NewMask[I] == UndefMaskElem)
NewMask[I] = SM1[I];		NewMask[I] = SM1[I];
		continue;
		}
		Value *VecOp1 = SI2->getOperand(NewMask[I] / VF);
		unsigned VecOp2Pos = SM1[I] / VF;
		Value *VecOp2 =
		SM1[I] == UndefMaskElem ? nullptr : SI1->getOperand(VecOp2Pos);
		if (SM1[I] == UndefMaskElem \|\| (NewMask[I] == SM1[I] && VecOp1 == VecOp2))
		continue;
		// Check if one mask can be safely replaced by another.
		Value *CV1Elem = GetVecElem(VecOp1, NewMask[I], VecOp2);
		Value *CV2Elem = GetVecElem(VecOp2, SM1[I], VecOp1);
		if (isa_and_nonnull<PoisonValue>(CV1Elem) \|\|
		vporpoUnsubmitted Not Done Reply Inline Actions These `->getAggregateElement(...)` and `->getOperand(...)` expressions repeat several times in the code and make this code quite verbose. Is there a way to avoid this repetition? vporpo: These `->getAggregateElement(...)` and `->getOperand(...)` expressions repeat several times in…
		(IsStrictUndef(CV1Elem) &&
		(IsStrictUndef(CV2Elem) \|\|
		(CV2Elem && isGuaranteedNotToBePoison(CV2Elem)) \|\|
		(!CV2Elem && isGuaranteedNotToBePoison(VecOp2))))) {
		NewMask[I] = SM1[I] % VF + VecOp2Pos * VF;
		continue;
		}
		if ((CV1Elem && CV1Elem == CV2Elem) \|\|
		isa_and_nonnull<PoisonValue>(CV2Elem) \|\|
		(IsStrictUndef(CV2Elem) &&
		(IsStrictUndef(CV1Elem) \|\|
		(CV1Elem && isGuaranteedNotToBePoison(CV1Elem)) \|\|
		(!CV1Elem && isGuaranteedNotToBePoison(VecOp1))))) {
		NewMask[I] = NewMask[I] % VF + VecOp2Pos * VF;
		continue;
		}
		return false;
}		}
// Check if the last undefs actually change the final number of used vector		// Check if the last undefs actually change the final number of used vector
// registers.		// registers.
return SM1.size() - LastUndefsCnt > 1 &&		return SM1.size() - LastUndefsCnt > 1 &&
TTI->getNumberOfParts(SI1->getType()) ==		TTI->getNumberOfParts(SI1->getType()) ==
		vporpoUnsubmitted Not Done Reply Inline Actions This also repeats in line 9054. Perhaps move it to a lambda like `getOperandIndex(SI2, VecOp2)` ? vporpo: This also repeats in line 9054. Perhaps move it to a lambda like `getOperandIndex(SI2, VecOp2)`…
		hvdijkUnsubmitted Not Done Reply Inline Actions Actually... That just looks very wrong, `=` rather than `==`. hvdijk: Actually... That just looks very wrong, `=` rather than `==`.
TTI->getNumberOfParts(		TTI->getNumberOfParts(
FixedVectorType::get(SI1->getType()->getElementType(),		FixedVectorType::get(SI1->getType()->getElementType(),
SM1.size() - LastUndefsCnt));		SM1.size() - LastUndefsCnt));
};		};
// Perform O(N^2) search over the gather/shuffle sequences and merge identical		// Perform O(N^2) search over the gather/shuffle sequences and merge identical
// instructions. TODO: We can further optimize this scan if we split the		// instructions. TODO: We can further optimize this scan if we split the
// instructions into different buckets based on the insert lane.		// instructions into different buckets based on the insert lane.
SmallVector<Instruction *, 16> Visited;		SmallVector<Instruction *, 16> Visited;
▲ Show 20 Lines • Show All 3,517 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

	Show All 13 Lines
	; contiguous. The score estimation needs to be corrected, so that these 4 loads			; contiguous. The score estimation needs to be corrected, so that these 4 loads
	; are not selected for vectorization. Instead we should vectorize with			; are not selected for vectorization. Instead we should vectorize with
	; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.			; contiguous loads, from %a plus offsets 0 to 3, or offsets 1 to 4.

	define void @s116_modified(float* %a) {			define void @s116_modified(float* %a) {
	; CHECK-LABEL: @s116_modified(			; CHECK-LABEL: @s116_modified(
	; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 0			; CHECK-NEXT: [[GEP0:%.]] = getelementptr inbounds float, float [[A:%.*]], i64 0
	; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[A]], i64 1			; CHECK-NEXT: [[GEP1:%.]] = getelementptr inbounds float, float [[A]], i64 1
	; CHECK-NEXT: [[GEP2:%.]] = getelementptr inbounds float, float [[A]], i64 2			; CHECK-NEXT: [[GEP3:%.]] = getelementptr inbounds float, float [[A]], i64 3
	; CHECK-NEXT: [[GEP4:%.]] = getelementptr inbounds float, float [[A]], i64 4
	; CHECK-NEXT: [[LD1:%.]] = load float, float [[GEP1]], align 4
	; CHECK-NEXT: [[LD0:%.]] = load float, float [[GEP0]], align 4			; CHECK-NEXT: [[LD0:%.]] = load float, float [[GEP0]], align 4
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP2]] to <2 x float>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[GEP1]] to <2 x float>*
	; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <2 x float>, <2 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[LD4:%.]] = load float, float [[GEP4]], align 4			; CHECK-NEXT: [[TMP3:%.]] = bitcast float [[GEP3]] to <2 x float>*
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0			; CHECK-NEXT: [[TMP4:%.]] = load <2 x float>, <2 x float> [[TMP3]], align 4
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> poison, float [[LD0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <4 x float> [[TMP3]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 4, i32 5, i32 3>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 2, i32 undef>
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x float> [[TMP5]], float [[LD4]], i32 3			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> [[TMP6]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x float> poison, float [[LD1]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x float> [[TMP7]], float [[LD1]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP7]], <4 x float> [[TMP8]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP8]], <4 x float> [[TMP4]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <4 x i32> <i32 0, i32 0, i32 1, i32 2>
	; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x float> [[TMP6]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fmul fast <4 x float> [[TMP9]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[GEP0]] to <4 x float>*			; CHECK-NEXT: [[TMP11:%.]] = bitcast float [[GEP0]] to <4 x float>*
	; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP11]], align 4			; CHECK-NEXT: store <4 x float> [[TMP10]], <4 x float>* [[TMP11]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%gep0 = getelementptr inbounds float, float* %a, i64 0			%gep0 = getelementptr inbounds float, float* %a, i64 0
	%gep1 = getelementptr inbounds float, float* %a, i64 1			%gep1 = getelementptr inbounds float, float* %a, i64 1
	%gep2 = getelementptr inbounds float, float* %a, i64 2			%gep2 = getelementptr inbounds float, float* %a, i64 2
	%gep3 = getelementptr inbounds float, float* %a, i64 3			%gep3 = getelementptr inbounds float, float* %a, i64 3
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

	Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1			; CHECK-NEXT: [[V2_LANE_1:%.*]] = extractelement <4 x double> [[V_2]], i32 1
	; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2			; CHECK-NEXT: [[V2_LANE_2:%.*]] = extractelement <4 x double> [[V_2]], i32 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x double> poison, double [[V1_LANE_0]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_2]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x double> [[TMP0]], double [[V1_LANE_2]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_1]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <4 x double> [[TMP1]], double [[V1_LANE_1]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_3]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x double> [[TMP2]], double [[V1_LANE_3]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x double> poison, double [[V2_LANE_2]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x double> [[TMP4]], double [[V2_LANE_1]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[V2_LANE_2]], i32 2			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <4 x double> [[TMP5]], double [[V2_LANE_0]], i32 2
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <4 x double> [[TMP6]], double [[V2_LANE_0]], i32 3			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x double> [[TMP6]], <4 x double> poison, <4 x i32> <i32 0, i32 1, i32 0, i32 2>
	; CHECK-NEXT: [[TMP8:%.*]] = fmul <4 x double> [[TMP3]], [[TMP7]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul <4 x double> [[TMP3]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x double> [[TMP8]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x double> [[TMP7]], <4 x double> poison, <9 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: call void @use(double [[V1_LANE_0]])			; CHECK-NEXT: call void @use(double [[V1_LANE_0]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_1]])			; CHECK-NEXT: call void @use(double [[V1_LANE_1]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_2]])			; CHECK-NEXT: call void @use(double [[V1_LANE_2]])
	; CHECK-NEXT: call void @use(double [[V1_LANE_3]])			; CHECK-NEXT: call void @use(double [[V1_LANE_3]])
	; CHECK-NEXT: store <9 x double> [[TMP9]], <9 x double>* [[PTR_1]], align 8			; CHECK-NEXT: store <9 x double> [[TMP8]], <9 x double>* [[PTR_1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8			%v.1 = load <9 x double>, <9 x double>* %ptr.1, align 8
	%v1.lane.0 = extractelement <9 x double> %v.1, i32 0			%v1.lane.0 = extractelement <9 x double> %v.1, i32 0
	%v1.lane.1 = extractelement <9 x double> %v.1, i32 1			%v1.lane.1 = extractelement <9 x double> %v.1, i32 1
	%v1.lane.2 = extractelement <9 x double> %v.1, i32 2			%v1.lane.2 = extractelement <9 x double> %v.1, i32 2
	%v1.lane.3 = extractelement <9 x double> %v.1, i32 3			%v1.lane.3 = extractelement <9 x double> %v.1, i32 3
	▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>			; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> poison, <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 9, i32 undef, i32 undef>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> poison, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s			; RUN: opt -slp-vectorizer < %s -S -o - -mtriple=x86_64-apple-macosx10.10.0 -mcpu=core2 \| FileCheck %s

	define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {			define void @_Z10fooConvertPDv4_xS0_S0_PKS_() {
	; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(			; CHECK-LABEL: @_Z10fooConvertPDv4_xS0_S0_PKS_(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4			; CHECK-NEXT: [[TMP0:%.*]] = extractelement <16 x half> undef, i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5			; CHECK-NEXT: [[TMP1:%.*]] = extractelement <16 x half> undef, i32 5
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x half> poison, half [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x half> [[TMP2]], half [[TMP1]], i32 1
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <2 x half> [[TMP3]] to <2 x float>
	; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = bitcast <2 x float> [[TMP4]] to <2 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP5]], <2 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> undef, <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 6, i32 7>			; CHECK-NEXT: [[VECINS_I_5_I1:%.*]] = shufflevector <8 x i32> undef, <8 x i32> [[TMP6]], <8 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 8, i32 9, i32 undef, i32 undef>
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%0 = extractelement <16 x half> undef, i32 4			%0 = extractelement <16 x half> undef, i32 4
	%conv.i.4.i = fpext half %0 to float			%conv.i.4.i = fpext half %0 to float
	%1 = bitcast float %conv.i.4.i to i32			%1 = bitcast float %conv.i.4.i to i32
	%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4			%vecins.i.4.i = insertelement <8 x i32> undef, i32 %1, i32 4
	%2 = extractelement <16 x half> undef, i32 5			%2 = extractelement <16 x half> undef, i32 5
	%conv.i.5.i = fpext half %2 to float			%conv.i.5.i = fpext half %2 to float
	%3 = bitcast float %conv.i.5.i to i32			%3 = bitcast float %conv.i.5.i to i32
	%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5			%vecins.i.5.i = insertelement <8 x i32> %vecins.i.4.i, i32 %3, i32 5
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/alternate-cmp-swapped-pred.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown -slp-vectorizer -S \| FileCheck %s

	define i16 @test(i16 %call37) {			define i16 @test(i16 %call37) {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2			; CHECK-NEXT: [[CALL:%.]] = load i16, i16 undef, align 2
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0>, i16 [[CALL37:%.]], i32 4			; CHECK-NEXT: [[TMP0:%.]] = insertelement <8 x i16> <i16 poison, i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0>, i16 [[CALL37:%.]], i32 4
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <8 x i16> [[TMP0]], i16 [[CALL]], i32 0
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i16> <i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 poison, i16 0>, i16 [[CALL37]], i32 3			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i16> <i16 0, i16 0, i16 0, i16 poison, i16 0, i16 0, i16 0, i16 poison>, i16 [[CALL37]], i32 3
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i16> [[TMP2]], i16 [[CALL37]], i32 6			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[TMP2]], <8 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 3, i32 6>
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <8 x i16> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <8 x i16> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <8 x i16> [[TMP1]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = icmp sgt <8 x i16> [[TMP1]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i1> [[TMP4]], <8 x i1> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 12, i32 5, i32 14, i32 7>			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <8 x i1> [[TMP3]], <8 x i1> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 11, i32 12, i32 5, i32 14, i32 7>
	; CHECK-NEXT: [[TMP7:%.*]] = zext <8 x i1> [[TMP6]] to <8 x i16>			; CHECK-NEXT: [[TMP6:%.*]] = zext <8 x i1> [[TMP5]] to <8 x i16>
	; CHECK-NEXT: [[TMP8:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP7]])			; CHECK-NEXT: [[TMP7:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP6]])
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP8]], 0			; CHECK-NEXT: [[OP_RDX:%.*]] = add i16 [[TMP7]], 0
	; CHECK-NEXT: ret i16 [[OP_RDX]]			; CHECK-NEXT: ret i16 [[OP_RDX]]
	;			;
	entry:			entry:
	%call = load i16, i16* undef, align 2			%call = load i16, i16* undef, align 2
	%0 = icmp slt i16 %call, 0			%0 = icmp slt i16 %call, 0
	%cond = zext i1 %0 to i16			%cond = zext i1 %0 to i16
	%1 = add i16 %cond, 0			%1 = add i16 %cond, 0
	%2 = icmp slt i16 0, 0			%2 = icmp slt i16 0, 0
	Show All 22 Lines

llvm/test/Transforms/SLPVectorizer/X86/broadcast_long.ll

	Show All 12 Lines
	; YAML-NEXT: - TreeSize: '2'			; YAML-NEXT: - TreeSize: '2'

	define void @bcast_long(i32 %A, i32 %S) {			define void @bcast_long(i32 %A, i32 %S) {
	; CHECK-LABEL: @bcast_long(			; CHECK-LABEL: @bcast_long(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[A0:%.]] = load i32, i32 [[A:%.*]], align 8			; CHECK-NEXT: [[A0:%.]] = load i32, i32 [[A:%.*]], align 8
	; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0			; CHECK-NEXT: [[IDXS0:%.]] = getelementptr inbounds i32, i32 [[S:%.*]], i64 0
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[A0]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <8 x i32> poison, i32 [[A0]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> <i32 0, i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP0]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IDXS0]] to <8 x i32>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[IDXS0]] to <8 x i32>*
	; CHECK-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* [[TMP1]], align 8			; CHECK-NEXT: store <8 x i32> [[SHUFFLE]], <8 x i32>* [[TMP1]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%A0 = load i32, i32 *%A, align 8			%A0 = load i32, i32 *%A, align 8

	%idxS0 = getelementptr inbounds i32, i32* %S, i64 0			%idxS0 = getelementptr inbounds i32, i32* %S, i64 0
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/buildvector-same-lane-insert.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	;RUN: opt -S -slp-vectorizer -mtriple=x86_64-unknown-linux-android23 < %s \| FileCheck %s			;RUN: opt -S -slp-vectorizer -mtriple=x86_64-unknown-linux-android23 < %s \| FileCheck %s

	define void @test() {			define void @test() {
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds float, ptr undef, i32 2			; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds float, ptr undef, i32 2
	; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = load <2 x float>, ptr undef, align 4			; CHECK-NEXT: [[TMP3:%.*]] = load <2 x float>, ptr undef, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x float> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x float> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fcmp olt float [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = fcmp olt float [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> undef, <2 x float> [[TMP3]], <2 x i32> <i32 2, i32 1>
				; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x float> zeroinitializer, float 0.000000e+00, i64 0
	; CHECK-NEXT: store <2 x float> zeroinitializer, ptr null, align 4			; CHECK-NEXT: store <2 x float> zeroinitializer, ptr null, align 4
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP2]], <2 x i32> <i32 3, i32 1>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP2]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: store <2 x float> zeroinitializer, ptr null, align 4			; CHECK-NEXT: store <2 x float> zeroinitializer, ptr null, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = getelementptr inbounds float, ptr undef, i32 2			%1 = getelementptr inbounds float, ptr undef, i32 2
	%2 = load float, ptr %1, align 4			%2 = load float, ptr %1, align 4
	%3 = load float, ptr undef, align 4			%3 = load float, ptr undef, align 4
	%4 = fsub float %2, %3			%4 = fsub float %2, %3
	%5 = getelementptr inbounds float, ptr undef, i32 3			%5 = getelementptr inbounds float, ptr undef, i32 3
	Show All 14 Lines
	; CHECK-LABEL: @test1(			; CHECK-LABEL: @test1(
	; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds float, ptr undef, i32 2			; CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds float, ptr undef, i32 2
	; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.*]] = load <2 x float>, ptr [[TMP1]], align 4
	; CHECK-NEXT: [[TMP3:%.*]] = load <2 x float>, ptr undef, align 4			; CHECK-NEXT: [[TMP3:%.*]] = load <2 x float>, ptr undef, align 4
	; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x float> [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP4:%.*]] = fsub <2 x float> [[TMP2]], [[TMP3]]
	; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x float> [[TMP4]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x float> [[TMP4]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fcmp olt float [[TMP6]], [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = fcmp olt float [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: store <2 x float> [[TMP3]], ptr null, align 4
				; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP8]], ptr null, align 4			; CHECK-NEXT: store <2 x float> [[TMP8]], ptr null, align 4
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP2]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: store <2 x float> [[TMP9]], ptr null, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
				vporpoUnsubmitted Not Done Reply Inline Actions TMP9 seems to be redundant here, it looks like it is a copy of TMP3: TMP8 is: TMP3[0], undef TMP9 is: TMP3[0], TMP3[1] I guess this was an issue even before this patch: TMP8 was a copy of TMP3, so the TMP8 shufflevector was redundant. vporpo: TMP9 seems to be redundant here, it looks like it is a copy of TMP3: TMP8 is: TMP3[0], undef…
	;			;
	%1 = getelementptr inbounds float, ptr undef, i32 2			%1 = getelementptr inbounds float, ptr undef, i32 2
	%2 = load float, ptr %1, align 4			%2 = load float, ptr %1, align 4
	%3 = load float, ptr undef, align 4			%3 = load float, ptr undef, align 4
	%4 = fsub float %2, %3			%4 = fsub float %2, %3
	%5 = getelementptr inbounds float, ptr undef, i32 3			%5 = getelementptr inbounds float, ptr undef, i32 3
	%6 = load float, ptr %5, align 4			%6 = load float, ptr %5, align 4
	%7 = getelementptr inbounds float, ptr undef, i32 1			%7 = getelementptr inbounds float, ptr undef, i32 1
	Show All 11 Lines

llvm/test/Transforms/SLPVectorizer/X86/cmp-as-alternate-ops.ll

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 poison, i32 0, i32 poison, i32 0>, i32 [[CONV_I32_I_I_I:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 poison, i32 0, i32 poison, i32 0>, i32 [[CONV_I32_I_I_I:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[CONV_I32_I_I_I1]], i32 2			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[CONV_I32_I_I_I1]], i32 2
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[TMP1]], zeroinitializer			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[TMP1]], zeroinitializer
	; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[TMP1]], zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = icmp slt <4 x i32> [[TMP1]], zeroinitializer
	; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i1> [[TMP2]], <4 x i1> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>			; CHECK-NEXT: [[TMP4:%.*]] = shufflevector <4 x i1> [[TMP2]], <4 x i1> [[TMP3]], <4 x i32> <i32 0, i32 1, i32 6, i32 3>
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x float> zeroinitializer, <4 x float> zeroinitializer			; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x float> zeroinitializer, <4 x float> zeroinitializer
	; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP5]], zeroinitializer			; CHECK-NEXT: [[TMP6:%.*]] = fadd <4 x float> [[TMP5]], zeroinitializer
	; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x float> zeroinitializer, <2 x float> [[TMP7]], <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <4 x float> [[TMP6]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } zeroinitializer, <2 x float> [[TMP7]], 0
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x float> zeroinitializer, <2 x float> [[TMP9]], <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[DOTFCA_0_INSERT]], <2 x float> [[TMP8]], 1
	; CHECK-NEXT: [[DOTFCA_0_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } zeroinitializer, <2 x float> [[TMP8]], 0
	; CHECK-NEXT: [[DOTFCA_1_INSERT:%.*]] = insertvalue { <2 x float>, <2 x float> } [[DOTFCA_0_INSERT]], <2 x float> [[TMP10]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } zeroinitializer			; CHECK-NEXT: ret { <2 x float>, <2 x float> } zeroinitializer
	;			;
	entry:			entry:
	%cmp.i.i.i.i.i = icmp slt i32 0, 0			%cmp.i.i.i.i.i = icmp slt i32 0, 0
	%cond.i.i.i.i = select i1 %cmp.i.i.i.i.i, float 0.000000e+00, float 0.000000e+00			%cond.i.i.i.i = select i1 %cmp.i.i.i.i.i, float 0.000000e+00, float 0.000000e+00
	%conv.i32.i.i.i1 = fptosi float 0.000000e+00 to i32			%conv.i32.i.i.i1 = fptosi float 0.000000e+00 to i32
	%cmp.i.i34.i.i.i = icmp slt i32 %conv.i32.i.i.i1, 0			%cmp.i.i34.i.i.i = icmp slt i32 %conv.i32.i.i.i1, 0
	%cond.i35.i.i.i = select i1 %cmp.i.i34.i.i.i, float 0.000000e+00, float 0.000000e+00			%cond.i35.i.i.i = select i1 %cmp.i.i34.i.i.i, float 0.000000e+00, float 0.000000e+00
	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

	Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines
	; AVX-LABEL: @same_opcode_on_one_side(			; AVX-LABEL: @same_opcode_on_one_side(
	; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0			; AVX-NEXT: [[TMP1:%.]] = insertelement <4 x i32> poison, i32 [[C:%.]], i32 0
	; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer			; AVX-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> zeroinitializer
	; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0			; AVX-NEXT: [[TMP2:%.]] = insertelement <4 x i32> poison, i32 [[A:%.]], i32 0
	; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer			; AVX-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> zeroinitializer
	; AVX-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; AVX-NEXT: [[TMP3:%.*]] = add <4 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 1			; AVX-NEXT: [[TMP4:%.]] = insertelement <4 x i32> [[TMP2]], i32 [[B:%.]], i32 1
	; AVX-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[C]], i32 2			; AVX-NEXT: [[TMP5:%.*]] = insertelement <4 x i32> [[TMP4]], i32 [[C]], i32 2
	; AVX-NEXT: [[TMP6:%.*]] = insertelement <4 x i32> [[TMP5]], i32 [[A]], i32 3			; AVX-NEXT: [[SHUFFLE2:%.*]] = shufflevector <4 x i32> [[TMP5]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 0>
	; AVX-NEXT: [[TMP7:%.*]] = xor <4 x i32> [[TMP3]], [[TMP6]]			; AVX-NEXT: [[TMP6:%.*]] = xor <4 x i32> [[TMP3]], [[SHUFFLE2]]
	; AVX-NEXT: store <4 x i32> [[TMP7]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16			; AVX-NEXT: store <4 x i32> [[TMP6]], <4 x i32>* bitcast ([32 x i32]* @cle32 to <4 x i32>*), align 16
	; AVX-NEXT: ret void			; AVX-NEXT: ret void
	;			;
	%add1 = add i32 %c, %a			%add1 = add i32 %c, %a
	%add2 = add i32 %c, %a			%add2 = add i32 %c, %a
	%add3 = add i32 %a, %c			%add3 = add i32 %a, %c
	%add4 = add i32 %c, %a			%add4 = add i32 %c, %a
	%1 = xor i32 %add1, %a			%1 = xor i32 %add1, %a
	store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16			store i32 %1, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 0), align 16
	%2 = xor i32 %b, %add2			%2 = xor i32 %b, %add2
	store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)			store i32 %2, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 1)
	%3 = xor i32 %c, %add3			%3 = xor i32 %c, %add3
	store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)			store i32 %3, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 2)
	%4 = xor i32 %a, %add4			%4 = xor i32 %a, %add4
	store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)			store i32 %4, i32* getelementptr inbounds ([32 x i32], [32 x i32]* @cle32, i64 0, i64 3)
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/crash_7zip.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -dce -S -mtriple=x86_64-apple-macosx10.8.0 -mcpu=corei7 \| FileCheck %s

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.8.0"			target triple = "x86_64-apple-macosx10.8.0"

	%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 = type { %struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333, i16, i8, i8*, i32, i32, i64, i64, i32, i32, i32, [4 x i32], i32, i32, i32, i32, i32, [20 x i8] }			%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334 = type { %struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333, i16, i8, i8*, i32, i32, i64, i64, i32, i32, i32, [4 x i32], i32, i32, i32, i32, i32, [20 x i8] }
	%struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333 = type { i32, i32, i32, i32 }			%struct._CLzmaProps.0.27.54.81.102.123.144.165.180.195.228.258.333 = type { i32, i32, i32, i32 }

	define fastcc void @LzmaDec_DecodeReal2(%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p) {			define fastcc void @LzmaDec_DecodeReal2(%struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p) {
	; CHECK-LABEL: @LzmaDec_DecodeReal2(			; CHECK-LABEL: @LzmaDec_DecodeReal2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[RANGE20_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334:%.]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P:%.*]], i64 0, i32 4			; CHECK-NEXT: [[RANGE20_I:%.]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334:%.]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P:%.*]], i64 0, i32 4
	; CHECK-NEXT: br label [[DO_BODY66_I:%.*]]			; CHECK-NEXT: br label [[DO_BODY66_I:%.*]]
	; CHECK: do.body66.i:			; CHECK: do.body66.i:
	; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i32> [ [[TMP3:%.]], [[DO_COND_I:%.]] ], [ undef, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <2 x i32> [ [[TMP4:%.]], [[DO_COND_I:%.]] ], [ undef, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP1:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP0]]			; CHECK-NEXT: [[TMP1:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP0]]
				; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x i32> <i32 undef, i32 poison>, <2 x i32> [[TMP1]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: br i1 undef, label [[DO_COND_I]], label [[IF_ELSE_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[DO_COND_I]], label [[IF_ELSE_I:%.*]]
	; CHECK: if.else.i:			; CHECK: if.else.i:
	; CHECK-NEXT: [[TMP2:%.*]] = sub <2 x i32> [[TMP1]], undef			; CHECK-NEXT: [[TMP3:%.*]] = sub <2 x i32> [[TMP1]], undef
	; CHECK-NEXT: br label [[DO_COND_I]]			; CHECK-NEXT: br label [[DO_COND_I]]
	; CHECK: do.cond.i:			; CHECK: do.cond.i:
	; CHECK-NEXT: [[TMP3]] = phi <2 x i32> [ [[TMP2]], [[IF_ELSE_I]] ], [ [[TMP1]], [[DO_BODY66_I]] ]			; CHECK-NEXT: [[TMP4]] = phi <2 x i32> [ [[TMP3]], [[IF_ELSE_I]] ], [ [[TMP2]], [[DO_BODY66_I]] ]
	; CHECK-NEXT: br i1 undef, label [[DO_BODY66_I]], label [[DO_END1006_I:%.*]]			; CHECK-NEXT: br i1 undef, label [[DO_BODY66_I]], label [[DO_END1006_I:%.*]]
	; CHECK: do.end1006.i:			; CHECK: do.end1006.i:
	; CHECK-NEXT: [[TMP4:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP3]]			; CHECK-NEXT: [[TMP5:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i32 [[RANGE20_I]] to <2 x i32>*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i32 [[RANGE20_I]] to <2 x i32>*
	; CHECK-NEXT: store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4			; CHECK-NEXT: store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%range20.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 4			%range20.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 4
	%code21.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 5			%code21.i = getelementptr inbounds %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334, %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* %p, i64 0, i32 5
	br label %do.body66.i			br label %do.body66.i

	do.body66.i: ; preds = %do.cond.i, %entry			do.body66.i: ; preds = %do.cond.i, %entry
	Show All 23 Lines

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	}			}

	define fastcc void @dct36(double* %inbuf) {			define fastcc void @dct36(double* %inbuf) {
	; CHECK-LABEL: @dct36(			; CHECK-LABEL: @dct36(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1			; CHECK-NEXT: [[ARRAYIDX44:%.]] = getelementptr inbounds double, double [[INBUF:%.*]], i64 1
	; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[INBUF]] to <2 x double>*
	; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8			; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> [[TMP1]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <2 x double> <double poison, double undef>, <2 x double> [[TMP1]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]			; CHECK-NEXT: [[TMP3:%.*]] = fadd <2 x double> [[TMP1]], [[TMP2]]
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[ARRAYIDX44]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2			%arrayidx41 = getelementptr inbounds double, double* %inbuf, i64 2
	%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1			%arrayidx44 = getelementptr inbounds double, double* %inbuf, i64 1
	%0 = load double, double* %arrayidx44, align 8			%0 = load double, double* %arrayidx44, align 8
	%add46 = fadd double %0, undef			%add46 = fadd double %0, undef
	store double %add46, double* %arrayidx41, align 8			store double %add46, double* %arrayidx41, align 8
	%1 = load double, double* %inbuf, align 8			%1 = load double, double* %inbuf, align 8
	%add49 = fadd double %1, %0			%add49 = fadd double %1, %0
	store double %add49, double* %arrayidx44, align 8			store double %add49, double* %arrayidx44, align 8
	ret void			ret void
	}			}

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

	Show All 14 Lines
	; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03			; CHECK-NEXT: [[ADD:%.*]] = fadd double [[MUL20]], 8.192000e+03
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[P1:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <2 x double> poison, double [[P1:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[P2:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> [[TMP0]], double [[P2:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.638400e+04, double 1.638400e+04>			; CHECK-NEXT: [[TMP2:%.*]] = fmul <2 x double> [[TMP1]], <double 1.638400e+04, double 1.638400e+04>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[ADD]], i32 1			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <2 x double> <double 0.000000e+00, double poison>, double [[ADD]], i32 1
	; CHECK-NEXT: br label [[FOR_BODY:%.*]]			; CHECK-NEXT: br label [[FOR_BODY:%.*]]
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[INDVARS_IV266:%.]] = phi i64 [ 0, [[BB1]] ], [ [[INDVARS_IV_NEXT267:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP3]], [[BB1]] ], [ [[TMP6:%.]], [[FOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <2 x double> [ [[TMP3]], [[BB1]] ], [ [[TMP7:%.]], [[FOR_BODY]] ]
	; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP4]])			; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> undef, <2 x double> [[TMP4]], <2 x i32> <i32 2, i32 1>
				; CHECK-NEXT: [[X13:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP5]])
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB1]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]			; CHECK-NEXT: store i32 [[X13]], i32* [[ARRAYIDX]], align 4, !tbaa [[TBAA0:![0-9]+]]
	; CHECK-NEXT: [[TMP5:%.*]] = shufflevector <2 x double> [[TMP4]], <2 x double> poison, <2 x i32> <i32 1, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <2 x double> undef, <2 x double> [[TMP4]], <2 x i32> <i32 3, i32 1>
	; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP5]])			; CHECK-NEXT: [[X14:%.*]] = tail call i32 @_xfn(<2 x double> [[TMP6]])
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [256 x i32], [256 x i32] [[TAB2]], i64 0, i64 [[INDVARS_IV266]]
	; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]			; CHECK-NEXT: store i32 [[X14]], i32* [[ARRAYIDX26]], align 4, !tbaa [[TBAA0]]
	; CHECK-NEXT: [[TMP6]] = fadd <2 x double> [[TMP2]], [[TMP4]]			; CHECK-NEXT: [[TMP7]] = fadd <2 x double> [[TMP2]], [[TMP4]]
	; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1			; CHECK-NEXT: [[INDVARS_IV_NEXT267]] = add nuw nsw i64 [[INDVARS_IV266]], 1
	; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256			; CHECK-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT267]], 256
	; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]			; CHECK-NEXT: br i1 [[EXITCOND]], label [[RETURN:%.*]], label [[FOR_BODY]]
	; CHECK: return:			; CHECK: return:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%tab1 = alloca [256 x i32], align 16			%tab1 = alloca [256 x i32], align 16
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -mtriple=x86_64-apple-macosx -mattr=+avx2 < %s \| FileCheck %s

	define i64 @foo(i32 %tmp7) {			define i64 @foo(i32 %tmp7) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i32> <i32 0, i32 0, i32 poison, i32 0>, i32 [[TMP7:%.]], i32 2
	; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = sub <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP2:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 undef, i32 4			; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <8 x i32> <i32 poison, i32 poison, i32 undef, i32 poison, i32 poison, i32 undef, i32 poison, i32 undef>, <8 x i32> [[TMP2]], <8 x i32> <i32 8, i32 9, i32 undef, i32 11, i32 12, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP3]], <8 x i32> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 2, i32 3, i32 undef, i32 4, i32 undef>			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 undef, i32 6
	; CHECK-NEXT: [[TMP4:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[SHUFFLE]]			; CHECK-NEXT: [[TMP5:%.*]] = sub nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP4]]
	; CHECK-NEXT: [[TMP5:%.*]] = add nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[SHUFFLE]]			; CHECK-NEXT: [[TMP6:%.*]] = add nsw <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 undef, i32 0, i32 undef, i32 0>, [[TMP4]]
	; CHECK-NEXT: [[TMP6:%.*]] = shufflevector <8 x i32> [[TMP4]], <8 x i32> [[TMP5]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>			; CHECK-NEXT: [[TMP7:%.*]] = shufflevector <8 x i32> [[TMP5]], <8 x i32> [[TMP6]], <8 x i32> <i32 0, i32 9, i32 2, i32 3, i32 12, i32 13, i32 6, i32 7>
	; CHECK-NEXT: [[TMP7:%.*]] = add <8 x i32> zeroinitializer, [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = add <8 x i32> zeroinitializer, [[TMP7]]
	; CHECK-NEXT: [[TMP8:%.*]] = xor <8 x i32> [[TMP7]], zeroinitializer			; CHECK-NEXT: [[TMP9:%.*]] = xor <8 x i32> [[TMP8]], zeroinitializer
	; CHECK-NEXT: [[TMP9:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP8]])			; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v8i32(<8 x i32> [[TMP9]])
	; CHECK-NEXT: [[TMP10:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)			; CHECK-NEXT: [[TMP11:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> zeroinitializer)
	; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP9]], [[TMP10]]			; CHECK-NEXT: [[OP_RDX:%.*]] = add i32 [[TMP10]], [[TMP11]]
	; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64			; CHECK-NEXT: [[TMP64:%.*]] = zext i32 [[OP_RDX]] to i64
	; CHECK-NEXT: ret i64 [[TMP64]]			; CHECK-NEXT: ret i64 [[TMP64]]
	;			;
	bb:			bb:
	%tmp = sub i32 0, 0			%tmp = sub i32 0, 0
	%tmp2 = sub nsw i32 0, %tmp			%tmp2 = sub nsw i32 0, %tmp
	%tmp3 = add i32 0, %tmp2			%tmp3 = add i32 0, %tmp2
	%tmp4 = xor i32 %tmp3, 0			%tmp4 = xor i32 %tmp3, 0
	Show All 40 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

	Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer			; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer
	; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0			; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
	; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1			; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0			; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
	; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1			; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
	; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
	; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>			; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> poison, <4 x float> [[TMP18]], <4 x i32> <i32 undef, i32 undef, i32 4, i32 5>
	; CHECK-NEXT: ret <4 x float> [[RD1]]			; CHECK-NEXT: ret <4 x float> [[RD1]]
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%c2 = extractelement <4 x i32> %c, i32 2			%c2 = extractelement <4 x i32> %c, i32 2
	%c3 = extractelement <4 x i32> %c, i32 3			%c3 = extractelement <4 x i32> %c, i32 3
	%a0 = extractelement <4 x float> %a, i32 0			%a0 = extractelement <4 x float> %a, i32 0
	%a1 = extractelement <4 x float> %a, i32 1			%a1 = extractelement <4 x float> %a, i32 1
	▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

Show All 38 Lines	;
ret <4 x float> %rd		ret <4 x float> %rd
}		}

define <8 x float> @simple_select2(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {		define <8 x float> @simple_select2(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
; CHECK-LABEL: @simple_select2(		; CHECK-LABEL: @simple_select2(
; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer		; CHECK-NEXT: [[TMP1:%.]] = icmp ne <4 x i32> [[C:%.]], zeroinitializer
; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]		; CHECK-NEXT: [[TMP2:%.]] = select <4 x i1> [[TMP1]], <4 x float> [[A:%.]], <4 x float> [[B:%.*]]
; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <8 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 undef, i32 3>		; CHECK-NEXT: [[TMP3:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <8 x i32> <i32 0, i32 undef, i32 1, i32 undef, i32 2, i32 undef, i32 undef, i32 3>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <8 x float> undef, <8 x float> [[TMP3]], <8 x i32> <i32 8, i32 1, i32 10, i32 3, i32 12, i32 5, i32 6, i32 15>		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <8 x float> undef, <8 x float> [[TMP3]], <8 x i32> <i32 8, i32 undef, i32 10, i32 undef, i32 12, i32 undef, i32 undef, i32 15>
; CHECK-NEXT: ret <8 x float> [[RD1]]		; CHECK-NEXT: ret <8 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines
; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer		; CHECK-NEXT: [[TMP11:%.*]] = icmp ne <2 x i32> [[TMP10]], zeroinitializer
; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0		; CHECK-NEXT: [[TMP12:%.*]] = insertelement <2 x float> poison, float [[A2]], i32 0
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1		; CHECK-NEXT: [[TMP13:%.*]] = insertelement <2 x float> [[TMP12]], float [[A3]], i32 1
; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0		; CHECK-NEXT: [[TMP14:%.*]] = insertelement <2 x float> poison, float [[B2]], i32 0
; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1		; CHECK-NEXT: [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[B3]], i32 1
; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]		; CHECK-NEXT: [[TMP16:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP13]], <2 x float> [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP17:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x float> [[TMP16]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP18]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>		; CHECK-NEXT: [[RD1:%.*]] = shufflevector <4 x float> undef, <4 x float> [[TMP18]], <4 x i32> <i32 undef, i32 undef, i32 4, i32 5>
; CHECK-NEXT: ret <4 x float> [[RD1]]		; CHECK-NEXT: ret <4 x float> [[RD1]]
;		;
%c0 = extractelement <4 x i32> %c, i32 0		%c0 = extractelement <4 x i32> %c, i32 0
%c1 = extractelement <4 x i32> %c, i32 1		%c1 = extractelement <4 x i32> %c, i32 1
%c2 = extractelement <4 x i32> %c, i32 2		%c2 = extractelement <4 x i32> %c, i32 2
%c3 = extractelement <4 x i32> %c, i32 3		%c3 = extractelement <4 x i32> %c, i32 3
%a0 = extractelement <4 x float> %a, i32 0		%a0 = extractelement <4 x float> %a, i32 0
%a1 = extractelement <4 x float> %a, i32 1		%a1 = extractelement <4 x float> %a, i32 1
▲ Show 20 Lines • Show All 232 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -o - -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	%struct.sw = type { float, float, float, float }			%struct.sw = type { float, float, float, float }

	define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {			define { <2 x float>, <2 x float> } @foo(%struct.sw* %v) {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP0:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0			; CHECK-NEXT: [[X:%.]] = getelementptr inbounds [[STRUCT_SW:%.]], %struct.sw* [[V:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4			; CHECK-NEXT: [[TMP1:%.]] = load float, float undef, align 4
	; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*			; CHECK-NEXT: [[TMP2:%.]] = bitcast float [[X]] to <2 x float>*
	; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16			; CHECK-NEXT: [[TMP3:%.]] = load <2 x float>, <2 x float> [[TMP2]], align 16
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> poison, <4 x i32> <i32 1, i32 0, i32 0, i32 1>
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> poison, float [[TMP0]], i32 0			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <4 x float> <float poison, float undef, float poison, float undef>, float [[TMP0]], i32 0
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <4 x float> [[TMP4]], float [[TMP1]], i32 2
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x float> [[TMP5]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 1, i32 undef>			; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[TMP5]]
	; CHECK-NEXT: [[TMP6:%.*]] = fmul <4 x float> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef			; CHECK-NEXT: [[TMP7:%.*]] = fadd <4 x float> [[TMP6]], undef
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef			; CHECK-NEXT: [[TMP8:%.*]] = fadd <4 x float> [[TMP7]], undef
	; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef			; CHECK-NEXT: [[TMP9:%.*]] = fadd <4 x float> [[TMP8]], undef
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x float> [[TMP9]], <4 x float> poison, <2 x i32> <i32 2, i32 3>
	; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0			; CHECK-NEXT: [[INS1:%.*]] = insertvalue { <2 x float>, <2 x float> } undef, <2 x float> [[TMP10]], 0
	; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1			; CHECK-NEXT: [[INS2:%.*]] = insertvalue { <2 x float>, <2 x float> } [[INS1]], <2 x float> [[TMP11]], 1
	; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]			; CHECK-NEXT: ret { <2 x float>, <2 x float> } [[INS2]]
	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux -mattr=+sse4.2 \| FileCheck %s

	@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@a = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4
	@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4			@b = common local_unnamed_addr global [4 x i32] zeroinitializer, align 4

	define i32 @fn1() {			define i32 @fn1() {
	; CHECK-LABEL: @fn1(			; CHECK-LABEL: @fn1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4			; CHECK-NEXT: [[TMP0:%.]] = load <4 x i32>, <4 x i32> bitcast ([4 x i32]* @b to <4 x i32>*), align 4
	; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer			; CHECK-NEXT: [[TMP1:%.*]] = icmp sgt <4 x i32> [[TMP0]], zeroinitializer
	; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 ptrtoint (i32 ()* @fn1 to i32)>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[TMP2:%.]] = shufflevector <4 x i32> <i32 8, i32 poison, i32 ptrtoint (i32 () @fn1 to i32), i32 poison>, <4 x i32> [[TMP0]], <4 x i32> <i32 0, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[TMP2]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 2>
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>			; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP1]], <4 x i32> [[SHUFFLE]], <4 x i32> <i32 0, i32 6, i32 0, i32 0>
	; CHECK-NEXT: store <4 x i32> [[SHUFFLE]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <4 x i32> [[TMP3]], <4 x i32> poison, <4 x i32> <i32 1, i32 2, i32 3, i32 0>
				; CHECK-NEXT: store <4 x i32> [[SHUFFLE1]], <4 x i32>* bitcast ([4 x i32]* @a to <4 x i32>*), align 4
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4			%0 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 0), align 4
	%cmp = icmp sgt i32 %0, 0			%cmp = icmp sgt i32 %0, 0
	%cond = select i1 %cmp, i32 8, i32 0			%cond = select i1 %cmp, i32 8, i32 0
	store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4			store i32 %cond, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i64 0, i32 3), align 4
	%1 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 1), align 4			%1 = load i32, i32* getelementptr ([4 x i32], [4 x i32]* @b, i64 0, i32 1), align 4
	Show All 13 Lines

llvm/test/Transforms/SLPVectorizer/X86/landing_pad.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -S \| FileCheck %s			; RUN: opt < %s -passes=slp-vectorizer,verify -slp-threshold=-99999 -S \| FileCheck %s

	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @foo() personality i32* ()* @bar {			define void @foo() personality i32* ()* @bar {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: br label [[BB3:%.*]]			; CHECK-NEXT: br label [[BB3:%.*]]
	; CHECK: bb2.loopexit:			; CHECK: bb2.loopexit:
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = phi <4 x i32> [ [[SHUFFLE:%.]], [[BB9:%.]] ], [ poison, [[BB2_LOOPEXIT:%.]] ]			; CHECK-NEXT: [[TMP0:%.]] = phi <4 x i32> [ [[TMP10:%.]], [[BB9:%.]] ], [ poison, [[BB2_LOOPEXIT:%.]] ]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP5:%.]], [[BB6:%.]] ], [ poison, [[BB1:%.]] ]			; CHECK-NEXT: [[TMP1:%.]] = phi <2 x i32> [ [[TMP5:%.]], [[BB6:%.]] ], [ poison, [[BB1:%.]] ]
	; CHECK-NEXT: [[TMP2:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 0, i32 0, i32 poison) [ "deopt"() ]			; CHECK-NEXT: [[TMP2:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 0, i32 0, i32 poison) [ "deopt"() ]
	; CHECK-NEXT: to label [[BB4:%.]] unwind label [[BB10:%.]]			; CHECK-NEXT: to label [[BB4:%.]] unwind label [[BB10:%.]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: br i1 poison, label [[BB11:%.]], label [[BB5:%.]]			; CHECK-NEXT: br i1 poison, label [[BB11:%.]], label [[BB5:%.]]
	; CHECK: bb5:			; CHECK: bb5:
	; CHECK-NEXT: br label [[BB7:%.*]]			; CHECK-NEXT: br label [[BB7:%.*]]
	; CHECK: bb6:			; CHECK: bb6:
	; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ <i32 0, i32 poison>, [[BB8:%.]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi <2 x i32> [ <i32 0, i32 poison>, [[BB8:%.]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP5]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 1			; CHECK-NEXT: [[TMP5]] = insertelement <2 x i32> poison, i32 [[TMP4]], i32 1
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb7:			; CHECK: bb7:
	; CHECK-NEXT: [[LOCAL_5_84111:%.*]] = phi i32 [ poison, [[BB8]] ], [ poison, [[BB5]] ]			; CHECK-NEXT: [[LOCAL_5_84111:%.*]] = phi i32 [ poison, [[BB8]] ], [ poison, [[BB5]] ]
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOCAL_5_84111]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x i32> poison, i32 [[LOCAL_5_84111]], i32 1
	; CHECK-NEXT: [[TMP7:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 poison, i32 poison, i32 poison) [ "deopt"() ]			; CHECK-NEXT: [[TMP7:%.]] = invoke i32 poison(i8 addrspace(1) nonnull poison, i32 poison, i32 poison, i32 poison) [ "deopt"() ]
	; CHECK-NEXT: to label [[BB8]] unwind label [[BB12:%.*]]			; CHECK-NEXT: to label [[BB8]] unwind label [[BB12:%.*]]
	; CHECK: bb8:			; CHECK: bb8:
	; CHECK-NEXT: br i1 poison, label [[BB7]], label [[BB6]]			; CHECK-NEXT: br i1 poison, label [[BB7]], label [[BB6]]
	; CHECK: bb9:			; CHECK: bb9:
	; CHECK-NEXT: [[INDVARS_IV528799:%.*]] = phi i64 [ poison, [[BB10]] ], [ poison, [[BB12]] ]			; CHECK-NEXT: [[INDVARS_IV528799:%.*]] = phi i64 [ poison, [[BB10]] ], [ poison, [[BB12]] ]
	; CHECK-NEXT: [[TMP8:%.]] = phi <2 x i32> [ [[SHUFFLE1:%.]], [[BB10]] ], [ [[TMP11:%.*]], [[BB12]] ]			; CHECK-NEXT: [[TMP8:%.]] = phi <2 x i32> [ [[SHUFFLE:%.]], [[BB10]] ], [ [[TMP12:%.*]], [[BB12]] ]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x i32> [[TMP8]], <2 x i32> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[SHUFFLE]] = shufflevector <4 x i32> [[TMP9]], <4 x i32> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			; CHECK-NEXT: [[TMP10]] = shufflevector <4 x i32> poison, <4 x i32> [[TMP9]], <4 x i32> <i32 undef, i32 undef, i32 4, i32 5>
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	; CHECK: bb10:			; CHECK: bb10:
	; CHECK-NEXT: [[TMP10:%.*]] = phi <2 x i32> [ [[TMP1]], [[BB3]] ]			; CHECK-NEXT: [[TMP11:%.*]] = phi <2 x i32> [ [[TMP1]], [[BB3]] ]
	; CHECK-NEXT: [[LANDING_PAD68:%.]] = landingpad { i8, i32 }			; CHECK-NEXT: [[LANDING_PAD68:%.]] = landingpad { i8, i32 }
	; CHECK-NEXT: cleanup			; CHECK-NEXT: cleanup
	; CHECK-NEXT: [[SHUFFLE1]] = shufflevector <2 x i32> [[TMP10]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>			; CHECK-NEXT: [[SHUFFLE]] = shufflevector <2 x i32> [[TMP11]], <2 x i32> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: br label [[BB9]]			; CHECK-NEXT: br label [[BB9]]
	; CHECK: bb11:			; CHECK: bb11:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: bb12:			; CHECK: bb12:
	; CHECK-NEXT: [[TMP11]] = phi <2 x i32> [ [[TMP6]], [[BB7]] ]			; CHECK-NEXT: [[TMP12]] = phi <2 x i32> [ [[TMP6]], [[BB7]] ]
	; CHECK-NEXT: [[LANDING_PAD149:%.]] = landingpad { i8, i32 }			; CHECK-NEXT: [[LANDING_PAD149:%.]] = landingpad { i8, i32 }
	; CHECK-NEXT: cleanup			; CHECK-NEXT: cleanup
	; CHECK-NEXT: br label [[BB9]]			; CHECK-NEXT: br label [[BB9]]
	;			;
	bb1:			bb1:
	br label %bb3			br label %bb3

	bb2.loopexit:			bb2.loopexit:
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/load-partial-vector-shuffle.ll

	Show All 32 Lines
	;			;
	; AVX-LABEL: @load_00123456(			; AVX-LABEL: @load_00123456(
	; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, ptr [[DATA:%.]], i64 2			; AVX-NEXT: [[ARRAYIDX2:%.]] = getelementptr inbounds i16, ptr [[DATA:%.]], i64 2
	; AVX-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i16, ptr [[DATA]], i64 3			; AVX-NEXT: [[ARRAYIDX3:%.*]] = getelementptr inbounds i16, ptr [[DATA]], i64 3
	; AVX-NEXT: [[TMP1:%.*]] = load <2 x i16>, ptr [[DATA]], align 2			; AVX-NEXT: [[TMP1:%.*]] = load <2 x i16>, ptr [[DATA]], align 2
	; AVX-NEXT: [[T2:%.*]] = load i16, ptr [[ARRAYIDX2]], align 2			; AVX-NEXT: [[T2:%.*]] = load i16, ptr [[ARRAYIDX2]], align 2
	; AVX-NEXT: [[TMP2:%.*]] = load <4 x i16>, ptr [[ARRAYIDX3]], align 2			; AVX-NEXT: [[TMP2:%.*]] = load <4 x i16>, ptr [[ARRAYIDX3]], align 2
	; AVX-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP1]], <2 x i16> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP3:%.*]] = shufflevector <2 x i16> [[TMP1]], <2 x i16> poison, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[VECINIT2_I_I2:%.*]] = shufflevector <8 x i16> [[TMP3]], <8 x i16> [[TMP3]], <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>			; AVX-NEXT: [[TMP4:%.*]] = shufflevector <8 x i16> undef, <8 x i16> [[TMP3]], <8 x i32> <i32 8, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				; AVX-NEXT: [[VECINIT2_I_I2:%.*]] = shufflevector <8 x i16> [[TMP4]], <8 x i16> [[TMP3]], <8 x i32> <i32 0, i32 8, i32 9, i32 3, i32 4, i32 5, i32 6, i32 7>
	; AVX-NEXT: [[VECINIT3_I_I:%.*]] = insertelement <8 x i16> [[VECINIT2_I_I2]], i16 [[T2]], i64 3			; AVX-NEXT: [[VECINIT3_I_I:%.*]] = insertelement <8 x i16> [[VECINIT2_I_I2]], i16 [[T2]], i64 3
	; AVX-NEXT: [[TMP4:%.*]] = shufflevector <4 x i16> [[TMP2]], <4 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>			; AVX-NEXT: [[TMP5:%.*]] = shufflevector <4 x i16> [[TMP2]], <4 x i16> poison, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef>
	; AVX-NEXT: [[VECINIT7_I_I1:%.*]] = shufflevector <8 x i16> [[VECINIT3_I_I]], <8 x i16> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>			; AVX-NEXT: [[VECINIT7_I_I1:%.*]] = shufflevector <8 x i16> [[VECINIT3_I_I]], <8 x i16> [[TMP5]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
	; AVX-NEXT: [[T7:%.*]] = bitcast <8 x i16> [[VECINIT7_I_I1]] to <2 x i64>			; AVX-NEXT: [[T7:%.*]] = bitcast <8 x i16> [[VECINIT7_I_I1]] to <2 x i64>
	; AVX-NEXT: ret <2 x i64> [[T7]]			; AVX-NEXT: ret <2 x i64> [[T7]]
	;			;
	%arrayidx1 = getelementptr inbounds i16, ptr %data, i64 1			%arrayidx1 = getelementptr inbounds i16, ptr %data, i64 1
	%arrayidx2 = getelementptr inbounds i16, ptr %data, i64 2			%arrayidx2 = getelementptr inbounds i16, ptr %data, i64 2
	%arrayidx3 = getelementptr inbounds i16, ptr %data, i64 3			%arrayidx3 = getelementptr inbounds i16, ptr %data, i64 3
	%arrayidx4 = getelementptr inbounds i16, ptr %data, i64 4			%arrayidx4 = getelementptr inbounds i16, ptr %data, i64 4
	%arrayidx5 = getelementptr inbounds i16, ptr %data, i64 5			%arrayidx5 = getelementptr inbounds i16, ptr %data, i64 5
	Show All 21 Lines

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

	Show All 9 Lines
	; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_1:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef			; CHECK-NEXT: [[ADD78_2:%.*]] = add nsw i32 undef, undef
	; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef			; CHECK-NEXT: [[SUB102_3:%.*]] = sub nsw i32 undef, undef
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_1]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <16 x i32> poison, i32 [[SUB102_1]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <16 x i32> [[TMP0]], i32 [[ADD94_1]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 2			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <16 x i32> [[TMP1]], i32 [[ADD78_1]], i32 2
	; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 3			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <16 x i32> [[TMP2]], i32 [[SUB86_1]], i32 3
	; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 4			; CHECK-NEXT: [[TMP4:%.*]] = insertelement <16 x i32> [[TMP3]], i32 [[ADD78_2]], i32 4
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 4, i32 4, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <16 x i32> [[TMP4]], <16 x i32> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 3, i32 4, i32 0, i32 5, i32 5, i32 0, i32 0, i32 0, i32 0, i32 0>
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> poison, i32 [[SUB86_1]], i32 0			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <16 x i32> poison, i32 [[SUB86_1]], i32 0
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[ADD78_1]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <16 x i32> [[TMP5]], i32 [[ADD78_1]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD94_1]], i32 2			; CHECK-NEXT: [[TMP7:%.*]] = insertelement <16 x i32> [[TMP6]], i32 [[ADD94_1]], i32 2
	; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_1]], i32 3			; CHECK-NEXT: [[TMP8:%.*]] = insertelement <16 x i32> [[TMP7]], i32 [[SUB102_1]], i32 3
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_3]], i32 4			; CHECK-NEXT: [[TMP9:%.*]] = insertelement <16 x i32> [[TMP8]], i32 [[SUB102_3]], i32 4
	; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP9]], <16 x i32> poison, <16 x i32> <i32 undef, i32 undef, i32 undef, i32 undef, i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 4, i32 undef, i32 undef, i32 4>			; CHECK-NEXT: [[SHUFFLE1:%.*]] = shufflevector <16 x i32> [[TMP9]], <16 x i32> poison, <16 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 2, i32 3, i32 4, i32 0, i32 0, i32 0, i32 0, i32 5, i32 0, i32 0, i32 5>
	; CHECK-NEXT: [[TMP10:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP10:%.*]] = add nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]			; CHECK-NEXT: [[TMP11:%.*]] = sub nsw <16 x i32> [[SHUFFLE]], [[SHUFFLE1]]
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>			; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <16 x i32> [[TMP10]], <16 x i32> [[TMP11]], <16 x i32> <i32 0, i32 1, i32 18, i32 19, i32 4, i32 5, i32 22, i32 23, i32 8, i32 9, i32 26, i32 27, i32 12, i32 13, i32 30, i32 31>
	; CHECK-NEXT: [[TMP13:%.*]] = lshr <16 x i32> [[TMP12]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>			; CHECK-NEXT: [[TMP13:%.*]] = lshr <16 x i32> [[TMP12]], <i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15, i32 15>
	; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i32> [[TMP13]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>			; CHECK-NEXT: [[TMP14:%.*]] = and <16 x i32> [[TMP13]], <i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537, i32 65537>
	; CHECK-NEXT: [[TMP15:%.*]] = mul nuw <16 x i32> [[TMP14]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>			; CHECK-NEXT: [[TMP15:%.*]] = mul nuw <16 x i32> [[TMP14]], <i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535, i32 65535>
	; CHECK-NEXT: [[TMP16:%.*]] = add <16 x i32> [[TMP15]], [[TMP12]]			; CHECK-NEXT: [[TMP16:%.*]] = add <16 x i32> [[TMP15]], [[TMP12]]
	; CHECK-NEXT: [[TMP17:%.*]] = xor <16 x i32> [[TMP16]], [[TMP15]]			; CHECK-NEXT: [[TMP17:%.*]] = xor <16 x i32> [[TMP16]], [[TMP15]]
	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

	Show All 11 Lines
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: if.end:			; CHECK: if.end:
	; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef			; CHECK-NEXT: [[SUB14:%.]] = sub nsw i32 [[Y_POS:%.]], undef
	; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2			; CHECK-NEXT: [[SHR15:%.*]] = ashr i32 [[SUB14]], 2
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x i32> poison, i32 [[SHR15]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x i32> [[TMP0]], i32 [[SUB14]], i32 1
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <4 x i32> [[TMP1]], <4 x i32> poison, <4 x i32> <i32 0, i32 1, i32 1, i32 1>
	; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>			; CHECK-NEXT: [[TMP2:%.*]] = icmp sgt <4 x i32> [[SHUFFLE]], <i32 0, i32 -1, i32 -5, i32 -9>
	; CHECK-NEXT: [[TMP3:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP0]], <4 x i32> zeroinitializer			; CHECK-NEXT: [[TMP3:%.*]] = insertelement <4 x i32> <i32 poison, i32 undef, i32 undef, i32 undef>, i32 [[SHR15]], i32 0
	; CHECK-NEXT: [[TMP4:%.*]] = icmp slt <4 x i32> [[TMP3]], undef			; CHECK-NEXT: [[TMP4:%.*]] = select <4 x i1> [[TMP2]], <4 x i32> [[TMP3]], <4 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP5:%.*]] = select <4 x i1> [[TMP4]], <4 x i32> [[TMP3]], <4 x i32> undef			; CHECK-NEXT: [[TMP5:%.*]] = icmp slt <4 x i32> [[TMP4]], undef
	; CHECK-NEXT: [[TMP6:%.*]] = sext <4 x i32> [[TMP5]] to <4 x i64>			; CHECK-NEXT: [[TMP6:%.*]] = select <4 x i1> [[TMP5]], <4 x i32> [[TMP4]], <4 x i32> undef
	; CHECK-NEXT: [[TMP7:%.*]] = trunc <4 x i64> [[TMP6]] to <4 x i32>			; CHECK-NEXT: [[TMP7:%.*]] = sext <4 x i32> [[TMP6]] to <4 x i64>
	; CHECK-NEXT: [[TMP8:%.*]] = extractelement <4 x i32> [[TMP7]], i32 0			; CHECK-NEXT: [[TMP8:%.*]] = trunc <4 x i64> [[TMP7]] to <4 x i32>
	; CHECK-NEXT: [[TMP9:%.*]] = sext i32 [[TMP8]] to i64			; CHECK-NEXT: [[TMP9:%.*]] = extractelement <4 x i32> [[TMP8]], i32 0
	; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = sext i32 [[TMP9]] to i64
	; CHECK-NEXT: [[TMP10:%.*]] = extractelement <4 x i32> [[TMP7]], i32 1			; CHECK-NEXT: [[ARRAYIDX31:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP10]]
	; CHECK-NEXT: [[TMP11:%.*]] = sext i32 [[TMP10]] to i64			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <4 x i32> [[TMP8]], i32 1
	; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP11]]			; CHECK-NEXT: [[TMP12:%.*]] = sext i32 [[TMP11]] to i64
	; CHECK-NEXT: [[TMP12:%.*]] = extractelement <4 x i32> [[TMP7]], i32 2			; CHECK-NEXT: [[ARRAYIDX31_1:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP12]]
	; CHECK-NEXT: [[TMP13:%.*]] = sext i32 [[TMP12]] to i64			; CHECK-NEXT: [[TMP13:%.*]] = extractelement <4 x i32> [[TMP8]], i32 2
	; CHECK-NEXT: [[ARRAYIDX31_2:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP13]]			; CHECK-NEXT: [[TMP14:%.*]] = sext i32 [[TMP13]] to i64
	; CHECK-NEXT: [[TMP14:%.*]] = extractelement <4 x i32> [[TMP7]], i32 3			; CHECK-NEXT: [[ARRAYIDX31_2:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP14]]
	; CHECK-NEXT: [[TMP15:%.*]] = sext i32 [[TMP14]] to i64			; CHECK-NEXT: [[TMP15:%.*]] = extractelement <4 x i32> [[TMP8]], i32 3
	; CHECK-NEXT: [[ARRAYIDX31_3:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP15]]			; CHECK-NEXT: [[TMP16:%.*]] = sext i32 [[TMP15]] to i64
				; CHECK-NEXT: [[ARRAYIDX31_3:%.]] = getelementptr inbounds i16, i16** undef, i64 [[TMP16]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	br label %land.lhs.true			br label %land.lhs.true

	land.lhs.true: ; preds = %entry			land.lhs.true: ; preds = %entry
	br i1 undef, label %if.then, label %if.end			br i1 undef, label %if.then, label %if.end

	Show All 32 Lines

llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -slp-threshold=-1000 -mtriple=x86_64 -S \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -slp-threshold=-1000 -mtriple=x86_64 -S \| FileCheck %s

	; The inputs to vector phi should remain undef.			; The inputs to vector phi should remain undef.

	define i32 @phi3UndefInput(i1 %cond, i8 %arg0, i8 %arg1, i8 %arg2, i8 %arg3) {			define i32 @phi3UndefInput(i1 %cond, i8 %arg0, i8 %arg1, i8 %arg2, i8 %arg3) {
	; CHECK-LABEL: @phi3UndefInput(			; CHECK-LABEL: @phi3UndefInput(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 undef, i8 undef, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 undef, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 0, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 0, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 21 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG0:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG1:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 21 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG0:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG0:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG2:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG3:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 poison, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 20 Lines
	; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]			; CHECK-NEXT: br i1 [[COND:%.]], label [[BB2:%.]], label [[BB3:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0			; CHECK-NEXT: [[TMP0:%.]] = insertelement <4 x i8> poison, i8 [[ARG1:%.]], i32 0
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG3:%.]], i32 1			; CHECK-NEXT: [[TMP1:%.]] = insertelement <4 x i8> [[TMP0]], i8 [[ARG3:%.]], i32 1
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG0:%.]], i32 2			; CHECK-NEXT: [[TMP2:%.]] = insertelement <4 x i8> [[TMP1]], i8 [[ARG0:%.]], i32 2
	; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG2:%.]], i32 3			; CHECK-NEXT: [[TMP3:%.]] = insertelement <4 x i8> [[TMP2]], i8 [[ARG2:%.]], i32 3
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 poison>, [[ENTRY:%.]] ]			; CHECK-NEXT: [[TMP4:%.]] = phi <4 x i8> [ [[TMP3]], [[BB2]] ], [ <i8 0, i8 0, i8 poison, i8 undef>, [[ENTRY:%.]] ]
	; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>			; CHECK-NEXT: [[TMP5:%.*]] = zext <4 x i8> [[TMP4]] to <4 x i32>
	; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])			; CHECK-NEXT: [[TMP6:%.*]] = call i32 @llvm.vector.reduce.or.v4i32(<4 x i32> [[TMP5]])
	; CHECK-NEXT: ret i32 [[TMP6]]			; CHECK-NEXT: ret i32 [[TMP6]]
	;			;
	entry:			entry:
	br i1 %cond, label %bb2, label %bb3			br i1 %cond, label %bb2, label %bb3

	bb2:			bb2:
	Show All 16 Lines

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s			; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -mattr=sse2 -slp-vectorizer -pass-remarks-output=%t < %s -slp-threshold=-2 \| FileCheck %s
	; RUN: FileCheck --input-file=%t --check-prefix=YAML %s			; RUN: FileCheck --input-file=%t --check-prefix=YAML %s

	define void @fextr(i16* %ptr) {			define void @fextr(i16* %ptr) {
	; CHECK-LABEL: @fextr(			; CHECK-LABEL: @fextr(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[LD:%.]] = load <8 x i16>, <8 x i16> undef, align 16			; CHECK-NEXT: [[LD:%.]] = load <8 x i16>, <8 x i16> undef, align 16
	; CHECK-NEXT: br label [[T:%.*]]			; CHECK-NEXT: br label [[T:%.*]]
	; CHECK: t:			; CHECK: t:
	; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0			; CHECK-NEXT: [[P0:%.]] = getelementptr inbounds i16, i16 [[PTR:%.*]], i64 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> <i32 0, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i16> [[LD]], <8 x i16> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: [[TMP0:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]			; CHECK-NEXT: [[TMP0:%.*]] = add <8 x i16> [[LD]], [[SHUFFLE]]
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*			; CHECK-NEXT: [[TMP1:%.]] = bitcast i16 [[P0]] to <8 x i16>*
	; CHECK-NEXT: store <8 x i16> [[TMP0]], <8 x i16>* [[TMP1]], align 2			; CHECK-NEXT: store <8 x i16> [[TMP0]], <8 x i16>* [[TMP1]], align 2
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	; YAML: Pass: slp-vectorizer			; YAML: Pass: slp-vectorizer
	; YAML-NEXT: Name: StoresVectorized			; YAML-NEXT: Name: StoresVectorized
	; YAML-NEXT: Function: fextr			; YAML-NEXT: Function: fextr
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reused-undefs.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -slp-threshold=-1000 < %s \| FileCheck %s			; RUN: opt -S -slp-vectorizer -mtriple=x86_64-unknown-linux-gnu -slp-threshold=-1000 < %s \| FileCheck %s

	define i32 @main(i32 %0) {			define i32 @main(i32 %0) {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: for.cond.preheader:			; CHECK-NEXT: for.cond.preheader:
	; CHECK-NEXT: br i1 false, label [[FOR_END:%.]], label [[FOR_INC_PREHEADER:%.]]			; CHECK-NEXT: br i1 false, label [[FOR_END:%.]], label [[FOR_INC_PREHEADER:%.]]
	; CHECK: for.inc.preheader:			; CHECK: for.inc.preheader:
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 poison>, i32 [[TMP0:%.]], i32 6			; CHECK-NEXT: [[TMP1:%.]] = insertelement <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 0, i32 0, i32 poison, i32 undef>, i32 [[TMP0:%.]], i32 6
	; CHECK-NEXT: br i1 false, label [[FOR_END]], label [[L1_PREHEADER:%.*]]			; CHECK-NEXT: br i1 false, label [[FOR_END]], label [[L1_PREHEADER:%.*]]
	; CHECK: for.end:			; CHECK: for.end:
	; CHECK-NEXT: [[DOTPR:%.]] = phi i32 [ 0, [[FOR_INC_PREHEADER]] ], [ 0, [[FOR_COND_PREHEADER:%.]] ]			; CHECK-NEXT: [[DOTPR:%.]] = phi i32 [ 0, [[FOR_INC_PREHEADER]] ], [ 0, [[FOR_COND_PREHEADER:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> poison, i32 [[DOTPR]], i32 0			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <8 x i32> poison, i32 [[DOTPR]], i32 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <8 x i32> <i32 undef, i32 undef, i32 undef, i32 0, i32 0, i32 0, i32 0, i32 0>			; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <8 x i32> [[TMP2]], <8 x i32> poison, <8 x i32> zeroinitializer
	; CHECK-NEXT: br label [[L1_PREHEADER]]			; CHECK-NEXT: br label [[L1_PREHEADER]]
	; CHECK: L1.preheader:			; CHECK: L1.preheader:
	; CHECK-NEXT: [[TMP3:%.*]] = phi <8 x i32> [ [[SHUFFLE]], [[FOR_END]] ], [ [[TMP1]], [[FOR_INC_PREHEADER]] ]			; CHECK-NEXT: [[TMP3:%.*]] = phi <8 x i32> [ [[SHUFFLE]], [[FOR_END]] ], [ [[TMP1]], [[FOR_INC_PREHEADER]] ]
	; CHECK-NEXT: ret i32 0			; CHECK-NEXT: ret i32 0
	;			;
	for.cond.preheader:			for.cond.preheader:
	br i1 false, label %for.end, label %for.inc.preheader			br i1 false, label %for.end, label %for.inc.preheader

	Show All 18 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s			; RUN: opt -slp-vectorizer -S -mcpu=cascadelake -mtriple=x86_64-unknown-linux-gnu < %s \| FileCheck %s

	define void @foo() {			define void @foo() {
	; CHECK-LABEL: @foo(			; CHECK-LABEL: @foo(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float			; CHECK-NEXT: [[CONV:%.*]] = uitofp i16 undef to float
	; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef			; CHECK-NEXT: [[SUB:%.*]] = fsub float 6.553500e+04, undef
	; CHECK-NEXT: br label [[BB1:%.*]]			; CHECK-NEXT: br label [[BB1:%.*]]
	; CHECK: bb1:			; CHECK: bb1:
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> poison, float [[SUB]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <4 x float> <float poison, float poison, float undef, float undef>, float [[SUB]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <4 x float> [[TMP0]], float [[CONV]], i32 1
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP14:%.]], [[BB3:%.*]] ]			; CHECK-NEXT: [[TMP2:%.]] = phi <4 x float> [ [[TMP1]], [[BB1]] ], [ [[TMP15:%.]], [[BB3:%.*]] ]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[TMP3:%.]] = load double, double undef, align 8
	; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB3]], label [[BB4:%.*]]
	; CHECK: bb4:			; CHECK: bb4:
	; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>			; CHECK-NEXT: [[TMP4:%.*]] = fpext <4 x float> [[TMP2]] to <4 x double>
	; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double			; CHECK-NEXT: [[CONV2:%.*]] = uitofp i16 undef to double
	; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1			; CHECK-NEXT: [[TMP5:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[CONV2]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP7:%.*]] = fsub <2 x double> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]			; CHECK-NEXT: [[TMP8:%.*]] = fadd <2 x double> [[TMP5]], [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP9:%.*]] = shufflevector <2 x double> [[TMP7]], <2 x double> [[TMP8]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[TMP10:%.*]] = shufflevector <2 x double> [[TMP9]], <2 x double> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[TMP11:%.*]] = fcmp ogt <4 x double> [[TMP10]], [[TMP4]]			; CHECK-NEXT: [[TMP11:%.*]] = shufflevector <4 x double> <double poison, double poison, double undef, double undef>, <4 x double> [[TMP10]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: [[TMP12:%.*]] = fptrunc <4 x double> [[TMP10]] to <4 x float>			; CHECK-NEXT: [[TMP12:%.*]] = fcmp ogt <4 x double> [[TMP11]], [[TMP4]]
	; CHECK-NEXT: [[TMP13:%.*]] = select <4 x i1> [[TMP11]], <4 x float> [[TMP2]], <4 x float> [[TMP12]]			; CHECK-NEXT: [[TMP13:%.*]] = fptrunc <4 x double> [[TMP10]] to <4 x float>
				; CHECK-NEXT: [[TMP14:%.*]] = select <4 x i1> [[TMP12]], <4 x float> [[TMP2]], <4 x float> [[TMP13]]
	; CHECK-NEXT: br label [[BB3]]			; CHECK-NEXT: br label [[BB3]]
	; CHECK: bb3:			; CHECK: bb3:
	; CHECK-NEXT: [[TMP14]] = phi <4 x float> [ [[TMP13]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]			; CHECK-NEXT: [[TMP15]] = phi <4 x float> [ [[TMP14]], [[BB4]] ], [ [[TMP2]], [[BB2]] ]
	; CHECK-NEXT: br label [[BB2]]			; CHECK-NEXT: br label [[BB2]]
	;			;
	entry:			entry:
	%conv = uitofp i16 undef to float			%conv = uitofp i16 undef to float
	%sub = fsub float 6.553500e+04, undef			%sub = fsub float 6.553500e+04, undef
	br label %bb1			br label %bb1

	bb1:			bb1:
	Show All 39 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Fix undef handling in gather function.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 457251

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/AArch64/tsc-s116.ll

llvm/test/Transforms/SLPVectorizer/AArch64/vectorize-free-extracts-inserts.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/PR35865.ll

llvm/test/Transforms/SLPVectorizer/X86/alternate-cmp-swapped-pred.ll

llvm/test/Transforms/SLPVectorizer/X86/broadcast_long.ll

llvm/test/Transforms/SLPVectorizer/X86/buildvector-same-lane-insert.ll

llvm/test/Transforms/SLPVectorizer/X86/cmp-as-alternate-ops.ll

llvm/test/Transforms/SLPVectorizer/X86/commutativity.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_7zip.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_lencod.ll

llvm/test/Transforms/SLPVectorizer/X86/crash_scheduling.ll

llvm/test/Transforms/SLPVectorizer/X86/extract-scalar-from-undef.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

llvm/test/Transforms/SLPVectorizer/X86/insert-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/jumbled-load-multiuse.ll

llvm/test/Transforms/SLPVectorizer/X86/landing_pad.ll

llvm/test/Transforms/SLPVectorizer/X86/load-partial-vector-shuffle.ll

llvm/test/Transforms/SLPVectorizer/X86/matched-shuffled-entries.ll

llvm/test/Transforms/SLPVectorizer/X86/partail.ll

llvm/test/Transforms/SLPVectorizer/X86/phi-undef-input.ll

llvm/test/Transforms/SLPVectorizer/X86/remark_extract_broadcast.ll

llvm/test/Transforms/SLPVectorizer/X86/reused-undefs.ll

llvm/test/Transforms/SLPVectorizer/X86/vectorize-widest-phis.ll

[SLP]Fix undef handling in gather function.
Needs ReviewPublic