This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/10
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/
-
Transforms/
-
SLPVectorizer/
-
AArch64/
1/2
invalid_type.ll
-
X86/
-
insert-element-build-vector-inseltpoison.ll
-
insert-element-build-vector.ll
-
vectorize-pair-path.ll

Differential D124309

[SLP] Steer for the best chance in tryToVectorize() when rooting with binary ops.
ClosedPublic

Authored by vdmitrie on Apr 22 2022, 3:48 PM.

Download Raw Diff

Details

Reviewers

vporpo
RKSimon
ABataev
spatel

Commits

rG88b9e46fb54c: [SLP] Steer for the best chance in tryToVectorize() when rooting with binary…

Summary

tryToVectorize() method implements one of searching paths for vectorizable tree roots in SLP vectorizer,
specifically for binary and comparison operations. Order of making probes for various scalar pairs
was defined by its implementation: the instruction operands, then climb over one operand if
the instruction is its sole user and then perform same actions for another operand if previous
attempts failed. Problem with this approach is that among these options we can have more than a
single vectorizable tree candidate and it is not necessarily the one which encountered first.
Trying to build vectorizable tree for each possible combination for just evaluation is expensive.
But we already have lookahead heuristics mechanism which we use for finding best pick among
operands of commutative instructions. It calculates cumulative score for candidates in two
consecutive lanes. This patch introduces use of the heuristics for choosing the best pair among
several combinations. We only try one that looks as most promising for vectorization.
Additional benefit is that we reduce total number of vectorization trees built for probes
because we skip those looking non-profitable early.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vdmitrie created this revision.Apr 22 2022, 3:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 3:48 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

vdmitrie requested review of this revision.Apr 22 2022, 3:48 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 3:48 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Hi Valery, thanks for the patch. This looks good! Could you split it into 2 patches: one that renames lookahead heuristics (NFC) and the second (functional) one?

llvm/test/Transforms/SLPVectorizer/AArch64/invalid_type.ll
7	Do you need to update this test?

In D124309#3469277, @ABataev wrote:

Hi Valery, thanks for the patch. This looks good! Could you split it into 2 patches: one that renames lookahead heuristics (NFC) and the second (functional) one?

Hi Alexey,
yeah, will do that. I actually have it split locally exactly like you suggested.

vdmitrie added inline comments.Apr 22 2022, 4:30 PM

llvm/test/Transforms/SLPVectorizer/AArch64/invalid_type.ll
7	I added check for vector types at line 9176 so we now are not reaching emission of the remark.

Harbormaster completed remote builds in B160980: Diff 424642.Apr 22 2022, 4:34 PM

vporpo added inline comments.Apr 22 2022, 4:38 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1046–1047	nit: It also holds the operands of a VL, so it is probably best to mention this here too.
2016	nit: perhaps `findBestRootPair` ?
2018	We could also have a separate max-depth limit for this, because I guess it will not ran as frequently as the other one, so we could have a higher depth if required.
9227	nit: I think we can drop `.hasValue()`, `if (!BestCandidate)` should work fine.

vdmitrie added inline comments.Apr 22 2022, 5:13 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1046–1047	Hm, I'm not sure I understand what you mean here. You might be mislead by the way diff is shown here. This is not renaming of existing VLOperands. This new helper class formed basically by couple routines: getShallowScore and getScoreAtLevelRec which were pulled out of VLOperands along with score constants. The class does not store anything from VL. It only needs total number of lanes for scoring. I even changed both methods to be const.
2016	Thanks for suggestion. Will apply it with next rebase.
2018	Will do. Any suggestion about option name?
9227	sure

vdmitrie mentioned this in D124313: [SLP][NFC] Outline lookahead heuristics into a separate helper class..Apr 22 2022, 5:19 PM

vporpo added inline comments.Apr 22 2022, 5:22 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1046–1047	Oops, yeah you are right, I got confused by the diff, sorry about that. Nice, thanks for refactoring it, it looks much better this way :D
2018	Hmm if we are sticking to using `root` to describe this, perhaps `RootLookAheadMaxDpeth`?

https://reviews.llvm.org/D124313 is the NFC split.

vdmitrie mentioned this in rGedf7bed87b77: [SLP][NFC] Outline lookahead heuristics into a separate helper class..Apr 22 2022, 7:00 PM

rebased + applied suggestions

Harbormaster completed remote builds in B161006: Diff 424685.Apr 22 2022, 8:29 PM

This revision is now accepted and ready to land.Apr 25 2022, 4:17 AM

This revision was landed with ongoing or failed builds.Apr 25 2022, 12:26 PM

Closed by commit rG88b9e46fb54c: [SLP] Steer for the best chance in tryToVectorize() when rooting with binary… (authored by vdmitrie). · Explain Why

This revision was automatically updated to reflect the committed changes.

vdmitrie added a commit: rG88b9e46fb54c: [SLP] Steer for the best chance in tryToVectorize() when rooting with binary….

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

73 lines

test/

Transforms/

SLPVectorizer/

AArch64/

invalid_type.ll

9 lines

X86/

insert-element-build-vector-inseltpoison.ll

2 lines

insert-element-build-vector.ll

2 lines

vectorize-pair-path.ll

31 lines

Diff 424997

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> MinTreeSize(
cl::desc("Only vectorize small trees if they are fully vectorizable"));		cl::desc("Only vectorize small trees if they are fully vectorizable"));

// The maximum depth that the look-ahead score heuristic will explore.		// The maximum depth that the look-ahead score heuristic will explore.
// The higher this value, the higher the compilation time overhead.		// The higher this value, the higher the compilation time overhead.
static cl::opt<int> LookAheadMaxDepth(		static cl::opt<int> LookAheadMaxDepth(
"slp-max-look-ahead-depth", cl::init(2), cl::Hidden,		"slp-max-look-ahead-depth", cl::init(2), cl::Hidden,
cl::desc("The maximum look-ahead depth for operand reordering scores"));		cl::desc("The maximum look-ahead depth for operand reordering scores"));

		// The maximum depth that the look-ahead score heuristic will explore
		// when it probing among candidates for vectorization tree roots.
		// The higher this value, the higher the compilation time overhead but unlike
		// similar limit for operands ordering this is less frequently used, hence
		// impact of higher value is less noticeable.
		static cl::opt<int> RootLookAheadMaxDepth(
		"slp-max-root-look-ahead-depth", cl::init(2), cl::Hidden,
		cl::desc("The maximum look-ahead depth for searching best rooting option"));

static cl::opt<bool>		static cl::opt<bool>
ViewSLPTree("view-slp-tree", cl::Hidden,		ViewSLPTree("view-slp-tree", cl::Hidden,
cl::desc("Display the SLP trees with Graphviz"));		cl::desc("Display the SLP trees with Graphviz"));

// Limit the number of alias checks. The limit is chosen so that		// Limit the number of alias checks. The limit is chosen so that
// it has no negative effect on the llvm benchmarks.		// it has no negative effect on the llvm benchmarks.
static const unsigned AliasedCheckLimit = 10;		static const unsigned AliasedCheckLimit = 10;

▲ Show 20 Lines • Show All 854 Lines • ▼ Show 20 Lines	#ifndef NDEBUG
void dump(raw_ostream &OS) const {		void dump(raw_ostream &OS) const {
OS << "{User:" << (UserTE ? std::to_string(UserTE->Idx) : "null")		OS << "{User:" << (UserTE ? std::to_string(UserTE->Idx) : "null")
<< " EdgeIdx:" << EdgeIdx << "}";		<< " EdgeIdx:" << EdgeIdx << "}";
}		}
LLVM_DUMP_METHOD void dump() const { dump(dbgs()); }		LLVM_DUMP_METHOD void dump() const { dump(dbgs()); }
#endif		#endif
};		};

/// A helper class used for scoring candidates for two consecutive lanes.		/// A helper class used for scoring candidates for two consecutive lanes.
class LookAheadHeuristics {		class LookAheadHeuristics {
		vporpoUnsubmitted Not Done Reply Inline Actions nit: It also holds the operands of a VL, so it is probably best to mention this here too. vporpo: nit: It also holds the operands of a VL, so it is probably best to mention this here too.
		vdmitrieAuthorUnsubmitted Done Reply Inline Actions Hm, I'm not sure I understand what you mean here. You might be mislead by the way diff is shown here. This is not renaming of existing VLOperands. This new helper class formed basically by couple routines: getShallowScore and getScoreAtLevelRec which were pulled out of VLOperands along with score constants. The class does not store anything from VL. It only needs total number of lanes for scoring. I even changed both methods to be const. vdmitrie: Hm, I'm not sure I understand what you mean here. You might be mislead by the way diff is shown…
		vporpoUnsubmitted Not Done Reply Inline Actions Oops, yeah you are right, I got confused by the diff, sorry about that. Nice, thanks for refactoring it, it looks much better this way :D vporpo: Oops, yeah you are right, I got confused by the diff, sorry about that. Nice, thanks for…
const DataLayout &DL;		const DataLayout &DL;
ScalarEvolution &SE;		ScalarEvolution &SE;
const BoUpSLP &R;		const BoUpSLP &R;
int NumLanes; // Total number of lanes (aka vectorization factor).		int NumLanes; // Total number of lanes (aka vectorization factor).
int MaxLevel; // The maximum recursion depth for accumulating score.		int MaxLevel; // The maximum recursion depth for accumulating score.

public:		public:
LookAheadHeuristics(const DataLayout &DL, ScalarEvolution &SE,		LookAheadHeuristics(const DataLayout &DL, ScalarEvolution &SE,
▲ Show 20 Lines • Show All 948 Lines • ▼ Show 20 Lines	LLVM_DUMP_METHOD raw_ostream &print(raw_ostream &OS) const {
return OS;		return OS;
}		}

/// Debug print.		/// Debug print.
LLVM_DUMP_METHOD void dump() const { print(dbgs()); }		LLVM_DUMP_METHOD void dump() const { print(dbgs()); }
#endif		#endif
};		};

		/// Evaluate each pair in \p Candidates and return index into \p Candidates
		/// for a pair which have highest score deemed to have best chance to form
		/// root of profitable tree to vectorize. Return None if no candidate scored
		/// above the LookAheadHeuristics::ScoreFail.
		Optional<int>
		vporpoUnsubmitted Not Done Reply Inline Actions nit: perhaps `findBestRootPair` ? vporpo: nit: perhaps `findBestRootPair` ?
		vdmitrieAuthorUnsubmitted Done Reply Inline Actions Thanks for suggestion. Will apply it with next rebase. vdmitrie: Thanks for suggestion. Will apply it with next rebase.
		findBestRootPair(ArrayRef<std::pair<Value , Value >> Candidates) {
		LookAheadHeuristics LookAhead(DL, SE, this, /NumLanes=*/2,
		vporpoUnsubmitted Not Done Reply Inline Actions We could also have a separate max-depth limit for this, because I guess it will not ran as frequently as the other one, so we could have a higher depth if required. vporpo: We could also have a separate max-depth limit for this, because I guess it will not ran as…
		vdmitrieAuthorUnsubmitted Done Reply Inline Actions Will do. Any suggestion about option name? vdmitrie: Will do. Any suggestion about option name?
		vporpoUnsubmitted Not Done Reply Inline Actions Hmm if we are sticking to using `root` to describe this, perhaps `RootLookAheadMaxDpeth`? vporpo: Hmm if we are sticking to using `root` to describe this, perhaps `RootLookAheadMaxDpeth`?
		RootLookAheadMaxDepth);
		int BestScore = LookAheadHeuristics::ScoreFail;
		Optional<int> Index = None;
		for (int I : seq<int>(0, Candidates.size())) {
		int Score = LookAhead.getScoreAtLevelRec(Candidates[I].first,
		Candidates[I].second,
		/U1=/nullptr, /U2=/nullptr,
		/Level=/1, None);
		if (Score > BestScore) {
		BestScore = Score;
		Index = I;
		}
		}
		return Index;
		}

/// Checks if the instruction is marked for deletion.		/// Checks if the instruction is marked for deletion.
bool isDeleted(Instruction *I) const { return DeletedInstructions.count(I); }		bool isDeleted(Instruction *I) const { return DeletedInstructions.count(I); }

/// Removes an instruction from its block and eventually deletes it.		/// Removes an instruction from its block and eventually deletes it.
/// It's like Instruction::eraseFromParent() except that the actual deletion		/// It's like Instruction::eraseFromParent() except that the actual deletion
/// is delayed until BoUpSLP is destructed.		/// is delayed until BoUpSLP is destructed.
void eraseInstruction(Instruction *I) {		void eraseInstruction(Instruction *I) {
DeletedInstructions.insert(I);		DeletedInstructions.insert(I);
▲ Show 20 Lines • Show All 7,134 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::tryToVectorizeList(ArrayRef<Value *> VL, BoUpSLP &R,
}		}
return Changed;		return Changed;
}		}

bool SLPVectorizerPass::tryToVectorize(Instruction *I, BoUpSLP &R) {		bool SLPVectorizerPass::tryToVectorize(Instruction *I, BoUpSLP &R) {
if (!I)		if (!I)
return false;		return false;

if (!isa<BinaryOperator>(I) && !isa<CmpInst>(I))		if ((!isa<BinaryOperator>(I) && !isa<CmpInst>(I)) \|\|
		isa<VectorType>(I->getType()))
return false;		return false;

Value *P = I->getParent();		Value *P = I->getParent();

// Vectorize in current basic block only.		// Vectorize in current basic block only.
auto *Op0 = dyn_cast<Instruction>(I->getOperand(0));		auto *Op0 = dyn_cast<Instruction>(I->getOperand(0));
auto *Op1 = dyn_cast<Instruction>(I->getOperand(1));		auto *Op1 = dyn_cast<Instruction>(I->getOperand(1));
if (!Op0 \|\| !Op1 \|\| Op0->getParent() != P \|\| Op1->getParent() != P)		if (!Op0 \|\| !Op1 \|\| Op0->getParent() != P \|\| Op1->getParent() != P)
return false;		return false;

// Try to vectorize V.		// First collect all possible candidates
if (tryToVectorizePair(Op0, Op1, R))		SmallVector<std::pair<Value , Value >, 4> Candidates;
return true;		Candidates.emplace_back(Op0, Op1);

auto *A = dyn_cast<BinaryOperator>(Op0);		auto *A = dyn_cast<BinaryOperator>(Op0);
auto *B = dyn_cast<BinaryOperator>(Op1);		auto *B = dyn_cast<BinaryOperator>(Op1);
// Try to skip B.		// Try to skip B.
if (B && B->hasOneUse()) {		if (A && B && B->hasOneUse()) {
auto *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));		auto *B0 = dyn_cast<BinaryOperator>(B->getOperand(0));
auto *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));		auto *B1 = dyn_cast<BinaryOperator>(B->getOperand(1));
if (B0 && B0->getParent() == P && tryToVectorizePair(A, B0, R))		if (B0 && B0->getParent() == P)
return true;		Candidates.emplace_back(A, B0);
if (B1 && B1->getParent() == P && tryToVectorizePair(A, B1, R))		if (B1 && B1->getParent() == P)
return true;		Candidates.emplace_back(A, B1);
}		}

// Try to skip A.		// Try to skip A.
if (A && A->hasOneUse()) {		if (B && A && A->hasOneUse()) {
auto *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));		auto *A0 = dyn_cast<BinaryOperator>(A->getOperand(0));
auto *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));		auto *A1 = dyn_cast<BinaryOperator>(A->getOperand(1));
if (A0 && A0->getParent() == P && tryToVectorizePair(A0, B, R))		if (A0 && A0->getParent() == P)
return true;		Candidates.emplace_back(A0, B);
if (A1 && A1->getParent() == P && tryToVectorizePair(A1, B, R))		if (A1 && A1->getParent() == P)
return true;		Candidates.emplace_back(A1, B);
}		}

		if (Candidates.size() == 1)
		return tryToVectorizePair(Op0, Op1, R);

		// We have multiple options. Try to pick the single best.
		Optional<int> BestCandidate = R.findBestRootPair(Candidates);
		if (!BestCandidate)
		vporpoUnsubmitted Not Done Reply Inline Actions nit: I think we can drop `.hasValue()`, `if (!BestCandidate)` should work fine. vporpo: nit: I think we can drop `.hasValue()`, `if (!BestCandidate)` should work fine.
		vdmitrieAuthorUnsubmitted Done Reply Inline Actions sure vdmitrie: sure
return false;		return false;
		return tryToVectorizePair(Candidates[*BestCandidate].first,
		Candidates[*BestCandidate].second, R);
}		}

namespace {		namespace {

/// Model horizontal reductions.		/// Model horizontal reductions.
///		///
/// A horizontal reduction is a tree of reduction instructions that has values		/// A horizontal reduction is a tree of reduction instructions that has values
/// that can be put into a vector as its leaves. For example:		/// that can be put into a vector as its leaves. For example:
▲ Show 20 Lines • Show All 1,744 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/AArch64/invalid_type.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -pass-remarks-missed=slp-vectorizer 2>&1 \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -pass-remarks-missed=slp-vectorizer 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
	target triple = "aarch64-unknown-linux-gnu"			target triple = "aarch64-unknown-linux-gnu"

	; This test check that slp vectorizer is not trying to vectorize instructions already vectorized.			; This test check that slp vectorizer is not trying to vectorize instructions already vectorized.
	; CHECK: remark: <unknown>:0:0: Cannot SLP vectorize list: type <16 x i8> is unsupported by vectorizer
	ABataevUnsubmitted Not Done Reply Inline Actions Do you need to update this test? ABataev: Do you need to update this test?
	vdmitrieAuthorUnsubmitted Done Reply Inline Actions I added check for vector types at line 9176 so we now are not reaching emission of the remark. vdmitrie: I added check for vector types at line 9176 so we now are not reaching emission of the remark.

	define void @vector() {			define void @vector() {
				; CHECK-LABEL: @vector(
				; CHECK-NEXT: [[LOAD0:%.]] = tail call <16 x i8> @vector.load(<16 x i8> undef, i32 1)
				; CHECK-NEXT: [[LOAD1:%.]] = tail call <16 x i8> @vector.load(<16 x i8> undef, i32 2)
				; CHECK-NEXT: [[ADD:%.*]] = add <16 x i8> [[LOAD1]], [[LOAD0]]
				; CHECK-NEXT: tail call void @vector.store(<16 x i8> [[ADD]], <16 x i8>* undef, i32 1)
				; CHECK-NEXT: ret void
				;
	%load0 = tail call <16 x i8> @vector.load(<16 x i8> *undef, i32 1)			%load0 = tail call <16 x i8> @vector.load(<16 x i8> *undef, i32 1)
	%load1 = tail call <16 x i8> @vector.load(<16 x i8> *undef, i32 2)			%load1 = tail call <16 x i8> @vector.load(<16 x i8> *undef, i32 2)
	%add = add <16 x i8> %load1, %load0			%add = add <16 x i8> %load1, %load0
	tail call void @vector.store(<16 x i8> %add, <16 x i8>* undef, i32 1)			tail call void @vector.store(<16 x i8> %add, <16 x i8>* undef, i32 1)
	ret void			ret void
	}			}

	declare <16 x i8> @vector.load(<16 x i8>*, i32)			declare <16 x i8> @vector.load(<16 x i8>*, i32)
	declare void @vector.store(<16 x i8>, <16 x i8>*, i32)			declare void @vector.store(<16 x i8>, <16 x i8>*, i32)

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector-inseltpoison.ll

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3			; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0			; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1			; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1
	; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]			; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
	; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]			; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0			; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1			; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]			; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]			; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
	; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])			; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
	; MINTREESIZE-NEXT: ret <4 x float> undef			; MINTREESIZE-NEXT: ret <4 x float> undef
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%c2 = extractelement <4 x i32> %c, i32 2			%c2 = extractelement <4 x i32> %c, i32 2
	%c3 = extractelement <4 x i32> %c, i32 3			%c3 = extractelement <4 x i32> %c, i32 3
	▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll

	Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
	; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3			; MINTREESIZE-NEXT: [[Q3:%.*]] = extractelement <4 x float> [[RD]], i32 3
	; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0			; MINTREESIZE-NEXT: [[TMP7:%.*]] = insertelement <2 x float> poison, float [[Q2]], i32 0
	; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1			; MINTREESIZE-NEXT: [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[Q3]], i32 1
	; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]			; MINTREESIZE-NEXT: [[Q4:%.*]] = fadd float [[Q0]], [[Q1]]
	; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]			; MINTREESIZE-NEXT: [[Q5:%.*]] = fadd float [[Q2]], [[Q3]]
	; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0			; MINTREESIZE-NEXT: [[TMP9:%.*]] = insertelement <2 x float> poison, float [[Q4]], i32 0
	; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1			; MINTREESIZE-NEXT: [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]			; MINTREESIZE-NEXT: [[Q6:%.*]] = fadd float [[Q4]], [[Q5]]
	; MINTREESIZE-NEXT: [[TMP11:%.*]] = insertelement <2 x float> poison, float [[Q6]], i32 0
	; MINTREESIZE-NEXT: [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[Q5]], i32 1
	; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]			; MINTREESIZE-NEXT: [[QI:%.*]] = fcmp olt float [[Q6]], [[Q5]]
	; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])			; MINTREESIZE-NEXT: call void @llvm.assume(i1 [[QI]])
	; MINTREESIZE-NEXT: ret <4 x float> undef			; MINTREESIZE-NEXT: ret <4 x float> undef
	;			;
	%c0 = extractelement <4 x i32> %c, i32 0			%c0 = extractelement <4 x i32> %c, i32 0
	%c1 = extractelement <4 x i32> %c, i32 1			%c1 = extractelement <4 x i32> %c, i32 1
	%c2 = extractelement <4 x i32> %c, i32 2			%c2 = extractelement <4 x i32> %c, i32 2
	%c3 = extractelement <4 x i32> %c, i32 3			%c3 = extractelement <4 x i32> %c, i32 3
	▲ Show 20 Lines • Show All 469 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-pair-path.ll

	Show All 11 Lines
	; encountered first (like here).			; encountered first (like here).

	define double @root_selection(double %a, double %b, double %c, double %d) local_unnamed_addr #0 {			define double @root_selection(double %a, double %b, double %c, double %d) local_unnamed_addr #0 {
	; CHECK-LABEL: @root_selection(			; CHECK-LABEL: @root_selection(
	; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[A:%.]], i32 0			; CHECK-NEXT: [[TMP1:%.]] = insertelement <2 x double> poison, double [[A:%.]], i32 0
	; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> [[TMP1]], double [[B:%.]], i32 1			; CHECK-NEXT: [[TMP2:%.]] = insertelement <2 x double> [[TMP1]], double [[B:%.]], i32 1
	; CHECK-NEXT: [[TMP3:%.*]] = fdiv fast <2 x double> [[TMP2]], <double 7.000000e+00, double 5.000000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = fdiv fast <2 x double> [[TMP2]], <double 7.000000e+00, double 5.000000e+00>
	; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 1			; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x double> [[TMP3]], i32 1
	; CHECK-NEXT: [[I11:%.*]] = fmul fast double [[TMP4]], undef			; CHECK-NEXT: [[I09:%.*]] = fmul fast double [[TMP4]], undef
				; CHECK-NEXT: [[I10:%.*]] = fsub fast double undef, [[I09]]
	; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <2 x double> [[TMP3]], <double 3.000000e+00, double undef>			; CHECK-NEXT: [[TMP5:%.*]] = fmul fast <2 x double> [[TMP3]], <double 3.000000e+00, double undef>
	; CHECK-NEXT: [[TMP6:%.*]] = fmul fast <2 x double> undef, [[TMP5]]			; CHECK-NEXT: [[TMP6:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[I10]], i32 1
	; CHECK-NEXT: [[TMP7:%.*]] = fsub fast <2 x double> undef, [[TMP5]]			; CHECK-NEXT: [[TMP7:%.*]] = fmul fast <2 x double> [[TMP6]], [[TMP5]]
	; CHECK-NEXT: [[TMP8:%.*]] = shufflevector <2 x double> [[TMP6]], <2 x double> [[TMP7]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[TMP8:%.*]] = fsub fast <2 x double> [[TMP7]], <double undef, double 1.100000e+01>
	; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x double> <double undef, double poison>, double [[I11]], i32 1			; CHECK-NEXT: [[TMP9:%.*]] = fmul fast <2 x double> [[TMP8]], <double 4.000000e+00, double 1.200000e+01>
	; CHECK-NEXT: [[TMP10:%.*]] = fsub fast <2 x double> [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[TMP10:%.*]] = fdiv fast <2 x double> [[TMP9]], <double 1.400000e+00, double 1.400000e+00>
	; CHECK-NEXT: [[TMP11:%.*]] = fmul fast <2 x double> [[TMP8]], [[TMP9]]			; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x double> [[TMP10]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = shufflevector <2 x double> [[TMP10]], <2 x double> [[TMP11]], <2 x i32> <i32 0, i32 3>			; CHECK-NEXT: [[I07:%.*]] = fadd fast double undef, [[TMP11]]
	; CHECK-NEXT: [[TMP13:%.*]] = fmul fast <2 x double> [[TMP12]], <double 4.000000e+00, double 1.100000e+01>			; CHECK-NEXT: [[TMP12:%.*]] = extractelement <2 x double> [[TMP10]], i32 1
	; CHECK-NEXT: [[TMP14:%.*]] = fsub fast <2 x double> [[TMP12]], <double 4.000000e+00, double 1.100000e+01>			; CHECK-NEXT: [[I16:%.*]] = fadd fast double [[I07]], [[TMP12]]
	; CHECK-NEXT: [[TMP15:%.*]] = shufflevector <2 x double> [[TMP13]], <2 x double> [[TMP14]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP16:%.*]] = fdiv fast <2 x double> [[TMP15]], <double 1.400000e+00, double 1.200000e+01>
	; CHECK-NEXT: [[TMP17:%.*]] = fmul fast <2 x double> [[TMP15]], <double 1.400000e+00, double 1.200000e+01>
	; CHECK-NEXT: [[TMP18:%.*]] = shufflevector <2 x double> [[TMP16]], <2 x double> [[TMP17]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP19:%.*]] = fadd fast <2 x double> [[TMP18]], <double undef, double 1.400000e+00>
	; CHECK-NEXT: [[TMP20:%.*]] = fdiv fast <2 x double> [[TMP18]], <double undef, double 1.400000e+00>
	; CHECK-NEXT: [[TMP21:%.*]] = shufflevector <2 x double> [[TMP19]], <2 x double> [[TMP20]], <2 x i32> <i32 0, i32 3>
	; CHECK-NEXT: [[TMP22:%.*]] = extractelement <2 x double> [[TMP21]], i32 0
	; CHECK-NEXT: [[TMP23:%.*]] = extractelement <2 x double> [[TMP21]], i32 1
	; CHECK-NEXT: [[I16:%.*]] = fadd fast double [[TMP22]], [[TMP23]]
	; CHECK-NEXT: [[I17:%.]] = fadd fast double [[I16]], [[C:%.]]			; CHECK-NEXT: [[I17:%.]] = fadd fast double [[I16]], [[C:%.]]
	; CHECK-NEXT: [[I18:%.]] = fadd fast double [[I17]], [[D:%.]]			; CHECK-NEXT: [[I18:%.]] = fadd fast double [[I17]], [[D:%.]]
	; CHECK-NEXT: ret double [[I18]]			; CHECK-NEXT: ret double [[I18]]
	;			;
	%i01 = fdiv fast double %a, 7.0			%i01 = fdiv fast double %a, 7.0
	%i02 = fmul fast double %i01, 3.0			%i02 = fmul fast double %i01, 3.0
	%i03 = fmul fast double undef, %i02			%i03 = fmul fast double undef, %i02
	%i04 = fsub fast double %i03, undef			%i04 = fsub fast double %i03, undef
	Show All 18 Lines