This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2/27
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
-
vectorize-pair-path.ll

Differential D125287

[SLP] Improve root steering by building actual trees instead of calling the look-ahead heuristic
Needs ReviewPublic

Authored by vporpo on May 9 2022, 8:15 PM.

Download Raw Diff

Details

Reviewers

vdmitrie
ABataev
RKSimon
dmgreen

Summary

Finding the best roots using the lookahead heuristic is not as accurate as
building short trees and comparing their cost.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vporpo created this revision.May 9 2022, 8:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 8:15 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

vporpo requested review of this revision.May 9 2022, 8:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 8:15 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B163623: Diff 428277.May 9 2022, 8:16 PM

This fixes a regression in SingleSource/Benchmarks/Misc/flops-5.c. Increasing the RootLookaheadMaxDepth doesn't fix the issue either. Building small trees instead of calling the lookahead heuristic seems to be more accurate in this case.

Updated checks in tests.

Harbormaster completed remote builds in B163625: Diff 428280.May 9 2022, 8:43 PM

ABataev added inline comments.May 10 2022, 4:33 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	I'm afraid of increasing compile time. All this stuff includes scheduling, which may take lots of time for large basic blocks.

vporpo added inline comments.May 10 2022, 8:42 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	What if we set a flag to disable scheduling for these types of fast tree estimations?

vdmitrie added inline comments.May 10 2022, 8:45 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	Yep, I agree. this will be more expensive for compile time. What about combining both worlds? I mean first try to use lookahead heuristics to get the single best. And if we can't narrow down to just one pair only then switch into probing via building trees. I believe it will not happen too frequently. We can also increase lookahead depth to make it even less frequent when we need to build vectorizable tree.

ABataev added inline comments.May 10 2022, 8:55 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	There is a problem with this fix that it tries to avoid/mask the problem, not fix it. The fact that LookAhead.getScoreAtLevelRec does not work here means that we're doing something wrong there or missing something. Would be good to try to improve LookAhead.getScoreAtLevelRec

vdmitrie added inline comments.May 10 2022, 9:03 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	This problem does actually not have a perfect solution. This is a heuristics and it will always have something missed. You can improve it to fix one particular case but there will be eventually another instance of the same problem.

ABataev added inline comments.May 10 2022, 9:08 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	Yes, sure. But if the heuristic misses something, better to try tweak the heuristic rather than using actual cost/vectorization attempt and just ignore the heuristic, which exists exactly for this purpose.

vdmitrie added inline comments.May 10 2022, 9:22 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	No, " just ignore" is not what I said. We definitely should use the heuristics. But when it happens that we came to its limits then we could use more fine grained tools.

ABataev added inline comments.May 10 2022, 9:33 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	I rather doubt that building a graph can be called a "fine grained tool". This is different tool, not intended for the analysis. We can extract some functionality out of there (to a separate function/member function) and make the heuristic more smart, but not use the build graph directly. Same problem with the heuristic may exist in some other places, we need to handle them too.

vdmitrie added inline comments.May 10 2022, 10:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	Cost modeling is the tool I referred to as a "fine grained tool". We have to build a graph to run it. So it's sort of necessary evil. In this sense trying to turn off scheduler for the purpose of using CM as finer grained heuristics does not sound like a crazy idea.

ABataev added inline comments.May 10 2022, 10:19 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	That's what I don't agree here. Cost model is not the tool for the modelling. For modelling we have heuristic. If it is not good, need to tweak it.

vporpo added inline comments.May 10 2022, 10:33 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	One issue with the lookahead search is that it is trying both sides of commutative operations, so this doesn't scale if we need to increase the depth. So we need a different tool for testing deeper trees. I agree that there may be something that the lookahead heuristic is missing here, but I would argue that it is the wrong tool for the job. The buildTree() logic is a much more accurate for this. Reusing the existing buildTree logic with some compromises (e.g, limiting size and disabling scheduling) seems like a good compromise to me.

ABataev added inline comments.May 10 2022, 10:37 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	I would oppose that. I would not use buildTree() for estimation. If there is a part, which can be used for better estimation, better to extract it to a separate function/class and the reuse it in the heuristic and actual graph building separately.

vporpo added inline comments.May 10 2022, 10:40 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	What is the reasoning for opposing it?

vdmitrie added inline comments.May 10 2022, 10:45 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	That's what I don't agree here. Cost model is not the tool for the modelling. For modelling we have heuristic. If it is not good, need to tweak it. It would be nice if you explained why you are against using CM for selecting a candidate. Cost model as its name suggests is supposed to be used for modeling.

ABataev added inline comments.May 10 2022, 10:46 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	Bad design decision. We have 4 stages. Analysis. Tree building. Cost estimation. Codegen. You want to make a circular dependence between Analysis and Tree building/Cost estimation. But I'm not against reusing some of the code from buildTree()/cost estimation for the analysis phase. I'm just saying that this functionality must be extracted and then reused for the analysis and for the tree building/cost estimation (if possible, to reduce maintenance burden).

ABataev added inline comments.May 10 2022, 10:47 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	Because it is not modelling, it is analysis
2028–2042	I mean, you want to use it not for modelling but for the analysis

vporpo added inline comments.May 10 2022, 10:56 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	I think the high level design that you showed is not very accurate. We are actually doing multiple "tree builds" and "cost estimations" before generating code even in the current design. I don't see the "circular dependency" issue being introduced by this.

vdmitrie added inline comments.May 10 2022, 10:57 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	Distinction between the two is moot.

ABataev added inline comments.May 10 2022, 11:01 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	Thta's not because we mix analysis/modelling/estimation, but just because cost estimation shows that the tree is not profitable. The question is not about number of attempts, it is about the design.
2028–2042	Weak argument.

vdmitrie added inline comments.May 10 2022, 12:21 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	What about alternative solution which is kind of step back but buildTree+CM is not used for analysis? if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable.

vporpo added inline comments.May 10 2022, 1:02 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	I don't see any strong argument against using buildTree+CostModel as long as buildTree is fast enough. The argument that this is somehow changing the design makes little sense because the pass is already following this buildTree+CostModel design. The only exception is perhaps for the lookahead search which is actually an example of a design to avoid: it is using its own custom tree-building and cost modeling, and requires special maintenance. Also the argument that we should extract some of the functionality and place it in a separate component is not very strong. Replicating similar functionality in multiple places is something that a good design should avoid. It just increases the maintenance overhead and will inevitably lead to divergence.

ABataev added inline comments.May 10 2022, 1:14 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	What about alternative solution which is kind of step back but buildTree+CM is not used for analysis? if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable. Yes, it may work as a quick solution.

vporpo added inline comments.May 10 2022, 4:38 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
2028–2042	if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable. That won't work because lookahead finds a single best, but it turns out to be the wrong one.

Disabled the scheduler for the fast buildtree.

I checked the compile time overhead with perf on the lit test, and it is about the same as the version before @vdmitrie's patch 88b9e46fb54c.

Harbormaster completed remote builds in B163983: Diff 428774.May 11 2022, 2:04 PM

I think that providing a buildTreeFastAndGetCost() style of function is a decent solution for these types of problems, but I guess this needs more discussion. Adding @RKSimon and @dmgreen .

vporpo added reviewers: RKSimon, dmgreen.May 16 2022, 11:13 AM

vdmitrie added inline comments.May 16 2022, 11:48 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
849	(If we finally agree with taking this path) It should probably be possible to introduce simulation mode to BlockScheduling rather than guard each BS interface call.
898	this description update is seems leftover from the previous diff (i.e. not intentional)

Fixed stale comment and added DisableScheduling flag to BlockScheduling.

Harbormaster completed remote builds in B164735: Diff 429827.May 16 2022, 1:15 PM

ping

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

61 lines

test/

Transforms/

SLPVectorizer/

X86/

vectorize-pair-path.ll

17 lines

Diff 428277

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> MinTreeSize(
cl::desc("Only vectorize small trees if they are fully vectorizable"));		cl::desc("Only vectorize small trees if they are fully vectorizable"));

// The maximum depth that the look-ahead score heuristic will explore.		// The maximum depth that the look-ahead score heuristic will explore.
// The higher this value, the higher the compilation time overhead.		// The higher this value, the higher the compilation time overhead.
static cl::opt<int> LookAheadMaxDepth(		static cl::opt<int> LookAheadMaxDepth(
"slp-max-look-ahead-depth", cl::init(2), cl::Hidden,		"slp-max-look-ahead-depth", cl::init(2), cl::Hidden,
cl::desc("The maximum look-ahead depth for operand reordering scores"));		cl::desc("The maximum look-ahead depth for operand reordering scores"));

// The maximum depth that the look-ahead score heuristic will explore		// The maximum tree size that will use when probing among candidates for
// when it probing among candidates for vectorization tree roots.		// vectorization tree roots. The higher this value, the higher the compilation
// The higher this value, the higher the compilation time overhead but unlike		// time overhead but unlike similar limit for operands ordering this is less
// similar limit for operands ordering this is less frequently used, hence		// frequently used, hence impact of higher value is less noticeable.
// impact of higher value is less noticeable.		static cl::opt<unsigned> RootLookAheadMaxSize(
static cl::opt<int> RootLookAheadMaxDepth(		"slp-root-look-ahead-max-size", cl::init(5), cl::Hidden,
"slp-max-root-look-ahead-depth", cl::init(2), cl::Hidden,		cl::desc("The maximum tree size for searching best rooting option"));
cl::desc("The maximum look-ahead depth for searching best rooting option"));

static cl::opt<bool>		static cl::opt<bool>
ViewSLPTree("view-slp-tree", cl::Hidden,		ViewSLPTree("view-slp-tree", cl::Hidden,
cl::desc("Display the SLP trees with Graphviz"));		cl::desc("Display the SLP trees with Graphviz"));

// Limit the number of alias checks. The limit is chosen so that		// Limit the number of alias checks. The limit is chosen so that
// it has no negative effect on the llvm benchmarks.		// it has no negative effect on the llvm benchmarks.
static const unsigned AliasedCheckLimit = 10;		static const unsigned AliasedCheckLimit = 10;
▲ Show 20 Lines • Show All 651 Lines • ▼ Show 20 Lines
}		}

namespace slpvectorizer {		namespace slpvectorizer {

/// Bottom Up SLP Vectorizer.		/// Bottom Up SLP Vectorizer.
class BoUpSLP {		class BoUpSLP {
struct TreeEntry;		struct TreeEntry;
struct ScheduleData;		struct ScheduleData;
		/// Limit the size of the SLP tree to this many nodes.
		Optional<unsigned> MaxTreeSize;

public:		public:
using ValueList = SmallVector<Value *, 8>;		using ValueList = SmallVector<Value *, 8>;
using InstrList = SmallVector<Instruction *, 16>;		using InstrList = SmallVector<Instruction *, 16>;
using ValueSet = SmallPtrSet<Value *, 16>;		using ValueSet = SmallPtrSet<Value *, 16>;
using StoreList = SmallVector<StoreInst *, 8>;		using StoreList = SmallVector<StoreInst *, 8>;
using ExtraValueToDebugLocsMap =		using ExtraValueToDebugLocsMap =
		vdmitrieUnsubmitted Not Done Reply Inline Actions (If we finally agree with taking this path) It should probably be possible to introduce simulation mode to BlockScheduling rather than guard each BS interface call. vdmitrie: (If we finally agree with taking this path) It should probably be possible to introduce…
MapVector<Value , SmallVector<Instruction , 2>>;		MapVector<Value , SmallVector<Instruction , 2>>;
using OrdersType = SmallVector<unsigned, 4>;		using OrdersType = SmallVector<unsigned, 4>;

BoUpSLP(Function Func, ScalarEvolution Se, TargetTransformInfo *Tti,		BoUpSLP(Function Func, ScalarEvolution Se, TargetTransformInfo *Tti,
TargetLibraryInfo TLi, AAResults Aa, LoopInfo *Li,		TargetLibraryInfo TLi, AAResults Aa, LoopInfo *Li,
DominatorTree Dt, AssumptionCache AC, DemandedBits *DB,		DominatorTree Dt, AssumptionCache AC, DemandedBits *DB,
const DataLayout DL, OptimizationRemarkEmitter ORE)		const DataLayout DL, OptimizationRemarkEmitter ORE)
: BatchAA(*Aa), F(Func), SE(Se), TTI(Tti), TLI(TLi), LI(Li),		: BatchAA(*Aa), F(Func), SE(Se), TTI(Tti), TLI(TLi), LI(Li),
Show All 31 Lines	public:
/// holding live values over call sites.		/// holding live values over call sites.
InstructionCost getSpillCost() const;		InstructionCost getSpillCost() const;

/// \returns the vectorization cost of the subtree that starts at \p VL.		/// \returns the vectorization cost of the subtree that starts at \p VL.
/// A negative number means that this is profitable.		/// A negative number means that this is profitable.
InstructionCost getTreeCost(ArrayRef<Value *> VectorizedVals = None);		InstructionCost getTreeCost(ArrayRef<Value *> VectorizedVals = None);

/// Construct a vectorizable tree that starts at \p Roots, ignoring users for		/// Construct a vectorizable tree that starts at \p Roots, ignoring users for
/// the purpose of scheduling and extraction in the \p UserIgnoreLst.		/// the purpose of scheduling and extraction in the \p UserIgnoreLst. The tree
		/// will have at least \p MaxSize nodes if this argument is passed.
		vdmitrieUnsubmitted Done Reply Inline Actions this description update is seems leftover from the previous diff (i.e. not intentional) vdmitrie: this description update is seems leftover from the previous diff (i.e. not intentional)
void buildTree(ArrayRef<Value *> Roots,		void buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst = None);		ArrayRef<Value *> UserIgnoreLst = None,
		Optional<unsigned> MaxSize = None);

/// Builds external uses of the vectorized scalars, i.e. the list of		/// Builds external uses of the vectorized scalars, i.e. the list of
/// vectorized scalars to be extracted, their lanes and their scalar users. \p		/// vectorized scalars to be extracted, their lanes and their scalar users. \p
/// ExternallyUsedValues contains additional list of external uses to handle		/// ExternallyUsedValues contains additional list of external uses to handle
/// vectorization of reductions.		/// vectorization of reductions.
void		void
buildExternalUses(const ExtraValueToDebugLocsMap &ExternallyUsedValues = {});		buildExternalUses(const ExtraValueToDebugLocsMap &ExternallyUsedValues = {});

▲ Show 20 Lines • Show All 1,106 Lines • ▼ Show 20 Lines	#endif
};		};

/// Evaluate each pair in \p Candidates and return index into \p Candidates		/// Evaluate each pair in \p Candidates and return index into \p Candidates
/// for a pair which have highest score deemed to have best chance to form		/// for a pair which have highest score deemed to have best chance to form
/// root of profitable tree to vectorize. Return None if no candidate scored		/// root of profitable tree to vectorize. Return None if no candidate scored
/// above the LookAheadHeuristics::ScoreFail.		/// above the LookAheadHeuristics::ScoreFail.
Optional<int>		Optional<int>
findBestRootPair(ArrayRef<std::pair<Value , Value >> Candidates) {		findBestRootPair(ArrayRef<std::pair<Value , Value >> Candidates) {
LookAheadHeuristics LookAhead(DL, SE, this, /NumLanes=*/2,		InstructionCost BestCost = InstructionCost::getMax();
RootLookAheadMaxDepth);
int BestScore = LookAheadHeuristics::ScoreFail;
Optional<int> Index = None;		Optional<int> Index = None;

for (int I : seq<int>(0, Candidates.size())) {		for (int I : seq<int>(0, Candidates.size())) {
int Score = LookAhead.getScoreAtLevelRec(Candidates[I].first,		SmallVector<Value *, 2> Roots(
Candidates[I].second,		{Candidates[I].first, Candidates[I].second});
/U1=/nullptr, /U2=/nullptr,		buildTree(Roots, /UserIgnoreList=/None,
/Level=/1, None);		RootLookAheadMaxSize.getValue());
if (Score > BestScore) {		if (isTreeTinyAndNotFullyVectorizable())
BestScore = Score;		continue;
		reorderTopToBottom();
		reorderBottomToTop(!isa<InsertElementInst>(Roots.front()));
		buildExternalUses();

		computeMinimumValueSizes();
		InstructionCost Cost = getTreeCost();
		if (Cost < BestCost) {
		BestCost = Cost;
Index = I;		Index = I;
		ABataevUnsubmitted Not Done Reply Inline Actions I'm afraid of increasing compile time. All this stuff includes scheduling, which may take lots of time for large basic blocks. ABataev: I'm afraid of increasing compile time. All this stuff includes scheduling, which may take lots…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Yep, I agree. this will be more expensive for compile time. What about combining both worlds? I mean first try to use lookahead heuristics to get the single best. And if we can't narrow down to just one pair only then switch into probing via building trees. I believe it will not happen too frequently. We can also increase lookahead depth to make it even less frequent when we need to build vectorizable tree. vdmitrie: Yep, I agree. this will be more expensive for compile time. What about combining both worlds? I…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions What if we set a flag to disable scheduling for these types of fast tree estimations? vporpo: What if we set a flag to disable scheduling for these types of fast tree estimations?
		ABataevUnsubmitted Not Done Reply Inline Actions There is a problem with this fix that it tries to avoid/mask the problem, not fix it. The fact that LookAhead.getScoreAtLevelRec does not work here means that we're doing something wrong there or missing something. Would be good to try to improve LookAhead.getScoreAtLevelRec ABataev: There is a problem with this fix that it tries to avoid/mask the problem, not fix it. The fact…
		vdmitrieUnsubmitted Not Done Reply Inline Actions This problem does actually not have a perfect solution. This is a heuristics and it will always have something missed. You can improve it to fix one particular case but there will be eventually another instance of the same problem. vdmitrie: This problem does actually not have a perfect solution. This is a heuristics and it will always…
		ABataevUnsubmitted Not Done Reply Inline Actions Yes, sure. But if the heuristic misses something, better to try tweak the heuristic rather than using actual cost/vectorization attempt and just ignore the heuristic, which exists exactly for this purpose. ABataev: Yes, sure. But if the heuristic misses something, better to try tweak the heuristic rather than…
		vdmitrieUnsubmitted Not Done Reply Inline Actions No, " just ignore" is not what I said. We definitely should use the heuristics. But when it happens that we came to its limits then we could use more fine grained tools. vdmitrie: No, " just ignore" is not what I said. We definitely should use the heuristics. But when it…
		ABataevUnsubmitted Not Done Reply Inline Actions I rather doubt that building a graph can be called a "fine grained tool". This is different tool, not intended for the analysis. We can extract some functionality out of there (to a separate function/member function) and make the heuristic more smart, but not use the build graph directly. Same problem with the heuristic may exist in some other places, we need to handle them too. ABataev: I rather doubt that building a graph can be called a "fine grained tool". This is different…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Cost modeling is the tool I referred to as a "fine grained tool". We have to build a graph to run it. So it's sort of necessary evil. In this sense trying to turn off scheduler for the purpose of using CM as finer grained heuristics does not sound like a crazy idea. vdmitrie: Cost modeling is the tool I referred to as a "fine grained tool". We have to build a graph to…
		ABataevUnsubmitted Not Done Reply Inline Actions That's what I don't agree here. Cost model is not the tool for the modelling. For modelling we have heuristic. If it is not good, need to tweak it. ABataev: That's what I don't agree here. Cost model is not the tool for the modelling. For modelling we…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions One issue with the lookahead search is that it is trying both sides of commutative operations, so this doesn't scale if we need to increase the depth. So we need a different tool for testing deeper trees. I agree that there may be something that the lookahead heuristic is missing here, but I would argue that it is the wrong tool for the job. The buildTree() logic is a much more accurate for this. Reusing the existing buildTree logic with some compromises (e.g, limiting size and disabling scheduling) seems like a good compromise to me. vporpo: One issue with the lookahead search is that it is trying both sides of commutative operations…
		ABataevUnsubmitted Not Done Reply Inline Actions I would oppose that. I would not use buildTree() for estimation. If there is a part, which can be used for better estimation, better to extract it to a separate function/class and the reuse it in the heuristic and actual graph building separately. ABataev: I would oppose that. I would not use buildTree() for estimation. If there is a part, which can…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions What is the reasoning for opposing it? vporpo: What is the reasoning for opposing it?
		ABataevUnsubmitted Not Done Reply Inline Actions Bad design decision. We have 4 stages. Analysis. Tree building. Cost estimation. Codegen. You want to make a circular dependence between Analysis and Tree building/Cost estimation. But I'm not against reusing some of the code from buildTree()/cost estimation for the analysis phase. I'm just saying that this functionality must be extracted and then reused for the analysis and for the tree building/cost estimation (if possible, to reduce maintenance burden). ABataev: Bad design decision. We have 4 stages. 1. Analysis. 2. Tree building. 3. Cost estimation. 4.
		vdmitrieUnsubmitted Not Done Reply Inline Actions That's what I don't agree here. Cost model is not the tool for the modelling. For modelling we have heuristic. If it is not good, need to tweak it. It would be nice if you explained why you are against using CM for selecting a candidate. Cost model as its name suggests is supposed to be used for modeling. vdmitrie: > That's what I don't agree here. Cost model is not the tool for the modelling. For modelling…
		ABataevUnsubmitted Not Done Reply Inline Actions Because it is not modelling, it is analysis ABataev: Because it is not modelling, it is analysis
		ABataevUnsubmitted Not Done Reply Inline Actions I mean, you want to use it not for modelling but for the analysis ABataev: I mean, you want to use it not for modelling but for the analysis
		vporpoAuthorUnsubmitted Done Reply Inline Actions I think the high level design that you showed is not very accurate. We are actually doing multiple "tree builds" and "cost estimations" before generating code even in the current design. I don't see the "circular dependency" issue being introduced by this. vporpo: I think the high level design that you showed is not very accurate. We are actually doing…
		ABataevUnsubmitted Not Done Reply Inline Actions Thta's not because we mix analysis/modelling/estimation, but just because cost estimation shows that the tree is not profitable. The question is not about number of attempts, it is about the design. ABataev: Thta's not because we mix analysis/modelling/estimation, but just because cost estimation shows…
		vdmitrieUnsubmitted Not Done Reply Inline Actions Distinction between the two is moot. vdmitrie: Distinction between the two is moot.
		ABataevUnsubmitted Not Done Reply Inline Actions Weak argument. ABataev: Weak argument.
		vdmitrieUnsubmitted Not Done Reply Inline Actions What about alternative solution which is kind of step back but buildTree+CM is not used for analysis? if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable. vdmitrie: What about alternative solution which is kind of step back but buildTree+CM is not used for…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions I don't see any strong argument against using buildTree+CostModel as long as buildTree is fast enough. The argument that this is somehow changing the design makes little sense because the pass is already following this buildTree+CostModel design. The only exception is perhaps for the lookahead search which is actually an example of a design to avoid: it is using its own custom tree-building and cost modeling, and requires special maintenance. Also the argument that we should extract some of the functionality and place it in a separate component is not very strong. Replicating similar functionality in multiple places is something that a good design should avoid. It just increases the maintenance overhead and will inevitably lead to divergence. vporpo: I don't see any strong argument against using buildTree+CostModel as long as buildTree is fast…
		ABataevUnsubmitted Not Done Reply Inline Actions What about alternative solution which is kind of step back but buildTree+CM is not used for analysis? if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable. Yes, it may work as a quick solution. ABataev: > What about alternative solution which is kind of step back but buildTree+CM is not used for…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions if lookahead heuristics cannot find single best findBestRootPair returns all indices that give the maximum score. Caller then uses approach it used before: tries to vectorize each until it finds the first which is profitable. That won't work because lookahead finds a single best, but it turns out to be the wrong one. vporpo: > if lookahead heuristics cannot find single best findBestRootPair returns all indices that…
}		}
}		}
return Index;		return Index;
}		}

/// Checks if the instruction is marked for deletion.		/// Checks if the instruction is marked for deletion.
bool isDeleted(Instruction *I) const { return DeletedInstructions.count(I); }		bool isDeleted(Instruction *I) const { return DeletedInstructions.count(I); }

▲ Show 20 Lines • Show All 2,033 Lines • ▼ Show 20 Lines	for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
<< Lane << " from " << *Scalar << ".\n");		<< Lane << " from " << *Scalar << ".\n");
ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));		ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));
}		}
}		}
}		}
}		}

void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst) {		ArrayRef<Value *> UserIgnoreLst,
		Optional<unsigned> MaxSize) {
deleteTree();		deleteTree();
		MaxTreeSize = MaxSize;
UserIgnoreList = UserIgnoreLst;		UserIgnoreList = UserIgnoreLst;
if (!allSameType(Roots))		if (!allSameType(Roots))
return;		return;
buildTree_rec(Roots, 0, EdgeInfo());		buildTree_rec(Roots, 0, EdgeInfo());
}		}

namespace {		namespace {
/// Tracks the state we can represent the loads in the given sequence.		/// Tracks the state we can represent the loads in the given sequence.
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	if (NumUniqueScalarValues == VL.size()) {
return false;		return false;
}		}
VL = UniqueValues;		VL = UniqueValues;
}		}
return true;		return true;
};		};

InstructionsState S = getSameOpcode(VL);		InstructionsState S = getSameOpcode(VL);

		if (MaxTreeSize && VectorizableTree.size() == *MaxTreeSize) {
		LLVM_DEBUG(dbgs() << "SLP: Gathering due to max tree size " << *MaxTreeSize
		<< ".\n");
		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
		ReuseShuffleIndicies);
		return;
		}

if (Depth == RecursionMaxDepth) {		if (Depth == RecursionMaxDepth) {
LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");		LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
if (TryToFindDuplicates(S))		if (TryToFindDuplicates(S))
newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,		newTreeEntry(VL, None /not vectorized/, S, UserTreeIdx,
ReuseShuffleIndicies);		ReuseShuffleIndicies);
return;		return;
}		}

▲ Show 20 Lines • Show All 7,113 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/vectorize-pair-path.ll

	Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines

	attributes #0 = { "unsafe-fp-math"="true" }			attributes #0 = { "unsafe-fp-math"="true" }

	; This test checks that root steering works and that the code gets vectorized.			; This test checks that root steering works and that the code gets vectorized.

	define void @root_steering() {			define void @root_steering() {
	; CHECK-LABEL: @root_steering(			; CHECK-LABEL: @root_steering(
	; CHECK-NEXT: bb:			; CHECK-NEXT: bb:
	; CHECK-NEXT: [[CHAIN2_2:%.*]] = fadd double 4.000000e-01, 5.000000e-01
	; CHECK-NEXT: [[CHAIN2_1:%.*]] = fmul double 3.000000e-01, [[CHAIN2_2]]
	; CHECK-NEXT: [[ROOT5:%.*]] = fadd double 2.000000e-01, [[CHAIN2_1]]
	; CHECK-NEXT: [[ROOT3:%.*]] = fmul double 3.000000e-01, 2.000000e-01			; CHECK-NEXT: [[ROOT3:%.*]] = fmul double 3.000000e-01, 2.000000e-01
	; CHECK-NEXT: [[MUL:%.*]] = fmul double [[ROOT3]], 1.000000e-01			; CHECK-NEXT: [[MUL:%.*]] = fmul double [[ROOT3]], 1.000000e-01
	; CHECK-NEXT: [[CHAINB_3:%.*]] = fadd double 3.000000e-01, 4.000000e-01			; CHECK-NEXT: [[CHAINB_3:%.*]] = fadd double 3.000000e-01, 4.000000e-01
	; CHECK-NEXT: [[CHAINB_2:%.*]] = fmul double 2.000000e-01, [[CHAINB_3]]			; CHECK-NEXT: [[CHAINB_2:%.*]] = fmul double 2.000000e-01, [[CHAINB_3]]
	; CHECK-NEXT: [[CHAINB_1:%.*]] = fadd double 1.000000e-01, [[CHAINB_2]]			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> <double 5.000000e-01, double poison>, double [[CHAINB_2]], i32 1
	; CHECK-NEXT: [[ROOT4:%.*]] = fmul double [[MUL]], [[CHAINB_1]]			; CHECK-NEXT: [[TMP1:%.*]] = fadd <2 x double> <double 4.000000e-01, double 1.000000e-01>, [[TMP0]]
	; CHECK-NEXT: [[ROOT2:%.*]] = fadd double 1.000000e-01, [[ROOT4]]			; CHECK-NEXT: [[TMP2:%.*]] = insertelement <2 x double> <double 3.000000e-01, double poison>, double [[MUL]], i32 1
	; CHECK-NEXT: [[ROOT1:%.*]] = fmul double [[ROOT3]], [[ROOT5]]			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], [[TMP1]]
	; CHECK-NEXT: [[DIV:%.*]] = fdiv double [[ROOT1]], [[ROOT2]]			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> <double 2.000000e-01, double 1.000000e-01>, [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.*]] = extractelement <2 x double> [[TMP4]], i32 0
				; CHECK-NEXT: [[ROOT1:%.*]] = fmul double [[ROOT3]], [[TMP5]]
				; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP4]], i32 1
				; CHECK-NEXT: [[DIV:%.*]] = fdiv double [[ROOT1]], [[TMP6]]
	; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[DIV]], 3.000000e-01			; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[DIV]], 3.000000e-01
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb:			bb:
	%chain2_2 = fadd double 0.4, 0.5			%chain2_2 = fadd double 0.4, 0.5
	%chain2_1 = fmul double 0.3, %chain2_2			%chain2_1 = fmul double 0.3, %chain2_2
	%root5 = fadd double 0.2, %chain2_1			%root5 = fadd double 0.2, %chain2_1

	Show All 15 Lines