This is an archive of the discontinued LLVM Phabricator instance.

lib/Transforms/Vectorize/SLPVectorizer.cpp
4751–4752	setAnalyzingOperands
4751–4752	getNextOperand
4751–4752	areOperandsAnalyzed
4752	First word needs to be a verb, how about analyzingInstruction.
4775–4782	Please mention the actual traversal used. It seems this is preorder. If that is true why do you even need to track the operand index? Iterative pre-order is: while (!Stack.empty()) { Value V = Stack.back(); Stack.pop_back(); visit(V); for (auto O = V->rbegin(); O != V->rend(); ++O) Stack.push_back(*O); }

Address comments from Adam.

Harbormaster completed remote builds in B6722: Diff 100082.May 24 2017, 7:13 AM

anemet added inline comments.May 24 2017, 9:53 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	Again, which order?

ABataev added inline comments.May 24 2017, 10:11 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	DFS

anemet added inline comments.May 24 2017, 10:12 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	pre/in/post?

ABataev added inline comments.May 24 2017, 10:15 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	pre

anemet added inline comments.May 24 2017, 10:20 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	Okay, then please update the comment. Also please answer my question below why we chose to track the edges with an iterative pre-order traversal. If it's unnecessary, please fix or add a FIXME. Thank you.

ABataev added inline comments.May 24 2017, 11:01 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	It simplifies limiting the tree traversal level of the function. We can use simple pre-order traversal, but still need to keep the tree node level within the Stack, for example, to limit the recursion level.

anemet added inline comments.May 25 2017, 2:26 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	I am not sure I see a major difference, do you have an example?
4795–4796	It would be good to add a comment why we need to clear P beyond the first level.

ABataev added inline comments.May 26 2017, 6:37 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	I'm just saying that there is no major difference. We won't win anything by using of pre-order traversal comparing with DFS preorder traversal. We will need to track the current tree depth to be able to cut off the vectorization process if the tree depth is more than `RecursionMaxDepth` (see line 4879).
4795–4796	Ok

Added a comment regarding assigning nullptr value to P

anemet added inline comments.May 26 2017, 10:44 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	I'm just saying that there is no major difference. We won't win anything by using of pre-order traversal comparing with DFS preorder traversal. Now I am confused. pre-order traversal is DFS pre-order traversal.

ABataev added inline comments.May 26 2017, 10:53 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	I'm sorry, I meant you're suggesting to change traversal to BFS.

anemet added inline comments.May 26 2017, 11:00 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	No it's still preorder. My point was that for preorder unlike postorder you don't need to keep track of level of processing that has been performed on a node (i.e. getOperandIndex()). See the code one comment down.

ABataev added inline comments.May 26 2017, 12:03 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	Yes, I understand this. But how to track the tree depth in your code? We need to limit the depth by the `RecusrsionMaxDepth`.

anemet added inline comments.May 31 2017, 8:35 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	The depth could be stored with each stack element -- that would still be simpler than keeping track of the operand. It would also properly describe the intent; the depth is only maintained to control the recursion limit.

ABataev added inline comments.Jun 1 2017, 6:07 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	I believe this is a matter of taste. Besides, the same approach is used in `matchAssociativeReduction` function and I believe it is better to use the same approach in all functions, instead of using the different algorithms for implementation of the same idea.

anemet added inline comments.Jun 1 2017, 11:45 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4778–4779	But it is not the same idea. matchAssociativeReduction performs post-order traversal. So it's a wrong example to support your case! I did see that you were copying the code from there (already a bad idea). I think that both of these should be rewritten with df_iterator and po_iterator respectively. Let's just add a FIXME in the code and move on: "This performs pre-order traversal. FIXME: can we do this with df_iterator?"

Updates after comments.

Thanks very much for rewriting the loop. This is way more intuitive now.

A few more nits/questions below but feel free to commit either way.

lib/Transforms/Vectorize/SLPVectorizer.cpp
4783	Capitalize sentence.
4784	I would start the level from 0, that's more canonical and then later would check like this: if (++Level < MaxDepth)
4801–4802	Rather than saying "on the next iteration", isn't it the desire that we don't analyze the phi node unless this is the root node? If yes it's better to say so, i.e. "unless this is the root node".

This revision is now accepted and ready to land.Jun 2 2017, 9:48 AM

Update after review

Harbormaster completed remote builds in B6954: Diff 101265.Jun 2 2017, 12:49 PM

LGTM.

lib/Transforms/Vectorize/SLPVectorizer.cpp
4782–4783	Something got messed up with upper/lowercase here.

ABataev added inline comments.Jun 2 2017, 1:08 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4782–4783	I tried to keep the original names of the variables. Should I capitalize them too?

anemet added inline comments.Jun 2 2017, 1:15 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4782–4783	Oh sorry, no I only meant the capitalize the first letter of the sentence. Looking at it again, I think I misread this and thought that "subtrees" was the beginning of a sentence. It's not so just go back to the original version. Sorry about the confusion!

ABataev added inline comments.Jun 2 2017, 1:20 PM

lib/Transforms/Vectorize/SLPVectorizer.cpp
4782–4783	Ok, no problems. Then I'll restore it back before commit

Closed by commit rL304593: [SLP] Improve comments and naming of functions/variables/members, NFC. (authored by ABataev). · Explain WhyJun 2 2017, 1:39 PM

This revision was automatically updated to reflect the committed changes.

Hi Alexey, Adam,

From what I can see in this algorithm, there is no limit on the actual size of the stack in the loop. The level variable controls just the recursion limit. So, in effect, IIUC, the max total number of operands being processed by the while loop is 2 ^ RecursionLimit (it's to the base 2 because we avoid phi nodes).

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4828 ↗	(On Diff #101269)	Could you please explain how this is NFC wrt the previous code? In the code on the LHS, we are checking that the operand is in the same basic block as the root before placing it on the stack. Here we are unconditionally placing all operands on the stack.

In D33320#795834, @anna wrote:

Hi Alexey, Adam,

From what I can see in this algorithm, there is no limit on the actual size of the stack in the loop. The level variable controls just the recursion limit. So, in effect, IIUC, the max total number of operands being processed by the while loop is 2 ^ RecursionLimit (it's to the base 2 because we avoid phi nodes).

It does not limits the number of processed nodes, it limits the tree height just like it was before.

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4828 ↗	(On Diff #101269)	Yes, missed these checks, will add them

In D33320#796679, @ABataev wrote:

In D33320#795834, @anna wrote:

Hi Alexey, Adam,

From what I can see in this algorithm, there is no limit on the actual size of the stack in the loop. The level variable controls just the recursion limit. So, in effect, IIUC, the max total number of operands being processed by the while loop is 2 ^ RecursionLimit (it's to the base 2 because we avoid phi nodes).

It does not limits the number of processed nodes, it limits the tree height just like it was before.

Yes, but limiting the tree height itself is not enough right? Now, in the worst case, 2^12 nodes being processed in the tryToVectorizeHorReductionOrInstOperands, when earlier it was just a single node (i.e. before this change: https://reviews.llvm.org/D25517).

llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
4828 ↗	(On Diff #101269)	Yeah, we saw huge compile time degradations in tryToVectorizeHorReductionOrInstOperands.

In D33320#796697, @anna wrote:

In D33320#796679, @ABataev wrote:

In D33320#795834, @anna wrote:

Hi Alexey, Adam,

From what I can see in this algorithm, there is no limit on the actual size of the stack in the loop. The level variable controls just the recursion limit. So, in effect, IIUC, the max total number of operands being processed by the while loop is 2 ^ RecursionLimit (it's to the base 2 because we avoid phi nodes).

It does not limits the number of processed nodes, it limits the tree height just like it was before.

Yes, but limiting the tree height itself is not enough right? Now, in the worst case, 2^12 nodes being processed in the tryToVectorizeHorReductionOrInstOperands, when earlier it was just a single node (i.e. before this change: https://reviews.llvm.org/D25517).

@anna, I am confused whether you're complaining about the additional overhead in the original change (D25517) or the algorithmic change in this refinement (D33320). Your comparison above seems to suggest your baseline is *before* the original changes.

In this refinement change, we can have more nodes on the stack compared to the original change but the number of nodes processed should remain unchanged.

@anemet: We saw the degradations with respect to this change (D33320) itself, not before the original change. Here's the code change as I see it. Please correct if this is wrong:

we were initially processing exactly one node in tryToVectorizeHorReductionOrInstOperands equivalent function, before any changes below.
after D25517, we are processing at most 2 ^ 12 nodes in tryToVectorizeHorReductionOrInstOperands, but they are limited to a single basic block.
in this refinement change (D33320), we are processing at most 2 ^ 12 nodes, but they are no longer limited to a single basic block.

@ABataev is fixing #3 by limiting it to the same basic block so that it's actually an NFC wrt #2. That might fix the compile time regressions. However, my concern is that aren't we still having the possibility of high compile time impact in #2 when we have a single large basic block with 2^10 binary instructions for example (because we are not limiting the number of nodes, but rather the depth of the tree)?
Should we perhaps have a threshold cutoff for the number of stack nodes? Or reduce the MaxDepthRecursion from 12 to 6?

I bailed out of this loop (return false in tryToVectorizeHorReductionOrInstOperands) when the stack size is too large, and that fixed our compile time regression temporarily.

I'm working on the patch that stops vectorization if the parent basic block of the instruction is not BB or the instruction was processed already, but it's quite hard to add a test for this change. I can publish it as NFC, because we just limiting the number of analyzed instructions if this is acceptable.

In D33320#796821, @ABataev wrote:

I'm working on the patch that stops vectorization if the parent basic block of the instruction is not BB or the instruction was processed already, but it's quite hard to add a test for this change. I can publish it as NFC, because we just limiting the number of analyzed instructions if this is acceptable.

Thanks Alexey. That should fix the current regressions we are seeing. Could you please add me as a reviewer to the patch?

The second concern I have (as mentioned in above comment) is that with D25517, we are increasing the complexity of tryToVectorizeHorReductionOrInstOperands from a single node being processed to an exponential number of nodes (2^12). Is this a correct analysis?

It may not be an issue in practice once we limit to the current basic block (the fix you're working on). However, this function is called for every instruction in the IR as part of vectorizeChainsInBlock, so as the basic block size increases, we may see a compile time impact.

We just identified this commit as the cause of a 10x slowdown when compiling shared_sha256.c in the llvm test-suite/CTMark (x86, no special flags should get you the default -O3) showing up on our performance tracking.

Did any of the planned improvements make it to ToT yet?

In D33320#824506, @MatzeB wrote:

We just identified this commit as the cause of a 10x slowdown when compiling shared_sha256.c in the llvm test-suite/CTMark (x86, no special flags should get you the default -O3) showing up on our performance tracking.

Did any of the planned improvements make it to ToT yet?

Seems like this has recovered on ToT.

In D33320#824538, @MatzeB wrote:

In D33320#824506, @MatzeB wrote:

We just identified this commit as the cause of a 10x slowdown when compiling shared_sha256.c in the llvm test-suite/CTMark (x86, no special flags should get you the default -O3) showing up on our performance tracking.

Did any of the planned improvements make it to ToT yet?

Seems like this has recovered on ToT.

This was the review thread: https://reviews.llvm.org/D34881

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

150 lines

Diff 101265

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 4,742 Lines • ▼ Show 20 Lines	if (P->getIncomingBlock(0) == BBLatch) {
Rdx = P->getIncomingValue(1);		Rdx = P->getIncomingValue(1);
}		}

if (Rdx && DominatedReduxValue(Rdx))		if (Rdx && DominatedReduxValue(Rdx))
return Rdx;		return Rdx;

return nullptr;		return nullptr;
}		}

namespace {		/// Attempt to reduce a horizontal reduction.
		anemetUnsubmitted Not Done Reply Inline Actions First word needs to be a verb, how about analyzingInstruction. anemet: First word needs to be a verb, how about analyzingInstruction.
		anemetUnsubmitted Not Done Reply Inline Actions setAnalyzingOperands anemet: setAnalyzingOperands
		anemetUnsubmitted Not Done Reply Inline Actions getNextOperand anemet: getNextOperand
		anemetUnsubmitted Not Done Reply Inline Actions areOperandsAnalyzed anemet: areOperandsAnalyzed
/// Tracks instructons and its children.		/// If it is legal to match a horizontal reduction feeding the phi node \a P
class WeakTrackingVHWithLevel final : public CallbackVH {		/// with reduction operators \a Root (or one of its operands) in a basic block
/// Operand index of the instruction currently beeing analized.		/// \a BB, then check if it can be done. If horizontal reduction is not found
unsigned Level = 0;		/// and root instruction is a binary operation, vectorization of the operands is
/// Is this the instruction that should be vectorized, or are we now		/// attempted.
/// processing children (i.e. operands of this instruction) for potential		/// \returns true if a horizontal reduction was matched and reduced or operands
/// vectorization?		/// of one of the binary instruction were vectorized.
bool IsInitial = true;		/// \returns false if a horizontal reduction was not matched (or not possible)
		/// or no vectorization of any binary operation feeding \a Root instruction was
public:		/// performed.
explicit WeakTrackingVHWithLevel() = default;		static bool tryToVectorizeHorReductionOrInstOperands(
WeakTrackingVHWithLevel(Value *V) : CallbackVH(V){};
/// Restart children analysis each time it is repaced by the new instruction.
void allUsesReplacedWith(Value *New) override {
setValPtr(New);
Level = 0;
IsInitial = true;
}
/// Check if the instruction was not deleted during vectorization.
bool isValid() const { return !getValPtr(); }
/// Is the istruction itself must be vectorized?
bool isInitial() const { return IsInitial; }
/// Try to vectorize children.
void clearInitial() { IsInitial = false; }
/// Are all children processed already?
bool isFinal() const {
assert(getValPtr() &&
(isa<Instruction>(getValPtr()) &&
cast<Instruction>(getValPtr())->getNumOperands() >= Level));
return getValPtr() &&
cast<Instruction>(getValPtr())->getNumOperands() == Level;
}
/// Get next child operation.
Value *nextOperand() {
assert(getValPtr() && isa<Instruction>(getValPtr()) &&
cast<Instruction>(getValPtr())->getNumOperands() > Level);
return cast<Instruction>(getValPtr())->getOperand(Level++);
}
virtual ~WeakTrackingVHWithLevel() = default;
};
} // namespace

/// \brief Attempt to reduce a horizontal reduction.
/// If it is legal to match a horizontal reduction feeding
/// the phi node P with reduction operators Root in a basic block BB, then check
/// if it can be done.
/// \returns true if a horizontal reduction was matched and reduced.
/// \returns false if a horizontal reduction was not matched.
static bool canBeVectorized(
PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,		PHINode P, Instruction Root, BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI,		TargetTransformInfo *TTI,
const function_ref<bool(BinaryOperator *, BoUpSLP &)> Vectorize) {		const function_ref<bool(BinaryOperator *, BoUpSLP &)> Vectorize) {
if (!ShouldVectorizeHor)		if (!ShouldVectorizeHor)
return false;		return false;

if (!Root)		if (!Root)
return false;		return false;

if (Root->getParent() != BB)		if (Root->getParent() != BB)
return false;		return false;
SmallVector<WeakTrackingVHWithLevel, 8> Stack(1, Root);		// Start analysis starting from Root instruction. If horizontal reduction is
		// found, try to vectorize it. If it is not a horizontal reduction or
		// vectorization is not possible or not effective, and currently analyzed
		// instruction is a binary operation, try to vectorize the operands, using
		// pre-order DFS traversal order. If the operands were not vectorized, repeat
		anemetUnsubmitted Not Done Reply Inline Actions Again, which order? anemet: Again, which order?
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions DFS ABataev: DFS
		anemetUnsubmitted Not Done Reply Inline Actions pre/in/post? anemet: pre/in/post?
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions pre ABataev: pre
		anemetUnsubmitted Not Done Reply Inline Actions Okay, then please update the comment. Also please answer my question below why we chose to track the edges with an iterative pre-order traversal. If it's unnecessary, please fix or add a FIXME. Thank you. anemet: Okay, then please update the comment. Also please answer my question below why we chose to…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions It simplifies limiting the tree traversal level of the function. We can use simple pre-order traversal, but still need to keep the tree node level within the Stack, for example, to limit the recursion level. ABataev: It simplifies limiting the tree traversal level of the function. We can use simple pre-order…
		anemetUnsubmitted Not Done Reply Inline Actions I am not sure I see a major difference, do you have an example? anemet: I am not sure I see a major difference, do you have an example?
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions I'm just saying that there is no major difference. We won't win anything by using of pre-order traversal comparing with DFS preorder traversal. We will need to track the current tree depth to be able to cut off the vectorization process if the tree depth is more than `RecursionMaxDepth` (see line 4879). ABataev: I'm just saying that there is no major difference. We won't win anything by using of pre-order…
		anemetUnsubmitted Not Done Reply Inline Actions I'm just saying that there is no major difference. We won't win anything by using of pre-order traversal comparing with DFS preorder traversal. Now I am confused. pre-order traversal is DFS pre-order traversal. anemet: > I'm just saying that there is no major difference. We won't win anything by using of pre…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions I'm sorry, I meant you're suggesting to change traversal to BFS. ABataev: I'm sorry, I meant you're suggesting to change traversal to BFS.
		anemetUnsubmitted Not Done Reply Inline Actions No it's still preorder. My point was that for preorder unlike postorder you don't need to keep track of level of processing that has been performed on a node (i.e. getOperandIndex()). See the code one comment down. anemet: No it's still preorder. My point was that for preorder unlike postorder you don't need to keep…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions Yes, I understand this. But how to track the tree depth in your code? We need to limit the depth by the `RecusrsionMaxDepth`. ABataev: Yes, I understand this. But how to track the tree depth in your code? We need to limit the…
		anemetUnsubmitted Not Done Reply Inline Actions The depth could be stored with each stack element -- that would still be simpler than keeping track of the operand. It would also properly describe the intent; the depth is only maintained to control the recursion limit. anemet: The depth could be stored with each stack element -- that would still be simpler than keeping…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions I believe this is a matter of taste. Besides, the same approach is used in `matchAssociativeReduction` function and I believe it is better to use the same approach in all functions, instead of using the different algorithms for implementation of the same idea. ABataev: I believe this is a matter of taste. Besides, the same approach is used in…
		anemetUnsubmitted Not Done Reply Inline Actions But it is not the same idea. matchAssociativeReduction performs post-order traversal. So it's a wrong example to support your case! I did see that you were copying the code from there (already a bad idea). I think that both of these should be rewritten with df_iterator and po_iterator respectively. Let's just add a FIXME in the code and move on: "This performs pre-order traversal. FIXME: can we do this with df_iterator?" anemet: But it is not the same idea. matchAssociativeReduction performs post-order traversal. So…
		// the same procedure considering each operand as a possible root of the
		// horizontal reduction.
		// INTERRUPT THE PROCESS IF THE Root INSTRUCITON ITSELF WAS VECTORIZED OR ALL
		anemetUnsubmitted Not Done Reply Inline Actions Please mention the actual traversal used. It seems this is preorder. If that is true why do you even need to track the operand index? Iterative pre-order is: while (!Stack.empty()) { Value V = Stack.back(); Stack.pop_back(); visit(V); for (auto O = V->rbegin(); O != V->rend(); ++O) Stack.push_back(O); } anemet:* Please mention the actual traversal used. It seems this is preorder. If that is true why do…
		// SUB-TREES NOT HIGHER THAN RecursionMaxDepth WERE ANALYZED/VECTORIZED.
		anemetUnsubmitted Not Done Reply Inline Actions Capitalize sentence. anemet: Capitalize sentence.
		anemetUnsubmitted Not Done Reply Inline Actions Something got messed up with upper/lowercase here. anemet: Something got messed up with upper/lowercase here.
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions I tried to keep the original names of the variables. Should I capitalize them too? ABataev: I tried to keep the original names of the variables. Should I capitalize them too?
		anemetUnsubmitted Not Done Reply Inline Actions Oh sorry, no I only meant the capitalize the first letter of the sentence. Looking at it again, I think I misread this and thought that "subtrees" was the beginning of a sentence. It's not so just go back to the original version. Sorry about the confusion! anemet: Oh sorry, no I only meant the capitalize the first letter of the sentence. Looking at it again…
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions Ok, no problems. Then I'll restore it back before commit ABataev: Ok, no problems. Then I'll restore it back before commit
		SmallVector<std::pair<WeakVH, unsigned>, 8> Stack(1, {Root, 0});
		anemetUnsubmitted Not Done Reply Inline Actions I would start the level from 0, that's more canonical and then later would check like this: if (++Level < MaxDepth) anemet: I would start the level from 0, that's more canonical and then later would check like this: if…
SmallSet<Value *, 8> VisitedInstrs;		SmallSet<Value *, 8> VisitedInstrs;
bool Res = false;		bool Res = false;
while (!Stack.empty()) {		while (!Stack.empty()) {
Value *V = Stack.back();		Value *V;
if (!V) {		unsigned Level;
Stack.pop_back();		std::tie(V, Level) = Stack.pop_back_val();
		if (!V)
continue;		continue;
}
auto *Inst = dyn_cast<Instruction>(V);		auto *Inst = dyn_cast<Instruction>(V);
if (!Inst \|\| isa<PHINode>(Inst)) {		if (!Inst \|\| isa<PHINode>(Inst))
Stack.pop_back();
continue;		continue;
}
if (Stack.back().isInitial()) {
Stack.back().clearInitial();
if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {		if (auto *BI = dyn_cast<BinaryOperator>(Inst)) {
		anemetUnsubmitted Not Done Reply Inline Actions It would be good to add a comment why we need to clear P beyond the first level. anemet: It would be good to add a comment why we need to clear P beyond the first level.
		ABataevAuthorUnsubmitted Not Done Reply Inline Actions Ok ABataev: Ok
HorizontalReduction HorRdx;		HorizontalReduction HorRdx;
if (HorRdx.matchAssociativeReduction(P, BI)) {		if (HorRdx.matchAssociativeReduction(P, BI)) {
if (HorRdx.tryToReduce(R, TTI)) {		if (HorRdx.tryToReduce(R, TTI)) {
Res = true;		Res = true;
		// Set P to nullptr to avoid re-analysis of phi node in
		// matchAssociativeReduction function unless this is the root node.
		anemetUnsubmitted Not Done Reply Inline Actions Rather than saying "on the next iteration", isn't it the desire that we don't analyze the phi node unless this is the root node? If yes it's better to say so, i.e. "unless this is the root node". anemet: Rather than saying "on the next iteration", isn't it the desire that we don't analyze the phi…
P = nullptr;		P = nullptr;
continue;		continue;
}		}
}		}
if (P) {		if (P) {
Inst = dyn_cast<Instruction>(BI->getOperand(0));		Inst = dyn_cast<Instruction>(BI->getOperand(0));
if (Inst == P)		if (Inst == P)
Inst = dyn_cast<Instruction>(BI->getOperand(1));		Inst = dyn_cast<Instruction>(BI->getOperand(1));
if (!Inst) {		if (!Inst) {
		// Set P to nullptr to avoid re-analysis of phi node in
		// matchAssociativeReduction function unless this is the root node.
P = nullptr;		P = nullptr;
continue;		continue;
}		}
}		}
}		}
		// Set P to nullptr to avoid re-analysis of phi node in
		// matchAssociativeReduction function unless this is the root node.
P = nullptr;		P = nullptr;
if (Vectorize(dyn_cast<BinaryOperator>(Inst), R)) {		if (Vectorize(dyn_cast<BinaryOperator>(Inst), R)) {
Res = true;		Res = true;
continue;		continue;
}		}
}
if (Stack.back().isFinal()) {
Stack.pop_back();
continue;
}

if (auto *NextV = dyn_cast<Instruction>(Stack.back().nextOperand()))		// Try to vectorize operands.
if (NextV->getParent() == BB && VisitedInstrs.insert(NextV).second &&		if (++Level < RecursionMaxDepth)
Stack.size() < RecursionMaxDepth)		for (auto *Op : Inst->operand_values())
Stack.push_back(NextV);		Stack.emplace_back(Op, Level);
}		}
return Res;		return Res;
}		}

bool SLPVectorizerPass::vectorizeRootInstruction(PHINode P, Value V,		bool SLPVectorizerPass::vectorizeRootInstruction(PHINode P, Value V,
BasicBlock *BB, BoUpSLP &R,		BasicBlock *BB, BoUpSLP &R,
TargetTransformInfo *TTI) {		TargetTransformInfo *TTI) {
if (!V)		if (!V)
return false;		return false;
auto *I = dyn_cast<Instruction>(V);		auto *I = dyn_cast<Instruction>(V);
if (!I)		if (!I)
return false;		return false;

if (!isa<BinaryOperator>(I))		if (!isa<BinaryOperator>(I))
P = nullptr;		P = nullptr;
// Try to match and vectorize a horizontal reduction.		// Try to match and vectorize a horizontal reduction.
return canBeVectorized(P, I, BB, R, TTI,		return tryToVectorizeHorReductionOrInstOperands(
[this](BinaryOperator *BI, BoUpSLP &R) -> bool {		P, I, BB, R, TTI, [this](BinaryOperator *BI, BoUpSLP &R) -> bool {
return tryToVectorize(BI, R);		return tryToVectorize(BI, R);
});		});
}		}

bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {		bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock *BB, BoUpSLP &R) {
bool Changed = false;		bool Changed = false;
SmallVector<Value *, 4> Incoming;		SmallVector<Value *, 4> Incoming;
SmallSet<Value *, 16> VisitedInstrs;		SmallSet<Value *, 16> VisitedInstrs;

bool HaveVectorizedPhiNodes = true;		bool HaveVectorizedPhiNodes = true;
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Improve comments and naming of functions/variables/members, NFC.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 101265

lib/Transforms/Vectorize/SLPVectorizer.cpp

[SLP] Improve comments and naming of functions/variables/members, NFC.
ClosedPublic