This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Visualize SLP trees with -view-slp-tree
ClosedPublic

Authored by anemet on Mar 7 2017, 11:13 PM.

Download Raw Diff

Details

Reviewers

rengolin
mzolotukhin
ABataev
mkuper
mssimpso
hfinkel

Commits

rG95da05c3f506: [SLP] Visualize SLP trees with -view-slp-tree
rL297303: [SLP] Visualize SLP trees with -view-slp-tree

Summary

Analyzing larger trees is extremely difficult with the current debug output so
this adds GraphTraits and DOTGraphTraits on top of the VectorizableTree data
structure. We can now display the SLP trees with Graphviz as in
https://reviews.llvm.org/F3132765.

I decorated the graph where a value needs to be gathered for one reason or
another. These are the red nodes.

There other improvement I am planning to make as I work through my case here.
For example, I would also like to mark nodes that need to be extracted.

Diff Detail

Build Status

Buildable 4629
Build 4629: arc lint + arc unit

Event Timeline

anemet created this revision.Mar 7 2017, 11:13 PM

Harbormaster completed remote builds in B4615: Diff 90982.Mar 7 2017, 11:13 PM

Thank you, this will be incredibly helpful. I don't think it quite works, though - see inline.

(In any case, I can only review the SLP part, never implemented GraphTraits.)

lib/Transforms/Vectorize/SLPVectorizer.cpp
517	tl;dr: I think this is broken, but it's not something you can easily fix here. Fair warning: I haven't had time to really think this through, so maybe there's something trivial I'm missing here. Having said that - I just realized yesterday the tree is a lie. It isn't actually a tree, it's a DAG - so "the user" is not well-defined (see line 1095 on the left, and the discussion about the trouble it causes here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435431.html ) I think the whole thing should be rewritten, to be explicit about the graph it contains. But right now, I believe the only correct way to look up which tree nodes feed a node T is the way vectorizeTree() does it - look at the (scalar) operands of T's scalars, and map them back to the nodes through ScalarToTreeEntry. I'm not sure trying to wrap GraphTraits around that is a good idea, though. We can either: Leave this as is, and be explicit about the fact that the rendered graph may be imprecise. Try to walk the graph "properly". Put this on hold until the underlying "graph" has a saner representation. I'm not sure when I'll have time to do it, though. If anybody else volunteers... None of those sound appealing, TBH. Not sure what the least evil is.
2122	Any reason not to render this as part of the graph regardless of NDEBUG?

Thanks for working on this Adam! I have drawn these kinds of graphs by hand more times than I care to admit. This will be very helpful.

lib/Transforms/Vectorize/SLPVectorizer.cpp
517	I'd like to better understand the imprecision here. You're essentially saying that the graph output could be missing some edges because a tree entry could have multiple users? And that these uses need not even be of the same vector value (in the unordered load case)? How difficult would it be to walk the DAG properly?

anemet added inline comments.Mar 8 2017, 8:09 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
517	tl;dr: I think this is broken, but it's not something you can easily fix here. Well not more broken that the entire code then ;). We are badly missing a FIXME somewhere. Fair warning: I haven't had time to really think this through, so maybe there's something trivial I'm missing here. Having said that - I just realized yesterday the tree is a lie. It isn't actually a tree, it's a DAG - so "the user" is not well-defined (see line 1095 on the left, and the discussion about the trouble it causes here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435431.html ) I think the whole thing should be rewritten, to be explicit about the graph it contains. But right now, I believe the only correct way to look up which tree nodes feed a node T is the way vectorizeTree() does it - look at the (scalar) operands of T's scalars, and map them back to the nodes through ScalarToTreeEntry. I'm not sure trying to wrap GraphTraits around that is a good idea, though. I have actually seen that we're trying to recognize existing entries in the tree but my conclusion was that at that point we just just terminate the tree. I am curious to look at your testcase. Anyhow, it's trivial to make the UsingTreeIdx a SmallVector<int, 1> to be able to represent a DAG without bloating the TreeEntry much. Hopefully we can find a place to easily update it too. It would also remove my C++ magic to make a single int an iterate-able "container". We can either: Leave this as is, and be explicit about the fact that the rendered graph may be imprecise. Try to walk the graph "properly". Put this on hold until the underlying "graph" has a saner representation. I'm not sure when I'll have time to do it, though. If anybody else volunteers... None of those sound appealing, TBH. Not sure what the least evil is. Let me look at this today. Hopefully the testcase you quote is a non-tree case (please confirm). I can use that to make this more precise. Either way, I'd like to check this in one form or another as it's super-useful even if we make it experimental for now. We can improve it in tree. If we make the user node a container the data structure is equipped to deal wit this.

This version allows for multiple users for a TreeEntry and hooks up
buildTree_rec to update it when there is a reuse. See
https://reviews.llvm.org/F3133339 how it looks for Michael's testcase.

The SLP part LGTM - it's less broken than the rest of the code, and we need to fix SLP itself to fix it (and avoid things like the Container hack...)

The GraphTraits implementation generally LGTM as well, but that may be because I don't know enough about it to tell a good one from a bad one...

lib/Transforms/Vectorize/SLPVectorizer.cpp
517	@mssimpso : yes. I think we're going to unroll the unordered load case for now, so they'll always be the same value. And, apparently, not as difficult as I thought, because Adam just did it. @anemet : Yes, not more broken than the entire code. I was going to write "but it's not your fault". :-) And yes, you're completely right, it is trivial... I didn't consider the fact the initial walk is correct, even if the results it produces is nonsense, so you can just keep using the walk order as is. Thanks!
1188	I think this is where we want the FIXME. :-) Can you please add it -not for the entire thing, just that this here is a huge hack that makes the graph used for printing more precise than the graph actually used for vectorization...

Closed by commit rL297303: [SLP] Visualize SLP trees with -view-slp-tree (authored by anemet). · Explain WhyMar 8 2017, 10:59 AM

This revision was automatically updated to reflect the committed changes.

anemet added inline comments.Mar 8 2017, 11:01 AM

lib/Transforms/Vectorize/SLPVectorizer.cpp
1188	OK, I added that. It would be good explain the shuffle-order thing as well but I will leave that to you or Shahid. Thanks for the review!

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

227 lines

Diff 91039

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show All 33 Lines
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/NoFolder.h"		#include "llvm/IR/NoFolder.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Support/GraphWriter.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Vectorize.h"		#include "llvm/Transforms/Vectorize.h"
#include <algorithm>		#include <algorithm>
#include <memory>		#include <memory>

using namespace llvm;		using namespace llvm;
using namespace slpvectorizer;		using namespace slpvectorizer;

Show All 35 Lines
static cl::opt<unsigned> RecursionMaxDepth(		static cl::opt<unsigned> RecursionMaxDepth(
"slp-recursion-max-depth", cl::init(12), cl::Hidden,		"slp-recursion-max-depth", cl::init(12), cl::Hidden,
cl::desc("Limit the recursion depth when building a vectorizable tree"));		cl::desc("Limit the recursion depth when building a vectorizable tree"));

static cl::opt<unsigned> MinTreeSize(		static cl::opt<unsigned> MinTreeSize(
"slp-min-tree-size", cl::init(3), cl::Hidden,		"slp-min-tree-size", cl::init(3), cl::Hidden,
cl::desc("Only vectorize small trees if they are fully vectorizable"));		cl::desc("Only vectorize small trees if they are fully vectorizable"));

		static cl::opt<bool>
		ViewSLPTree("view-slp-tree", cl::Hidden,
		cl::desc("Display the SLP trees with Graphviz"));

// Limit the number of alias checks. The limit is chosen so that		// Limit the number of alias checks. The limit is chosen so that
// it has no negative effect on the llvm benchmarks.		// it has no negative effect on the llvm benchmarks.
static const unsigned AliasedCheckLimit = 10;		static const unsigned AliasedCheckLimit = 10;

// Another limit for the alias checks: The maximum distance between load/store		// Another limit for the alias checks: The maximum distance between load/store
// instructions where alias checks are done.		// instructions where alias checks are done.
// This limit is useful for very large basic blocks.		// This limit is useful for very large basic blocks.
static const unsigned MaxMemDepDistance = 160;		static const unsigned MaxMemDepDistance = 160;
▲ Show 20 Lines • Show All 313 Lines • ▼ Show 20 Lines

private:		private:
struct TreeEntry;		struct TreeEntry;

/// \returns the cost of the vectorizable entry.		/// \returns the cost of the vectorizable entry.
int getEntryCost(TreeEntry *E);		int getEntryCost(TreeEntry *E);

/// This is the recursive part of buildTree.		/// This is the recursive part of buildTree.
void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth);		void buildTree_rec(ArrayRef<Value *> Roots, unsigned Depth, int);

/// \returns True if the ExtractElement/ExtractValue instructions in VL can		/// \returns True if the ExtractElement/ExtractValue instructions in VL can
/// be vectorized to use the original vector (or aggregate "bitcast" to a vector).		/// be vectorized to use the original vector (or aggregate "bitcast" to a vector).
bool canReuseExtract(ArrayRef<Value *> VL, unsigned Opcode) const;		bool canReuseExtract(ArrayRef<Value *> VL, unsigned Opcode) const;

/// Vectorize a single entry in the tree. VL icontains all isomorphic scalars		/// Vectorize a single entry in the tree. VL icontains all isomorphic scalars
/// in order of its usage in a user program, for example ADD1, ADD2 and so on		/// in order of its usage in a user program, for example ADD1, ADD2 and so on
/// or LOAD1 , LOAD2 etc.		/// or LOAD1 , LOAD2 etc.
Show All 32 Lines	void reorderAltShuffleOperands(ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
SmallVectorImpl<Value *> &Right);		SmallVectorImpl<Value *> &Right);
/// \reorder commutative operands to get better probability of		/// \reorder commutative operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
SmallVectorImpl<Value *> &Right);		SmallVectorImpl<Value *> &Right);
struct TreeEntry {		struct TreeEntry {
TreeEntry() : Scalars(), VectorizedValue(nullptr),		TreeEntry(std::vector<TreeEntry> &Container)
NeedToGather(0), NeedToShuffle(0) {}		: Scalars(), VectorizedValue(nullptr), NeedToGather(0),
		NeedToShuffle(0), Container(Container) {}

/// \returns true if the scalars in VL are equal to this entry.		/// \returns true if the scalars in VL are equal to this entry.
bool isSame(ArrayRef<Value *> VL) const {		bool isSame(ArrayRef<Value *> VL) const {
assert(VL.size() == Scalars.size() && "Invalid size");		assert(VL.size() == Scalars.size() && "Invalid size");
return std::equal(VL.begin(), VL.end(), Scalars.begin());		return std::equal(VL.begin(), VL.end(), Scalars.begin());
}		}

/// \returns true if the scalars in VL are found in this tree entry.		/// \returns true if the scalars in VL are found in this tree entry.
Show All 13 Lines	struct TreeEntry {
/// The Scalars are vectorized into this value. It is initialized to Null.		/// The Scalars are vectorized into this value. It is initialized to Null.
Value *VectorizedValue;		Value *VectorizedValue;

/// Do we need to gather this sequence ?		/// Do we need to gather this sequence ?
bool NeedToGather;		bool NeedToGather;

/// Do we need to shuffle the load ?		/// Do we need to shuffle the load ?
bool NeedToShuffle;		bool NeedToShuffle;

		/// Points back to the VectorizableTree.
		///
		/// Only used for Graphviz right now. Unfortunately GraphTrait::NodeRef has
		/// to be a pointer and needs to be able to initialize the child iterator.
		/// Thus we need a reference back to the container to translate the indices
		/// to entries.
		std::vector<TreeEntry> &Container;

		/// The TreeEntry index containing the user of this entry. We can actually
		mkuperUnsubmitted Not Done Reply Inline Actions tl;dr: I think this is broken, but it's not something you can easily fix here. Fair warning: I haven't had time to really think this through, so maybe there's something trivial I'm missing here. Having said that - I just realized yesterday the tree is a lie. It isn't actually a tree, it's a DAG - so "the user" is not well-defined (see line 1095 on the left, and the discussion about the trouble it causes here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435431.html ) I think the whole thing should be rewritten, to be explicit about the graph it contains. But right now, I believe the only correct way to look up which tree nodes feed a node T is the way vectorizeTree() does it - look at the (scalar) operands of T's scalars, and map them back to the nodes through ScalarToTreeEntry. I'm not sure trying to wrap GraphTraits around that is a good idea, though. We can either: Leave this as is, and be explicit about the fact that the rendered graph may be imprecise. Try to walk the graph "properly". Put this on hold until the underlying "graph" has a saner representation. I'm not sure when I'll have time to do it, though. If anybody else volunteers... None of those sound appealing, TBH. Not sure what the least evil is. mkuper: tl;dr: I think this is broken, but it's not something you can easily fix here. Fair warning: I…
		mssimpsoUnsubmitted Not Done Reply Inline Actions I'd like to better understand the imprecision here. You're essentially saying that the graph output could be missing some edges because a tree entry could have multiple users? And that these uses need not even be of the same vector value (in the unordered load case)? How difficult would it be to walk the DAG properly? mssimpso: I'd like to better understand the imprecision here. You're essentially saying that the graph…
		mkuperUnsubmitted Not Done Reply Inline Actions @mssimpso : yes. I think we're going to unroll the unordered load case for now, so they'll always be the same value. And, apparently, not as difficult as I thought, because Adam just did it. @anemet : Yes, not more broken than the entire code. I was going to write "but it's not your fault". :-) And yes, you're completely right, it is trivial... I didn't consider the fact the initial walk is correct, even if the results it produces is nonsense, so you can just keep using the walk order as is. Thanks! mkuper: @mssimpso : yes. I think we're going to unroll the unordered load case for now, so they'll…
		anemetAuthorUnsubmitted Not Done Reply Inline Actions tl;dr: I think this is broken, but it's not something you can easily fix here. Well not more broken that the entire code then ;). We are badly missing a FIXME somewhere. Fair warning: I haven't had time to really think this through, so maybe there's something trivial I'm missing here. Having said that - I just realized yesterday the tree is a lie. It isn't actually a tree, it's a DAG - so "the user" is not well-defined (see line 1095 on the left, and the discussion about the trouble it causes here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170306/435431.html ) I think the whole thing should be rewritten, to be explicit about the graph it contains. But right now, I believe the only correct way to look up which tree nodes feed a node T is the way vectorizeTree() does it - look at the (scalar) operands of T's scalars, and map them back to the nodes through ScalarToTreeEntry. I'm not sure trying to wrap GraphTraits around that is a good idea, though. I have actually seen that we're trying to recognize existing entries in the tree but my conclusion was that at that point we just just terminate the tree. I am curious to look at your testcase. Anyhow, it's trivial to make the UsingTreeIdx a SmallVector<int, 1> to be able to represent a DAG without bloating the TreeEntry much. Hopefully we can find a place to easily update it too. It would also remove my C++ magic to make a single int an iterate-able "container". We can either: Leave this as is, and be explicit about the fact that the rendered graph may be imprecise. Try to walk the graph "properly". Put this on hold until the underlying "graph" has a saner representation. I'm not sure when I'll have time to do it, though. If anybody else volunteers... None of those sound appealing, TBH. Not sure what the least evil is. Let me look at this today. Hopefully the testcase you quote is a non-tree case (please confirm). I can use that to make this more precise. Either way, I'd like to check this in one form or another as it's super-useful even if we make it experimental for now. We can improve it in tree. If we make the user node a container the data structure is equipped to deal wit this. anemet: > tl;dr: I think this is broken, but it's not something you can easily fix here. Well not more…
		/// have multiple users so the data structure is not truly a tree.
		SmallVector<int, 1> UserTreeIndices;
};		};

/// Create a new VectorizableTree entry.		/// Create a new VectorizableTree entry.
TreeEntry newTreeEntry(ArrayRef<Value > VL, bool Vectorized,		TreeEntry newTreeEntry(ArrayRef<Value > VL, bool Vectorized,
bool NeedToShuffle) {		bool NeedToShuffle, int &UserTreeIdx) {
VectorizableTree.emplace_back();		VectorizableTree.emplace_back(VectorizableTree);
int idx = VectorizableTree.size() - 1;		int idx = VectorizableTree.size() - 1;
TreeEntry *Last = &VectorizableTree[idx];		TreeEntry *Last = &VectorizableTree[idx];
Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());		Last->Scalars.insert(Last->Scalars.begin(), VL.begin(), VL.end());
Last->NeedToGather = !Vectorized;		Last->NeedToGather = !Vectorized;
Last->NeedToShuffle = NeedToShuffle;		Last->NeedToShuffle = NeedToShuffle;
if (Vectorized) {		if (Vectorized) {
for (int i = 0, e = VL.size(); i != e; ++i) {		for (int i = 0, e = VL.size(); i != e; ++i) {
assert(!ScalarToTreeEntry.count(VL[i]) && "Scalar already in tree!");		assert(!ScalarToTreeEntry.count(VL[i]) && "Scalar already in tree!");
ScalarToTreeEntry[VL[i]] = idx;		ScalarToTreeEntry[VL[i]] = idx;
}		}
} else {		} else {
MustGather.insert(VL.begin(), VL.end());		MustGather.insert(VL.begin(), VL.end());
}		}

		if (UserTreeIdx >= 0)
		Last->UserTreeIndices.push_back(UserTreeIdx);
		UserTreeIdx = idx;
return Last;		return Last;
}		}

/// -- Vectorization State --		/// -- Vectorization State --
/// Holds all of the tree entries.		/// Holds all of the tree entries.
std::vector<TreeEntry> VectorizableTree;		std::vector<TreeEntry> VectorizableTree;

/// Maps a specific scalar to its tree entry.		/// Maps a specific scalar to its tree entry.
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines

#ifndef NDEBUG		#ifndef NDEBUG
friend inline raw_ostream &operator<<(raw_ostream &os,		friend inline raw_ostream &operator<<(raw_ostream &os,
const BoUpSLP::ScheduleData &SD) {		const BoUpSLP::ScheduleData &SD) {
SD.dump(os);		SD.dump(os);
return os;		return os;
}		}
#endif		#endif
		friend struct GraphTraits<BoUpSLP *>;
		friend struct DOTGraphTraits<BoUpSLP *>;

/// Contains all scheduling data for a basic block.		/// Contains all scheduling data for a basic block.
///		///
struct BlockScheduling {		struct BlockScheduling {

BlockScheduling(BasicBlock *BB)		BlockScheduling(BasicBlock *BB)
: BB(BB), ChunkSize(BB->size()), ChunkPos(ChunkSize),		: BB(BB), ChunkSize(BB->size()), ChunkPos(ChunkSize),
ScheduleStart(nullptr), ScheduleEnd(nullptr),		ScheduleStart(nullptr), ScheduleEnd(nullptr),
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	#endif

/// A map of scalar integer values to the smallest bit width with which they		/// A map of scalar integer values to the smallest bit width with which they
/// can legally be represented. The values map to (width, signed) pairs,		/// can legally be represented. The values map to (width, signed) pairs,
/// where "width" indicates the minimum bit width and "signed" is True if the		/// where "width" indicates the minimum bit width and "signed" is True if the
/// value must be signed-extended, rather than zero-extended, back to its		/// value must be signed-extended, rather than zero-extended, back to its
/// original width.		/// original width.
MapVector<Value *, std::pair<uint64_t, bool>> MinBWs;		MapVector<Value *, std::pair<uint64_t, bool>> MinBWs;
};		};
		} // end namespace slpvectorizer

		template <> struct GraphTraits<BoUpSLP *> {
		typedef BoUpSLP::TreeEntry TreeEntry;

		/// NodeRef has to be a pointer per the GraphWriter.
		typedef TreeEntry *NodeRef;

		/// \brief Add the VectorizableTree to the index iterator to be able to return
		/// TreeEntry pointers.
		struct ChildIteratorType
		: public iterator_adaptor_base<ChildIteratorType,
		SmallVector<int, 1>::iterator> {

		std::vector<TreeEntry> &VectorizableTree;

		ChildIteratorType(SmallVector<int, 1>::iterator W,
		std::vector<TreeEntry> &VT)
		: ChildIteratorType::iterator_adaptor_base(W), VectorizableTree(VT) {}

		NodeRef operator() { return &VectorizableTree[I]; }
		};

		static NodeRef getEntryNode(BoUpSLP &R) { return &R.VectorizableTree[0]; }

		static ChildIteratorType child_begin(NodeRef N) {
		return {N->UserTreeIndices.begin(), N->Container};
		}
		static ChildIteratorType child_end(NodeRef N) {
		return {N->UserTreeIndices.end(), N->Container};
		}

		/// For the node iterator we just need to turn the TreeEntry iterator into a
		/// TreeEntry* iterator so that it dereferences to NodeRef.
		typedef pointer_iterator<std::vector<TreeEntry>::iterator> nodes_iterator;

		static nodes_iterator nodes_begin(BoUpSLP *R) {
		return nodes_iterator(R->VectorizableTree.begin());
		}
		static nodes_iterator nodes_end(BoUpSLP *R) {
		return nodes_iterator(R->VectorizableTree.end());
		}

		static unsigned size(BoUpSLP *R) { return R->VectorizableTree.size(); }
		};

		template <> struct DOTGraphTraits<BoUpSLP *> : public DefaultDOTGraphTraits {
		typedef BoUpSLP::TreeEntry TreeEntry;

		DOTGraphTraits(bool isSimple = false) : DefaultDOTGraphTraits(isSimple) {}

		std::string getNodeLabel(const TreeEntry Entry, const BoUpSLP ) {
		std::string Str;
		raw_string_ostream OS(Str);
		if (isSplat(Entry->Scalars)) {
		OS << "<splat> " << *Entry->Scalars[0];
		return Str;
		}
		for (auto V : Entry->Scalars)
		OS << *V << "\n";
		return Str;
		}

		static std::string getNodeAttributes(const TreeEntry *Entry,
		const BoUpSLP *) {
		if (Entry->NeedToGather)
		return "color=red";
		return "";
		}
		};

} // end namespace llvm		} // end namespace llvm
} // end namespace slpvectorizer

void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst) {		ArrayRef<Value *> UserIgnoreLst) {
ExtraValueToDebugLocsMap ExternallyUsedValues;		ExtraValueToDebugLocsMap ExternallyUsedValues;
buildTree(Roots, ExternallyUsedValues, UserIgnoreLst);		buildTree(Roots, ExternallyUsedValues, UserIgnoreLst);
}		}
void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ExtraValueToDebugLocsMap &ExternallyUsedValues,		ExtraValueToDebugLocsMap &ExternallyUsedValues,
ArrayRef<Value *> UserIgnoreLst) {		ArrayRef<Value *> UserIgnoreLst) {
deleteTree();		deleteTree();
UserIgnoreList = UserIgnoreLst;		UserIgnoreList = UserIgnoreLst;
if (!allSameType(Roots))		if (!allSameType(Roots))
return;		return;
buildTree_rec(Roots, 0);		buildTree_rec(Roots, 0, -1);

// Collect the values that we need to extract from the tree.		// Collect the values that we need to extract from the tree.
for (TreeEntry &EIdx : VectorizableTree) {		for (TreeEntry &EIdx : VectorizableTree) {
TreeEntry *Entry = &EIdx;		TreeEntry *Entry = &EIdx;

// For each lane:		// For each lane:
for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {		for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
Value *Scalar = Entry->Scalars[Lane];		Value *Scalar = Entry->Scalars[Lane];
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane " <<		DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane " <<
Lane << " from " << *Scalar << ".\n");		Lane << " from " << *Scalar << ".\n");
ExternalUses.push_back(ExternalUser(Scalar, U, Lane));		ExternalUses.push_back(ExternalUser(Scalar, U, Lane));
}		}
}		}
}		}
}		}

		void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth,
void BoUpSLP::buildTree_rec(ArrayRef<Value *> VL, unsigned Depth) {		int UserTreeIdx) {
bool isAltShuffle = false;		bool isAltShuffle = false;
assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");		assert((allConstant(VL) \|\| allSameType(VL)) && "Invalid types!");

if (Depth == RecursionMaxDepth) {		if (Depth == RecursionMaxDepth) {
DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");		DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}

// Don't handle vectors.		// Don't handle vectors.
if (VL[0]->getType()->isVectorTy()) {		if (VL[0]->getType()->isVectorTy()) {
DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");		DEBUG(dbgs() << "SLP: Gathering due to vector type.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}

if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))		if (StoreInst *SI = dyn_cast<StoreInst>(VL[0]))
if (SI->getValueOperand()->getType()->isVectorTy()) {		if (SI->getValueOperand()->getType()->isVectorTy()) {
DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");		DEBUG(dbgs() << "SLP: Gathering due to store vector type.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
unsigned Opcode = getSameOpcode(VL);		unsigned Opcode = getSameOpcode(VL);

// Check that this shuffle vector refers to the alternate		// Check that this shuffle vector refers to the alternate
// sequence of opcodes.		// sequence of opcodes.
if (Opcode == Instruction::ShuffleVector) {		if (Opcode == Instruction::ShuffleVector) {
Instruction *I0 = dyn_cast<Instruction>(VL[0]);		Instruction *I0 = dyn_cast<Instruction>(VL[0]);
unsigned Op = I0->getOpcode();		unsigned Op = I0->getOpcode();
if (Op != Instruction::ShuffleVector)		if (Op != Instruction::ShuffleVector)
isAltShuffle = true;		isAltShuffle = true;
}		}

// If all of the operands are identical or constant we have a simple solution.		// If all of the operands are identical or constant we have a simple solution.
if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !Opcode) {		if (allConstant(VL) \|\| isSplat(VL) \|\| !allSameBlock(VL) \|\| !Opcode) {
DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");		DEBUG(dbgs() << "SLP: Gathering due to C,S,B,O. \n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}

// We now know that this is a vector of instructions of the same type from		// We now know that this is a vector of instructions of the same type from
// the same block.		// the same block.

// Don't vectorize ephemeral values.		// Don't vectorize ephemeral values.
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
if (EphValues.count(VL[i])) {		if (EphValues.count(VL[i])) {
DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<		DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<
") is ephemeral.\n");		") is ephemeral.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}

// Check if this is a duplicate of another entry.		// Check if this is a duplicate of another entry.
if (ScalarToTreeEntry.count(VL[0])) {		if (ScalarToTreeEntry.count(VL[0])) {
int Idx = ScalarToTreeEntry[VL[0]];		int Idx = ScalarToTreeEntry[VL[0]];
TreeEntry *E = &VectorizableTree[Idx];		TreeEntry *E = &VectorizableTree[Idx];
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");		DEBUG(dbgs() << "SLP: \tChecking bundle: " << *VL[i] << ".\n");
if (E->Scalars[i] != VL[i]) {		if (E->Scalars[i] != VL[i]) {
DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");		DEBUG(dbgs() << "SLP: Gathering due to partial overlap.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}
		E->UserTreeIndices.push_back(UserTreeIdx);
		mkuperUnsubmitted Not Done Reply Inline Actions I think this is where we want the FIXME. :-) Can you please add it -not for the entire thing, just that this here is a huge hack that makes the graph used for printing more precise than the graph actually used for vectorization... mkuper: I think this is where we want the FIXME. :-) Can you please add it -not for the entire thing…
		anemetAuthorUnsubmitted Not Done Reply Inline Actions OK, I added that. It would be good explain the shuffle-order thing as well but I will leave that to you or Shahid. Thanks for the review! anemet: OK, I added that. It would be good explain the shuffle-order thing as well but I will leave…
DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *VL[0] << ".\n");		DEBUG(dbgs() << "SLP: Perfect diamond merge at " << *VL[0] << ".\n");
return;		return;
}		}

// Check that none of the instructions in the bundle are already in the tree.		// Check that none of the instructions in the bundle are already in the tree.
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
if (ScalarToTreeEntry.count(VL[i])) {		if (ScalarToTreeEntry.count(VL[i])) {
DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<		DEBUG(dbgs() << "SLP: The instruction (" << *VL[i] <<
") is already in tree.\n");		") is already in tree.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}

// If any of the scalars is marked as a value that needs to stay scalar then		// If any of the scalars is marked as a value that needs to stay scalar then
// we need to gather the scalars.		// we need to gather the scalars.
for (unsigned i = 0, e = VL.size(); i != e; ++i) {		for (unsigned i = 0, e = VL.size(); i != e; ++i) {
if (MustGather.count(VL[i])) {		if (MustGather.count(VL[i])) {
DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");		DEBUG(dbgs() << "SLP: Gathering due to gathered scalar.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}

// Check that all of the users of the scalars that we want to vectorize are		// Check that all of the users of the scalars that we want to vectorize are
// schedulable.		// schedulable.
Instruction *VL0 = cast<Instruction>(VL[0]);		Instruction *VL0 = cast<Instruction>(VL[0]);
BasicBlock *BB = cast<Instruction>(VL0)->getParent();		BasicBlock *BB = cast<Instruction>(VL0)->getParent();

if (!DT->isReachableFromEntry(BB)) {		if (!DT->isReachableFromEntry(BB)) {
// Don't go into unreachable blocks. They may contain instructions with		// Don't go into unreachable blocks. They may contain instructions with
// dependency cycles which confuse the final scheduling.		// dependency cycles which confuse the final scheduling.
DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");		DEBUG(dbgs() << "SLP: bundle in unreachable block.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}

// Check that every instructions appears once in this bundle.		// Check that every instructions appears once in this bundle.
for (unsigned i = 0, e = VL.size(); i < e; ++i)		for (unsigned i = 0, e = VL.size(); i < e; ++i)
for (unsigned j = i+1; j < e; ++j)		for (unsigned j = i+1; j < e; ++j)
if (VL[i] == VL[j]) {		if (VL[i] == VL[j]) {
DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");		DEBUG(dbgs() << "SLP: Scalar used twice in bundle.\n");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}

auto &BSRef = BlocksSchedules[BB];		auto &BSRef = BlocksSchedules[BB];
if (!BSRef) {		if (!BSRef) {
BSRef = llvm::make_unique<BlockScheduling>(BB);		BSRef = llvm::make_unique<BlockScheduling>(BB);
}		}
BlockScheduling &BS = *BSRef.get();		BlockScheduling &BS = *BSRef.get();

if (!BS.tryScheduleBundle(VL, this)) {		if (!BS.tryScheduleBundle(VL, this)) {
DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");		DEBUG(dbgs() << "SLP: We are not able to schedule this bundle!\n");
assert((!BS.getScheduleData(VL[0]) \|\|		assert((!BS.getScheduleData(VL[0]) \|\|
!BS.getScheduleData(VL[0])->isPartOfBundle()) &&		!BS.getScheduleData(VL[0])->isPartOfBundle()) &&
"tryScheduleBundle should cancelScheduling on failure");		"tryScheduleBundle should cancelScheduling on failure");
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");		DEBUG(dbgs() << "SLP: We are able to schedule this bundle.\n");

switch (Opcode) {		switch (Opcode) {
case Instruction::PHI: {		case Instruction::PHI: {
PHINode *PH = dyn_cast<PHINode>(VL0);		PHINode *PH = dyn_cast<PHINode>(VL0);

// Check for terminator values (e.g. invoke).		// Check for terminator values (e.g. invoke).
for (unsigned j = 0; j < VL.size(); ++j)		for (unsigned j = 0; j < VL.size(); ++j)
for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
TerminatorInst *Term = dyn_cast<TerminatorInst>(		TerminatorInst *Term = dyn_cast<TerminatorInst>(
cast<PHINode>(VL[j])->getIncomingValueForBlock(PH->getIncomingBlock(i)));		cast<PHINode>(VL[j])->getIncomingValueForBlock(PH->getIncomingBlock(i)));
if (Term) {		if (Term) {
DEBUG(dbgs() << "SLP: Need to swizzle PHINodes (TerminatorInst use).\n");		DEBUG(dbgs() << "SLP: Need to swizzle PHINodes (TerminatorInst use).\n");
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}

newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");		DEBUG(dbgs() << "SLP: added a vector of PHINodes.\n");

for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {		for (unsigned i = 0, e = PH->getNumIncomingValues(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<PHINode>(j)->getIncomingValueForBlock(		Operands.push_back(cast<PHINode>(j)->getIncomingValueForBlock(
PH->getIncomingBlock(i)));		PH->getIncomingBlock(i)));

buildTree_rec(Operands, Depth + 1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::ExtractValue:		case Instruction::ExtractValue:
case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
bool Reuse = canReuseExtract(VL, Opcode);		bool Reuse = canReuseExtract(VL, Opcode);
if (Reuse) {		if (Reuse) {
DEBUG(dbgs() << "SLP: Reusing extract sequence.\n");		DEBUG(dbgs() << "SLP: Reusing extract sequence.\n");
} else {		} else {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
}		}
newTreeEntry(VL, Reuse, false);		newTreeEntry(VL, Reuse, false, UserTreeIdx);
return;		return;
}		}
case Instruction::Load: {		case Instruction::Load: {
// Check that a vectorized load would load the same memory as a scalar		// Check that a vectorized load would load the same memory as a scalar
// load.		// load.
// For example we don't want vectorize loads that are smaller than 8 bit.		// For example we don't want vectorize loads that are smaller than 8 bit.
// Even though we have a packed struct {<i2, i2, i2, i2>} LLVM treats		// Even though we have a packed struct {<i2, i2, i2, i2>} LLVM treats
// loading/storing it as an i8 struct. If we vectorize loads/stores from		// loading/storing it as an i8 struct. If we vectorize loads/stores from
// such a struct we read/write packed bits disagreeing with the		// such a struct we read/write packed bits disagreeing with the
// unvectorized version.		// unvectorized version.
Type *ScalarTy = VL[0]->getType();		Type *ScalarTy = VL[0]->getType();

if (DL->getTypeSizeInBits(ScalarTy) !=		if (DL->getTypeSizeInBits(ScalarTy) !=
DL->getTypeAllocSizeInBits(ScalarTy)) {		DL->getTypeAllocSizeInBits(ScalarTy)) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");		DEBUG(dbgs() << "SLP: Gathering loads of non-packed type.\n");
return;		return;
}		}

// Make sure all loads in the bundle are simple - we can't vectorize		// Make sure all loads in the bundle are simple - we can't vectorize
// atomic or volatile loads.		// atomic or volatile loads.
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {		for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
LoadInst *L = cast<LoadInst>(VL[i]);		LoadInst *L = cast<LoadInst>(VL[i]);
if (!L->isSimple()) {		if (!L->isSimple()) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");		DEBUG(dbgs() << "SLP: Gathering non-simple loads.\n");
return;		return;
}		}
}		}

// Check if the loads are consecutive, reversed, or neither.		// Check if the loads are consecutive, reversed, or neither.
bool Consecutive = true;		bool Consecutive = true;
bool ReverseConsecutive = true;		bool ReverseConsecutive = true;
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {		for (unsigned i = 0, e = VL.size() - 1; i < e; ++i) {
if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {		if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {
Consecutive = false;		Consecutive = false;
break;		break;
} else {		} else {
ReverseConsecutive = false;		ReverseConsecutive = false;
}		}
}		}

if (Consecutive) {		if (Consecutive) {
++NumLoadsWantToKeepOrder;		++NumLoadsWantToKeepOrder;
newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a vector of loads.\n");		DEBUG(dbgs() << "SLP: added a vector of loads.\n");
return;		return;
}		}

// If none of the load pairs were consecutive when checked in order,		// If none of the load pairs were consecutive when checked in order,
// check the reverse order.		// check the reverse order.
if (ReverseConsecutive)		if (ReverseConsecutive)
for (unsigned i = VL.size() - 1; i > 0; --i)		for (unsigned i = VL.size() - 1; i > 0; --i)
Show All 9 Lines	case Instruction::Load: {
auto NewVL = makeArrayRef(Sorted.begin(), Sorted.end());		auto NewVL = makeArrayRef(Sorted.begin(), Sorted.end());
for (unsigned i = 0, e = NewVL.size() - 1; i < e; ++i) {		for (unsigned i = 0, e = NewVL.size() - 1; i < e; ++i) {
if (!isConsecutiveAccess(NewVL[i], NewVL[i + 1], DL, SE)) {		if (!isConsecutiveAccess(NewVL[i], NewVL[i + 1], DL, SE)) {
ShuffledLoads = false;		ShuffledLoads = false;
break;		break;
}		}
}		}
if (ShuffledLoads) {		if (ShuffledLoads) {
newTreeEntry(NewVL, true, true);		newTreeEntry(NewVL, true, true, UserTreeIdx);
return;		return;
}		}
}		}
}		}

BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);

if (ReverseConsecutive) {		if (ReverseConsecutive) {
++NumLoadsWantToChangeOrder;		++NumLoadsWantToChangeOrder;
DEBUG(dbgs() << "SLP: Gathering reversed loads.\n");		DEBUG(dbgs() << "SLP: Gathering reversed loads.\n");
} else {		} else {
DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");		DEBUG(dbgs() << "SLP: Gathering non-consecutive loads.\n");
}		}
return;		return;
Show All 10 Lines	switch (Opcode) {
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
Type *SrcTy = VL0->getOperand(0)->getType();		Type *SrcTy = VL0->getOperand(0)->getType();
for (Value *Val : VL) {		for (Value *Val : VL) {
Type *Ty = cast<Instruction>(Val)->getOperand(0)->getType();		Type *Ty = cast<Instruction>(Val)->getOperand(0)->getType();
if (Ty != SrcTy \|\| !isValidElementType(Ty)) {		if (Ty != SrcTy \|\| !isValidElementType(Ty)) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: Gathering casts with different src types.\n");		DEBUG(dbgs() << "SLP: Gathering casts with different src types.\n");
return;		return;
}		}
}		}
newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a vector of casts.\n");		DEBUG(dbgs() << "SLP: added a vector of casts.\n");

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

buildTree_rec(Operands, Depth+1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
// Check that all of the compares have the same predicate.		// Check that all of the compares have the same predicate.
CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();		CmpInst::Predicate P0 = cast<CmpInst>(VL0)->getPredicate();
Type *ComparedTy = cast<Instruction>(VL[0])->getOperand(0)->getType();		Type *ComparedTy = cast<Instruction>(VL[0])->getOperand(0)->getType();
for (unsigned i = 1, e = VL.size(); i < e; ++i) {		for (unsigned i = 1, e = VL.size(); i < e; ++i) {
CmpInst *Cmp = cast<CmpInst>(VL[i]);		CmpInst *Cmp = cast<CmpInst>(VL[i]);
if (Cmp->getPredicate() != P0 \|\|		if (Cmp->getPredicate() != P0 \|\|
Cmp->getOperand(0)->getType() != ComparedTy) {		Cmp->getOperand(0)->getType() != ComparedTy) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: Gathering cmp with different predicate.\n");		DEBUG(dbgs() << "SLP: Gathering cmp with different predicate.\n");
return;		return;
}		}
}		}

newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a vector of compares.\n");		DEBUG(dbgs() << "SLP: added a vector of compares.\n");

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

buildTree_rec(Operands, Depth+1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::Select:		case Instruction::Select:
case Instruction::Add:		case Instruction::Add:
case Instruction::FAdd:		case Instruction::FAdd:
case Instruction::Sub:		case Instruction::Sub:
case Instruction::FSub:		case Instruction::FSub:
case Instruction::Mul:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
case Instruction::UDiv:		case Instruction::UDiv:
case Instruction::SDiv:		case Instruction::SDiv:
case Instruction::FDiv:		case Instruction::FDiv:
case Instruction::URem:		case Instruction::URem:
case Instruction::SRem:		case Instruction::SRem:
case Instruction::FRem:		case Instruction::FRem:
case Instruction::Shl:		case Instruction::Shl:
case Instruction::LShr:		case Instruction::LShr:
case Instruction::AShr:		case Instruction::AShr:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor: {		case Instruction::Xor: {
newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a vector of bin op.\n");		DEBUG(dbgs() << "SLP: added a vector of bin op.\n");

// Sort operands of the instructions so that each side is more likely to		// Sort operands of the instructions so that each side is more likely to
// have the same opcode.		// have the same opcode.
if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {		if (isa<BinaryOperator>(VL0) && VL0->isCommutative()) {
ValueList Left, Right;		ValueList Left, Right;
reorderInputsAccordingToOpcode(VL, Left, Right);		reorderInputsAccordingToOpcode(VL, Left, Right);
buildTree_rec(Left, Depth + 1);		buildTree_rec(Left, Depth + 1, UserTreeIdx);
buildTree_rec(Right, Depth + 1);		buildTree_rec(Right, Depth + 1, UserTreeIdx);
return;		return;
}		}

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

buildTree_rec(Operands, Depth+1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
// We don't combine GEPs with complicated (nested) indexing.		// We don't combine GEPs with complicated (nested) indexing.
for (Value *Val : VL) {		for (Value *Val : VL) {
if (cast<Instruction>(Val)->getNumOperands() != 2) {		if (cast<Instruction>(Val)->getNumOperands() != 2) {
DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");		DEBUG(dbgs() << "SLP: not-vectorizable GEP (nested indexes).\n");
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}

// We can't combine several GEPs into one vector if they operate on		// We can't combine several GEPs into one vector if they operate on
// different types.		// different types.
Type *Ty0 = cast<Instruction>(VL0)->getOperand(0)->getType();		Type *Ty0 = cast<Instruction>(VL0)->getOperand(0)->getType();
for (Value *Val : VL) {		for (Value *Val : VL) {
Type *CurTy = cast<Instruction>(Val)->getOperand(0)->getType();		Type *CurTy = cast<Instruction>(Val)->getOperand(0)->getType();
if (Ty0 != CurTy) {		if (Ty0 != CurTy) {
DEBUG(dbgs() << "SLP: not-vectorizable GEP (different types).\n");		DEBUG(dbgs() << "SLP: not-vectorizable GEP (different types).\n");
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}

// We don't combine GEPs with non-constant indexes.		// We don't combine GEPs with non-constant indexes.
for (Value *Val : VL) {		for (Value *Val : VL) {
auto Op = cast<Instruction>(Val)->getOperand(1);		auto Op = cast<Instruction>(Val)->getOperand(1);
if (!isa<ConstantInt>(Op)) {		if (!isa<ConstantInt>(Op)) {
DEBUG(		DEBUG(
dbgs() << "SLP: not-vectorizable GEP (non-constant indexes).\n");		dbgs() << "SLP: not-vectorizable GEP (non-constant indexes).\n");
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
return;		return;
}		}
}		}

newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");		DEBUG(dbgs() << "SLP: added a vector of GEPs.\n");
for (unsigned i = 0, e = 2; i < e; ++i) {		for (unsigned i = 0, e = 2; i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

buildTree_rec(Operands, Depth + 1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::Store: {		case Instruction::Store: {
// Check if the stores are consecutive or of we need to swizzle them.		// Check if the stores are consecutive or of we need to swizzle them.
for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)		for (unsigned i = 0, e = VL.size() - 1; i < e; ++i)
if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {		if (!isConsecutiveAccess(VL[i], VL[i + 1], DL, SE)) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: Non-consecutive store.\n");		DEBUG(dbgs() << "SLP: Non-consecutive store.\n");
return;		return;
}		}

newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a vector of stores.\n");		DEBUG(dbgs() << "SLP: added a vector of stores.\n");

ValueList Operands;		ValueList Operands;
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(0));		Operands.push_back(cast<Instruction>(j)->getOperand(0));

buildTree_rec(Operands, Depth + 1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
return;		return;
}		}
case Instruction::Call: {		case Instruction::Call: {
// Check if the calls are all to the same vectorizable intrinsic.		// Check if the calls are all to the same vectorizable intrinsic.
CallInst *CI = cast<CallInst>(VL[0]);		CallInst *CI = cast<CallInst>(VL[0]);
// Check if this is an Intrinsic call or something that can be		// Check if this is an Intrinsic call or something that can be
// represented by an intrinsic call		// represented by an intrinsic call
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
if (!isTriviallyVectorizable(ID)) {		if (!isTriviallyVectorizable(ID)) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");		DEBUG(dbgs() << "SLP: Non-vectorizable call.\n");
return;		return;
}		}
Function *Int = CI->getCalledFunction();		Function *Int = CI->getCalledFunction();
Value *A1I = nullptr;		Value *A1I = nullptr;
if (hasVectorInstrinsicScalarOpd(ID, 1))		if (hasVectorInstrinsicScalarOpd(ID, 1))
A1I = CI->getArgOperand(1);		A1I = CI->getArgOperand(1);
for (unsigned i = 1, e = VL.size(); i != e; ++i) {		for (unsigned i = 1, e = VL.size(); i != e; ++i) {
CallInst *CI2 = dyn_cast<CallInst>(VL[i]);		CallInst *CI2 = dyn_cast<CallInst>(VL[i]);
if (!CI2 \|\| CI2->getCalledFunction() != Int \|\|		if (!CI2 \|\| CI2->getCalledFunction() != Int \|\|
getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|		getVectorIntrinsicIDForCall(CI2, TLI) != ID \|\|
!CI->hasIdenticalOperandBundleSchema(*CI2)) {		!CI->hasIdenticalOperandBundleSchema(*CI2)) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << VL[i]		DEBUG(dbgs() << "SLP: mismatched calls:" << CI << "!=" << VL[i]
<< "\n");		<< "\n");
return;		return;
}		}
// ctlz,cttz and powi are special intrinsics whose second argument		// ctlz,cttz and powi are special intrinsics whose second argument
// should be same in order for them to be vectorized.		// should be same in order for them to be vectorized.
if (hasVectorInstrinsicScalarOpd(ID, 1)) {		if (hasVectorInstrinsicScalarOpd(ID, 1)) {
Value *A1J = CI2->getArgOperand(1);		Value *A1J = CI2->getArgOperand(1);
if (A1I != A1J) {		if (A1I != A1J) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI		DEBUG(dbgs() << "SLP: mismatched arguments in call:" << *CI
<< " argument "<< A1I<<"!=" << A1J		<< " argument "<< A1I<<"!=" << A1J
<< "\n");		<< "\n");
return;		return;
}		}
}		}
// Verify that the bundle operands are identical between the two calls.		// Verify that the bundle operands are identical between the two calls.
if (CI->hasOperandBundles() &&		if (CI->hasOperandBundles() &&
!std::equal(CI->op_begin() + CI->getBundleOperandsStartIndex(),		!std::equal(CI->op_begin() + CI->getBundleOperandsStartIndex(),
CI->op_begin() + CI->getBundleOperandsEndIndex(),		CI->op_begin() + CI->getBundleOperandsEndIndex(),
CI2->op_begin() + CI2->getBundleOperandsStartIndex())) {		CI2->op_begin() + CI2->getBundleOperandsStartIndex())) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" << *CI << "!="		DEBUG(dbgs() << "SLP: mismatched bundle operands in calls:" << *CI << "!="
<< *VL[i] << '\n');		<< *VL[i] << '\n');
return;		return;
}		}
}		}

newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {		for (unsigned i = 0, e = CI->getNumArgOperands(); i != e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL) {		for (Value *j : VL) {
CallInst *CI2 = dyn_cast<CallInst>(j);		CallInst *CI2 = dyn_cast<CallInst>(j);
Operands.push_back(CI2->getArgOperand(i));		Operands.push_back(CI2->getArgOperand(i));
}		}
buildTree_rec(Operands, Depth + 1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
// If this is not an alternate sequence of opcode like add-sub		// If this is not an alternate sequence of opcode like add-sub
// then do not vectorize this instruction.		// then do not vectorize this instruction.
if (!isAltShuffle) {		if (!isAltShuffle) {
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");		DEBUG(dbgs() << "SLP: ShuffleVector are not vectorized.\n");
return;		return;
}		}
newTreeEntry(VL, true, false);		newTreeEntry(VL, true, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");		DEBUG(dbgs() << "SLP: added a ShuffleVector op.\n");

// Reorder operands if reordering would enable vectorization.		// Reorder operands if reordering would enable vectorization.
if (isa<BinaryOperator>(VL0)) {		if (isa<BinaryOperator>(VL0)) {
ValueList Left, Right;		ValueList Left, Right;
reorderAltShuffleOperands(VL, Left, Right);		reorderAltShuffleOperands(VL, Left, Right);
buildTree_rec(Left, Depth + 1);		buildTree_rec(Left, Depth + 1, UserTreeIdx);
buildTree_rec(Right, Depth + 1);		buildTree_rec(Right, Depth + 1, UserTreeIdx);
return;		return;
}		}

for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {		for (unsigned i = 0, e = VL0->getNumOperands(); i < e; ++i) {
ValueList Operands;		ValueList Operands;
// Prepare the operand vector.		// Prepare the operand vector.
for (Value *j : VL)		for (Value *j : VL)
Operands.push_back(cast<Instruction>(j)->getOperand(i));		Operands.push_back(cast<Instruction>(j)->getOperand(i));

buildTree_rec(Operands, Depth + 1);		buildTree_rec(Operands, Depth + 1, UserTreeIdx);
}		}
return;		return;
}		}
default:		default:
BS.cancelScheduling(VL);		BS.cancelScheduling(VL);
newTreeEntry(VL, false, false);		newTreeEntry(VL, false, false, UserTreeIdx);
DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");		DEBUG(dbgs() << "SLP: Gathering unknown instruction.\n");
return;		return;
}		}
}		}

unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {		unsigned BoUpSLP::canMapToVector(Type *T, const DataLayout &DL) const {
unsigned N;		unsigned N;
Type *EltTy;		Type *EltTy;
▲ Show 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	if (MinBWs.count(ScalarRoot)) {
ExtractCost +=		ExtractCost +=
TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);		TTI->getVectorInstrCost(Instruction::ExtractElement, VecTy, EU.Lane);
}		}
}		}

int SpillCost = getSpillCost();		int SpillCost = getSpillCost();
Cost += SpillCost + ExtractCost;		Cost += SpillCost + ExtractCost;

DEBUG(dbgs() << "SLP: Spill Cost = " << SpillCost << ".\n"		std::string Str;
		{
		raw_string_ostream OS(Str);
		OS << "SLP: Spill Cost = " << SpillCost << ".\n"
		mkuperUnsubmitted Done Reply Inline Actions Any reason not to render this as part of the graph regardless of NDEBUG? mkuper: Any reason not to render this as part of the graph regardless of NDEBUG?
<< "SLP: Extract Cost = " << ExtractCost << ".\n"		<< "SLP: Extract Cost = " << ExtractCost << ".\n"
<< "SLP: Total Cost = " << Cost << ".\n");		<< "SLP: Total Cost = " << Cost << ".\n";
		}
		DEBUG(dbgs() << Str);

		if (ViewSLPTree)
		ViewGraph(this, "SLP" + F->getName(), false, Str);

return Cost;		return Cost;
}		}

int BoUpSLP::getGatherCost(Type *Ty) {		int BoUpSLP::getGatherCost(Type *Ty) {
int Cost = 0;		int Cost = 0;
for (unsigned i = 0, e = cast<VectorType>(Ty)->getNumElements(); i < e; ++i)		for (unsigned i = 0, e = cast<VectorType>(Ty)->getNumElements(); i < e; ++i)
Cost += TTI->getVectorInstrCost(Instruction::InsertElement, Ty, i);		Cost += TTI->getVectorInstrCost(Instruction::InsertElement, Ty, i);
return Cost;		return Cost;
▲ Show 20 Lines • Show All 3,124 Lines • Show Last 20 Lines