This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Generalize vectorizeStores to support loads as well NFC.
AbandonedPublic

Authored by fhahn on Sep 12 2017, 5:24 AM.

Download Raw Diff

Details

Reviewers

ABataev
dtemirbulatov
RKSimon

Summary

This is a preparation for D37737.

Diff Detail

Event Timeline

fhahn created this revision.Sep 12 2017, 5:24 AM

Herald added a subscriber: rengolin. · View Herald TranscriptSep 12 2017, 5:24 AM

fhahn added a child revision: D37737: [SLPVectorizer] Merge subsequent gather loads..Sep 12 2017, 5:24 AM

RKSimon added reviewers: ABataev, dtemirbulatov, RKSimon.Sep 12 2017, 5:40 AM

tests? benchmarks results?

Hi Florian,

I'm curious to know if you have a motivating example for this change. For a load-rooted tree (and since we build the trees bottom-up), I think the depth would always be 1. I'm surprised the cost model would find this to be profitable. The loaded values would probably need to be extracted before they are used. Also, does the order of vectorization matter with this patch? Say we currently vectorize a store-rooted tree ending in consecutive loads. Is it possible with this patch that we would vectorize the loads first and then no longer be able to vectorize the entire tree? And lastly, if you have a particular case in mind that you're trying to optimize, do you know if the DAG combiner's consecutive load/store optimizations are helpful?

sorry for not responding sooner. Thanks for the feedback. I am currently swamped in other work, but I hope I can revisit this patch soon!

Not needed anymore, fixed by D36130

Revision Contents

Path

Size

include/

llvm/

Transforms/

Vectorize/

SLPVectorizer.h

4 lines

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

244 lines

Diff 114804

include/llvm/Transforms/Vectorize/SLPVectorizer.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
/// These are implementation details and should not be used by clients.		/// These are implementation details and should not be used by clients.
namespace slpvectorizer {		namespace slpvectorizer {

class BoUpSLP;		class BoUpSLP;

} // end namespace slpvectorizer		} // end namespace slpvectorizer

struct SLPVectorizerPass : public PassInfoMixin<SLPVectorizerPass> {		struct SLPVectorizerPass : public PassInfoMixin<SLPVectorizerPass> {
using StoreList = SmallVector<StoreInst *, 8>;		using StoreList = SmallVector<Instruction *, 8>;
using StoreListMap = MapVector<Value *, StoreList>;		using StoreListMap = MapVector<Value *, StoreList>;
using WeakTrackingVHList = SmallVector<WeakTrackingVH, 8>;		using WeakTrackingVHList = SmallVector<WeakTrackingVH, 8>;
using WeakTrackingVHListMap = MapVector<Value *, WeakTrackingVHList>;		using WeakTrackingVHListMap = MapVector<Value *, WeakTrackingVHList>;

ScalarEvolution *SE = nullptr;		ScalarEvolution *SE = nullptr;
TargetTransformInfo *TTI = nullptr;		TargetTransformInfo *TTI = nullptr;
TargetLibraryInfo *TLI = nullptr;		TargetLibraryInfo *TLI = nullptr;
AliasAnalysis *AA = nullptr;		AliasAnalysis *AA = nullptr;
▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	private:

/// \brief Scan the basic block and look for patterns that are likely to start		/// \brief Scan the basic block and look for patterns that are likely to start
/// a vectorization chain.		/// a vectorization chain.
bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);		bool vectorizeChainsInBlock(BasicBlock *BB, slpvectorizer::BoUpSLP &R);

bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,		bool vectorizeStoreChain(ArrayRef<Value *> Chain, slpvectorizer::BoUpSLP &R,
unsigned VecRegSize);		unsigned VecRegSize);

bool vectorizeStores(ArrayRef<StoreInst *> Stores, slpvectorizer::BoUpSLP &R);

/// The store instructions in a basic block organized by base pointer.		/// The store instructions in a basic block organized by base pointer.
StoreListMap Stores;		StoreListMap Stores;

/// The getelementptr instructions in a basic block organized by base pointer.		/// The getelementptr instructions in a basic block organized by base pointer.
WeakTrackingVHListMap GEPs;		WeakTrackingVHListMap GEPs;
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H		#endif // LLVM_TRANSFORMS_VECTORIZE_SLPVECTORIZER_H

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 3,295 Lines • ▼ Show 20 Lines	#endif
}		}
}		}

Builder.ClearInsertionPoint();		Builder.ClearInsertionPoint();

return VectorizableTree[0].VectorizedValue;		return VectorizableTree[0].VectorizedValue;
}		}

		/// \brief Check that the Values in the slice in VL array are still existent in
		/// the WeakTrackingVH array.
		/// Vectorization of part of the VL array may cause later values in the VL array
		/// to become invalid. We track when this has happened in the WeakTrackingVH
		/// array.
		static bool hasValueBeenRAUWed(ArrayRef<Value *> VL,
		ArrayRef<WeakTrackingVH> VH, unsigned SliceBegin,
		unsigned SliceSize) {
		VL = VL.slice(SliceBegin, SliceSize);
		VH = VH.slice(SliceBegin, SliceSize);
		return !std::equal(VL.begin(), VL.end(), VH.begin());
		}

		bool vectorizeAccessChain(ArrayRef<Value *> Chain, BoUpSLP &R,
		unsigned VecRegSize) {
		assert(!Chain.empty() &&
		(isa<LoadInst>(Chain[0]) \|\| isa<StoreInst>(Chain[0])) &&
		"Chain has to be non-empty and contain load or store instructions");

		unsigned ChainLen = Chain.size();
		DEBUG(dbgs() << "SLP: Analyzing a "
		<< (isa<StoreInst>(Chain[0]) ? "store" : "load")
		<< " chain of length " << ChainLen << "\n");
		unsigned Sz = R.getVectorElementSize(Chain[0]);
		unsigned VF = VecRegSize / Sz;

		if (!isPowerOf2_32(Sz) \|\| VF < 2)
		return false;

		// Keep track of values that were deleted by vectorizing in the loop below.
		SmallVector<WeakTrackingVH, 8> TrackValues(Chain.begin(), Chain.end());

		bool Changed = false;
		// Look for profitable vectorizable trees at all offsets, starting at zero.
		for (unsigned i = 0, e = ChainLen; i < e; ++i) {
		if (i + VF > e)
		break;

		// Check that a previous iteration of this loop did not delete the Value.
		if (hasValueBeenRAUWed(Chain, TrackValues, i, VF))
		continue;

		DEBUG(dbgs() << "SLP: Analyzing " << VF << " "
		<< (isa<StoreInst>(Chain[0]) ? "stores" : "loads")
		<< " at offset " << i << "\n");
		ArrayRef<Value *> Operands = Chain.slice(i, VF);

		R.buildTree(Operands);
		if (R.isTreeTinyAndNotFullyVectorizable())
		continue;

		R.computeMinimumValueSizes();

		int Cost = R.getTreeCost();

		DEBUG(dbgs() << "SLP: Found cost=" << Cost << " for VF=" << VF << "\n");
		if (Cost < -SLPCostThreshold) {
		DEBUG(dbgs() << "SLP: Decided to vectorize cost=" << Cost << "\n");

		using namespace ore;
		auto *ORE = R.getORE();
		if (isa<StoreInst>(Chain[i]))
		ORE->emit(OptimizationRemark(SV_NAME, "StoresVectorized",
		cast<StoreInst>(Chain[i]))
		<< " Stores SLP vectorized with cost " << NV("Cost", Cost)
		<< " and with tree size " << NV("TreeSize", R.getTreeSize()));
		else
		ORE->emit(OptimizationRemark(SV_NAME, "LoadsVectorized",
		cast<LoadInst>(Chain[i]))
		<< " Loads SLP vectorized with cost " << NV("Cost", Cost)
		<< " and with tree size " << NV("TreeSize", R.getTreeSize()));

		R.vectorizeTree();

		// Move to the next bundle.
		i += VF - 1;
		Changed = true;
		}
		}

		return Changed;
		}


		static bool
		vectorizeAccesses(ArrayRef<Instruction *> Accesses, BoUpSLP &R,
		const DataLayout DL, ScalarEvolution SE) {
		SetVector<Instruction *> Heads, Tails;
		SmallDenseMap<Instruction, Instruction > ConsecutiveChain;

		// We may run into multiple chains that merge into a single chain. We mark the
		// stores that we vectorized so that we don't visit the same store twice.
		BoUpSLP::ValueSet VectorizedAccesses;

		// Do a quadratic search on all of the given loads or stores and find
		// all of the pairs of loads or stores that follow each other.
		SmallVector<unsigned, 16> IndexQueue;
		for (unsigned i = 0, e = Accesses.size(); i < e; ++i) {
		IndexQueue.clear();
		// If a load or store has multiple consecutive store candidates, search
		// array according to the sequence: from i+1 to e, then from i-1 to 0.
		// This is because usually pairing with immediate succeeding or preceding
		// candidate create the best chance to find slp vectorization opportunity.
		unsigned j = 0;
		for (j = i + 1; j < e; ++j)
		IndexQueue.push_back(j);
		for (j = i; j > 0; --j)
		IndexQueue.push_back(j - 1);

		for (auto &k : IndexQueue) {
		if (isConsecutiveAccess(Accesses[i], Accesses[k], DL, SE)) {
		Tails.insert(Accesses[k]);
		Heads.insert(Accesses[i]);
		ConsecutiveChain[Accesses[i]] = Accesses[k];
		break;
		}
		}
		}

		bool Changed = false;

		// For loads or stores that start but don't end a link in the chain:
		for (SetVector<Instruction *>::iterator it = Heads.begin(), e = Heads.end();
		it != e; ++it) {
		if (Tails.count(*it))
		continue;

		// We found a load or store instr that starts a chain. Now follow the chain
		// and try to vectorize it.
		BoUpSLP::ValueList Operands;
		Instruction I = it;
		// Collect the chain into a list.
		while (Tails.count(I) \|\| Heads.count(I)) {
		if (VectorizedAccesses.count(I))
		break;
		Operands.push_back(I);
		// Move to the next value in the chain.
		I = ConsecutiveChain[I];
		}
		//
		// FIXME: Is division-by-2 the correct step? Should we assert that the
		// register size is a power-of-2?
		for (unsigned Size = R.getMaxVecRegSize(); Size >= R.getMinVecRegSize();
		Size /= 2) {
		if (vectorizeAccessChain(Operands, R, Size)) {
		// Mark the vectorized stores so that we don't vectorize them again.
		VectorizedAccesses.insert(Operands.begin(), Operands.end());
		Changed = true;
		break;
		}
		}
		}

		return Changed;
		}


void BoUpSLP::optimizeGatherSequence() {		void BoUpSLP::optimizeGatherSequence() {
DEBUG(dbgs() << "SLP: Optimizing " << GatherSeq.size()		DEBUG(dbgs() << "SLP: Optimizing " << GatherSeq.size()
<< " gather sequences instructions.\n");		<< " gather sequences instructions.\n");
// LICM InsertElementInst sequences.		// LICM InsertElementInst sequences.

for (Instruction *it : GatherSeq) {		for (Instruction *it : GatherSeq) {
InsertElementInst *Insert = dyn_cast<InsertElementInst>(it);		InsertElementInst *Insert = dyn_cast<InsertElementInst>(it);

if (!Insert)		if (!Insert)
continue;		continue;

// Check if this block is inside a loop.		// Check if this block is inside a loop.
Loop *L = LI->getLoopFor(Insert->getParent());		Loop *L = LI->getLoopFor(Insert->getParent());
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e;) {
}		}
}		}
if (In) {		if (In) {
assert(!is_contained(Visited, In));		assert(!is_contained(Visited, In));
Visited.push_back(In);		Visited.push_back(In);
}		}
}		}
}		}

CSEBlocks.clear();		CSEBlocks.clear();
GatherSeq.clear();		GatherSeq.clear();
}		}

// Groups the instructions to a bundle (which is then a single scheduling entity)		// Groups the instructions to a bundle (which is then a single scheduling entity)
// and schedules instructions until the bundle gets ready.		// and schedules instructions until the bundle gets ready.
bool BoUpSLP::BlockScheduling::tryScheduleBundle(ArrayRef<Value *> VL,		bool BoUpSLP::BlockScheduling::tryScheduleBundle(ArrayRef<Value *> VL,
BoUpSLP SLP, Value OpValue) {		BoUpSLP SLP, Value OpValue) {
▲ Show 20 Lines • Show All 839 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::runImpl(Function &F, ScalarEvolution *SE_,
if (Changed) {		if (Changed) {
R.optimizeGatherSequence();		R.optimizeGatherSequence();
DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");		DEBUG(dbgs() << "SLP: vectorized \"" << F.getName() << "\"\n");
DEBUG(verifyFunction(F));		DEBUG(verifyFunction(F));
}		}
return Changed;		return Changed;
}		}

/// \brief Check that the Values in the slice in VL array are still existent in
/// the WeakTrackingVH array.
/// Vectorization of part of the VL array may cause later values in the VL array
/// to become invalid. We track when this has happened in the WeakTrackingVH
/// array.
static bool hasValueBeenRAUWed(ArrayRef<Value *> VL,
ArrayRef<WeakTrackingVH> VH, unsigned SliceBegin,
unsigned SliceSize) {
VL = VL.slice(SliceBegin, SliceSize);
VH = VH.slice(SliceBegin, SliceSize);
return !std::equal(VL.begin(), VL.end(), VH.begin());
}

bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,		bool SLPVectorizerPass::vectorizeStoreChain(ArrayRef<Value *> Chain, BoUpSLP &R,
unsigned VecRegSize) {		unsigned VecRegSize) {
unsigned ChainLen = Chain.size();		unsigned ChainLen = Chain.size();
DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << ChainLen		DEBUG(dbgs() << "SLP: Analyzing a store chain of length " << ChainLen
<< "\n");		<< "\n");
unsigned Sz = R.getVectorElementSize(Chain[0]);		unsigned Sz = R.getVectorElementSize(Chain[0]);
unsigned VF = VecRegSize / Sz;		unsigned VF = VecRegSize / Sz;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	if (Cost < -SLPCostThreshold) {
i += VF - 1;		i += VF - 1;
Changed = true;		Changed = true;
}		}
}		}

return Changed;		return Changed;
}		}

bool SLPVectorizerPass::vectorizeStores(ArrayRef<StoreInst *> Stores,
BoUpSLP &R) {
SetVector<StoreInst *> Heads, Tails;
SmallDenseMap<StoreInst , StoreInst > ConsecutiveChain;

// We may run into multiple chains that merge into a single chain. We mark the
// stores that we vectorized so that we don't visit the same store twice.
BoUpSLP::ValueSet VectorizedStores;
bool Changed = false;

// Do a quadratic search on all of the given stores and find
// all of the pairs of stores that follow each other.
SmallVector<unsigned, 16> IndexQueue;
for (unsigned i = 0, e = Stores.size(); i < e; ++i) {
IndexQueue.clear();
// If a store has multiple consecutive store candidates, search Stores
// array according to the sequence: from i+1 to e, then from i-1 to 0.
// This is because usually pairing with immediate succeeding or preceding
// candidate create the best chance to find slp vectorization opportunity.
unsigned j = 0;
for (j = i + 1; j < e; ++j)
IndexQueue.push_back(j);
for (j = i; j > 0; --j)
IndexQueue.push_back(j - 1);

for (auto &k : IndexQueue) {
if (isConsecutiveAccess(Stores[i], Stores[k], DL, SE)) {
Tails.insert(Stores[k]);
Heads.insert(Stores[i]);
ConsecutiveChain[Stores[i]] = Stores[k];
break;
}
}
}

// For stores that start but don't end a link in the chain:
for (SetVector<StoreInst *>::iterator it = Heads.begin(), e = Heads.end();
it != e; ++it) {
if (Tails.count(*it))
continue;

// We found a store instr that starts a chain. Now follow the chain and try
// to vectorize it.
BoUpSLP::ValueList Operands;
StoreInst I = it;
// Collect the chain into a list.
while (Tails.count(I) \|\| Heads.count(I)) {
if (VectorizedStores.count(I))
break;
Operands.push_back(I);
// Move to the next value in the chain.
I = ConsecutiveChain[I];
}

// FIXME: Is division-by-2 the correct step? Should we assert that the
// register size is a power-of-2?
for (unsigned Size = R.getMaxVecRegSize(); Size >= R.getMinVecRegSize();
Size /= 2) {
if (vectorizeStoreChain(Operands, R, Size)) {
// Mark the vectorized stores so that we don't vectorize them again.
VectorizedStores.insert(Operands.begin(), Operands.end());
Changed = true;
break;
}
}
}

return Changed;
}

void SLPVectorizerPass::collectSeedInstructions(BasicBlock *BB) {		void SLPVectorizerPass::collectSeedInstructions(BasicBlock *BB) {
// Initialize the collections. We will make a single pass over the block.		// Initialize the collections. We will make a single pass over the block.
Stores.clear();		Stores.clear();
GEPs.clear();		GEPs.clear();

// Visit the store and getelementptr instructions in BB and organize them in		// Visit the store and getelementptr instructions in BB and organize them in
// Stores and GEPs according to the underlying objects of their pointer		// Stores and GEPs according to the underlying objects of their pointer
// operands.		// operands.
▲ Show 20 Lines • Show All 1,409 Lines • ▼ Show 20 Lines	DEBUG(dbgs() << "SLP: Analyzing a store chain of length "
<< it->second.size() << ".\n");		<< it->second.size() << ".\n");

// Process the stores in chunks of 16.		// Process the stores in chunks of 16.
// TODO: The limit of 16 inhibits greater vectorization factors.		// TODO: The limit of 16 inhibits greater vectorization factors.
// For example, AVX2 supports v32i8. Increasing this limit, however,		// For example, AVX2 supports v32i8. Increasing this limit, however,
// may cause a significant compile-time increase.		// may cause a significant compile-time increase.
for (unsigned CI = 0, CE = it->second.size(); CI < CE; CI+=16) {		for (unsigned CI = 0, CE = it->second.size(); CI < CE; CI+=16) {
unsigned Len = std::min<unsigned>(CE - CI, 16);		unsigned Len = std::min<unsigned>(CE - CI, 16);
Changed \|= vectorizeStores(makeArrayRef(&it->second[CI], Len), R);		Changed \|= vectorizeAccesses(makeArrayRef(&it->second[CI], Len), R, DL,
		SE);
}		}
}		}
return Changed;		return Changed;
}		}

char SLPVectorizer::ID = 0;		char SLPVectorizer::ID = 0;

static const char lv_name[] = "SLP Vectorizer";		static const char lv_name[] = "SLP Vectorizer";
Show All 12 Lines