This is an archive of the discontinued LLVM Phabricator instance.

[LV] Refactor Cost Model's selectVectorizationFactor(), driven by a LoopVectorizationPlanner
ClosedPublic

Authored by Ayal on Mar 6 2017, 7:46 AM.

Download Raw Diff

Details

Reviewers

rengolin
anemet
mkuper

Commits

rG928ec405843e: [LV] Refactor Cost Model's selectVectorizationFactor(); NFC
rL297737: [LV] Refactor Cost Model's selectVectorizationFactor(); NFC

Summary

Refactoring Cost Model's selectVectorizationFactor() so that it handles only the selection of the best VF from a pre-computed range of candidate VF's, extracting early-exit criteria and the computation of a MaxVF upper-bound to other methods, all driven by a LoopVectorizationPlanner.

Follows https://reviews.llvm.org/D28975 and its tentative breakdown starting with the 1st item "refactor Cost-Model to provide MaxVF and early-exit methods.". The refactoring and Planner proposed in this patch are independent of VPlan though.

No change in output intended.

Join work with Gil.

Diff Detail

Event Timeline

Ayal created this revision.Mar 6 2017, 7:46 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptMar 6 2017, 7:46 AM

mssimpso added a subscriber: mssimpso.Mar 6 2017, 7:56 AM

mkuper added inline comments.Mar 6 2017, 1:12 PM

lib/Transforms/Vectorize/LoopVectorize.cpp

6164–6175

Any reason not to use Optional<> instead?

7395

Maybe:

DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
if (UserVF != 1)
  CM.selectUserVectorizationFactor(UserVF);
return {UserVF, 0}

(I think isPowerOf2_32(1) is true.)

Ayal added inline comments.Mar 6 2017, 2:28 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6164–6175	Ahh, sure, done. Any suggestion for a better method name, avoiding having two computeMaxVF()'s?
7395	Right, isPowerOf2_32(1) is true. Will actually simplify this further by dropping the "if (UserVF != 1)", since selectUserVectorizationFactor() knows how to handle a UserVF of 1 (namely, by doing nothing).

mkuper added inline comments.Mar 6 2017, 4:18 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6164–6175	Not really, since we're sort of mixing VF selection with legality in this function. :-\
6200	I understand this is the way it has always been, and isn't changing in this patch, but now I realize it's fairly odd. Why are we bailing on "TC % MaxVF != 0" instead of trying to reduce MaxVF so that it actually is 0? Am I missing something here? If not, could you add a FIXME, and/or fix it in a follow-up commit?

Ayal/Gil,

I have no suggestions for this patch beyond what Michael has already mentioned. But I really like the way you're splitting up the larger patch into smaller pieces. This will make it much easier to review. Thanks!

Ayal added inline comments.Mar 7 2017, 2:02 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6164–6175	These checks could potentially move into Legality, but in some sense they are "early pruning" due to excessive cost, rather than "legal" obstacles that cannot be handled. Plus only (Max)VF is considered here, not (Max)UF.
6200	It would indeed be better to search for a smaller MaxVF that does divide TC, instead of giving up. Added a FIXME. We should also check if loop requiresScalarEpilog(), which in turn should be determined more accurately per-VF. Added another FIXME.

Addressing review suggestions. Thanks!

LGTM

This revision is now accepted and ready to land.Mar 7 2017, 2:29 PM

Closed by commit rL297737: [LV] Refactor Cost Model's selectVectorizationFactor(); NFC (authored by ayalz). · Explain WhyMar 14 2017, 6:19 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

205 lines

Diff 90933

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
// Vectorizing Compilers.		// Vectorizing Compilers.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SCCIterator.h"		#include "llvm/ADT/SCCIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
▲ Show 20 Lines • Show All 1,789 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel(Loop *L, PredicatedScalarEvolution &PSE,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
const TargetLibraryInfo TLI, DemandedBits DB,		const TargetLibraryInfo TLI, DemandedBits DB,
AssumptionCache *AC,		AssumptionCache *AC,
OptimizationRemarkEmitter ORE, const Function F,		OptimizationRemarkEmitter ORE, const Function F,
const LoopVectorizeHints *Hints)		const LoopVectorizeHints *Hints)
: TheLoop(L), PSE(PSE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI), DB(DB),		: TheLoop(L), PSE(PSE), LI(LI), Legal(Legal), TTI(TTI), TLI(TLI), DB(DB),
AC(AC), ORE(ORE), TheFunction(F), Hints(Hints) {}		AC(AC), ORE(ORE), TheFunction(F), Hints(Hints) {}

		/// \return An upper bound for the vectorization factor, or None if
		/// vectorization should be avoided up front.
		Optional<unsigned> computeMaxVF(bool OptForSize);

/// Information about vectorization costs		/// Information about vectorization costs
struct VectorizationFactor {		struct VectorizationFactor {
unsigned Width; // Vector width with best cost		unsigned Width; // Vector width with best cost
unsigned Cost; // Cost of the loop with that width		unsigned Cost; // Cost of the loop with that width
};		};
/// \return The most profitable vectorization factor and the cost of that VF.		/// \return The most profitable vectorization factor and the cost of that VF.
/// This method checks every power of two up to VF. If UserVF is not ZERO		/// This method checks every power of two up to MaxVF. If UserVF is not ZERO
/// then this vectorization factor will be selected if vectorization is		/// then this vectorization factor will be selected if vectorization is
/// possible.		/// possible.
VectorizationFactor selectVectorizationFactor(bool OptForSize);		VectorizationFactor selectVectorizationFactor(unsigned MaxVF);

		/// Setup cost-based decisions for user vectorization factor.
		void selectUserVectorizationFactor(unsigned UserVF) {
		collectUniformsAndScalars(UserVF);
		collectInstsToScalarize(UserVF);
		}

/// \return The size (in bits) of the smallest and widest types in the code		/// \return The size (in bits) of the smallest and widest types in the code
/// that needs to be vectorized. We ignore values that remain scalar such as		/// that needs to be vectorized. We ignore values that remain scalar such as
/// 64 bit loop indices.		/// 64 bit loop indices.
std::pair<unsigned, unsigned> getSmallestAndWidestTypes();		std::pair<unsigned, unsigned> getSmallestAndWidestTypes();

/// \return The desired interleave count.		/// \return The desired interleave count.
/// If interleave count has been specified by metadata it will be returned.		/// If interleave count has been specified by metadata it will be returned.
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	bool isOptimizableIVTruncate(Instruction *I, unsigned VF) {
if (Op != Legal->getPrimaryInduction() && TTI.isTruncateFree(SrcTy, DestTy))		if (Op != Legal->getPrimaryInduction() && TTI.isTruncateFree(SrcTy, DestTy))
return false;		return false;

// If the truncated value is not an induction variable, return false.		// If the truncated value is not an induction variable, return false.
return Legal->isInductionVariable(Op);		return Legal->isInductionVariable(Op);
}		}

private:		private:
		/// \return An upper bound for the vectorization factor, larger than zero.
		/// One is returned if vectorization should best be avoided due to cost.
		unsigned computeFeasibleMaxVF(bool OptForSize);

/// The vectorization cost is a combination of the cost itself and a boolean		/// The vectorization cost is a combination of the cost itself and a boolean
/// indicating whether any of the contributing operations will actually		/// indicating whether any of the contributing operations will actually
/// operate on		/// operate on
/// vector values after type legalization in the backend. If this latter value		/// vector values after type legalization in the backend. If this latter value
/// is		/// is
/// false, then all operations will be scalarized (i.e. no vectorization has		/// false, then all operations will be scalarized (i.e. no vectorization has
/// actually taken place).		/// actually taken place).
typedef std::pair<unsigned, bool> VectorizationCostTy;		typedef std::pair<unsigned, bool> VectorizationCostTy;
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	public:
/// Loop Vectorize Hint.		/// Loop Vectorize Hint.
const LoopVectorizeHints *Hints;		const LoopVectorizeHints *Hints;
/// Values to ignore in the cost model.		/// Values to ignore in the cost model.
SmallPtrSet<const Value *, 16> ValuesToIgnore;		SmallPtrSet<const Value *, 16> ValuesToIgnore;
/// Values to ignore in the cost model when VF > 1.		/// Values to ignore in the cost model when VF > 1.
SmallPtrSet<const Value *, 16> VecValuesToIgnore;		SmallPtrSet<const Value *, 16> VecValuesToIgnore;
};		};

		/// LoopVectorizationPlanner - drives the vectorization process after having
		/// passed Legality checks.
		class LoopVectorizationPlanner {
		public:
		LoopVectorizationPlanner(LoopVectorizationCostModel &CM) : CM(CM) {}

		~LoopVectorizationPlanner() {}

		/// Plan how to best vectorize, return the best VF and its cost.
		LoopVectorizationCostModel::VectorizationFactor plan(bool OptForSize,
		unsigned UserVF);

		private:
		/// The profitablity analysis.
		LoopVectorizationCostModel &CM;
		};

/// \brief This holds vectorization requirements that must be verified late in		/// \brief This holds vectorization requirements that must be verified late in
/// the process. The requirements are set by legalize and costmodel. Once		/// the process. The requirements are set by legalize and costmodel. Once
/// vectorization has been determined to be possible and profitable the		/// vectorization has been determined to be possible and profitable the
/// requirements can be verified by looking for metadata or compiler options.		/// requirements can be verified by looking for metadata or compiler options.
/// For example, some loops require FP commutativity which is only allowed if		/// For example, some loops require FP commutativity which is only allowed if
/// vectorization is explicitly specified or if the fast-math compiler option		/// vectorization is explicitly specified or if the fast-math compiler option
/// has been provided.		/// has been provided.
/// Late evaluation of these requirements allows helpful diagnostics to be		/// Late evaluation of these requirements allows helpful diagnostics to be
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
}		}
};		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Implementation of LoopVectorizationLegality, InnerLoopVectorizer and		// Implementation of LoopVectorizationLegality, InnerLoopVectorizer and
// LoopVectorizationCostModel.		// LoopVectorizationCostModel and LoopVectorizationPlanner.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

Value InnerLoopVectorizer::getBroadcastInstrs(Value V) {		Value InnerLoopVectorizer::getBroadcastInstrs(Value V) {
// We need to place the broadcast of invariant variables outside the loop.		// We need to place the broadcast of invariant variables outside the loop.
Instruction *Instr = dyn_cast<Instruction>(V);		Instruction *Instr = dyn_cast<Instruction>(V);
bool NewInstr = (Instr && Instr->getParent() == LoopVectorBody);		bool NewInstr = (Instr && Instr->getParent() == LoopVectorBody);
bool Invariant = OrigLoop->isLoopInvariant(V) && !NewInstr;		bool Invariant = OrigLoop->isLoopInvariant(V) && !NewInstr;

▲ Show 20 Lines • Show All 3,789 Lines • ▼ Show 20 Lines	if (LastMember) {
continue;		continue;
}		}
DEBUG(dbgs() << "LV: Interleaved group requires epilogue iteration.\n");		DEBUG(dbgs() << "LV: Interleaved group requires epilogue iteration.\n");
RequiresScalarEpilogue = true;		RequiresScalarEpilogue = true;
}		}
}		}
}		}

LoopVectorizationCostModel::VectorizationFactor		Optional<unsigned> LoopVectorizationCostModel::computeMaxVF(bool OptForSize) {
LoopVectorizationCostModel::selectVectorizationFactor(bool OptForSize) {		if (!EnableCondStoresVectorization && Legal->getNumPredStores()) {
// Width 1 means no vectorize		ORE->emit(createMissedAnalysis("ConditionalStore")
VectorizationFactor Factor = {1U, 0U};		<< "store that is conditionally executed prevents vectorization");
if (OptForSize && Legal->getRuntimePointerChecking()->Need) {		DEBUG(dbgs() << "LV: No vectorization. There are conditional stores.\n");
		return None;
		}

		if (!OptForSize) // Remaining checks deal with scalar loop when OptForSize.
		return computeFeasibleMaxVF(OptForSize);

		if (Legal->getRuntimePointerChecking()->Need) {
		mkuperUnsubmitted Not Done Reply Inline Actions Any reason not to use Optional<> instead? mkuper: Any reason not to use Optional<> instead?
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Ahh, sure, done. Any suggestion for a better method name, avoiding having two computeMaxVF()'s? Ayal: Ahh, sure, done. Any suggestion for a better method name, avoiding having two computeMaxVF()'s?
		mkuperUnsubmitted Not Done Reply Inline Actions Not really, since we're sort of mixing VF selection with legality in this function. :-\ mkuper: Not really, since we're sort of mixing VF selection with legality in this function. :-\
		AyalAuthorUnsubmitted Not Done Reply Inline Actions These checks could potentially move into Legality, but in some sense they are "early pruning" due to excessive cost, rather than "legal" obstacles that cannot be handled. Plus only (Max)VF is considered here, not (Max)UF. Ayal: These checks could potentially move into Legality, but in some sense they are "early pruning"…
ORE->emit(createMissedAnalysis("CantVersionLoopWithOptForSize")		ORE->emit(createMissedAnalysis("CantVersionLoopWithOptForSize")
<< "runtime pointer checks needed. Enable vectorization of this "		<< "runtime pointer checks needed. Enable vectorization of this "
"loop with '#pragma clang loop vectorize(enable)' when "		"loop with '#pragma clang loop vectorize(enable)' when "
"compiling with -Os/-Oz");		"compiling with -Os/-Oz");
DEBUG(dbgs()		DEBUG(dbgs()
<< "LV: Aborting. Runtime ptr check is required with -Os/-Oz.\n");		<< "LV: Aborting. Runtime ptr check is required with -Os/-Oz.\n");
return Factor;		return None;
}		}

if (!EnableCondStoresVectorization && Legal->getNumPredStores()) {		// If we optimize the program for size, avoid creating the tail loop.
ORE->emit(createMissedAnalysis("ConditionalStore")		unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
<< "store that is conditionally executed prevents vectorization");		DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');
DEBUG(dbgs() << "LV: No vectorization. There are conditional stores.\n");
return Factor;		// If we don't know the precise trip count, don't try to vectorize.
		if (TC < 2) {
		ORE->emit(
		createMissedAnalysis("UnknownLoopCountComplexCFG")
		<< "unable to calculate the loop count due to complex control flow");
		DEBUG(dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
		return None;
}		}

		unsigned MaxVF = computeFeasibleMaxVF(OptForSize);

		if (TC % MaxVF != 0) {
		mkuperUnsubmitted Not Done Reply Inline Actions I understand this is the way it has always been, and isn't changing in this patch, but now I realize it's fairly odd. Why are we bailing on "TC % MaxVF != 0" instead of trying to reduce MaxVF so that it actually is 0? Am I missing something here? If not, could you add a FIXME, and/or fix it in a follow-up commit? mkuper: I understand this is the way it has always been, and isn't changing in this patch, but now I…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions It would indeed be better to search for a smaller MaxVF that does divide TC, instead of giving up. Added a FIXME. We should also check if loop requiresScalarEpilog(), which in turn should be determined more accurately per-VF. Added another FIXME. Ayal: It would indeed be better to search for a smaller MaxVF that does divide TC, instead of giving…
		// If the trip count that we found modulo the vectorization factor is not
		// zero then we require a tail.
		// FIXME: look for a smaller MaxVF that does divide TC rather than give up.
		// FIXME: return None if loop requiresScalarEpilog(<MaxVF>), or look for a
		// smaller MaxVF that does not require a scalar epilog.

		ORE->emit(createMissedAnalysis("NoTailLoopWithOptForSize")
		<< "cannot optimize for size and vectorize at the "
		"same time. Enable vectorization of this loop "
		"with '#pragma clang loop vectorize(enable)' "
		"when compiling with -Os/-Oz");
		DEBUG(dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
		return None;
		}

		return MaxVF;
		}

		unsigned LoopVectorizationCostModel::computeFeasibleMaxVF(bool OptForSize) {
MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);		MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
unsigned SmallestType, WidestType;		unsigned SmallestType, WidestType;
std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();		std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
unsigned WidestRegister = TTI.getRegisterBitWidth(true);		unsigned WidestRegister = TTI.getRegisterBitWidth(true);
unsigned MaxSafeDepDist = -1U;		unsigned MaxSafeDepDist = -1U;

// Get the maximum safe dependence distance in bits computed by LAA. If the		// Get the maximum safe dependence distance in bits computed by LAA. If the
// loop contains any interleaved accesses, we divide the dependence distance		// loop contains any interleaved accesses, we divide the dependence distance
Show All 17 Lines	unsigned LoopVectorizationCostModel::computeFeasibleMaxVF(bool OptForSize) {
if (MaxVectorSize == 0) {		if (MaxVectorSize == 0) {
DEBUG(dbgs() << "LV: The target has no vector registers.\n");		DEBUG(dbgs() << "LV: The target has no vector registers.\n");
MaxVectorSize = 1;		MaxVectorSize = 1;
}		}

assert(MaxVectorSize <= 64 && "Did not expect to pack so many elements"		assert(MaxVectorSize <= 64 && "Did not expect to pack so many elements"
" into one vector!");		" into one vector!");

unsigned VF = MaxVectorSize;		unsigned MaxVF = MaxVectorSize;

if (MaximizeBandwidth && !OptForSize) {		if (MaximizeBandwidth && !OptForSize) {
// Collect all viable vectorization factors.		// Collect all viable vectorization factors.
SmallVector<unsigned, 8> VFs;		SmallVector<unsigned, 8> VFs;
unsigned NewMaxVectorSize = WidestRegister / SmallestType;		unsigned NewMaxVectorSize = WidestRegister / SmallestType;
for (unsigned VS = MaxVectorSize; VS <= NewMaxVectorSize; VS *= 2)		for (unsigned VS = MaxVectorSize; VS <= NewMaxVectorSize; VS *= 2)
VFs.push_back(VS);		VFs.push_back(VS);

// For each VF calculate its register usage.		// For each VF calculate its register usage.
auto RUs = calculateRegisterUsage(VFs);		auto RUs = calculateRegisterUsage(VFs);

// Select the largest VF which doesn't require more registers than existing		// Select the largest VF which doesn't require more registers than existing
// ones.		// ones.
unsigned TargetNumRegisters = TTI.getNumberOfRegisters(true);		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(true);
for (int i = RUs.size() - 1; i >= 0; --i) {		for (int i = RUs.size() - 1; i >= 0; --i) {
if (RUs[i].MaxLocalUsers <= TargetNumRegisters) {		if (RUs[i].MaxLocalUsers <= TargetNumRegisters) {
VF = VFs[i];		MaxVF = VFs[i];
break;		break;
}		}
}		}
}		}
		return MaxVF;
// If we optimize the program for size, avoid creating the tail loop.
if (OptForSize) {
unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');

// If we don't know the precise trip count, don't try to vectorize.
if (TC < 2) {
ORE->emit(
createMissedAnalysis("UnknownLoopCountComplexCFG")
<< "unable to calculate the loop count due to complex control flow");
DEBUG(dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
return Factor;
}

// Find the maximum SIMD width that can fit within the trip count.
VF = TC % MaxVectorSize;

if (VF == 0)
VF = MaxVectorSize;
else {
// If the trip count that we found modulo the vectorization factor is not
// zero then we require a tail.
ORE->emit(createMissedAnalysis("NoTailLoopWithOptForSize")
<< "cannot optimize for size and vectorize at the "
"same time. Enable vectorization of this loop "
"with '#pragma clang loop vectorize(enable)' "
"when compiling with -Os/-Oz");
DEBUG(dbgs() << "LV: Aborting. A tail loop is required with -Os/-Oz.\n");
return Factor;
}
}

int UserVF = Hints->getWidth();
if (UserVF != 0) {
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");

Factor.Width = UserVF;

collectUniformsAndScalars(UserVF);
collectInstsToScalarize(UserVF);
return Factor;
}		}

		LoopVectorizationCostModel::VectorizationFactor
		LoopVectorizationCostModel::selectVectorizationFactor(unsigned MaxVF) {
float Cost = expectedCost(1).first;		float Cost = expectedCost(1).first;
#ifndef NDEBUG		#ifndef NDEBUG
const float ScalarCost = Cost;		const float ScalarCost = Cost;
#endif /* NDEBUG */		#endif /* NDEBUG */
unsigned Width = 1;		unsigned Width = 1;
DEBUG(dbgs() << "LV: Scalar loop costs: " << (int)ScalarCost << ".\n");		DEBUG(dbgs() << "LV: Scalar loop costs: " << (int)ScalarCost << ".\n");

bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;		bool ForceVectorization = Hints->getForce() == LoopVectorizeHints::FK_Enabled;
// Ignore scalar width, because the user explicitly wants vectorization.		// Ignore scalar width, because the user explicitly wants vectorization.
if (ForceVectorization && VF > 1) {		if (ForceVectorization && MaxVF > 1) {
Width = 2;		Width = 2;
Cost = expectedCost(Width).first / (float)Width;		Cost = expectedCost(Width).first / (float)Width;
}		}

for (unsigned i = 2; i <= VF; i *= 2) {		for (unsigned i = 2; i <= MaxVF; i *= 2) {
// Notice that the vector loop needs to be executed less times, so		// Notice that the vector loop needs to be executed less times, so
// we need to divide the cost of the vector loops by the width of		// we need to divide the cost of the vector loops by the width of
// the vector elements.		// the vector elements.
VectorizationCostTy C = expectedCost(i);		VectorizationCostTy C = expectedCost(i);
float VectorCost = C.first / (float)i;		float VectorCost = C.first / (float)i;
DEBUG(dbgs() << "LV: Vector loop of width " << i		DEBUG(dbgs() << "LV: Vector loop of width " << i
<< " costs: " << (int)VectorCost << ".\n");		<< " costs: " << (int)VectorCost << ".\n");
if (!C.second && !ForceVectorization) {		if (!C.second && !ForceVectorization) {
DEBUG(		DEBUG(
dbgs() << "LV: Not considering vector loop of width " << i		dbgs() << "LV: Not considering vector loop of width " << i
<< " because it will not generate any vector instructions.\n");		<< " because it will not generate any vector instructions.\n");
continue;		continue;
}		}
if (VectorCost < Cost) {		if (VectorCost < Cost) {
Cost = VectorCost;		Cost = VectorCost;
Width = i;		Width = i;
}		}
}		}

DEBUG(if (ForceVectorization && Width > 1 && Cost >= ScalarCost) dbgs()		DEBUG(if (ForceVectorization && Width > 1 && Cost >= ScalarCost) dbgs()
<< "LV: Vectorization seems to be not beneficial, "		<< "LV: Vectorization seems to be not beneficial, "
<< "but was forced by a user.\n");		<< "but was forced by a user.\n");
DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");		DEBUG(dbgs() << "LV: Selecting VF: " << Width << ".\n");
Factor.Width = Width;		VectorizationFactor Factor = {Width, (unsigned)(Width * Cost)};
Factor.Cost = Width * Cost;
return Factor;		return Factor;
}		}

std::pair<unsigned, unsigned>		std::pair<unsigned, unsigned>
LoopVectorizationCostModel::getSmallestAndWidestTypes() {		LoopVectorizationCostModel::getSmallestAndWidestTypes() {
unsigned MinWidth = -1U;		unsigned MinWidth = -1U;
unsigned MaxWidth = 8;		unsigned MaxWidth = 8;
const DataLayout &DL = TheFunction->getParent()->getDataLayout();		const DataLayout &DL = TheFunction->getParent()->getDataLayout();
▲ Show 20 Lines • Show All 1,048 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectValuesToIgnore() {
// detection.		// detection.
for (auto &Reduction : *Legal->getReductionVars()) {		for (auto &Reduction : *Legal->getReductionVars()) {
RecurrenceDescriptor &RedDes = Reduction.second;		RecurrenceDescriptor &RedDes = Reduction.second;
SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();		SmallPtrSetImpl<Instruction *> &Casts = RedDes.getCastInsts();
VecValuesToIgnore.insert(Casts.begin(), Casts.end());		VecValuesToIgnore.insert(Casts.begin(), Casts.end());
}		}
}		}

		LoopVectorizationCostModel::VectorizationFactor
		LoopVectorizationPlanner::plan(bool OptForSize, unsigned UserVF) {

		// Width 1 means no vectorize, cost 0 means uncomputed cost.
		const LoopVectorizationCostModel::VectorizationFactor NoVectorization = {1U,
		0U};
		Optional<unsigned> MaybeMaxVF = CM.computeMaxVF(OptForSize);
		if (!MaybeMaxVF.hasValue()) // Cases considered too costly to vectorize.
		return NoVectorization;

		if (UserVF) {
		DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
		mkuperUnsubmitted Not Done Reply Inline Actions Maybe: DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n"); assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two"); if (UserVF != 1) CM.selectUserVectorizationFactor(UserVF); return {UserVF, 0} ? (I think isPowerOf2_32(1) is true.) mkuper: Maybe: ``` DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n"); assert(isPowerOf2_32…
		AyalAuthorUnsubmitted Not Done Reply Inline Actions Right, isPowerOf2_32(1) is true. Will actually simplify this further by dropping the "if (UserVF != 1)", since selectUserVectorizationFactor() knows how to handle a UserVF of 1 (namely, by doing nothing). Ayal: Right, isPowerOf2_32(1) is true. Will actually simplify this further by dropping the "if…
		// Collect the instructions (and their associated costs) that will be more
		// profitable to scalarize.
		CM.selectUserVectorizationFactor(UserVF);
		return {UserVF, 0};
		}

		unsigned MaxVF = MaybeMaxVF.getValue();
		assert(MaxVF != 0 && "MaxVF is zero.");
		if (MaxVF == 1)
		return NoVectorization;

		// Select the optimal vectorization factor.
		return CM.selectVectorizationFactor(MaxVF);
		}

void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,		void InnerLoopUnroller::scalarizeInstruction(Instruction *Instr,
bool IfPredicateInstr) {		bool IfPredicateInstr) {
assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");		assert(!Instr->getType()->isAggregateType() && "Can't handle vectors");
// Holds vector parameters or scalars, in case of uniform vals.		// Holds vector parameters or scalars, in case of uniform vals.
SmallVector<VectorParts, 4> Params;		SmallVector<VectorParts, 4> Params;

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);

▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, TTI, GetLAA, LI, ORE,		LoopVectorizationLegality LVL(L, PSE, DT, TLI, AA, F, TTI, GetLAA, LI, ORE,
&Requirements, &Hints);		&Requirements, &Hints);
if (!LVL.canVectorize()) {		if (!LVL.canVectorize()) {
DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");		DEBUG(dbgs() << "LV: Not vectorizing: Cannot prove legality.\n");
emitMissedWarning(F, L, Hints, ORE);		emitMissedWarning(F, L, Hints, ORE);
return false;		return false;
}		}

// Use the cost model.
LoopVectorizationCostModel CM(L, PSE, LI, &LVL, *TTI, TLI, DB, AC, ORE, F,
&Hints);
CM.collectValuesToIgnore();

// Check the function attributes to find out if this function should be		// Check the function attributes to find out if this function should be
// optimized for size.		// optimized for size.
bool OptForSize =		bool OptForSize =
Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();		Hints.getForce() != LoopVectorizeHints::FK_Enabled && F->optForSize();

// Compute the weighted frequency of this loop being executed and see if it		// Compute the weighted frequency of this loop being executed and see if it
// is less than 20% of the function entry baseline frequency. Note that we		// is less than 20% of the function entry baseline frequency. Note that we
// always have a canonical loop here because we think we can vectorize.		// always have a canonical loop here because we think we can vectorize.
Show All 29 Lines	if (Hints.isPotentiallyUnsafe() &&
DEBUG(dbgs() << "LV: Potentially unsafe FP op prevents vectorization.\n");		DEBUG(dbgs() << "LV: Potentially unsafe FP op prevents vectorization.\n");
ORE->emit(		ORE->emit(
createMissedAnalysis(Hints.vectorizeAnalysisPassName(), "UnsafeFP", L)		createMissedAnalysis(Hints.vectorizeAnalysisPassName(), "UnsafeFP", L)
<< "loop not vectorized due to unsafe FP support.");		<< "loop not vectorized due to unsafe FP support.");
emitMissedWarning(F, L, Hints, ORE);		emitMissedWarning(F, L, Hints, ORE);
return false;		return false;
}		}

// Select the optimal vectorization factor.		// Use the cost model.
const LoopVectorizationCostModel::VectorizationFactor VF =		LoopVectorizationCostModel CM(L, PSE, LI, &LVL, *TTI, TLI, DB, AC, ORE, F,
CM.selectVectorizationFactor(OptForSize);		&Hints);
		CM.collectValuesToIgnore();

		// Use the planner for vectorization.
		LoopVectorizationPlanner LVP(CM);

		// Get user vectorization factor.
		unsigned UserVF = Hints.getWidth();

		// Plan how to best vectorize, return the best VF and its cost.
		LoopVectorizationCostModel::VectorizationFactor VF =
		LVP.plan(OptForSize, UserVF);

// Select the interleave count.		// Select the interleave count.
unsigned IC = CM.selectInterleaveCount(OptForSize, VF.Width, VF.Cost);		unsigned IC = CM.selectInterleaveCount(OptForSize, VF.Width, VF.Cost);

// Get user interleave count.		// Get user interleave count.
unsigned UserIC = Hints.getInterleave();		unsigned UserIC = Hints.getInterleave();

// Identify the diagnostic messages that should be produced.		// Identify the diagnostic messages that should be produced.
▲ Show 20 Lines • Show All 213 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Refactor Cost Model's selectVectorizationFactor(), driven by a LoopVectorizationPlannerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 90933

lib/Transforms/Vectorize/LoopVectorize.cpp

[LV] Refactor Cost Model's selectVectorizationFactor(), driven by a LoopVectorizationPlanner
ClosedPublic