This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LoopUnrollPass.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
full-unroll-bad-geps.ll
2
full-unroll-heuristics.ll

Differential D9966

[Unroll] Rework the naming and structure of the new unroll heuristics.
ClosedPublic

Authored by chandlerc on May 24 2015, 7:47 PM.

Download Raw Diff

Details

Reviewers

chandlerc
mzolotukhin
hfinkel

Commits

rG9dabd14d59ef: [Unroll] Rework the naming and structure of the new unroll heuristics.
rL239164: [Unroll] Rework the naming and structure of the new unroll heuristics.

Summary

The new naming is (to me) much easier to understand. Here is a summary
of the new state of the world:

'*Threshold' is the base line threshold for full unrolling. It is measured against the estimated unrolled cost as computed by getUserCost in TTI (or CodeMetrics, etc). We will exceed this threshold when unrolling loops where unrolling exposes a significant degree of simplification of the logic within the loop.
'*MaxThreshold' is an absolute cap computed the same way as '*Threshold' but is never exceeded even due to profitable circumstanctes.
'*PercentCostSavedThreshold' is the percentage of the loop's estimated dynamic execution cost which needs to be saved by unrolling to allow full unrolling in excess of the '*Threshold'.

When actually analyzing the loop, we now produce both an estimated
unrolled cost, and an estimated rolled cost. The rolled cost is notably
a dynamic estimate based on our analysis of the expected execution of
each iteration.

These estimates are pretty bad still, but we can make them much better,
and (to me) it is much more clear *how* to make them better when they
have reasonably descriptive names. For example, we may want to apply
estimated (from heuristics or profiles) dynamic execution weights to the
*dynamic* cost estimates. If we start doing that, we would also need to
track the static unrolled cost and the dynamic unrolled cost, as only
the latter could reasonably be weighted by profile information.

This patch is sadly not without functionality change for the new unroll
analysis logic. Buried in the heuristic management were several things
that surprised me. For example, we never subtracted the optimized
instruction count off when comparing against the unroll heursistics!
I don't know if this just got lost somewhere along the way or what, but
with the new accounting of things, this is much easier to keep track of
and we use the post-simplification cost estimate to compare to the
thresholds, and use the dynamic cost reduction ratio to select whether
we can exceed the baseline threshold.

My next series of patches will significantly improve the cost estimation
by handling dead control flows, weighting dynamic cost by the profile
(which both will amplify simplifications of nested loops and discount
simplifications of heavily predicated paths), and trying to track
instructions that are unnecessary after unrolling.

Diff Detail

Event Timeline

chandlerc updated this revision to Diff 26392.May 24 2015, 7:47 PM

chandlerc retitled this revision from to [Unroll] Rework the naming and structure of the new unroll heuristics..

chandlerc updated this object.

chandlerc edited the test plan for this revision. (Show Details)

chandlerc added reviewers: mzolotukhin, hfinkel.

chandlerc added a subscriber: Unknown Object (MLST).

echristo added a subscriber: echristo.May 24 2015, 8:00 PM

Hi Chandler,

In general I'm in favor for such changes, but my feeling is that MaxThreshold and Threshold are still pretty confusing. What about something like TinyLoopsThreshold and BigLoopsThreshold? The idea is that the TinyLoopsThreshold defines which loops are considered tiny (and which we unconditionally unroll), and BigLoopsThreshold defines huge loops, which will never be unrolled.

Other than that, the changes look good to me.

And by the way, thanks for your recent patches here!

Michael

Switched to a single threshold and a discount after long discussions with
Michael on the best naming pattern here.

LGTM.

test/Transforms/LoopUnroll/full-unroll-heuristics.ll
4	Please update the comment.

Thanks, submitting!

test/Transforms/LoopUnroll/full-unroll-heuristics.ll
4	Done.

This revision is now accepted and ready to land.Jun 5 2015, 10:04 AM

Closed by commit rL239164: [Unroll] Rework the naming and structure of the new unroll heuristics. (authored by chandlerc). · Explain WhyJun 5 2015, 10:05 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

24 lines

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

216 lines

test/

Transforms/

LoopUnroll/

full-unroll-bad-geps.ll

2 lines

full-unroll-heuristics.ll

12 lines

Diff 27205

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 215 Lines • ▼ Show 20 Lines	public:
/// Alternatively, we could split the cost interface into distinct code-size		/// Alternatively, we could split the cost interface into distinct code-size
/// and execution-speed costs. This would allow modelling the core of this		/// and execution-speed costs. This would allow modelling the core of this
/// query more accurately as a call is a single small instruction, but		/// query more accurately as a call is a single small instruction, but
/// incurs significant execution cost.		/// incurs significant execution cost.
bool isLoweredToCall(const Function *F) const;		bool isLoweredToCall(const Function *F) const;

/// Parameters that control the generic loop unrolling transformation.		/// Parameters that control the generic loop unrolling transformation.
struct UnrollingPreferences {		struct UnrollingPreferences {
/// The cost threshold for the unrolled loop, compared to		/// The cost threshold for the unrolled loop. Should be relative to the
/// CodeMetrics.NumInsts aggregated over all basic blocks in the loop body.		/// getUserCost values returned by this API, and the expectation is that
/// The unrolling factor is set such that the unrolled loop body does not		/// the unrolled loop's instructions when run through that interface should
/// exceed this cost. Set this to UINT_MAX to disable the loop body cost		/// not exceed this cost. However, this is only an estimate. Also, specific
		/// loops may be unrolled even with a cost above this threshold if deemed
		/// profitable. Set this to UINT_MAX to disable the loop body cost
/// restriction.		/// restriction.
unsigned Threshold;		unsigned Threshold;
/// If complete unrolling could help other optimizations (e.g. InstSimplify)		/// If complete unrolling will reduce the cost of the loop below its
/// to remove N% of instructions, then we can go beyond unroll threshold.		/// expected dynamic cost while rolled by this percentage, apply a discount
/// This value set the minimal percent for allowing that.		/// (below) to its unrolled cost.
unsigned MinPercentOfOptimized;		unsigned PercentDynamicCostSavedThreshold;
/// The absolute cost threshold. We won't go beyond this even if complete		/// The discount applied to the unrolled cost when the dynamic cost
/// unrolling could result in optimizing out 90% of instructions.		/// savings of unrolling exceed the \c PercentDynamicCostSavedThreshold.
unsigned AbsoluteThreshold;		unsigned DynamicCostSavingsDiscount;
/// The cost threshold for the unrolled loop when optimizing for size (set		/// The cost threshold for the unrolled loop when optimizing for size (set
/// to UINT_MAX to disable).		/// to UINT_MAX to disable).
unsigned OptSizeThreshold;		unsigned OptSizeThreshold;
/// The cost threshold for the unrolled loop, like Threshold, but used		/// The cost threshold for the unrolled loop, like Threshold, but used
/// for partial/runtime unrolling (set to UINT_MAX to disable).		/// for partial/runtime unrolling (set to UINT_MAX to disable).
unsigned PartialThreshold;		unsigned PartialThreshold;
/// The cost threshold for the unrolled loop when optimizing for size, like		/// The cost threshold for the unrolled loop when optimizing for size, like
/// OptSizeThreshold, but used for partial/runtime unrolling (set to		/// OptSizeThreshold, but used for partial/runtime unrolling (set to
▲ Show 20 Lines • Show All 634 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show All 32 Lines
#include "llvm/Transforms/Utils/UnrollLoop.h"		#include "llvm/Transforms/Utils/UnrollLoop.h"
#include <climits>		#include <climits>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-unroll"		#define DEBUG_TYPE "loop-unroll"

static cl::opt<unsigned>		static cl::opt<unsigned>
UnrollThreshold("unroll-threshold", cl::init(150), cl::Hidden,		UnrollThreshold("unroll-threshold", cl::init(150), cl::Hidden,
cl::desc("The cut-off point for automatic loop unrolling"));		cl::desc("The baseline cost threshold for loop unrolling"));

		static cl::opt<unsigned> UnrollPercentDynamicCostSavedThreshold(
		"unroll-percent-dynamic-cost-saved-threshold", cl::init(20), cl::Hidden,
		cl::desc("The percentage of estimated dynamic cost which must be saved by "
		"unrolling to allow unrolling up to the max threshold."));

		static cl::opt<unsigned> UnrollDynamicCostSavingsDiscount(
		"unroll-dynamic-cost-savings-discount", cl::init(2000), cl::Hidden,
		cl::desc("This is the amount discounted from the total unroll cost when "
		"the unrolled form has a high dynamic cost savings (triggered by "
		"the '-unroll-perecent-dynamic-cost-saved-threshold' flag)."));

static cl::opt<unsigned> UnrollMaxIterationsCountToAnalyze(		static cl::opt<unsigned> UnrollMaxIterationsCountToAnalyze(
"unroll-max-iteration-count-to-analyze", cl::init(0), cl::Hidden,		"unroll-max-iteration-count-to-analyze", cl::init(0), cl::Hidden,
cl::desc("Don't allow loop unrolling to simulate more than this number of"		cl::desc("Don't allow loop unrolling to simulate more than this number of"
"iterations when checking full unroll profitability"));		"iterations when checking full unroll profitability"));

static cl::opt<unsigned> UnrollMinPercentOfOptimized(
"unroll-percent-of-optimized-for-complete-unroll", cl::init(20), cl::Hidden,
cl::desc("If complete unrolling could trigger further optimizations, and, "
"by that, remove the given percent of instructions, perform the "
"complete unroll even if it's beyond the threshold"));

static cl::opt<unsigned> UnrollAbsoluteThreshold(
"unroll-absolute-threshold", cl::init(2000), cl::Hidden,
cl::desc("Don't unroll if the unrolled size is bigger than this threshold,"
" even if we can remove big portion of instructions later."));

static cl::opt<unsigned>		static cl::opt<unsigned>
UnrollCount("unroll-count", cl::init(0), cl::Hidden,		UnrollCount("unroll-count", cl::init(0), cl::Hidden,
cl::desc("Use this unroll count for all loops including those with "		cl::desc("Use this unroll count for all loops including those with "
"unroll_count pragma values, for testing purposes"));		"unroll_count pragma values, for testing purposes"));

static cl::opt<bool>		static cl::opt<bool>
UnrollAllowPartial("unroll-allow-partial", cl::init(false), cl::Hidden,		UnrollAllowPartial("unroll-allow-partial", cl::init(false), cl::Hidden,
cl::desc("Allows loops to be partially unrolled until "		cl::desc("Allows loops to be partially unrolled until "
Show All 9 Lines	cl::desc("Unrolled size limit for loops with an unroll(full) or "
"unroll_count pragma."));		"unroll_count pragma."));

namespace {		namespace {
class LoopUnroll : public LoopPass {		class LoopUnroll : public LoopPass {
public:		public:
static char ID; // Pass ID, replacement for typeid		static char ID; // Pass ID, replacement for typeid
LoopUnroll(int T = -1, int C = -1, int P = -1, int R = -1) : LoopPass(ID) {		LoopUnroll(int T = -1, int C = -1, int P = -1, int R = -1) : LoopPass(ID) {
CurrentThreshold = (T == -1) ? UnrollThreshold : unsigned(T);		CurrentThreshold = (T == -1) ? UnrollThreshold : unsigned(T);
CurrentAbsoluteThreshold = UnrollAbsoluteThreshold;		CurrentPercentDynamicCostSavedThreshold =
CurrentMinPercentOfOptimized = UnrollMinPercentOfOptimized;		UnrollPercentDynamicCostSavedThreshold;
		CurrentDynamicCostSavingsDiscount = UnrollDynamicCostSavingsDiscount;
CurrentCount = (C == -1) ? UnrollCount : unsigned(C);		CurrentCount = (C == -1) ? UnrollCount : unsigned(C);
CurrentAllowPartial = (P == -1) ? UnrollAllowPartial : (bool)P;		CurrentAllowPartial = (P == -1) ? UnrollAllowPartial : (bool)P;
CurrentRuntime = (R == -1) ? UnrollRuntime : (bool)R;		CurrentRuntime = (R == -1) ? UnrollRuntime : (bool)R;

UserThreshold = (T != -1) \|\| (UnrollThreshold.getNumOccurrences() > 0);		UserThreshold = (T != -1) \|\| (UnrollThreshold.getNumOccurrences() > 0);
UserAbsoluteThreshold = (UnrollAbsoluteThreshold.getNumOccurrences() > 0);		UserPercentDynamicCostSavedThreshold =
UserPercentOfOptimized =		(UnrollPercentDynamicCostSavedThreshold.getNumOccurrences() > 0);
(UnrollMinPercentOfOptimized.getNumOccurrences() > 0);		UserDynamicCostSavingsDiscount =
		(UnrollDynamicCostSavingsDiscount.getNumOccurrences() > 0);
UserAllowPartial = (P != -1) \|\|		UserAllowPartial = (P != -1) \|\|
(UnrollAllowPartial.getNumOccurrences() > 0);		(UnrollAllowPartial.getNumOccurrences() > 0);
UserRuntime = (R != -1) \|\| (UnrollRuntime.getNumOccurrences() > 0);		UserRuntime = (R != -1) \|\| (UnrollRuntime.getNumOccurrences() > 0);
UserCount = (C != -1) \|\| (UnrollCount.getNumOccurrences() > 0);		UserCount = (C != -1) \|\| (UnrollCount.getNumOccurrences() > 0);

initializeLoopUnrollPass(*PassRegistry::getPassRegistry());		initializeLoopUnrollPass(*PassRegistry::getPassRegistry());
}		}

/// A magic value for use with the Threshold parameter to indicate		/// A magic value for use with the Threshold parameter to indicate
/// that the loop unroll should be performed regardless of how much		/// that the loop unroll should be performed regardless of how much
/// code expansion would result.		/// code expansion would result.
static const unsigned NoThreshold = UINT_MAX;		static const unsigned NoThreshold = UINT_MAX;

// Threshold to use when optsize is specified (and there is no		// Threshold to use when optsize is specified (and there is no
// explicit -unroll-threshold).		// explicit -unroll-threshold).
static const unsigned OptSizeUnrollThreshold = 50;		static const unsigned OptSizeUnrollThreshold = 50;

// Default unroll count for loops with run-time trip count if		// Default unroll count for loops with run-time trip count if
// -unroll-count is not set		// -unroll-count is not set
static const unsigned UnrollRuntimeCount = 8;		static const unsigned UnrollRuntimeCount = 8;

unsigned CurrentCount;		unsigned CurrentCount;
unsigned CurrentThreshold;		unsigned CurrentThreshold;
unsigned CurrentAbsoluteThreshold;		unsigned CurrentPercentDynamicCostSavedThreshold;
unsigned CurrentMinPercentOfOptimized;		unsigned CurrentDynamicCostSavingsDiscount;
bool CurrentAllowPartial;		bool CurrentAllowPartial;
bool CurrentRuntime;		bool CurrentRuntime;
bool UserCount; // CurrentCount is user-specified.
bool UserThreshold; // CurrentThreshold is user-specified.		// Flags for whether the 'current' settings are user-specified.
bool UserAbsoluteThreshold; // CurrentAbsoluteThreshold is		bool UserCount;
// user-specified.		bool UserThreshold;
bool UserPercentOfOptimized; // CurrentMinPercentOfOptimized is		bool UserPercentDynamicCostSavedThreshold;
// user-specified.		bool UserDynamicCostSavingsDiscount;
bool UserAllowPartial; // CurrentAllowPartial is user-specified.		bool UserAllowPartial;
bool UserRuntime; // CurrentRuntime is user-specified.		bool UserRuntime;

bool runOnLoop(Loop *L, LPPassManager &LPM) override;		bool runOnLoop(Loop *L, LPPassManager &LPM) override;

/// This transformation requires natural loop information & requires that		/// This transformation requires natural loop information & requires that
/// loop preheaders be inserted into the CFG...		/// loop preheaders be inserted into the CFG...
///		///
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
Show All 13 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
}		}

// Fill in the UnrollingPreferences parameter with values from the		// Fill in the UnrollingPreferences parameter with values from the
// TargetTransformationInfo.		// TargetTransformationInfo.
void getUnrollingPreferences(Loop *L, const TargetTransformInfo &TTI,		void getUnrollingPreferences(Loop *L, const TargetTransformInfo &TTI,
TargetTransformInfo::UnrollingPreferences &UP) {		TargetTransformInfo::UnrollingPreferences &UP) {
UP.Threshold = CurrentThreshold;		UP.Threshold = CurrentThreshold;
UP.AbsoluteThreshold = CurrentAbsoluteThreshold;		UP.PercentDynamicCostSavedThreshold =
UP.MinPercentOfOptimized = CurrentMinPercentOfOptimized;		CurrentPercentDynamicCostSavedThreshold;
		UP.DynamicCostSavingsDiscount = CurrentDynamicCostSavingsDiscount;
UP.OptSizeThreshold = OptSizeUnrollThreshold;		UP.OptSizeThreshold = OptSizeUnrollThreshold;
UP.PartialThreshold = CurrentThreshold;		UP.PartialThreshold = CurrentThreshold;
UP.PartialOptSizeThreshold = OptSizeUnrollThreshold;		UP.PartialOptSizeThreshold = OptSizeUnrollThreshold;
UP.Count = CurrentCount;		UP.Count = CurrentCount;
UP.MaxCount = UINT_MAX;		UP.MaxCount = UINT_MAX;
UP.Partial = CurrentAllowPartial;		UP.Partial = CurrentAllowPartial;
UP.Runtime = CurrentRuntime;		UP.Runtime = CurrentRuntime;
UP.AllowExpensiveTripCount = false;		UP.AllowExpensiveTripCount = false;
Show All 12 Lines	public:

// Select threshold values used to limit unrolling based on a		// Select threshold values used to limit unrolling based on a
// total unrolled size. Parameters Threshold and PartialThreshold		// total unrolled size. Parameters Threshold and PartialThreshold
// are set to the maximum unrolled size for fully and partially		// are set to the maximum unrolled size for fully and partially
// unrolled loops respectively.		// unrolled loops respectively.
void selectThresholds(const Loop *L, bool HasPragma,		void selectThresholds(const Loop *L, bool HasPragma,
const TargetTransformInfo::UnrollingPreferences &UP,		const TargetTransformInfo::UnrollingPreferences &UP,
unsigned &Threshold, unsigned &PartialThreshold,		unsigned &Threshold, unsigned &PartialThreshold,
unsigned &AbsoluteThreshold,		unsigned &PercentDynamicCostSavedThreshold,
unsigned &PercentOfOptimizedForCompleteUnroll) {		unsigned &DynamicCostSavingsDiscount) {
// Determine the current unrolling threshold. While this is		// Determine the current unrolling threshold. While this is
// normally set from UnrollThreshold, it is overridden to a		// normally set from UnrollThreshold, it is overridden to a
// smaller value if the current function is marked as		// smaller value if the current function is marked as
// optimize-for-size, and the unroll threshold was not user		// optimize-for-size, and the unroll threshold was not user
// specified.		// specified.
Threshold = UserThreshold ? CurrentThreshold : UP.Threshold;		Threshold = UserThreshold ? CurrentThreshold : UP.Threshold;
PartialThreshold = UserThreshold ? CurrentThreshold : UP.PartialThreshold;		PartialThreshold = UserThreshold ? CurrentThreshold : UP.PartialThreshold;
AbsoluteThreshold = UserAbsoluteThreshold ? CurrentAbsoluteThreshold		PercentDynamicCostSavedThreshold =
: UP.AbsoluteThreshold;		UserPercentDynamicCostSavedThreshold
PercentOfOptimizedForCompleteUnroll = UserPercentOfOptimized		? CurrentPercentDynamicCostSavedThreshold
? CurrentMinPercentOfOptimized		: UP.PercentDynamicCostSavedThreshold;
: UP.MinPercentOfOptimized;		DynamicCostSavingsDiscount = UserDynamicCostSavingsDiscount
		? CurrentDynamicCostSavingsDiscount
		: UP.DynamicCostSavingsDiscount;

if (!UserThreshold &&		if (!UserThreshold &&
L->getHeader()->getParent()->hasFnAttribute(		L->getHeader()->getParent()->hasFnAttribute(
Attribute::OptimizeForSize)) {		Attribute::OptimizeForSize)) {
Threshold = UP.OptSizeThreshold;		Threshold = UP.OptSizeThreshold;
PartialThreshold = UP.PartialOptSizeThreshold;		PartialThreshold = UP.PartialOptSizeThreshold;
}		}
if (HasPragma) {		if (HasPragma) {
// If the loop has an unrolling pragma, we want to be more		// If the loop has an unrolling pragma, we want to be more
// aggressive with unrolling limits. Set thresholds to at		// aggressive with unrolling limits. Set thresholds to at
// least the PragmaTheshold value which is larger than the		// least the PragmaTheshold value which is larger than the
// default limits.		// default limits.
if (Threshold != NoThreshold)		if (Threshold != NoThreshold)
Threshold = std::max<unsigned>(Threshold, PragmaUnrollThreshold);		Threshold = std::max<unsigned>(Threshold, PragmaUnrollThreshold);
if (PartialThreshold != NoThreshold)		if (PartialThreshold != NoThreshold)
PartialThreshold =		PartialThreshold =
std::max<unsigned>(PartialThreshold, PragmaUnrollThreshold);		std::max<unsigned>(PartialThreshold, PragmaUnrollThreshold);
}		}
}		}
bool canUnrollCompletely(Loop *L, unsigned Threshold,		bool canUnrollCompletely(Loop *L, unsigned Threshold,
unsigned AbsoluteThreshold, uint64_t UnrolledSize,		unsigned PercentDynamicCostSavedThreshold,
unsigned NumberOfOptimizedInstructions,		unsigned DynamicCostSavingsDiscount,
unsigned PercentOfOptimizedForCompleteUnroll);		unsigned UnrolledCost, unsigned RolledDynamicCost);
};		};
}		}

char LoopUnroll::ID = 0;		char LoopUnroll::ID = 0;
INITIALIZE_PASS_BEGIN(LoopUnroll, "loop-unroll", "Unroll loops", false, false)		INITIALIZE_PASS_BEGIN(LoopUnroll, "loop-unroll", "Unroll loops", false, false)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
▲ Show 20 Lines • Show All 317 Lines • ▼ Show 20 Lines	bool visitLoad(LoadInst &I) {
return true;		return true;
}		}
};		};
} // namespace		} // namespace


namespace {		namespace {
struct EstimatedUnrollCost {		struct EstimatedUnrollCost {
/// \brief Count the number of optimized instructions.		/// \brief The estimated cost after unrolling.
unsigned NumberOfOptimizedInstructions;		unsigned UnrolledCost;

/// \brief Count the total number of instructions.		/// \brief The estimated dynamic cost of executing the instructions in the
unsigned UnrolledLoopSize;		/// rolled form.
		unsigned RolledDynamicCost;
};		};
}		}

/// \brief Figure out if the loop is worth full unrolling.		/// \brief Figure out if the loop is worth full unrolling.
///		///
/// Complete loop unrolling can make some loads constant, and we need to know		/// Complete loop unrolling can make some loads constant, and we need to know
/// if that would expose any further optimization opportunities. This routine		/// if that would expose any further optimization opportunities. This routine
/// estimates this optimization. It assigns computed number of instructions,		/// estimates this optimization. It assigns computed number of instructions,
Show All 20 Lines	analyzeLoopUnrollCost(const Loop *L, unsigned TripCount, ScalarEvolution &SE,

SmallSetVector<BasicBlock *, 16> BBWorklist;		SmallSetVector<BasicBlock *, 16> BBWorklist;
DenseMap<Value , Constant > SimplifiedValues;		DenseMap<Value , Constant > SimplifiedValues;

// Use a cache to access SCEV expressions so that we don't pay the cost on		// Use a cache to access SCEV expressions so that we don't pay the cost on
// each iteration. This cache is lazily self-populating.		// each iteration. This cache is lazily self-populating.
SCEVCache SC(*L, SE);		SCEVCache SC(*L, SE);

unsigned NumberOfOptimizedInstructions = 0;		// The estimated cost of the unrolled form of the loop. We try to estimate
unsigned UnrolledLoopSize = 0;		// this by simplifying as much as we can while computing the estimate.
		unsigned UnrolledCost = 0;
		// We also track the estimated dynamic (that is, actually executed) cost in
		// the rolled form. This helps identify cases when the savings from unrolling
		// aren't just exposing dead control flows, but actual reduced dynamic
		// instructions due to the simplifications which we expect to occur after
		// unrolling.
		unsigned RolledDynamicCost = 0;

// Simulate execution of each iteration of the loop counting instructions,		// Simulate execution of each iteration of the loop counting instructions,
// which would be simplified.		// which would be simplified.
// Since the same load will take different values on different iterations,		// Since the same load will take different values on different iterations,
// we literally have to go through all loop's iterations.		// we literally have to go through all loop's iterations.
for (unsigned Iteration = 0; Iteration < TripCount; ++Iteration) {		for (unsigned Iteration = 0; Iteration < TripCount; ++Iteration) {
SimplifiedValues.clear();		SimplifiedValues.clear();
UnrolledInstAnalyzer Analyzer(Iteration, SimplifiedValues, SC);		UnrolledInstAnalyzer Analyzer(Iteration, SimplifiedValues, SC);

BBWorklist.clear();		BBWorklist.clear();
BBWorklist.insert(L->getHeader());		BBWorklist.insert(L->getHeader());
// Note that we must not cache the size, this loop grows the worklist.		// Note that we must not cache the size, this loop grows the worklist.
for (unsigned Idx = 0; Idx != BBWorklist.size(); ++Idx) {		for (unsigned Idx = 0; Idx != BBWorklist.size(); ++Idx) {
BasicBlock *BB = BBWorklist[Idx];		BasicBlock *BB = BBWorklist[Idx];

// Visit all instructions in the given basic block and try to simplify		// Visit all instructions in the given basic block and try to simplify
// it. We don't change the actual IR, just count optimization		// it. We don't change the actual IR, just count optimization
// opportunities.		// opportunities.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
UnrolledLoopSize += TTI.getUserCost(&I);		unsigned InstCost = TTI.getUserCost(&I);

// Visit the instruction to analyze its loop cost after unrolling,		// Visit the instruction to analyze its loop cost after unrolling,
// and if the visitor returns true, then we can optimize this		// and if the visitor returns false, include this instruction in the
// instruction away.		// unrolled cost.
if (Analyzer.visit(I))		if (!Analyzer.visit(I))
NumberOfOptimizedInstructions += TTI.getUserCost(&I);		UnrolledCost += InstCost;

		// Also track this instructions expected cost when executing the rolled
		// loop form.
		RolledDynamicCost += InstCost;

// If unrolled body turns out to be too big, bail out.		// If unrolled body turns out to be too big, bail out.
if (UnrolledLoopSize - NumberOfOptimizedInstructions >		if (UnrolledCost > MaxUnrolledLoopSize)
MaxUnrolledLoopSize)
return None;		return None;
}		}

// Add BB's successors to the worklist.		// Add BB's successors to the worklist.
for (BasicBlock *Succ : successors(BB))		for (BasicBlock *Succ : successors(BB))
if (L->contains(Succ))		if (L->contains(Succ))
BBWorklist.insert(Succ);		BBWorklist.insert(Succ);
}		}

// If we found no optimization opportunities on the first iteration, we		// If we found no optimization opportunities on the first iteration, we
// won't find them on later ones too.		// won't find them on later ones too.
if (!NumberOfOptimizedInstructions)		if (UnrolledCost == RolledDynamicCost)
return None;		return None;
}		}
return {{NumberOfOptimizedInstructions, UnrolledLoopSize}};		return {{UnrolledCost, RolledDynamicCost}};
}		}

/// ApproximateLoopSize - Approximate the size of the loop.		/// ApproximateLoopSize - Approximate the size of the loop.
static unsigned ApproximateLoopSize(const Loop *L, unsigned &NumCalls,		static unsigned ApproximateLoopSize(const Loop *L, unsigned &NumCalls,
bool &NotDuplicatable,		bool &NotDuplicatable,
const TargetTransformInfo &TTI,		const TargetTransformInfo &TTI,
AssumptionCache *AC) {		AssumptionCache *AC) {
SmallPtrSet<const Value *, 32> EphValues;		SmallPtrSet<const Value *, 32> EphValues;
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	static void SetLoopAlreadyUnrolled(Loop *L) {
MDs.push_back(DisableNode);		MDs.push_back(DisableNode);

MDNode *NewLoopID = MDNode::get(Context, MDs);		MDNode *NewLoopID = MDNode::get(Context, MDs);
// Set operand 0 to refer to the loop id itself.		// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);		NewLoopID->replaceOperandWith(0, NewLoopID);
L->setLoopID(NewLoopID);		L->setLoopID(NewLoopID);
}		}

bool LoopUnroll::canUnrollCompletely(		bool LoopUnroll::canUnrollCompletely(Loop *L, unsigned Threshold,
Loop *L, unsigned Threshold, unsigned AbsoluteThreshold,		unsigned PercentDynamicCostSavedThreshold,
uint64_t UnrolledSize, unsigned NumberOfOptimizedInstructions,		unsigned DynamicCostSavingsDiscount,
unsigned PercentOfOptimizedForCompleteUnroll) {		unsigned UnrolledCost,
		unsigned RolledDynamicCost) {

if (Threshold == NoThreshold) {		if (Threshold == NoThreshold) {
DEBUG(dbgs() << " Can fully unroll, because no threshold is set.\n");		DEBUG(dbgs() << " Can fully unroll, because no threshold is set.\n");
return true;		return true;
}		}

if (UnrolledSize <= Threshold) {		if (UnrolledCost <= Threshold) {
DEBUG(dbgs() << " Can fully unroll, because unrolled size: "		DEBUG(dbgs() << " Can fully unroll, because unrolled cost: "
<< UnrolledSize << "<" << Threshold << "\n");		<< UnrolledCost << "<" << Threshold << "\n");
return true;		return true;
}		}

assert(UnrolledSize && "UnrolledSize can't be 0 at this point.");		assert(UnrolledCost && "UnrolledCost can't be 0 at this point.");
unsigned PercentOfOptimizedInstructions =		assert(RolledDynamicCost >= UnrolledCost &&
(uint64_t)NumberOfOptimizedInstructions * 100ull / UnrolledSize;		"Cannot have a higher unrolled cost than a rolled cost!");

if (UnrolledSize <= AbsoluteThreshold &&		// Compute the percentage of the dynamic cost in the rolled form that is
PercentOfOptimizedInstructions >= PercentOfOptimizedForCompleteUnroll) {		// saved when unrolled. If unrolling dramatically reduces the estimated
DEBUG(dbgs() << " Can fully unroll, because unrolling will help removing "		// dynamic cost of the loop, we use a higher threshold to allow more
<< PercentOfOptimizedInstructions		// unrolling.
<< "% instructions (threshold: "		unsigned PercentDynamicCostSaved =
<< PercentOfOptimizedForCompleteUnroll << "%)\n");		(uint64_t)(RolledDynamicCost - UnrolledCost) * 100ull / RolledDynamicCost;
DEBUG(dbgs() << " Unrolled size (" << UnrolledSize
<< ") is less than the threshold (" << AbsoluteThreshold		if (PercentDynamicCostSaved >= PercentDynamicCostSavedThreshold &&
<< ").\n");		(int64_t)UnrolledCost - (int64_t)DynamicCostSavingsDiscount <=
		(int64_t)Threshold) {
		DEBUG(dbgs() << " Can fully unroll, because unrolling will reduce the "
		"expected dynamic cost by " << PercentDynamicCostSaved
		<< "% (threshold: " << PercentDynamicCostSavedThreshold
		<< "%)\n"
		<< " and the unrolled cost (" << UnrolledCost
		<< ") is less than the max threshold ("
		<< DynamicCostSavingsDiscount << ").\n");
return true;		return true;
}		}

DEBUG(dbgs() << " Too large to fully unroll:\n");		DEBUG(dbgs() << " Too large to fully unroll:\n");
DEBUG(dbgs() << " Unrolled size: " << UnrolledSize << "\n");		DEBUG(dbgs() << " Threshold: " << Threshold << "\n");
DEBUG(dbgs() << " Estimated number of optimized instructions: "		DEBUG(dbgs() << " Max threshold: " << DynamicCostSavingsDiscount << "\n");
<< NumberOfOptimizedInstructions << "\n");		DEBUG(dbgs() << " Percent cost saved threshold: "
DEBUG(dbgs() << " Absolute threshold: " << AbsoluteThreshold << "\n");		<< PercentDynamicCostSavedThreshold << "%\n");
DEBUG(dbgs() << " Minimum percent of removed instructions: "		DEBUG(dbgs() << " Unrolled cost: " << UnrolledCost << "\n");
<< PercentOfOptimizedForCompleteUnroll << "\n");		DEBUG(dbgs() << " Rolled dynamic cost: " << RolledDynamicCost << "\n");
DEBUG(dbgs() << " Threshold for small loops: " << Threshold << "\n");		DEBUG(dbgs() << " Percent cost saved: " << PercentDynamicCostSaved
		<< "\n");
return false;		return false;
}		}

unsigned LoopUnroll::selectUnrollCount(		unsigned LoopUnroll::selectUnrollCount(
const Loop *L, unsigned TripCount, bool PragmaFullUnroll,		const Loop *L, unsigned TripCount, bool PragmaFullUnroll,
unsigned PragmaCount, const TargetTransformInfo::UnrollingPreferences &UP,		unsigned PragmaCount, const TargetTransformInfo::UnrollingPreferences &UP,
bool &SetExplicitly) {		bool &SetExplicitly) {
SetExplicitly = true;		SetExplicitly = true;
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	if (notDuplicatable) {
return false;		return false;
}		}
if (NumInlineCandidates != 0) {		if (NumInlineCandidates != 0) {
DEBUG(dbgs() << " Not unrolling loop with inlinable calls.\n");		DEBUG(dbgs() << " Not unrolling loop with inlinable calls.\n");
return false;		return false;
}		}

unsigned Threshold, PartialThreshold;		unsigned Threshold, PartialThreshold;
unsigned AbsoluteThreshold, PercentOfOptimizedForCompleteUnroll;		unsigned PercentDynamicCostSavedThreshold;
		unsigned DynamicCostSavingsDiscount;
selectThresholds(L, HasPragma, UP, Threshold, PartialThreshold,		selectThresholds(L, HasPragma, UP, Threshold, PartialThreshold,
AbsoluteThreshold, PercentOfOptimizedForCompleteUnroll);		PercentDynamicCostSavedThreshold,
		DynamicCostSavingsDiscount);

// Given Count, TripCount and thresholds determine the type of		// Given Count, TripCount and thresholds determine the type of
// unrolling which is to be performed.		// unrolling which is to be performed.
enum { Full = 0, Partial = 1, Runtime = 2 };		enum { Full = 0, Partial = 1, Runtime = 2 };
int Unrolling;		int Unrolling;
if (TripCount && Count == TripCount) {		if (TripCount && Count == TripCount) {
Unrolling = Partial;		Unrolling = Partial;
// If the loop is really small, we don't need to run an expensive analysis.		// If the loop is really small, we don't need to run an expensive analysis.
if (canUnrollCompletely(		if (canUnrollCompletely(L, Threshold, 100, DynamicCostSavingsDiscount,
L, Threshold, AbsoluteThreshold,		UnrolledSize, UnrolledSize)) {
UnrolledSize, 0, 100)) {
Unrolling = Full;		Unrolling = Full;
} else {		} else {
// The loop isn't that small, but we still can fully unroll it if that		// The loop isn't that small, but we still can fully unroll it if that
// helps to remove a significant number of instructions.		// helps to remove a significant number of instructions.
// To check that, run additional analysis on the loop.		// To check that, run additional analysis on the loop.
if (Optional<EstimatedUnrollCost> Cost =		if (Optional<EstimatedUnrollCost> Cost = analyzeLoopUnrollCost(
analyzeLoopUnrollCost(L, TripCount, *SE, TTI, AbsoluteThreshold))		L, TripCount, *SE, TTI, Threshold + DynamicCostSavingsDiscount))
if (canUnrollCompletely(L, Threshold, AbsoluteThreshold,		if (canUnrollCompletely(L, Threshold, PercentDynamicCostSavedThreshold,
Cost->UnrolledLoopSize,		DynamicCostSavingsDiscount, Cost->UnrolledCost,
Cost->NumberOfOptimizedInstructions,		Cost->RolledDynamicCost)) {
PercentOfOptimizedForCompleteUnroll)) {
Unrolling = Full;		Unrolling = Full;
}		}
}		}
} else if (TripCount && Count < TripCount) {		} else if (TripCount && Count < TripCount) {
Unrolling = Partial;		Unrolling = Partial;
} else {		} else {
Unrolling = Runtime;		Unrolling = Runtime;
}		}
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/full-unroll-bad-geps.ll

	; Check that we don't crash on corner cases.			; Check that we don't crash on corner cases.
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-absolute-threshold=10 -unroll-threshold=10 -unroll-percent-of-optimized-for-complete-unroll=20 -o /dev/null			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=20 -o /dev/null
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	define void @foo1() {			define void @foo1() {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%phi = phi i64 [ 0, %entry ], [ %inc, %for.body ]			%phi = phi i64 [ 0, %entry ], [ %inc, %for.body ]
	Show All 24 Lines

test/Transforms/LoopUnroll/full-unroll-heuristics.ll

	; In this test we check how heuristics for complete unrolling work. We have			; In this test we check how heuristics for complete unrolling work. We have
	; three knobs:			; three knobs:
	; 1) -unroll-threshold			; 1) -unroll-threshold
	; 2) -unroll-absolute-threshold and			; 2) -unroll-max-threshold and
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Please update the comment. mzolotukhin: Please update the comment.
				chandlercAuthorUnsubmitted Not Done Reply Inline Actions Done. chandlerc: Done.
	; 3) -unroll-percent-of-optimized-for-complete-unroll			; 3) -unroll-percent-cost-saved-threshold
	;			;
	; They control loop-unrolling according to the following rules:			; They control loop-unrolling according to the following rules:
	; * If size of unrolled loop exceeds the absoulte threshold, we don't unroll			; * If size of unrolled loop exceeds the absoulte threshold, we don't unroll
	; this loop under any circumstances.			; this loop under any circumstances.
	; * If size of unrolled loop is below the '-unroll-threshold', then we'll			; * If size of unrolled loop is below the '-unroll-threshold', then we'll
	; consider this loop as a very small one, and completely unroll it.			; consider this loop as a very small one, and completely unroll it.
	; * If a loop size is between these two tresholds, we only do complete unroll			; * If a loop size is between these two tresholds, we only do complete unroll
	; it if estimated number of potentially optimized instructions is high (we			; it if estimated number of potentially optimized instructions is high (we
	; specify the minimal percent of such instructions).			; specify the minimal percent of such instructions).

	; In this particular test-case, complete unrolling will allow later			; In this particular test-case, complete unrolling will allow later
	; optimizations to remove ~55% of the instructions, the loop body size is 9,			; optimizations to remove ~55% of the instructions, the loop body size is 9,
	; and unrolled size is 65.			; and unrolled size is 65.

	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-absolute-threshold=10 -unroll-threshold=10 -unroll-percent-of-optimized-for-complete-unroll=20 \| FileCheck %s -check-prefix=TEST1			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=20 -unroll-dynamic-cost-savings-discount=0 \| FileCheck %s -check-prefix=TEST1
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-absolute-threshold=100 -unroll-threshold=10 -unroll-percent-of-optimized-for-complete-unroll=20 \| FileCheck %s -check-prefix=TEST2			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=20 -unroll-dynamic-cost-savings-discount=90 \| FileCheck %s -check-prefix=TEST2
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-absolute-threshold=100 -unroll-threshold=10 -unroll-percent-of-optimized-for-complete-unroll=80 \| FileCheck %s -check-prefix=TEST3			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=80 -unroll-dynamic-cost-savings-discount=90 \| FileCheck %s -check-prefix=TEST3
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-absolute-threshold=100 -unroll-threshold=100 -unroll-percent-of-optimized-for-complete-unroll=80 \| FileCheck %s -check-prefix=TEST4			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=100 -unroll-percent-dynamic-cost-saved-threshold=80 -unroll-dynamic-cost-savings-discount=0 \| FileCheck %s -check-prefix=TEST4

	; If the absolute threshold is too low, or if we can't optimize away requested			; If the absolute threshold is too low, or if we can't optimize away requested
	; percent of instructions, we shouldn't unroll:			; percent of instructions, we shouldn't unroll:
	; TEST1: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv			; TEST1: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv
	; TEST3: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv			; TEST3: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv

	; Otherwise, we should:			; Otherwise, we should:
	; TEST2-NOT: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv			; TEST2-NOT: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv
	Show All 31 Lines