This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
2/4
TargetTransformInfo.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
7/14
LoopUnrollPass.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
1
full-unroll-crashers.ll
1
full-unroll-heuristics-2.ll
-
full-unroll-heuristics-cmp.ll
2
full-unroll-heuristics-dce.ll
-
full-unroll-heuristics-geps.ll
-
full-unroll-heuristics-phi-prop.ll
-
full-unroll-heuristics.ll
3
partial-unroll-const-bounds.ll
1
unroll-heuristics-pgo.ll

Differential D26989

Use continuous boosting factor for complete unroll.
ClosedPublic

Authored by danielcdh on Nov 22 2016, 1:32 PM.

Download Raw Diff

Details

Reviewers

chandlerc
mzolotukhin

Commits

rGcc76344ef5f2: Use continuous boosting factor for complete unroll.
rL290737: Use continuous boosting factor for complete unroll.

Summary

The current loop complete unroll algorithm checks if unrolling complete will reduce the runtime by a certain percentage. If yes, it will apply a fixed boosting factor to the threshold (by discounting cost). The problem for this approach is that the threshold abruptly. This patch makes the boosting factor a function of runtime reduction percentage, capped by a fixed threshold. In this way, the threshold changes continuously.

The patch also simplified the code by reducing one parameter in UP.

The patch only affects code-gen of two speccpu2006 benchmark:

445.gobmk binary size decreases 0.08%, no performance change.
464.h264ref binary size increases 0.24%, no performance change.

Diff Detail

Build Status

Buildable 2469
Build 2469: arc lint + arc unit

Event Timeline

danielcdh updated this revision to Diff 78933.Nov 22 2016, 1:32 PM

danielcdh retitled this revision from to Use continuous boosting factor for complete unroll..

danielcdh updated this object.

danielcdh added a reviewer: mzolotukhin.

danielcdh added a subscriber: llvm-commits.

ping...

Hi Dehao,

Sorry for the delay, I missed the patch for some reason.

A couple of initial remarks:

Could you split unrelated changes into a separate (probably, NFC) patch? For instance, changing int to unsigned.
Did you check compile-time impact of this change? LLVM-testsuite exposed a couple of problems with the original version, so I wonder if it's ok with the new one.

Thanks,
Michael

include/llvm/Analysis/TargetTransformInfo.h
246–257	What is the formula for the threshold boost? Could you include it into the comment?
lib/Transforms/InstCombine/InstCombinePHI.cpp
570 ↗	(On Diff #78933)	Why do we need to remove it?

rebase

update comment

In D26989#611319, @mzolotukhin wrote:

Hi Dehao,

Sorry for the delay, I missed the patch for some reason.

A couple of initial remarks:

Could you split unrelated changes into a separate (probably, NFC) patch? For instance, changing int to unsigned.

Done

Did you check compile-time impact of this change? LLVM-testsuite exposed a couple of problems with the original version, so I wonder if it's ok with the new one.

I checked the compile time impact on speccpu2006, no noticeable compile time change is observed.

Thanks,
Dehao

Thanks,
Michael

lib/Transforms/InstCombine/InstCombinePHI.cpp
570 ↗	(On Diff #78933)	This is irrelevant, removed the change.

ping...

Hi Dehao,

Please find some comments inline.

Thanks,
Michael

include/llvm/Analysis/TargetTransformInfo.h
246–257	I would also add a couple of examples to clarify the intention of this boost. Like if unrolling reduces the loop (in terms of execution time) by a factor of 4x, then we boost the threshold by the factor of 4. If unrolling isn't expected to reduce the running time, then we don't increase the threshold.
lib/Transforms/Scalar/LoopUnrollPass.cpp
49–57	I think this description might be unclear for future users who are not familiar with this patch. We need to reflect what, when, and how this parameter boosts.
766–767	Why do we use `Benefit*Benefit` here? The result is `unsigned`, which means that the values we can get here are very limited (1,2,3, or 4 with `PercentMaxThresholdBoost` = 400). I'd suggest computing the upper bound for the cost, not the benefit to work around it. Also, this code needs some comments, it's not obvious what we're doing here.
test/Transforms/LoopUnroll/full-unroll-crashers.ll
2	Don't we need a corresponding `-unroll-max-percent-threshold-boost` argument here? We need to explicitly pass it so that the test is immune to future changes in default thresholds.
test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll
1	Same here and in other tests.

update

lib/Transforms/Scalar/LoopUnrollPass.cpp
766–767	Updated the code and comments, hopefully makes it clear.

mzolotukhin added inline comments.Dec 14 2016, 5:02 PM

include/llvm/Analysis/TargetTransformInfo.h
252–253	Shouldn't it be 2x instead of 4x?
lib/Transforms/Scalar/LoopUnrollPass.cpp
54	I didn't realize that we use this formula - I thought we're using a linear function. Doesn't it contradict to this? /// BoostedThreshold = Threshold * min(RolledCost / UnrolledCost, /// PercentMaxThresholdBoost)
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
1	I think `unroll-max-percent-threshold-boost` should be `100` to correspond to `unroll-dynamic-cost-savings-discount=0`. BTW, it looks a bit weird that value `100` for the boost means that we are actually not boosting anything.
test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll
1	Same here.

update

include/llvm/Analysis/TargetTransformInfo.h
252–253	Updated the comment to make it consistent.
lib/Transforms/Scalar/LoopUnrollPass.cpp
54	Updated the comment to make it consistent.
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
1	Yes, 100 means 100%, i.e no boosting. Maybe we need to change the naming to make it more accurate, suggestions?

mzolotukhin added inline comments.Dec 15 2016, 12:21 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
54	Where is this 1/(1-X^2) come from? To me from the code it looks like: NewThreshold = DefaultThreshold * X^2 Also, is there any compelling reason not use a simple linear formula for this? E.g. NewThreshold = DefaultThreshold * Y, Y = min(RolledCost/UnrolledCost, BoostLimit)

mzolotukhin added a reviewer: chandlerc.Dec 15 2016, 12:22 PM

Update comment

change to linear boosting factor computation.

danielcdh added inline comments.Dec 15 2016, 2:12 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
54	Thanks for the reviews! Updated the comment to make it clearer. No compelling reason, just thought it would be helpful to promote more complete inline when beneficial. Performance experiments does not justify the non-linear equation, so I changed to linear formula instead.

Hi Dehao,

Thanks, the patch mostly looks good to me (see minor remarks inline). I also would like Chandler to take a look at this too, as we discussed this with him in the past - I've added him to reviewers.

Thanks,
Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
54	The comment is still about quadratic formula:)
765	This computation potentially may overflow. We probably need some checks against it.
test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll
1	Minor: if possible, please don't change `-unroll-threshold=12`. We're replacing `-dynamic-cost-savings-discount` and `-unroll-percent-dynamic-cost-saved-threshold`, so the changes should touch only them.
test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll
24	Can we avoid this change? If no, the comment before the test needs to be updated.

This revision is now accepted and ready to land.Dec 15 2016, 4:04 PM

update

Thanks for the review!

Chandler, could you help take a look at this patch too?

Thanks,
Dehao

test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll
1	Because we are now using linear equation to compute boosting factor, it is not enough to boost 10 to the required threshold. That's the original reason I used non-linear equation.

chandlerc added inline comments.Dec 28 2016, 6:11 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
49–56	Here and above, if the value is a limit or max, start with that. So I would say `MaxPercentThresholdBoost`. In fact, the flag string below is already that order. Also, I liked having these be different from the names in the TTI struct. Can you keep the `Unroll` prefix that used to be here? That also matches the flag text.
51–56	The text here doesn't really parse for me. It starts off talking about something other than what it is. How about: The maximum 'boost' (represented as a percentage >= 100) applied to the threshold when aggressively unrolling a loop due to the dynamic cost savings. If completely unrolling a loop will reduce the total runtime from X to Y, we boost the lop unroll threshold to DefaultThreshold*std::min(MaxPercentThresholdBoost, X/Y). This limit avoids excessive code bloat.
762–768	How about factoring the logic to compute this BoostFactorPercent into a helper function? I think that would make it more clear -- you could use early return to bail out in error conditions. I think it would also be a good place to explain some of the reasoning behind the specific formula (runtime cost / unrolled cost) used to scale the threshold.

integrate Chandler's comments

Thanks for the reviews! PTAL

I'm good with the patch now, thanks (see nit picks below, but no need to refresh the patch, just fix prior to submitting).

I would like to see at least code size numbers for the llvm test suite benchmarks before you submit. (I'm interested if there are runtime changes, but not really worried about them.) If the code size doesn't regress significantly (as SPEC doesn't), LGTM.

lib/Transforms/Scalar/LoopUnrollPass.cpp
54	lop -> loop
762–763	I think this will format in an easier to read way with a variable like `Boost`.

update

In D26989#632450, @chandlerc wrote:

I'm good with the patch now, thanks (see nit picks below, but no need to refresh the patch, just fix prior to submitting).

I would like to see at least code size numbers for the llvm test suite benchmarks before you submit. (I'm interested if there are runtime changes, but not really worried about them.) If the code size doesn't regress significantly (as SPEC doesn't), LGTM.

The code size does not change except for the following 2 binaries:

CMakeFiles/CheckTypeSize/CMAKE_SIZEOF_UNSIGNED_SHORT.bin 8112->8104 (0.1%)
CMakeFiles/TestEndianess.bin 8096->8088 (0.1%)

The run time did not change.

danielcdh closed this revision.Dec 29 2016, 5:01 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

18 lines

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

107 lines

test/

Transforms/

LoopUnroll/

full-unroll-crashers.ll

2 lines

full-unroll-heuristics-2.ll

2 lines

full-unroll-heuristics-cmp.ll

2 lines

full-unroll-heuristics-dce.ll

2 lines

full-unroll-heuristics-geps.ll

2 lines

full-unroll-heuristics-phi-prop.ll

2 lines

full-unroll-heuristics.ll

15 lines

partial-unroll-const-bounds.ll

4 lines

unroll-heuristics-pgo.ll

2 lines

Diff 82704

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	struct UnrollingPreferences {
/// The cost threshold for the unrolled loop. Should be relative to the		/// The cost threshold for the unrolled loop. Should be relative to the
/// getUserCost values returned by this API, and the expectation is that		/// getUserCost values returned by this API, and the expectation is that
/// the unrolled loop's instructions when run through that interface should		/// the unrolled loop's instructions when run through that interface should
/// not exceed this cost. However, this is only an estimate. Also, specific		/// not exceed this cost. However, this is only an estimate. Also, specific
/// loops may be unrolled even with a cost above this threshold if deemed		/// loops may be unrolled even with a cost above this threshold if deemed
/// profitable. Set this to UINT_MAX to disable the loop body cost		/// profitable. Set this to UINT_MAX to disable the loop body cost
/// restriction.		/// restriction.
unsigned Threshold;		unsigned Threshold;
/// If complete unrolling will reduce the cost of the loop below its		/// If complete unrolling will reduce the cost of the loop, we will boost
/// expected dynamic cost while rolled by this percentage, apply a discount		/// the Threshold by a certain percent to allow more aggressive complete
/// (below) to its unrolled cost.		/// unrolling. This value provides the maximum boost percentage that we
unsigned PercentDynamicCostSavedThreshold;		/// can apply to Threshold (The value should be no less than 100).
/// The discount applied to the unrolled cost when the dynamic cost		/// BoostedThreshold = Threshold * min(RolledCost / UnrolledCost,
/// savings of unrolling exceed the \c PercentDynamicCostSavedThreshold.		/// MaxPercentThresholdBoost / 100)
unsigned DynamicCostSavingsDiscount;		/// E.g. if complete unrolling reduces the loop execution time by 50%
		/// then we boost the threshold by the factor of 2x. If unrolling is not
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Shouldn't it be 2x instead of 4x? mzolotukhin: Shouldn't it be 2x instead of 4x?
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Updated the comment to make it consistent. danielcdh: Updated the comment to make it consistent.
		/// expected to reduce the running time, then we do not increase the
		/// threshold.
		unsigned MaxPercentThresholdBoost;
/// The cost threshold for the unrolled loop when optimizing for size (set		/// The cost threshold for the unrolled loop when optimizing for size (set
		mzolotukhinUnsubmitted Done Reply Inline Actions What is the formula for the threshold boost? Could you include it into the comment? mzolotukhin: What is the formula for the threshold boost? Could you include it into the comment?
		mzolotukhinUnsubmitted Done Reply Inline Actions I would also add a couple of examples to clarify the intention of this boost. Like if unrolling reduces the loop (in terms of execution time) by a factor of 4x, then we boost the threshold by the factor of 4. If unrolling isn't expected to reduce the running time, then we don't increase the threshold. mzolotukhin: I would also add a couple of examples to clarify the intention of this boost. Like if unrolling…
/// to UINT_MAX to disable).		/// to UINT_MAX to disable).
unsigned OptSizeThreshold;		unsigned OptSizeThreshold;
/// The cost threshold for the unrolled loop, like Threshold, but used		/// The cost threshold for the unrolled loop, like Threshold, but used
/// for partial/runtime unrolling (set to UINT_MAX to disable).		/// for partial/runtime unrolling (set to UINT_MAX to disable).
unsigned PartialThreshold;		unsigned PartialThreshold;
/// The cost threshold for the unrolled loop when optimizing for size, like		/// The cost threshold for the unrolled loop when optimizing for size, like
/// OptSizeThreshold, but used for partial/runtime unrolling (set to		/// OptSizeThreshold, but used for partial/runtime unrolling (set to
/// UINT_MAX to disable).		/// UINT_MAX to disable).
▲ Show 20 Lines • Show All 928 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show All 40 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-unroll"		#define DEBUG_TYPE "loop-unroll"

static cl::opt<unsigned>		static cl::opt<unsigned>
UnrollThreshold("unroll-threshold", cl::Hidden,		UnrollThreshold("unroll-threshold", cl::Hidden,
cl::desc("The baseline cost threshold for loop unrolling"));		cl::desc("The baseline cost threshold for loop unrolling"));

static cl::opt<unsigned> UnrollPercentDynamicCostSavedThreshold(		static cl::opt<unsigned> UnrollMaxPercentThresholdBoost(
"unroll-percent-dynamic-cost-saved-threshold", cl::init(50), cl::Hidden,		"unroll-max-percent-threshold-boost", cl::init(400), cl::Hidden,
cl::desc("The percentage of estimated dynamic cost which must be saved by "		cl::desc("The maximum 'boost' (represented as a percentage >= 100) applied "
"unrolling to allow unrolling up to the max threshold."));		"to the threshold when aggressively unrolling a loop due to the "
		"dynamic cost savings. If completely unrolling a loop will reduce "
static cl::opt<unsigned> UnrollDynamicCostSavingsDiscount(		"the total runtime from X to Y, we boost the loop unroll "
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I didn't realize that we use this formula - I thought we're using a linear function. Doesn't it contradict to this? /// BoostedThreshold = Threshold * min(RolledCost / UnrolledCost, /// PercentMaxThresholdBoost) mzolotukhin: I didn't realize that we use this formula - I thought we're using a linear function. Doesn't it…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Updated the comment to make it consistent. danielcdh: Updated the comment to make it consistent.
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Where is this 1/(1-X^2) come from? To me from the code it looks like: NewThreshold = DefaultThreshold * X^2 Also, is there any compelling reason not use a simple linear formula for this? E.g. NewThreshold = DefaultThreshold * Y, Y = min(RolledCost/UnrolledCost, BoostLimit) mzolotukhin: Where is this 1/(1-X^2) come from? To me from the code it looks like: ``` NewThreshold =…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the reviews! Updated the comment to make it clearer. No compelling reason, just thought it would be helpful to promote more complete inline when beneficial. Performance experiments does not justify the non-linear equation, so I changed to linear formula instead. danielcdh: Thanks for the reviews! Updated the comment to make it clearer. No compelling reason, just…
		mzolotukhinUnsubmitted Done Reply Inline Actions The comment is still about quadratic formula:) mzolotukhin: The comment is still about quadratic formula:)
		chandlercUnsubmitted Not Done Reply Inline Actions lop -> loop chandlerc: lop -> loop
"unroll-dynamic-cost-savings-discount", cl::init(100), cl::Hidden,		"threshold to DefaultThreshold*std::min(MaxPercentThresholdBoost, "
cl::desc("This is the amount discounted from the total unroll cost when "		"X/Y). This limit avoids excessive code bloat."));
		chandlercUnsubmitted Done Reply Inline Actions Here and above, if the value is a limit or max, start with that. So I would say `MaxPercentThresholdBoost`. In fact, the flag string below is already that order. Also, I liked having these be different from the names in the TTI struct. Can you keep the `Unroll` prefix that used to be here? That also matches the flag text. chandlerc: Here and above, if the value is a limit or max, start with that. So I would say…
		chandlercUnsubmitted Done Reply Inline Actions The text here doesn't really parse for me. It starts off talking about something other than what it is. How about: The maximum 'boost' (represented as a percentage >= 100) applied to the threshold when aggressively unrolling a loop due to the dynamic cost savings. If completely unrolling a loop will reduce the total runtime from X to Y, we boost the lop unroll threshold to DefaultThresholdstd::min(MaxPercentThresholdBoost, X/Y). This limit avoids excessive code bloat. chandlerc:* The text here doesn't really parse for me. It starts off talking about something other than…
"the unrolled form has a high dynamic cost savings (triggered by "
"the '-unroll-perecent-dynamic-cost-saved-threshold' flag)."));

		mzolotukhinUnsubmitted Done Reply Inline Actions I think this description might be unclear for future users who are not familiar with this patch. We need to reflect what, when, and how this parameter boosts. mzolotukhin: I think this description might be unclear for future users who are not familiar with this patch.
static cl::opt<unsigned> UnrollMaxIterationsCountToAnalyze(		static cl::opt<unsigned> UnrollMaxIterationsCountToAnalyze(
"unroll-max-iteration-count-to-analyze", cl::init(10), cl::Hidden,		"unroll-max-iteration-count-to-analyze", cl::init(10), cl::Hidden,
cl::desc("Don't allow loop unrolling to simulate more than this number of"		cl::desc("Don't allow loop unrolling to simulate more than this number of"
"iterations when checking full unroll profitability"));		"iterations when checking full unroll profitability"));

static cl::opt<unsigned> UnrollCount(		static cl::opt<unsigned> UnrollCount(
"unroll-count", cl::Hidden,		"unroll-count", cl::Hidden,
cl::desc("Use this unroll count for all loops including those with "		cl::desc("Use this unroll count for all loops including those with "
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(		static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(
Loop *L, const TargetTransformInfo &TTI, Optional<unsigned> UserThreshold,		Loop *L, const TargetTransformInfo &TTI, Optional<unsigned> UserThreshold,
Optional<unsigned> UserCount, Optional<bool> UserAllowPartial,		Optional<unsigned> UserCount, Optional<bool> UserAllowPartial,
Optional<bool> UserRuntime, Optional<bool> UserUpperBound) {		Optional<bool> UserRuntime, Optional<bool> UserUpperBound) {
TargetTransformInfo::UnrollingPreferences UP;		TargetTransformInfo::UnrollingPreferences UP;

// Set up the defaults		// Set up the defaults
UP.Threshold = 150;		UP.Threshold = 150;
UP.PercentDynamicCostSavedThreshold = 50;		UP.MaxPercentThresholdBoost = 400;
UP.DynamicCostSavingsDiscount = 100;
UP.OptSizeThreshold = 0;		UP.OptSizeThreshold = 0;
UP.PartialThreshold = UP.Threshold;		UP.PartialThreshold = UP.Threshold;
UP.PartialOptSizeThreshold = 0;		UP.PartialOptSizeThreshold = 0;
UP.Count = 0;		UP.Count = 0;
UP.PeelCount = 0;		UP.PeelCount = 0;
UP.DefaultUnrollRuntimeCount = 8;		UP.DefaultUnrollRuntimeCount = 8;
UP.MaxCount = UINT_MAX;		UP.MaxCount = UINT_MAX;
UP.FullUnrollMaxCount = UINT_MAX;		UP.FullUnrollMaxCount = UINT_MAX;
Show All 15 Lines	if (L->getHeader()->getParent()->optForSize()) {
UP.PartialThreshold = UP.PartialOptSizeThreshold;		UP.PartialThreshold = UP.PartialOptSizeThreshold;
}		}

// Apply any user values specified by cl::opt		// Apply any user values specified by cl::opt
if (UnrollThreshold.getNumOccurrences() > 0) {		if (UnrollThreshold.getNumOccurrences() > 0) {
UP.Threshold = UnrollThreshold;		UP.Threshold = UnrollThreshold;
UP.PartialThreshold = UnrollThreshold;		UP.PartialThreshold = UnrollThreshold;
}		}
if (UnrollPercentDynamicCostSavedThreshold.getNumOccurrences() > 0)		if (UnrollMaxPercentThresholdBoost.getNumOccurrences() > 0)
UP.PercentDynamicCostSavedThreshold =		UP.MaxPercentThresholdBoost = UnrollMaxPercentThresholdBoost;
UnrollPercentDynamicCostSavedThreshold;
if (UnrollDynamicCostSavingsDiscount.getNumOccurrences() > 0)
UP.DynamicCostSavingsDiscount = UnrollDynamicCostSavingsDiscount;
if (UnrollMaxCount.getNumOccurrences() > 0)		if (UnrollMaxCount.getNumOccurrences() > 0)
UP.MaxCount = UnrollMaxCount;		UP.MaxCount = UnrollMaxCount;
if (UnrollFullMaxCount.getNumOccurrences() > 0)		if (UnrollFullMaxCount.getNumOccurrences() > 0)
UP.FullUnrollMaxCount = UnrollFullMaxCount;		UP.FullUnrollMaxCount = UnrollFullMaxCount;
if (UnrollAllowPartial.getNumOccurrences() > 0)		if (UnrollAllowPartial.getNumOccurrences() > 0)
UP.Partial = UnrollAllowPartial;		UP.Partial = UnrollAllowPartial;
if (UnrollAllowRemainder.getNumOccurrences() > 0)		if (UnrollAllowRemainder.getNumOccurrences() > 0)
UP.AllowRemainder = UnrollAllowRemainder;		UP.AllowRemainder = UnrollAllowRemainder;
▲ Show 20 Lines • Show All 481 Lines • ▼ Show 20 Lines	static void SetLoopAlreadyUnrolled(Loop *L) {
MDs.push_back(DisableNode);		MDs.push_back(DisableNode);

MDNode *NewLoopID = MDNode::get(Context, MDs);		MDNode *NewLoopID = MDNode::get(Context, MDs);
// Set operand 0 to refer to the loop id itself.		// Set operand 0 to refer to the loop id itself.
NewLoopID->replaceOperandWith(0, NewLoopID);		NewLoopID->replaceOperandWith(0, NewLoopID);
L->setLoopID(NewLoopID);		L->setLoopID(NewLoopID);
}		}

static bool canUnrollCompletely(Loop *L, unsigned Threshold,		// Computes the boosting factor for complete unrolling.
unsigned PercentDynamicCostSavedThreshold,		// If fully unrolling the loop would save a lot of RolledDynamicCost, it would
unsigned DynamicCostSavingsDiscount,		// be beneficial to fully unroll the loop even if unrolledcost is large. We
uint64_t UnrolledCost,		// use (RolledDynamicCost / UnrolledCost) to model the unroll benefits to adjust
uint64_t RolledDynamicCost) {		// the unroll threshold.
if (Threshold == NoThreshold) {		static unsigned getFullUnrollBoostingFactor(const EstimatedUnrollCost &Cost,
DEBUG(dbgs() << " Can fully unroll, because no threshold is set.\n");		unsigned MaxPercentThresholdBoost) {
return true;		if (Cost.RolledDynamicCost >= UINT_MAX / 100)
}		return 100;
		else if (Cost.UnrolledCost != 0)
if (UnrolledCost <= Threshold) {		// The boosting factor is RolledDynamicCost / UnrolledCost
DEBUG(dbgs() << " Can fully unroll, because unrolled cost: "		return std::min(100 * Cost.RolledDynamicCost / Cost.UnrolledCost,
<< UnrolledCost << "<=" << Threshold << "\n");		MaxPercentThresholdBoost);
return true;		else
}		return MaxPercentThresholdBoost;

assert(UnrolledCost && "UnrolledCost can't be 0 at this point.");
assert(RolledDynamicCost >= UnrolledCost &&
"Cannot have a higher unrolled cost than a rolled cost!");

// Compute the percentage of the dynamic cost in the rolled form that is
// saved when unrolled. If unrolling dramatically reduces the estimated
// dynamic cost of the loop, we use a higher threshold to allow more
// unrolling.
unsigned PercentDynamicCostSaved =
(uint64_t)(RolledDynamicCost - UnrolledCost) * 100ull / RolledDynamicCost;

if (PercentDynamicCostSaved >= PercentDynamicCostSavedThreshold &&
(int64_t)UnrolledCost - (int64_t)DynamicCostSavingsDiscount <=
(int64_t)Threshold) {
DEBUG(dbgs() << " Can fully unroll, because unrolling will reduce the "
"expected dynamic cost by "
<< PercentDynamicCostSaved << "% (threshold: "
<< PercentDynamicCostSavedThreshold << "%)\n"
<< " and the unrolled cost (" << UnrolledCost
<< ") is less than the max threshold ("
<< DynamicCostSavingsDiscount << ").\n");
return true;
}

DEBUG(dbgs() << " Too large to fully unroll:\n");
DEBUG(dbgs() << " Threshold: " << Threshold << "\n");
DEBUG(dbgs() << " Max threshold: " << DynamicCostSavingsDiscount << "\n");
DEBUG(dbgs() << " Percent cost saved threshold: "
<< PercentDynamicCostSavedThreshold << "%\n");
DEBUG(dbgs() << " Unrolled cost: " << UnrolledCost << "\n");
DEBUG(dbgs() << " Rolled dynamic cost: " << RolledDynamicCost << "\n");
DEBUG(dbgs() << " Percent cost saved: " << PercentDynamicCostSaved
<< "\n");
return false;
}		}

// Returns loop size estimation for unrolled loop.		// Returns loop size estimation for unrolled loop.
static uint64_t getUnrolledLoopSize(		static uint64_t getUnrolledLoopSize(
unsigned LoopSize,		unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP) {		TargetTransformInfo::UnrollingPreferences &UP) {
assert(LoopSize >= UP.BEInsns && "LoopSize should not be less than BEInsns!");		assert(LoopSize >= UP.BEInsns && "LoopSize should not be less than BEInsns!");
return (uint64_t)(LoopSize - UP.BEInsns) * UP.Count + UP.BEInsns;		return (uint64_t)(LoopSize - UP.BEInsns) * UP.Count + UP.BEInsns;
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	static bool computeUnrollCount(
unsigned ExactTripCount = TripCount;		unsigned ExactTripCount = TripCount;
assert((ExactTripCount == 0 \|\| MaxTripCount == 0) &&		assert((ExactTripCount == 0 \|\| MaxTripCount == 0) &&
"ExtractTripCound and MaxTripCount cannot both be non zero.");		"ExtractTripCound and MaxTripCount cannot both be non zero.");
unsigned FullUnrollTripCount = ExactTripCount ? ExactTripCount : MaxTripCount;		unsigned FullUnrollTripCount = ExactTripCount ? ExactTripCount : MaxTripCount;
UP.Count = FullUnrollTripCount;		UP.Count = FullUnrollTripCount;
if (FullUnrollTripCount && FullUnrollTripCount <= UP.FullUnrollMaxCount) {		if (FullUnrollTripCount && FullUnrollTripCount <= UP.FullUnrollMaxCount) {
// When computing the unrolled size, note that BEInsns are not replicated		// When computing the unrolled size, note that BEInsns are not replicated
// like the rest of the loop body.		// like the rest of the loop body.
if (canUnrollCompletely(L, UP.Threshold, 100, UP.DynamicCostSavingsDiscount,		if (getUnrolledLoopSize(LoopSize, UP) < UP.Threshold) {
getUnrolledLoopSize(LoopSize, UP),
getUnrolledLoopSize(LoopSize, UP))) {
UseUpperBound = (MaxTripCount == FullUnrollTripCount);		UseUpperBound = (MaxTripCount == FullUnrollTripCount);
TripCount = FullUnrollTripCount;		TripCount = FullUnrollTripCount;
TripMultiple = UP.UpperBound ? 1 : TripMultiple;		TripMultiple = UP.UpperBound ? 1 : TripMultiple;
return ExplicitUnroll;		return ExplicitUnroll;
} else {		} else {
// The loop isn't that small, but we still can fully unroll it if that		// The loop isn't that small, but we still can fully unroll it if that
// helps to remove a significant number of instructions.		// helps to remove a significant number of instructions.
// To check that, run additional analysis on the loop.		// To check that, run additional analysis on the loop.
if (Optional<EstimatedUnrollCost> Cost = analyzeLoopUnrollCost(		if (Optional<EstimatedUnrollCost> Cost = analyzeLoopUnrollCost(
L, FullUnrollTripCount, DT, *SE, TTI,		L, FullUnrollTripCount, DT, *SE, TTI,
UP.Threshold + UP.DynamicCostSavingsDiscount))		UP.Threshold * UP.MaxPercentThresholdBoost / 100)) {
if (canUnrollCompletely(L, UP.Threshold,		unsigned Boost =
UP.PercentDynamicCostSavedThreshold,		getFullUnrollBoostingFactor(*Cost, UP.MaxPercentThresholdBoost);
UP.DynamicCostSavingsDiscount,		if (Cost->UnrolledCost < UP.Threshold * Boost / 100) {
		chandlercUnsubmitted Done Reply Inline Actions I think this will format in an easier to read way with a variable like `Boost`. chandlerc: I think this will format in an easier to read way with a variable like `Boost`.
Cost->UnrolledCost, Cost->RolledDynamicCost)) {
UseUpperBound = (MaxTripCount == FullUnrollTripCount);		UseUpperBound = (MaxTripCount == FullUnrollTripCount);
TripCount = FullUnrollTripCount;		TripCount = FullUnrollTripCount;
		mzolotukhinUnsubmitted Done Reply Inline Actions This computation potentially may overflow. We probably need some checks against it. mzolotukhin: This computation potentially may overflow. We probably need some checks against it.
TripMultiple = UP.UpperBound ? 1 : TripMultiple;		TripMultiple = UP.UpperBound ? 1 : TripMultiple;
return ExplicitUnroll;		return ExplicitUnroll;
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Why do we use `BenefitBenefit` here? The result is `unsigned`, which means that the values we can get here are very limited (1,2,3, or 4 with `PercentMaxThresholdBoost` = 400). I'd suggest computing the upper bound for the cost, not the benefit to work around it. Also, this code needs some comments, it's not obvious what we're doing here. mzolotukhin:* 1) Why do we use `Benefit*Benefit` here? 2) The result is `unsigned`, which means that the…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Updated the code and comments, hopefully makes it clear. danielcdh: Updated the code and comments, hopefully makes it clear.
}		}
		chandlercUnsubmitted Done Reply Inline Actions How about factoring the logic to compute this BoostFactorPercent into a helper function? I think that would make it more clear -- you could use early return to bail out in error conditions. I think it would also be a good place to explain some of the reasoning behind the specific formula (runtime cost / unrolled cost) used to scale the threshold. chandlerc: How about factoring the logic to compute this BoostFactorPercent into a helper function? I…
}		}
}		}
		}

// 4rd priority is partial unrolling.		// 4rd priority is partial unrolling.
// Try partial unroll only when TripCount could be staticaly calculated.		// Try partial unroll only when TripCount could be staticaly calculated.
if (TripCount) {		if (TripCount) {
UP.Partial \|= ExplicitUnroll;		UP.Partial \|= ExplicitUnroll;
if (!UP.Partial) {		if (!UP.Partial) {
DEBUG(dbgs() << " will not try to unroll partially because "		DEBUG(dbgs() << " will not try to unroll partially because "
<< "-unroll-allow-partial not given\n");		<< "-unroll-allow-partial not given\n");
▲ Show 20 Lines • Show All 374 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/full-unroll-crashers.ll

	; Check that we don't crash on corner cases.			; Check that we don't crash on corner cases.
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=1 -unroll-percent-dynamic-cost-saved-threshold=20 -o /dev/null			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=1 -unroll-max-percent-threshold-boost=200 -o /dev/null
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Don't we need a corresponding `-unroll-max-percent-threshold-boost` argument here? We need to explicitly pass it so that the test is immune to future changes in default thresholds. mzolotukhin: Don't we need a corresponding `-unroll-max-percent-threshold-boost` argument here? We need to…
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	@known_constant = internal unnamed_addr constant [10 x i32] [i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1], align 16			@known_constant = internal unnamed_addr constant [10 x i32] [i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1], align 16

	define void @foo1() {			define void @foo1() {
	entry:			entry:
	br label %for.body			br label %for.body

	▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll

	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=70 -unroll-dynamic-cost-savings-discount=90 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-max-percent-threshold-boost=200 \| FileCheck %s
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Same here and in other tests. mzolotukhin: Same here and in other tests.
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	@unknown_global = internal unnamed_addr global [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16			@unknown_global = internal unnamed_addr global [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16
	@weak_constant = weak unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16			@weak_constant = weak unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16

	; Though @unknown_global is initialized with constant values, we can't consider			; Though @unknown_global is initialized with constant values, we can't consider
	; it as a constant, so we shouldn't unroll the loop.			; it as a constant, so we shouldn't unroll the loop.
	; CHECK-LABEL: @foo			; CHECK-LABEL: @foo
	▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/full-unroll-heuristics-cmp.ll

	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-dynamic-cost-savings-discount=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=40 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-threshold=10 -unroll-max-percent-threshold-boost=200 \| FileCheck %s
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	@known_constant = internal unnamed_addr constant [10 x i32] [i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1], align 16			@known_constant = internal unnamed_addr constant [10 x i32] [i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1], align 16

	; If we can figure out result of comparison on each iteration, we can resolve			; If we can figure out result of comparison on each iteration, we can resolve
	; the depending branch. That means, that the unrolled version of the loop would			; the depending branch. That means, that the unrolled version of the loop would
	; have less code, because we don't need not-taken basic blocks there.			; have less code, because we don't need not-taken basic blocks there.
	; This test checks that this is taken into consideration.			; This test checks that this is taken into consideration.
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll

	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-dynamic-cost-savings-discount=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=60 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-threshold=12 -unroll-max-percent-threshold-boost=400 \| FileCheck %s
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Minor: if possible, please don't change `-unroll-threshold=12`. We're replacing `-dynamic-cost-savings-discount` and `-unroll-percent-dynamic-cost-saved-threshold`, so the changes should touch only them. mzolotukhin: Minor: if possible, please don't change `-unroll-threshold=12`. We're replacing `-dynamic-cost…
				danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Because we are now using linear equation to compute boosting factor, it is not enough to boost 10 to the required threshold. That's the original reason I used non-linear equation. danielcdh: Because we are now using linear equation to compute boosting factor, it is not enough to boost…
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	@known_constant = internal unnamed_addr constant [10 x i32] [i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0], align 16			@known_constant = internal unnamed_addr constant [10 x i32] [i32 0, i32 0, i32 0, i32 0, i32 1, i32 0, i32 0, i32 0, i32 0, i32 0], align 16

	; If a load becomes a constant after loop unrolling, we sometimes can simplify			; If a load becomes a constant after loop unrolling, we sometimes can simplify
	; CFG. This test verifies that we handle such cases.			; CFG. This test verifies that we handle such cases.
	; After one operand in an instruction is constant-folded and the			; After one operand in an instruction is constant-folded and the
	; instruction is simplified, the other operand might become dead.			; instruction is simplified, the other operand might become dead.
	Show All 29 Lines

test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll

	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-dynamic-cost-savings-discount=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=60 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-threshold=10 -unroll-max-percent-threshold-boost=200 \| FileCheck %s
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	; When examining gep-instructions we shouldn't consider them simplified if the			; When examining gep-instructions we shouldn't consider them simplified if the
	; corresponding memory access isn't simplified. Doing the opposite might bias			; corresponding memory access isn't simplified. Doing the opposite might bias
	; our estimate, so that we might decide to unroll even a simple memcpy loop.			; our estimate, so that we might decide to unroll even a simple memcpy loop.
	;			;
	; Thus, the following loop shouldn't be unrolled:			; Thus, the following loop shouldn't be unrolled:
	; CHECK-LABEL: @not_simplified_geps			; CHECK-LABEL: @not_simplified_geps
	Show All 19 Lines

test/Transforms/LoopUnroll/full-unroll-heuristics-phi-prop.ll

	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-dynamic-cost-savings-discount=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=50 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=100 -unroll-threshold=10 -unroll-max-percent-threshold-boost=200 \| FileCheck %s
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	define i64 @propagate_loop_phis() {			define i64 @propagate_loop_phis() {
	; CHECK-LABEL: @propagate_loop_phis(			; CHECK-LABEL: @propagate_loop_phis(
	; CHECK-NOT: br i1			; CHECK-NOT: br i1
	; CHECK: ret i64 3			; CHECK: ret i64 3
	entry:			entry:
	br label %loop			br label %loop
	Show All 14 Lines

test/Transforms/LoopUnroll/full-unroll-heuristics.ll

	Show All 11 Lines
	; * If a loop size is between these two tresholds, we only do complete unroll			; * If a loop size is between these two tresholds, we only do complete unroll
	; it if estimated number of potentially optimized instructions is high (we			; it if estimated number of potentially optimized instructions is high (we
	; specify the minimal percent of such instructions).			; specify the minimal percent of such instructions).

	; In this particular test-case, complete unrolling will allow later			; In this particular test-case, complete unrolling will allow later
	; optimizations to remove ~55% of the instructions, the loop body size is 9,			; optimizations to remove ~55% of the instructions, the loop body size is 9,
	; and unrolled size is 65.			; and unrolled size is 65.

	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=20 -unroll-dynamic-cost-savings-discount=0 \| FileCheck %s -check-prefix=TEST1			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-max-percent-threshold-boost=100 \| FileCheck %s -check-prefix=TEST1
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=20 -unroll-dynamic-cost-savings-discount=90 \| FileCheck %s -check-prefix=TEST2			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=20 -unroll-max-percent-threshold-boost=200 \| FileCheck %s -check-prefix=TEST2
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=10 -unroll-percent-dynamic-cost-saved-threshold=80 -unroll-dynamic-cost-savings-discount=90 \| FileCheck %s -check-prefix=TEST3			; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=20 -unroll-max-percent-threshold-boost=100 \| FileCheck %s -check-prefix=TEST3
	; RUN: opt < %s -S -loop-unroll -unroll-max-iteration-count-to-analyze=1000 -unroll-threshold=100 -unroll-percent-dynamic-cost-saved-threshold=80 -unroll-dynamic-cost-savings-discount=0 \| FileCheck %s -check-prefix=TEST4

	; If the absolute threshold is too low, or if we can't optimize away requested			; If the absolute threshold is too low, we should not unroll:
	; percent of instructions, we shouldn't unroll:
	; TEST1: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv			; TEST1: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv
	; TEST3: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv

	; Otherwise, we should:			; Otherwise, we should:
	; TEST2-NOT: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv			; TEST2-NOT: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv

	; Also, we should unroll if the 'unroll-threshold' is big enough:			; If we do not boost threshold, the unroll will not happen:
	; TEST4-NOT: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv			; TEST3: %array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv

	; And check that we don't crash when we're not allowed to do any analysis.			; And check that we don't crash when we're not allowed to do any analysis.
	; RUN: opt < %s -loop-unroll -unroll-max-iteration-count-to-analyze=0 -disable-output			; RUN: opt < %s -loop-unroll -unroll-max-iteration-count-to-analyze=0 -disable-output
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

	@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16			@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16

	define i32 @foo(i32* noalias nocapture readonly %src) {			define i32 @foo(i32* noalias nocapture readonly %src) {
	Show All 20 Lines

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll

	; RUN: opt < %s -S -unroll-threshold=20 -loop-unroll -unroll-allow-partial -unroll-runtime -unroll-allow-remainder -unroll-dynamic-cost-savings-discount=0 \| FileCheck %s			; RUN: opt < %s -S -unroll-threshold=20 -loop-unroll -unroll-allow-partial -unroll-runtime -unroll-allow-remainder -unroll-max-percent-threshold-boost=100 \| FileCheck %s
				mzolotukhinUnsubmitted Not Done Reply Inline Actions I think `unroll-max-percent-threshold-boost` should be `100` to correspond to `unroll-dynamic-cost-savings-discount=0`. BTW, it looks a bit weird that value `100` for the boost means that we are actually not boosting anything. mzolotukhin: I think `unroll-max-percent-threshold-boost` should be `100` to correspond to `unroll-dynamic…
				danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Yes, 100 means 100%, i.e no boosting. Maybe we need to change the naming to make it more accurate, suggestions? danielcdh: Yes, 100 means 100%, i.e no boosting. Maybe we need to change the naming to make it more…

	; The Loop TripCount is 9. However unroll factors 3 or 9 exceed given threshold.			; The Loop TripCount is 9. However unroll factors 3 or 9 exceed given threshold.
	; The test checks that we choose a smaller, power-of-two, unroll count and do not give up on unrolling.			; The test checks that we choose a smaller, power-of-two, unroll count and do not give up on unrolling.

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: store			; CHECK: store
	; CHECK: for.body.1:			; CHECK: for.body.1:
	; CHECK: store			; CHECK: store

	define void @foo(i32* nocapture %a, i32* nocapture readonly %b) nounwind uwtable {			define void @foo(i32* nocapture %a, i32* nocapture readonly %b) nounwind uwtable {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ 1, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 1, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
	%ld = load i32, i32* %arrayidx, align 4			%ld = load i32, i32* %arrayidx, align 4
	%idxprom1 = sext i32 %ld to i64			%idxprom1 = sext i32 %ld to i64
	%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %idxprom1			%arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %idxprom1
	%st = trunc i64 %indvars.iv to i32			%st = trunc i64 %indvars.iv to i32
	store i32 %st, i32* %arrayidx2, align 4			store i32 %st, i32* %arrayidx2, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 10			%exitcond = icmp eq i64 %indvars.iv.next, 20
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Can we avoid this change? If no, the comment before the test needs to be updated. mzolotukhin: Can we avoid this change? If no, the comment before the test needs to be updated.
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll

	; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-threshold=40 -unroll-dynamic-cost-savings-discount=0 \| FileCheck %s			; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-threshold=40 -unroll-max-percent-threshold-boost=100 \| FileCheck %s
				mzolotukhinUnsubmitted Not Done Reply Inline Actions Same here. mzolotukhin: Same here.

	@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16			@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16

	; CHECK-LABEL: @bar_prof			; CHECK-LABEL: @bar_prof
	; CHECK: loop.prol:			; CHECK: loop.prol:
	; CHECK: loop:			; CHECK: loop:
	; CHECK: %mul = mul			; CHECK: %mul = mul
	; CHECK: %mul.1 = mul			; CHECK: %mul.1 = mul
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Use continuous boosting factor for complete unroll.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 82704

include/llvm/Analysis/TargetTransformInfo.h

lib/Transforms/Scalar/LoopUnrollPass.cpp

test/Transforms/LoopUnroll/full-unroll-crashers.ll

test/Transforms/LoopUnroll/full-unroll-heuristics-2.ll

test/Transforms/LoopUnroll/full-unroll-heuristics-cmp.ll

test/Transforms/LoopUnroll/full-unroll-heuristics-dce.ll

test/Transforms/LoopUnroll/full-unroll-heuristics-geps.ll

test/Transforms/LoopUnroll/full-unroll-heuristics-phi-prop.ll

test/Transforms/LoopUnroll/full-unroll-heuristics.ll

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll

test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll

Use continuous boosting factor for complete unroll.
ClosedPublic