This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
Transforms/Utils/
-
Utils/
-
UnrollLoop.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
24
LoopUnrollPass.cpp
-
Utils/
4
LoopUnroll.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
partial-unroll-const-bounds.ll

Differential D19553

Unroll pass restructure.
ClosedPublic

Authored by evstupac on Apr 26 2016, 12:54 PM.

Download Raw Diff

Details

Reviewers

mzolotukhin
sanjoy
escha
resistor
hfinkel

Commits

rGea2aef4a1d4b: The patch refactors unroll pass. Summary: Unroll factor (Count) calculations…

Summary

The patch restructure unroll pass (a part where decision on unroll Count is made).
It adds some comments and early exits when unroll Count decision is obvious.
The patch does not affect trunk, but can affect architectures where unroll remainder is not allowed. However it is easy to disable it in Unroll Preferences structure.

I've got build same on all spec benchmarks for x86.

Diff Detail

Repository: rL LLVM

Event Timeline

evstupac updated this revision to Diff 55065.Apr 26 2016, 12:54 PM

evstupac retitled this revision from to Unroll pass restructure..

evstupac updated this object.

evstupac added reviewers: mzolotukhin, sanjoy, escha, resistor.

evstupac set the repository for this revision to rL LLVM.

evstupac added subscribers: llvm-commits, zansari.

Herald added subscribers: mzolotukhin, sanjoy. · View Herald TranscriptApr 26 2016, 12:54 PM

zzheng added a subscriber: zzheng.Apr 26 2016, 4:06 PM

Hi Evgeny,

Thanks for doing this, I agree that this place is like a big mess now. Please find some comments inline.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
89–93	Do we need `unroll-runtime` at all if we have this option? Also, I'd suggest to separate code restructuring from adding new stuff, like new command line options.
529	I would rename it to `computeUnrollCount`, `findUnrollCount`, or something like this. It looks to me that we're doing much more work in this function than just 'get'.
537–539	We set these parameters here, but we're not guaranteed to early exit. Is this intentional? Maybe set them only if we're going to return? Also, do we need to set `UP.Runtime = true` as we do below?
654–655	Is this comment relevant now? And where did the convergence checks go?
662–666	Can we `assert(UP.Count == TripCount)` here? If yes, it would emphasize the logic, if not, the logic seems flawed.
762	s/isCountSetExplicitly/IsCountSetExplicitly/
lib/Transforms/Utils/LoopUnroll.cpp
202	This change might also be in a separate patch, right?

Hi Michael,

Thanks for looking into proposed new structure.
I'll submit a set of patches for restructure. Finally I want to move all checks preventing unroll from "lib/Transforms/Utils/LoopUnroll.cpp" to "lib/Transforms/Scalar/LoopUnrollPass.cpp".

See my answers inline.

Thanks,
Evgeny

lib/Transforms/Scalar/LoopUnrollPass.cpp
89–93	This is to structure case with "Convergent" operation in a loop and face needs of architectures that suffers from generated remainder. However I agree that option itself could be removed from the patch if UP.AllowRemainder remain "true" by default.
537–539	We set these parameters here, but we're not guaranteed to early exit. Is this intentional? Yes, since user requested unroll. Only exceeded pragma threshold or unsafe transformation can prevent unroll here. When pragma threshold is exceeded we still allow all kinds of unroll, but with less Count. Also, do we need to set UP.Runtime = true as we do below? Yes UP.Runtime whold be reasonable here. However that will cause some tests fail. I'll request this in next patch. Right now I tried to keep current logic to highlight places like this.
654–655	lines 781 - 782. I agree it is better to move the comment (with corresponding changes) there.
662–666	assert(TripCount) will be more accurate as UP.Count could be 0
lib/Transforms/Utils/LoopUnroll.cpp
202	Yes. But I'd like to keep it in this patch. That is related to current "-unroll-count" and "#pragma unroll" behavior. Now: "-unroll-count" forces "full unroll", "partial" and skips "runtime", so go directly to forced unroll when TripCount is runtime. "#pragma unroll" forces "full unroll", "partial" and "runtime" and skip forced unroll even if runtime has failed So to make current behavior clear I'd like to keep this. However you are right this could be done in separate patch.

Updated according latest comments: variables rename, comments fix.

Can we have consolidated unrolled-size computation or threshold enforcement?

Also, please elaborate on forced unroll of runtime loops.

Thanks

lib/Transforms/Scalar/LoopUnrollPass.cpp
540	This formula appears several times... It'll be better to have a consolidated function that computes the unrolled size or enforces the threshold.
566	This feels a little awkward. I would use bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0; ... if (HasPragma \|\| UserUnrollCount)
lib/Transforms/Utils/LoopUnroll.cpp
305	I don't understand this part. If UnrollRuntimeLoopRemainder() returned false, remainder loop is not generated. How do we ensure correctness if we 'force' a loop that has runtime tripcount of 6 to by unrolled by 4?

Thanks for taking a look on the patch.
Please find my answers inline.

lib/Transforms/Scalar/LoopUnrollPass.cpp
540	Good point. We can create something like IsCountMeetThreshold(UP, LoopSize)
566	Agree. This is better.
lib/Transforms/Utils/LoopUnroll.cpp
305	Actually that is the way how -unroll-count works now when runtime unroll is disabled. For this type of unrolling we do not remove conditional branches from unrolled loop. For example unroll by 2: for.body ... cmp br for.body1, exit for.body1 ... cmp br for.body, exit exit My changes do not change current behavior. In my next patch I'll request runtime unroll to be enabled when -unroll-count passed. There could be positive effects of this type of unrolling (like reduced number of executed backward branches). So if user wants a loop to be unrolled compiler should do this (when it is safe).

rebased after recent changes in unroll, fixed some var names.

evstupac added inline comments.May 5 2016, 3:41 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
540	Currently the formula have different parameters at different places. I'm planing to unify this, but this will change current behavior. So this is still a good point, but I'd like to fix this in separate patch.

PING

mzolotukhin added inline comments.May 13 2016, 3:14 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp

537–544

I find it a bit confusing that we change values in UP even if we don't exit the function. Can we rewrite it to something like:

If (condition1) {
  UP = ...;
  return true; // We only change UP when we are going to return
}
if (condition2) {
  UP = ...;
  return true;
}

as opposed to

if (condition1) {
  UP = ...
  if (condition)
    return true;
  // Here we changed UP, but didn't return
}

evstupac added inline comments.May 13 2016, 7:28 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
537–544	We can do so, but still we need to enable all the properties somehow. For "condition1" it will look like: If (condition1) { UP = ...; return true; // We only change UP when we are going to return } if (condition1) UP = ...; That is because if user specified count that exceed threshold we are reducing it, but still allow some certain level of unrolling (Runtime, Full, Force....) Personally I'd better disable unroll and warn user in that case (telling how to enable specified unroll by increasing pragma threshold or reducing count). That will resolve the issue, but change current behavior: if (condition1) { if (condition) { UP = ... } else { UP.Count = 1; Warn(); } return true; } Now it is very annoying when I tell compiler to unroll by 8, but it unrolls by 4 just because of some heuristics without any warnings.

mzolotukhin added inline comments.May 16 2016, 12:16 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
537–544	Personally, I like what you suggested as well. I think when pragmas are used, the user wants full control over what happens and currently we don't provide that. Also, I really like that the code will be much simpler in this case. Others might have different views on that though.

mzolotukhin added a subscriber: hfinkel.May 16 2016, 12:34 PM

Let's wait for other suggestions/comments. However the patch is here 20 days without significant changes.
So there could be more comments only after commit.

lib/Transforms/Scalar/LoopUnrollPass.cpp
537–544	Anyway to stress current logic and ask for changes the code need to be refactored first.

PING.

This certainly looks cleaner, thanks!

My one request, since we're refactoring anyway, could we make some named constant for the number of latch-associated instructions and use it instead of putting '2' everywhere?

This revision is now accepted and ready to land.May 25 2016, 5:00 PM

Thank you for the review and accept.
The patch is important for further improvements.

could we make some named constant for the number of latch-associated instructions and use it instead of putting '2' everywhere?

That's a good point. I'll add a variable for this. However, generally it is not a constant value. It depends on loop instructions and unroll factor. We could have more recurrences to optimize. Say, "s ^= 1" will become constant for every unroll factor which is multiple of 2. I'm going to address this in one of further patches.

Here the version committed.
Added BEInsns var which represents number of optimized instructions when "back edge" becomes "fall through". It replaced constant "2" used for this in all calculations.

committed revision 271071.

Gerolf added a subscriber: Gerolf.May 27 2016, 4:44 PM

Gerolf added inline comments.

lib/Transforms/Scalar/LoopUnrollPass.cpp
547	Please move (LoopSize - BEInsns) * UP.Count + BEInsns to a function and comment properly. It will be invoked many times as the estimated size of the unrolled loop gets checked repeatedly.
639	Isn't that simply UP.Count = max(UP.PartialThreshold - BEInsns/ LoopSize -BEInsns -1, 0)? Probably it also needs a check that LoopSize > BEInsns, but that should probably be an assertion anyway. And for UP.PartialThreshold - BEInsns/LoopSize - BEInsns should probably be fit into a function since there is a similar instance above. It makes sense to do all this now since the intent of this patch is a code cleanup.
700	see above

evstupac added inline comments.May 27 2016, 5:51 PM

lib/Transforms/Scalar/LoopUnrollPass.cpp
639	This code was introduced previously, not in this patch. I'm only refactoring places I've noticed. And yes, there is a lot more places to refactor. That is not just max you've mentioned. Currently LoopSize is defined as max (3, "estimated loop size"), BEInsns is just "2". So LoopSize is always greater than BEInsns. However I agree, that moving UnrolledSize calculation into function and adding the assert to the function is a good point. What I want to do is to move the whole threshold check to a function. And that requires modifications that would change current behavior - so that will go to the next patch. You can look into previous comment for the discussion.

mehdi_amini added a subscriber: mehdi_amini.May 31 2016, 3:16 PM

mehdi_amini added inline comments.

lib/Transforms/Scalar/LoopUnrollPass.cpp
547	I believe this hasn't been done. And there is a discrepancy since some places are doing the arithmetic in 64 bits while others are doing it in 32 bits: `UnrolledSize = (uint64_t)(LoopSize - BEInsns) * TripCount + BEInsns;` vs `UnrolledSize = (LoopSize - BEInsns) * UP.Count + BEInsns;` Coverity reports possible overflows: CID 1356133: (OVERFLOW_BEFORE_WIDEN) Potentially overflowing expression "(LoopSize - BEInsns) * UP.Count" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned).

evstupac mentioned this in D21719: Unroll restructure.Jun 27 2016, 9:50 AM

Eugene.Zelenko closed this revision.Sep 23 2016, 3:13 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

5 lines

Transforms/

Utils/

UnrollLoop.h

8 lines

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

437 lines

Utils/

LoopUnroll.cpp

10 lines

test/

Transforms/

LoopUnroll/

partial-unroll-const-bounds.ll

2 lines

Diff 56357

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	struct UnrollingPreferences {
unsigned FullUnrollMaxCount;		unsigned FullUnrollMaxCount;
/// Allow partial unrolling (unrolling of loops to expand the size of the		/// Allow partial unrolling (unrolling of loops to expand the size of the
/// loop body, not only to eliminate small constant-trip-count loops).		/// loop body, not only to eliminate small constant-trip-count loops).
bool Partial;		bool Partial;
/// Allow runtime unrolling (unrolling of loops to expand the size of the		/// Allow runtime unrolling (unrolling of loops to expand the size of the
/// loop body even when the number of loop iterations is not known at		/// loop body even when the number of loop iterations is not known at
/// compile time).		/// compile time).
bool Runtime;		bool Runtime;
		/// Allow generation of a loop remainder (extra iterations after unroll).
		bool AllowRemainder;
/// Allow emitting expensive instructions (such as divisions) when computing		/// Allow emitting expensive instructions (such as divisions) when computing
/// the trip count of a loop for runtime unrolling.		/// the trip count of a loop for runtime unrolling.
bool AllowExpensiveTripCount;		bool AllowExpensiveTripCount;
		/// Apply loop unroll on any kind of loop
		/// (mainly to loops that fail runtime unrolling).
		bool Force;
};		};

/// \brief Get target-customized preferences for the generic loop unrolling		/// \brief Get target-customized preferences for the generic loop unrolling
/// transformation. The caller will initialize UP with the current		/// transformation. The caller will initialize UP with the current
/// target-independent defaults.		/// target-independent defaults.
void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) const;		void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) const;

/// @}		/// @}
▲ Show 20 Lines • Show All 744 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/UnrollLoop.h

	Show All 23 Lines
	class DominatorTree;			class DominatorTree;
	class Loop;			class Loop;
	class LoopInfo;			class LoopInfo;
	class LPPassManager;			class LPPassManager;
	class MDNode;			class MDNode;
	class Pass;			class Pass;
	class ScalarEvolution;			class ScalarEvolution;

	bool UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool AllowRuntime,			bool UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,
	bool AllowExpensiveTripCount, unsigned TripMultiple,			bool AllowRuntime, bool AllowExpensiveTripCount,
	LoopInfo LI, ScalarEvolution SE, DominatorTree *DT,			unsigned TripMultiple, LoopInfo LI, ScalarEvolution SE,
	AssumptionCache *AC, bool PreserveLCSSA);			DominatorTree DT, AssumptionCache AC, bool PreserveLCSSA);

	bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,			bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
	bool AllowExpensiveTripCount,			bool AllowExpensiveTripCount,
	bool UseEpilogRemainder, LoopInfo *LI,			bool UseEpilogRemainder, LoopInfo *LI,
	ScalarEvolution SE, DominatorTree DT,			ScalarEvolution SE, DominatorTree DT,
	bool PreserveLCSSA);			bool PreserveLCSSA);

	MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);			MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);
	}			}

	#endif			#endif

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> UnrollFullMaxCount(
cl::desc(		cl::desc(
"Set the max unroll count for full unrolling, for testing purposes"));		"Set the max unroll count for full unrolling, for testing purposes"));

static cl::opt<bool>		static cl::opt<bool>
UnrollAllowPartial("unroll-allow-partial", cl::Hidden,		UnrollAllowPartial("unroll-allow-partial", cl::Hidden,
cl::desc("Allows loops to be partially unrolled until "		cl::desc("Allows loops to be partially unrolled until "
"-unroll-threshold loop size is reached."));		"-unroll-threshold loop size is reached."));

		static cl::opt<bool> UnrollAllowRemainder(
		"unroll-allow-remainder", cl::Hidden,
		cl::desc("Allow generation of a loop remainder (extra iterations) "
		"when unrolling a loop."));

static cl::opt<bool>		static cl::opt<bool>
UnrollRuntime("unroll-runtime", cl::ZeroOrMore, cl::Hidden,		UnrollRuntime("unroll-runtime", cl::ZeroOrMore, cl::Hidden,
cl::desc("Unroll loops with run-time trip counts"));		cl::desc("Unroll loops with run-time trip counts"));

static cl::opt<unsigned> PragmaUnrollThreshold(		static cl::opt<unsigned> PragmaUnrollThreshold(
"pragma-unroll-threshold", cl::init(16 * 1024), cl::Hidden,		"pragma-unroll-threshold", cl::init(16 * 1024), cl::Hidden,
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Do we need `unroll-runtime` at all if we have this option? Also, I'd suggest to separate code restructuring from adding new stuff, like new command line options. mzolotukhin: Do we need `unroll-runtime` at all if we have this option? Also, I'd suggest to separate code…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions This is to structure case with "Convergent" operation in a loop and face needs of architectures that suffers from generated remainder. However I agree that option itself could be removed from the patch if UP.AllowRemainder remain "true" by default. evstupac: This is to structure case with "Convergent" operation in a loop and face needs of architectures…
cl::desc("Unrolled size limit for loops with an unroll(full) or "		cl::desc("Unrolled size limit for loops with an unroll(full) or "
"unroll_count pragma."));		"unroll_count pragma."));

/// A magic value for use with the Threshold parameter to indicate		/// A magic value for use with the Threshold parameter to indicate
/// that the loop unroll should be performed regardless of how much		/// that the loop unroll should be performed regardless of how much
/// code expansion would result.		/// code expansion would result.
static const unsigned NoThreshold = UINT_MAX;		static const unsigned NoThreshold = UINT_MAX;

/// Default unroll count for loops with run-time trip count if		/// Default unroll count for loops with run-time trip count if
/// -unroll-count is not set		/// -unroll-count is not set
static const unsigned DefaultUnrollRuntimeCount = 8;		static const unsigned DefaultUnrollRuntimeCount = 8;

/// Gather the various unrolling parameters based on the defaults, compiler		/// Gather the various unrolling parameters based on the defaults, compiler
/// flags, TTI overrides, pragmas, and user specified parameters.		/// flags, TTI overrides and user specified parameters.
static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(		static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(
Loop *L, const TargetTransformInfo &TTI, Optional<unsigned> UserThreshold,		Loop *L, const TargetTransformInfo &TTI, Optional<unsigned> UserThreshold,
Optional<unsigned> UserCount, Optional<bool> UserAllowPartial,		Optional<unsigned> UserCount, Optional<bool> UserAllowPartial,
Optional<bool> UserRuntime, unsigned PragmaCount, bool PragmaFullUnroll,		Optional<bool> UserRuntime) {
bool PragmaEnableUnroll, unsigned TripCount) {
TargetTransformInfo::UnrollingPreferences UP;		TargetTransformInfo::UnrollingPreferences UP;

// Set up the defaults		// Set up the defaults
UP.Threshold = 150;		UP.Threshold = 150;
UP.PercentDynamicCostSavedThreshold = 20;		UP.PercentDynamicCostSavedThreshold = 20;
UP.DynamicCostSavingsDiscount = 2000;		UP.DynamicCostSavingsDiscount = 2000;
UP.OptSizeThreshold = 50;		UP.OptSizeThreshold = 50;
UP.PartialThreshold = UP.Threshold;		UP.PartialThreshold = UP.Threshold;
UP.PartialOptSizeThreshold = UP.OptSizeThreshold;		UP.PartialOptSizeThreshold = UP.OptSizeThreshold;
UP.Count = 0;		UP.Count = 0;
UP.MaxCount = UINT_MAX;		UP.MaxCount = UINT_MAX;
UP.FullUnrollMaxCount = UINT_MAX;		UP.FullUnrollMaxCount = UINT_MAX;
UP.Partial = false;		UP.Partial = false;
UP.Runtime = false;		UP.Runtime = false;
		UP.AllowRemainder = true;
UP.AllowExpensiveTripCount = false;		UP.AllowExpensiveTripCount = false;
		UP.Force = false;

// Override with any target specific settings		// Override with any target specific settings
TTI.getUnrollingPreferences(L, UP);		TTI.getUnrollingPreferences(L, UP);

// Apply size attributes		// Apply size attributes
if (L->getHeader()->getParent()->optForSize()) {		if (L->getHeader()->getParent()->optForSize()) {
UP.Threshold = UP.OptSizeThreshold;		UP.Threshold = UP.OptSizeThreshold;
UP.PartialThreshold = UP.PartialOptSizeThreshold;		UP.PartialThreshold = UP.PartialOptSizeThreshold;
}		}

// Apply unroll count pragmas
if (PragmaCount)
UP.Count = PragmaCount;
else if (PragmaFullUnroll)
UP.Count = TripCount;

// Apply any user values specified by cl::opt		// Apply any user values specified by cl::opt
if (UnrollThreshold.getNumOccurrences() > 0) {		if (UnrollThreshold.getNumOccurrences() > 0) {
UP.Threshold = UnrollThreshold;		UP.Threshold = UnrollThreshold;
UP.PartialThreshold = UnrollThreshold;		UP.PartialThreshold = UnrollThreshold;
}		}
if (UnrollPercentDynamicCostSavedThreshold.getNumOccurrences() > 0)		if (UnrollPercentDynamicCostSavedThreshold.getNumOccurrences() > 0)
UP.PercentDynamicCostSavedThreshold =		UP.PercentDynamicCostSavedThreshold =
UnrollPercentDynamicCostSavedThreshold;		UnrollPercentDynamicCostSavedThreshold;
if (UnrollDynamicCostSavingsDiscount.getNumOccurrences() > 0)		if (UnrollDynamicCostSavingsDiscount.getNumOccurrences() > 0)
UP.DynamicCostSavingsDiscount = UnrollDynamicCostSavingsDiscount;		UP.DynamicCostSavingsDiscount = UnrollDynamicCostSavingsDiscount;
if (UnrollCount.getNumOccurrences() > 0)
UP.Count = UnrollCount;
if (UnrollMaxCount.getNumOccurrences() > 0)		if (UnrollMaxCount.getNumOccurrences() > 0)
UP.MaxCount = UnrollMaxCount;		UP.MaxCount = UnrollMaxCount;
if (UnrollFullMaxCount.getNumOccurrences() > 0)		if (UnrollFullMaxCount.getNumOccurrences() > 0)
UP.FullUnrollMaxCount = UnrollFullMaxCount;		UP.FullUnrollMaxCount = UnrollFullMaxCount;
if (UnrollAllowPartial.getNumOccurrences() > 0)		if (UnrollAllowPartial.getNumOccurrences() > 0)
UP.Partial = UnrollAllowPartial;		UP.Partial = UnrollAllowPartial;
		if (UnrollAllowRemainder.getNumOccurrences() > 0)
		UP.AllowRemainder = UnrollAllowRemainder;
if (UnrollRuntime.getNumOccurrences() > 0)		if (UnrollRuntime.getNumOccurrences() > 0)
UP.Runtime = UnrollRuntime;		UP.Runtime = UnrollRuntime;

// Apply user values provided by argument		// Apply user values provided by argument
if (UserThreshold.hasValue()) {		if (UserThreshold.hasValue()) {
UP.Threshold = *UserThreshold;		UP.Threshold = *UserThreshold;
UP.PartialThreshold = *UserThreshold;		UP.PartialThreshold = *UserThreshold;
}		}
if (UserCount.hasValue())		if (UserCount.hasValue())
UP.Count = *UserCount;		UP.Count = *UserCount;
if (UserAllowPartial.hasValue())		if (UserAllowPartial.hasValue())
UP.Partial = *UserAllowPartial;		UP.Partial = *UserAllowPartial;
if (UserRuntime.hasValue())		if (UserRuntime.hasValue())
UP.Runtime = *UserRuntime;		UP.Runtime = *UserRuntime;

if (PragmaCount > 0 \|\|
((PragmaFullUnroll \|\| PragmaEnableUnroll) && TripCount != 0)) {
// If the loop has an unrolling pragma, we want to be more aggressive with
// unrolling limits. Set thresholds to at least the PragmaTheshold value
// which is larger than the default limits.
if (UP.Threshold != NoThreshold)
UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);
if (UP.PartialThreshold != NoThreshold)
UP.PartialThreshold =
std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);
}

return UP;		return UP;
}		}

namespace {		namespace {
struct EstimatedUnrollCost {		struct EstimatedUnrollCost {
/// \brief The estimated cost after unrolling.		/// \brief The estimated cost after unrolling.
int UnrolledCost;		int UnrolledCost;

▲ Show 20 Lines • Show All 339 Lines • ▼ Show 20 Lines	DEBUG(dbgs() << " Percent cost saved threshold: "
<< PercentDynamicCostSavedThreshold << "%\n");		<< PercentDynamicCostSavedThreshold << "%\n");
DEBUG(dbgs() << " Unrolled cost: " << UnrolledCost << "\n");		DEBUG(dbgs() << " Unrolled cost: " << UnrolledCost << "\n");
DEBUG(dbgs() << " Rolled dynamic cost: " << RolledDynamicCost << "\n");		DEBUG(dbgs() << " Rolled dynamic cost: " << RolledDynamicCost << "\n");
DEBUG(dbgs() << " Percent cost saved: " << PercentDynamicCostSaved		DEBUG(dbgs() << " Percent cost saved: " << PercentDynamicCostSaved
<< "\n");		<< "\n");
return false;		return false;
}		}

static bool tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI,		// Returns true if unroll count was set explicitly.
ScalarEvolution *SE, const TargetTransformInfo &TTI,		// Calculates unroll count and writes it to UP.Count.
AssumptionCache &AC, bool PreserveLCSSA,		static bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI,
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I would rename it to `computeUnrollCount`, `findUnrollCount`, or something like this. It looks to me that we're doing much more work in this function than just 'get'. mzolotukhin: I would rename it to `computeUnrollCount`, `findUnrollCount`, or something like this. It looks…
Optional<unsigned> ProvidedCount,		DominatorTree &DT, LoopInfo *LI,
Optional<unsigned> ProvidedThreshold,		ScalarEvolution *SE, unsigned TripCount,
Optional<bool> ProvidedAllowPartial,		unsigned TripMultiple, unsigned LoopSize,
Optional<bool> ProvidedRuntime) {		TargetTransformInfo::UnrollingPreferences &UP) {
BasicBlock *Header = L->getHeader();		// Check for explicit Count.
DEBUG(dbgs() << "Loop Unroll: F[" << Header->getParent()->getName()		// 1st priority is unroll count set by "unroll-count" option.
<< "] Loop %" << Header->getName() << "\n");		bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0;
		if (UserUnrollCount) {
		UP.Count = UnrollCount;
		UP.AllowExpensiveTripCount = true;
		mzolotukhinUnsubmitted Not Done Reply Inline Actions We set these parameters here, but we're not guaranteed to early exit. Is this intentional? Maybe set them only if we're going to return? Also, do we need to set `UP.Runtime = true` as we do below? mzolotukhin: We set these parameters here, but we're not guaranteed to early exit. Is this intentional?
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions We set these parameters here, but we're not guaranteed to early exit. Is this intentional? Yes, since user requested unroll. Only exceeded pragma threshold or unsafe transformation can prevent unroll here. When pragma threshold is exceeded we still allow all kinds of unroll, but with less Count. Also, do we need to set UP.Runtime = true as we do below? Yes UP.Runtime whold be reasonable here. However that will cause some tests fail. I'll request this in next patch. Right now I tried to keep current logic to highlight places like this. evstupac: > We set these parameters here, but we're not guaranteed to early exit. Is this intentional? >…
		UP.Force = true;
		zzhengUnsubmitted Not Done Reply Inline Actions This formula appears several times... It'll be better to have a consolidated function that computes the unrolled size or enforces the threshold. zzheng: This formula appears several times... It'll be better to have a consolidated function that…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Good point. We can create something like IsCountMeetThreshold(UP, LoopSize) evstupac: Good point. We can create something like IsCountMeetThreshold(UP, LoopSize)
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Currently the formula have different parameters at different places. I'm planing to unify this, but this will change current behavior. So this is still a good point, but I'd like to fix this in separate patch. evstupac: Currently the formula have different parameters at different places. I'm planing to unify this…
		if (UP.AllowRemainder &&
		(LoopSize - 2) * UP.Count + 2 < UP.Threshold)
		return true;
		}
		mzolotukhinUnsubmitted Not Done Reply Inline Actions I find it a bit confusing that we change values in `UP` even if we don't exit the function. Can we rewrite it to something like: If (condition1) { UP = ...; return true; // We only change UP when we are going to return } if (condition2) { UP = ...; return true; } as opposed to if (condition1) { UP = ... if (condition) return true; // Here we changed UP, but didn't return } mzolotukhin: I find it a bit confusing that we change values in `UP` even if we don't exit the function. Can…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions We can do so, but still we need to enable all the properties somehow. For "condition1" it will look like: If (condition1) { UP = ...; return true; // We only change UP when we are going to return } if (condition1) UP = ...; That is because if user specified count that exceed threshold we are reducing it, but still allow some certain level of unrolling (Runtime, Full, Force....) Personally I'd better disable unroll and warn user in that case (telling how to enable specified unroll by increasing pragma threshold or reducing count). That will resolve the issue, but change current behavior: if (condition1) { if (condition) { UP = ... } else { UP.Count = 1; Warn(); } return true; } Now it is very annoying when I tell compiler to unroll by 8, but it unrolls by 4 just because of some heuristics without any warnings. evstupac: We can do so, but still we need to enable all the properties somehow. For "condition1" it will…
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Personally, I like what you suggested as well. I think when pragmas are used, the user wants full control over what happens and currently we don't provide that. Also, I really like that the code will be much simpler in this case. Others might have different views on that though. mzolotukhin: Personally, I like what you suggested as well. I think when pragmas are used, the user wants…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Anyway to stress current logic and ask for changes the code need to be refactored first. evstupac: Anyway to stress current logic and ask for changes the code need to be refactored first.

if (HasUnrollDisablePragma(L)) {		// 2nd priority is unroll count set by pragma.
return false;		unsigned PragmaCount = UnrollCountPragmaValue(L);
		GerolfUnsubmitted Not Done Reply Inline Actions Please move (LoopSize - BEInsns) * UP.Count + BEInsns to a function and comment properly. It will be invoked many times as the estimated size of the unrolled loop gets checked repeatedly. Gerolf: Please move (LoopSize - BEInsns) * UP.Count + BEInsns to a function and comment properly. It…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I believe this hasn't been done. And there is a discrepancy since some places are doing the arithmetic in 64 bits while others are doing it in 32 bits: `UnrolledSize = (uint64_t)(LoopSize - BEInsns) * TripCount + BEInsns;` vs `UnrolledSize = (LoopSize - BEInsns) * UP.Count + BEInsns;` Coverity reports possible overflows: CID 1356133: (OVERFLOW_BEFORE_WIDEN) Potentially overflowing expression "(LoopSize - BEInsns) * UP.Count" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned). mehdi_amini: I believe this hasn't been done. And there is a discrepancy since some places are doing the…
		if (PragmaCount > 0) {
		UP.Count = PragmaCount;
		UP.Runtime = true;
		UP.AllowExpensiveTripCount = true;
		UP.Force = true;
		if (UP.AllowRemainder &&
		(LoopSize - 2) * UP.Count + 2 < PragmaUnrollThreshold)
		return true;
}		}
bool PragmaFullUnroll = HasUnrollFullPragma(L);		bool PragmaFullUnroll = HasUnrollFullPragma(L);
bool PragmaEnableUnroll = HasUnrollEnablePragma(L);		if (PragmaFullUnroll && TripCount != 0) {
unsigned PragmaCount = UnrollCountPragmaValue(L);		UP.Count = TripCount;
bool HasPragma = PragmaFullUnroll \|\| PragmaEnableUnroll \|\| PragmaCount > 0;		if ((LoopSize - 2) * UP.Count + 2 < PragmaUnrollThreshold)
		return false;
// Find trip count and trip multiple if count is not available
unsigned TripCount = 0;
unsigned TripMultiple = 1;
// If there are multiple exiting blocks but one of them is the latch, use the
// latch for the trip count estimation. Otherwise insist on a single exiting
// block for the trip count estimation.
BasicBlock *ExitingBlock = L->getLoopLatch();
if (!ExitingBlock \|\| !L->isLoopExiting(ExitingBlock))
ExitingBlock = L->getExitingBlock();
if (ExitingBlock) {
TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
}		}

TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(		bool PragmaEnableUnroll = HasUnrollEnablePragma(L);
L, TTI, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial,		bool ExplicitUnroll = PragmaCount > 0 \|\| PragmaFullUnroll \|\| PragmaEnableUnroll \|\|
ProvidedRuntime, PragmaCount, PragmaFullUnroll, PragmaEnableUnroll,		UserUnrollCount;
		zzhengUnsubmitted Not Done Reply Inline Actions This feels a little awkward. I would use bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0; ... if (HasPragma \|\| UserUnrollCount) zzheng: This feels a little awkward. I would use ``` bool UserUnrollCount = UnrollCount.
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Agree. This is better. evstupac: Agree. This is better.
TripCount);

unsigned Count = UP.Count;		uint64_t UnrolledSize;
bool CountSetExplicitly = Count != 0;		DebugLoc LoopLoc = L->getStartLoc();
// Use a heuristic count if we didn't set anything explicitly.		Function *F = L->getHeader()->getParent();
if (!CountSetExplicitly)		LLVMContext &Ctx = F->getContext();
Count = TripCount == 0 ? DefaultUnrollRuntimeCount : TripCount;
if (TripCount && Count > TripCount)
Count = TripCount;
Count = std::min(Count, UP.FullUnrollMaxCount);

unsigned NumInlineCandidates;		if (ExplicitUnroll && TripCount != 0) {
bool NotDuplicatable;		// If the loop has an unrolling pragma, we want to be more aggressive with
bool Convergent;		// unrolling limits. Set thresholds to at least the PragmaThreshold value
unsigned LoopSize = ApproximateLoopSize(		// which is larger than the default limits.
L, NumInlineCandidates, NotDuplicatable, Convergent, TTI, &AC);		UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);
DEBUG(dbgs() << " Loop Size = " << LoopSize << "\n");		UP.PartialThreshold =
		std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);
		}

		// 3rd priority is full unroll count.
		// Full unroll make sense only when TripCount could be staticaly calculated.
		// Also we need to check if we exceed FullUnrollMaxCount.
		if (TripCount && TripCount <= UP.FullUnrollMaxCount ) {
// When computing the unrolled size, note that the conditional branch on the		// When computing the unrolled size, note that the conditional branch on the
// backedge and the comparison feeding it are not replicated like the rest of		// backedge and the comparison feeding it are not replicated like the rest of
// the loop body (which is why 2 is subtracted).		// the loop body (which is why 2 is subtracted).
uint64_t UnrolledSize = (uint64_t)(LoopSize - 2) * Count + 2;		UnrolledSize = (uint64_t)(LoopSize - 2) * TripCount + 2;
if (NotDuplicatable) {
DEBUG(dbgs() << " Not unrolling loop which contains non-duplicatable"
<< " instructions.\n");
return false;
}
if (NumInlineCandidates != 0) {
DEBUG(dbgs() << " Not unrolling loop with inlinable calls.\n");
return false;
}

// Given Count, TripCount and thresholds determine the type of
// unrolling which is to be performed.
enum { Full = 0, Partial = 1, Runtime = 2 };
int Unrolling;
if (TripCount && Count == TripCount) {
Unrolling = Partial;
// If the loop is really small, we don't need to run an expensive analysis.
if (canUnrollCompletely(L, UP.Threshold, 100, UP.DynamicCostSavingsDiscount,		if (canUnrollCompletely(L, UP.Threshold, 100, UP.DynamicCostSavingsDiscount,
UnrolledSize, UnrolledSize)) {		UnrolledSize, UnrolledSize)) {
Unrolling = Full;		UP.Count = TripCount;
		return ExplicitUnroll;
} else {		} else {
// The loop isn't that small, but we still can fully unroll it if that		// The loop isn't that small, but we still can fully unroll it if that
// helps to remove a significant number of instructions.		// helps to remove a significant number of instructions.
// To check that, run additional analysis on the loop.		// To check that, run additional analysis on the loop.
if (Optional<EstimatedUnrollCost> Cost = analyzeLoopUnrollCost(		if (Optional<EstimatedUnrollCost> Cost = analyzeLoopUnrollCost(
L, TripCount, DT, *SE, TTI,		L, TripCount, DT, *SE, TTI,
UP.Threshold + UP.DynamicCostSavingsDiscount))		UP.Threshold + UP.DynamicCostSavingsDiscount))
if (canUnrollCompletely(L, UP.Threshold,		if (canUnrollCompletely(L, UP.Threshold,
UP.PercentDynamicCostSavedThreshold,		UP.PercentDynamicCostSavedThreshold,
UP.DynamicCostSavingsDiscount,		UP.DynamicCostSavingsDiscount,
Cost->UnrolledCost, Cost->RolledDynamicCost)) {		Cost->UnrolledCost, Cost->RolledDynamicCost)) {
Unrolling = Full;		UP.Count = TripCount;
		return ExplicitUnroll;
}		}
}		}
} else if (TripCount && Count < TripCount) {
Unrolling = Partial;
} else {
Unrolling = Runtime;
}		}

// Reduce count based on the type of unrolling and the threshold values.		// 4rd priority is partial unrolling.
unsigned OriginalCount = Count;		// Try partial unroll only when TripCount could be staticaly calculated.
bool AllowRuntime = PragmaEnableUnroll \|\| (PragmaCount > 0) \|\| UP.Runtime;		if (TripCount) {
// Don't unroll a runtime trip count loop with unroll full pragma.		if (UP.Count == 0)
if (HasRuntimeUnrollDisablePragma(L) \|\| PragmaFullUnroll) {		UP.Count = TripCount;
AllowRuntime = false;		UP.Partial \|= ExplicitUnroll;
}		if (!UP.Partial) {
bool DecreasedCountDueToConvergence = false;
if (Unrolling == Partial) {
bool AllowPartial = PragmaEnableUnroll \|\| UP.Partial;
if (!AllowPartial && !CountSetExplicitly) {
DEBUG(dbgs() << " will not try to unroll partially because "		DEBUG(dbgs() << " will not try to unroll partially because "
<< "-unroll-allow-partial not given\n");		<< "-unroll-allow-partial not given\n");
		UP.Count = 0;
return false;		return false;
}		}
if (UP.PartialThreshold != NoThreshold && Count > 1) {		if (UP.PartialThreshold != NoThreshold) {
// Reduce unroll count to be modulo of TripCount for partial unrolling.		// Reduce unroll count to be modulo of TripCount for partial unrolling.
		UnrolledSize = (uint64_t)(LoopSize - 2) * UP.Count + 2;
if (UnrolledSize > UP.PartialThreshold)		if (UnrolledSize > UP.PartialThreshold)
Count = (std::max(UP.PartialThreshold, 3u) - 2) / (LoopSize - 2);		UP.Count = (std::max(UP.PartialThreshold, 3u) - 2) / (LoopSize - 2);
if (Count > UP.MaxCount)		if (UP.Count > UP.MaxCount)
Count = UP.MaxCount;		UP.Count = UP.MaxCount;
while (Count != 0 && TripCount % Count != 0)		while (UP.Count != 0 && TripCount % UP.Count != 0)
Count--;		UP.Count--;
if (AllowRuntime && Count <= 1) {		if (UP.AllowRemainder && UP.Count <= 1) {
// If there is no Count that is modulo of TripCount, set Count to		// If there is no Count that is modulo of TripCount, set Count to
// largest power-of-two factor that satisfies the threshold limit.		// largest power-of-two factor that satisfies the threshold limit.
// As we'll create fixup loop, do the type of unrolling only if		// As we'll create fixup loop, do the type of unrolling only if
// runtime unrolling is allowed.		// remainder loop is allowed.
Count = DefaultUnrollRuntimeCount;		UP.Count = DefaultUnrollRuntimeCount;
UnrolledSize = (LoopSize - 2) * Count + 2;		UnrolledSize = (LoopSize - 2) * UP.Count + 2;
while (Count != 0 && UnrolledSize > UP.PartialThreshold) {		while (UP.Count != 0 && UnrolledSize > UP.PartialThreshold) {
		GerolfUnsubmitted Not Done Reply Inline Actions Isn't that simply UP.Count = max(UP.PartialThreshold - BEInsns/ LoopSize -BEInsns -1, 0)? Probably it also needs a check that LoopSize > BEInsns, but that should probably be an assertion anyway. And for UP.PartialThreshold - BEInsns/LoopSize - BEInsns should probably be fit into a function since there is a similar instance above. It makes sense to do all this now since the intent of this patch is a code cleanup. Gerolf: Isn't that simply UP.Count = max(UP.PartialThreshold - BEInsns/ LoopSize -BEInsns -1, 0)?
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions This code was introduced previously, not in this patch. I'm only refactoring places I've noticed. And yes, there is a lot more places to refactor. That is not just max you've mentioned. Currently LoopSize is defined as max (3, "estimated loop size"), BEInsns is just "2". So LoopSize is always greater than BEInsns. However I agree, that moving UnrolledSize calculation into function and adding the assert to the function is a good point. What I want to do is to move the whole threshold check to a function. And that requires modifications that would change current behavior - so that will go to the next patch. You can look into previous comment for the discussion. evstupac: This code was introduced previously, not in this patch. I'm only refactoring places I've…
Count >>= 1;		UP.Count >>= 1;
UnrolledSize = (LoopSize - 2) * Count + 2;		UnrolledSize = (LoopSize - 2) * UP.Count + 2;
}		}
}		}
		if (UP.Count < 2) {
		if (PragmaEnableUnroll)
		emitOptimizationRemarkMissed(
		Ctx, DEBUG_TYPE, *F, LoopLoc,
		"Unable to unroll loop as directed by unroll(enable) pragma "
		"because unrolled size is too large.");
		UP.Count = 0;
		}
		} else {
		UP.Count = TripCount;
		}
		if ((PragmaFullUnroll \|\| PragmaEnableUnroll) && TripCount &&
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Is this comment relevant now? And where did the convergence checks go? mzolotukhin: Is this comment relevant now? And where did the convergence checks go?
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions lines 781 - 782. I agree it is better to move the comment (with corresponding changes) there. evstupac: lines 781 - 782. I agree it is better to move the comment (with corresponding changes) there.
		UP.Count != TripCount)
		emitOptimizationRemarkMissed(
		Ctx, DEBUG_TYPE, *F, LoopLoc,
		"Unable to fully unroll loop as directed by unroll pragma because "
		"unrolled size is too large.");
		return ExplicitUnroll;
		}
		assert(TripCount == 0 &&
		"All cases when TripCount is constant should be covered here.");
		if (PragmaFullUnroll)
		emitOptimizationRemarkMissed(
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Can we `assert(UP.Count == TripCount)` here? If yes, it would emphasize the logic, if not, the logic seems flawed. mzolotukhin: Can we `assert(UP.Count == TripCount)` here? If yes, it would emphasize the logic, if not, the…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions assert(TripCount) will be more accurate as UP.Count could be 0 evstupac: assert(TripCount) will be more accurate as UP.Count could be 0
		Ctx, DEBUG_TYPE, *F, LoopLoc,
		"Unable to fully unroll loop as directed by unroll(full) pragma "
		"because loop has a runtime trip count.");

		// 5th priority is runtime unrolling.
		// Don't unroll a runtime trip count loop when it is disabled.
		if (HasRuntimeUnrollDisablePragma(L)) {
		UP.Count = 0;
		return false;
}		}
} else if (Unrolling == Runtime) {		// Reduce count based on the type of unrolling and the threshold values.
if (!AllowRuntime && !CountSetExplicitly) {		UP.Runtime \|= PragmaEnableUnroll \|\| PragmaCount > 0 \|\| UserUnrollCount;
		if (!UP.Runtime) {
DEBUG(dbgs() << " will not try to unroll loop with runtime trip count "		DEBUG(dbgs() << " will not try to unroll loop with runtime trip count "
<< "-unroll-runtime not given\n");		<< "-unroll-runtime not given\n");
		UP.Count = 0;
return false;		return false;
}		}
		if (UP.Count == 0)
		UP.Count = DefaultUnrollRuntimeCount;
		UnrolledSize = (LoopSize - 2) * UP.Count + 2;

// Reduce unroll count to be the largest power-of-two factor of		// Reduce unroll count to be the largest power-of-two factor of
// the original count which satisfies the threshold limit.		// the original count which satisfies the threshold limit.
while (Count != 0 && UnrolledSize > UP.PartialThreshold) {		while (UP.Count != 0 && UnrolledSize > UP.PartialThreshold) {
Count >>= 1;		UP.Count >>= 1;
UnrolledSize = (LoopSize - 2) * Count + 2;		UnrolledSize = (LoopSize - 2) * UP.Count + 2;
		}

		unsigned OrigCount = UP.Count;

		if (!UP.AllowRemainder && UP.Count != 0 && (TripMultiple % UP.Count) != 0) {
		while (UP.Count != 0 && TripMultiple % UP.Count != 0)
		UP.Count >>= 1;
		GerolfUnsubmitted Not Done Reply Inline Actions see above Gerolf: see above
		DEBUG(dbgs() << "Remainder loop is restricted (that could architecture "
		"specific or because the loop contains a convergent "
		"instruction), so unroll count must divide the trip "
		"multiple, "
		<< TripMultiple << ". Reducing unroll count from "
		<< OrigCount << " to " << UP.Count << ".\n");
		if (PragmaCount > 0 && !UP.AllowRemainder)
		emitOptimizationRemarkMissed(
		Ctx, DEBUG_TYPE, *F, LoopLoc,
		Twine("Unable to unroll loop the number of times directed by "
		"unroll_count pragma because remainder loop is restricted "
		"(that could architecture specific or because the loop "
		"contains a convergent instruction) and so must have an unroll "
		"count that divides the loop trip multiple of ") +
		Twine(TripMultiple) + ". Unrolling instead " + Twine(UP.Count) +
		" time(s).");

}		}

if (Count > UP.MaxCount)		if (UP.Count > UP.MaxCount)
Count = UP.MaxCount;		UP.Count = UP.MaxCount;
		DEBUG(dbgs() << " partially unrolling with count: " << UP.Count << "\n");
		if (UP.Count < 2)
		UP.Count = 0;
		return ExplicitUnroll;
		}

		static bool tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI,
		ScalarEvolution *SE, const TargetTransformInfo &TTI,
		AssumptionCache &AC, bool PreserveLCSSA,
		Optional<unsigned> ProvidedCount,
		Optional<unsigned> ProvidedThreshold,
		Optional<bool> ProvidedAllowPartial,
		Optional<bool> ProvidedRuntime) {
		BasicBlock *Header = L->getHeader();
		DEBUG(dbgs() << "Loop Unroll: F[" << Header->getParent()->getName()
		<< "] Loop %" << Header->getName() << "\n");
		if (HasUnrollDisablePragma(L)) {
		return false;
		}

		unsigned NumInlineCandidates;
		bool NotDuplicatable;
		bool Convergent;
		unsigned LoopSize = ApproximateLoopSize(
		L, NumInlineCandidates, NotDuplicatable, Convergent, TTI, &AC);
		DEBUG(dbgs() << " Loop Size = " << LoopSize << "\n");
		if (NotDuplicatable) {
		DEBUG(dbgs() << " Not unrolling loop which contains non-duplicatable"
		<< " instructions.\n");
		return false;
		}
		if (NumInlineCandidates != 0) {
		DEBUG(dbgs() << " Not unrolling loop with inlinable calls.\n");
		return false;
		}

		// Find trip count and trip multiple if count is not available
		unsigned TripCount = 0;
		unsigned TripMultiple = 1;
		// If there are multiple exiting blocks but one of them is the latch, use the
		// latch for the trip count estimation. Otherwise insist on a single exiting
		mzolotukhinUnsubmitted Not Done Reply Inline Actions s/isCountSetExplicitly/IsCountSetExplicitly/ mzolotukhin: s/isCountSetExplicitly/IsCountSetExplicitly/
		// block for the trip count estimation.
		BasicBlock *ExitingBlock = L->getLoopLatch();
		if (!ExitingBlock \|\| !L->isLoopExiting(ExitingBlock))
		ExitingBlock = L->getExitingBlock();
		if (ExitingBlock) {
		TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
		TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
		}

		TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(
		L, TTI, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial,
		ProvidedRuntime);

// If the loop contains a convergent operation, the prelude we'd add		// If the loop contains a convergent operation, the prelude we'd add
// to do the first few instructions before we hit the unrolled loop		// to do the first few instructions before we hit the unrolled loop
// is unsafe -- it adds a control-flow dependency to the convergent		// is unsafe -- it adds a control-flow dependency to the convergent
// operation. Therefore Count must divide TripMultiple.		// operation. Therefore restrict remainder loop (try unrollig without).
//		//
// TODO: This is quite conservative. In practice, convergent_op()		// TODO: This is quite conservative. In practice, convergent_op()
// is likely to be called unconditionally in the loop. In this		// is likely to be called unconditionally in the loop. In this
// case, the program would be ill-formed (on most architectures)		// case, the program would be ill-formed (on most architectures)
// unless n were the same on all threads in a thread group.		// unless n were the same on all threads in a thread group.
// Assuming n is the same on all threads, any kind of unrolling is		// Assuming n is the same on all threads, any kind of unrolling is
// safe. But currently llvm's notion of convergence isn't powerful		// safe. But currently llvm's notion of convergence isn't powerful
// enough to express this.		// enough to express this.
unsigned OrigCount = Count;		if (Convergent)
while (Convergent && Count != 0 && TripMultiple % Count != 0) {		UP.AllowRemainder = false;
DecreasedCountDueToConvergence = true;
Count >>= 1;
}
if (OrigCount > Count) {
DEBUG(dbgs() << " loop contains a convergent instruction, so unroll "
"count must divide the trip multiple, "
<< TripMultiple << ". Reducing unroll count from "
<< OrigCount << " to " << Count << ".\n");
}
DEBUG(dbgs() << " partially unrolling with count: " << Count << "\n");
}

if (HasPragma) {
// Emit optimization remarks if we are unable to unroll the loop
// as directed by a pragma.
DebugLoc LoopLoc = L->getStartLoc();
Function *F = Header->getParent();
LLVMContext &Ctx = F->getContext();
if (PragmaCount > 0 && DecreasedCountDueToConvergence) {
emitOptimizationRemarkMissed(
Ctx, DEBUG_TYPE, *F, LoopLoc,
Twine("Unable to unroll loop the number of times directed by "
"unroll_count pragma because the loop contains a convergent "
"instruction, and so must have an unroll count that divides "
"the loop trip multiple of ") +
Twine(TripMultiple) + ". Unrolling instead " + Twine(Count) +
" time(s).");
} else if ((PragmaCount > 0) && Count != OriginalCount) {
emitOptimizationRemarkMissed(
Ctx, DEBUG_TYPE, *F, LoopLoc,
"Unable to unroll loop the number of times directed by "
"unroll_count pragma because unrolled size is too large.");
} else if (PragmaFullUnroll && !TripCount) {
emitOptimizationRemarkMissed(
Ctx, DEBUG_TYPE, *F, LoopLoc,
"Unable to fully unroll loop as directed by unroll(full) pragma "
"because loop has a runtime trip count.");
} else if (PragmaEnableUnroll && Count != TripCount && Count < 2) {
emitOptimizationRemarkMissed(
Ctx, DEBUG_TYPE, *F, LoopLoc,
"Unable to unroll loop as directed by unroll(enable) pragma because "
"unrolled size is too large.");
} else if ((PragmaFullUnroll \|\| PragmaEnableUnroll) && TripCount &&
Count != TripCount) {
emitOptimizationRemarkMissed(
Ctx, DEBUG_TYPE, *F, LoopLoc,
"Unable to fully unroll loop as directed by unroll pragma because "
"unrolled size is too large.");
}
}

if (Unrolling != Full && Count < 2) {		bool IsCountSetExplicitly = computeUnrollCount(L, TTI, DT, LI, SE, TripCount,
// Partial unrolling by 1 is a nop. For full unrolling, a factor		TripMultiple, LoopSize, UP);
// of 1 makes sense because loop control can be eliminated.		if (!UP.Count)
return false;		return false;
}		// Unroll factor (Count) must be less or equal to TripCount.
		if (TripCount && UP.Count > TripCount)
		UP.Count = TripCount;

// Unroll the loop.		// Unroll the loop.
if (!UnrollLoop(L, Count, TripCount, AllowRuntime, UP.AllowExpensiveTripCount,		if (!UnrollLoop(L, UP.Count, TripCount, UP.Force, UP.Runtime,
TripMultiple, LI, SE, &DT, &AC, PreserveLCSSA))		UP.AllowExpensiveTripCount, TripMultiple, LI, SE, &DT, &AC,
		PreserveLCSSA))
return false;		return false;

// If loop has an unroll count pragma mark loop as unrolled to prevent		// If loop has an unroll count pragma or unrolled by explicitly set count
// unrolling beyond that requested by the pragma.		// mark loop as unrolled to prevent unrolling beyond that requested.
if (HasPragma && PragmaCount != 0)		if (IsCountSetExplicitly)
SetLoopAlreadyUnrolled(L);		SetLoopAlreadyUnrolled(L);
return true;		return true;
}		}

namespace {		namespace {
class LoopUnroll : public LoopPass {		class LoopUnroll : public LoopPass {
public:		public:
static char ID; // Pass ID, replacement for typeid		static char ID; // Pass ID, replacement for typeid
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
/// iterations before branching into the unrolled loop. UnrollLoop will not		/// iterations before branching into the unrolled loop. UnrollLoop will not
/// runtime-unroll the loop if computing RuntimeTripCount will be expensive and		/// runtime-unroll the loop if computing RuntimeTripCount will be expensive and
/// AllowExpensiveTripCount is false.		/// AllowExpensiveTripCount is false.
///		///
/// The LoopInfo Analysis that is passed will be kept consistent.		/// The LoopInfo Analysis that is passed will be kept consistent.
///		///
/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and		/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and
/// DominatorTree if they are non-null.		/// DominatorTree if they are non-null.
bool llvm::UnrollLoop(Loop *L, unsigned Count, unsigned TripCount,		bool llvm::UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,
		mzolotukhinUnsubmitted Not Done Reply Inline Actions This change might also be in a separate patch, right? mzolotukhin: This change might also be in a separate patch, right?
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Yes. But I'd like to keep it in this patch. That is related to current "-unroll-count" and "#pragma unroll" behavior. Now: "-unroll-count" forces "full unroll", "partial" and skips "runtime", so go directly to forced unroll when TripCount is runtime. "#pragma unroll" forces "full unroll", "partial" and "runtime" and skip forced unroll even if runtime has failed So to make current behavior clear I'd like to keep this. However you are right this could be done in separate patch. evstupac: Yes. But I'd like to keep it in this patch. That is related to current "-unroll-count" and…
bool AllowRuntime, bool AllowExpensiveTripCount,		bool AllowRuntime, bool AllowExpensiveTripCount,
unsigned TripMultiple, LoopInfo LI, ScalarEvolution SE,		unsigned TripMultiple, LoopInfo LI, ScalarEvolution SE,
DominatorTree DT, AssumptionCache AC,		DominatorTree DT, AssumptionCache AC,
bool PreserveLCSSA) {		bool PreserveLCSSA) {
BasicBlock *Preheader = L->getLoopPreheader();		BasicBlock *Preheader = L->getLoopPreheader();
if (!Preheader) {		if (!Preheader) {
DEBUG(dbgs() << " Can't unroll; loop preheader-insertion failed.\n");		DEBUG(dbgs() << " Can't unroll; loop preheader-insertion failed.\n");
return false;		return false;
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	DEBUG(
"operation.");		"operation.");
});		});
// Don't output the runtime loop remainder if Count is a multiple of		// Don't output the runtime loop remainder if Count is a multiple of
// TripMultiple. Such a remainder is never needed, and is unsafe if the loop		// TripMultiple. Such a remainder is never needed, and is unsafe if the loop
// contains a convergent instruction.		// contains a convergent instruction.
if (RuntimeTripCount && TripMultiple % Count != 0 &&		if (RuntimeTripCount && TripMultiple % Count != 0 &&
!UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,		!UnrollRuntimeLoopRemainder(L, Count, AllowExpensiveTripCount,
UnrollRuntimeEpilog, LI, SE, DT,		UnrollRuntimeEpilog, LI, SE, DT,
PreserveLCSSA))		PreserveLCSSA)) {
		if (Force)
		RuntimeTripCount = false;
		zzhengUnsubmitted Not Done Reply Inline Actions I don't understand this part. If UnrollRuntimeLoopRemainder() returned false, remainder loop is not generated. How do we ensure correctness if we 'force' a loop that has runtime tripcount of 6 to by unrolled by 4? zzheng: I don't understand this part. If UnrollRuntimeLoopRemainder() returned false, remainder loop is…
		evstupacAuthorUnsubmitted Not Done Reply Inline Actions Actually that is the way how -unroll-count works now when runtime unroll is disabled. For this type of unrolling we do not remove conditional branches from unrolled loop. For example unroll by 2: for.body ... cmp br for.body1, exit for.body1 ... cmp br for.body, exit exit My changes do not change current behavior. In my next patch I'll request runtime unroll to be enabled when -unroll-count passed. There could be positive effects of this type of unrolling (like reduced number of executed backward branches). So if user wants a loop to be unrolled compiler should do this (when it is safe). evstupac: Actually that is the way how -unroll-count works now when runtime unroll is disabled. For this…
		else
return false;		return false;
		}

// Notify ScalarEvolution that the loop will be substantially changed,		// Notify ScalarEvolution that the loop will be substantially changed,
// if not outright eliminated.		// if not outright eliminated.
if (SE)		if (SE)
SE->forgetLoop(L);		SE->forgetLoop(L);

// If we know the trip count, we know the multiple...		// If we know the trip count, we know the multiple...
unsigned BreakoutTrip = 0;		unsigned BreakoutTrip = 0;
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll

	; RUN: opt < %s -S -unroll-threshold=20 -loop-unroll -unroll-allow-partial -unroll-runtime \| FileCheck %s			; RUN: opt < %s -S -unroll-threshold=20 -loop-unroll -unroll-allow-partial -unroll-allow-remainder \| FileCheck %s

	; The Loop TripCount is 9. However unroll factors 3 or 9 exceed given threshold.			; The Loop TripCount is 9. However unroll factors 3 or 9 exceed given threshold.
	; The test checks that we choose a smaller, power-of-two, unroll count and do not give up on unrolling.			; The test checks that we choose a smaller, power-of-two, unroll count and do not give up on unrolling.

	; CHECK: for.body:			; CHECK: for.body:
	; CHECK: store			; CHECK: store
	; CHECK: for.body.1:			; CHECK: for.body.1:
	; CHECK: store			; CHECK: store
	Show All 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Unroll pass restructure.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 56357

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Transforms/Utils/UnrollLoop.h

lib/Transforms/Scalar/LoopUnrollPass.cpp

lib/Transforms/Utils/LoopUnroll.cpp

test/Transforms/LoopUnroll/partial-unroll-const-bounds.ll

Unroll pass restructure.
ClosedPublic