This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
TargetTransformInfo.h
-
CodeGen/
-
BasicTTIImpl.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
LoopUnrollPass.h
-
Utils/
-
UnrollLoop.h
-
lib/
-
Analysis/
2/2
ScalarEvolution.cpp
-
Transforms/
-
Scalar/
11/11
LoopUnrollPass.cpp
-
Utils/
2/5
LoopUnroll.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
2
tti-unroll-prefs.ll

Differential D24790

[LoopUnroll] Use the upper bound of the loop trip count to completely unroll loops
ClosedPublic

Authored by haicheng on Sep 20 2016, 9:52 PM.

Download Raw Diff

Details

Reviewers

mzolotukhin
• tstellarAMD
mssimpso
mcrosier

Commits

rG6cac34fd41e8: [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop
rL284044: [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop

Summary

This patch tries to fully unroll loops having break statement like this

for (int i = 0; i < 8; i++) {                                                                     
    if (a[i] == value) {                                                                               
        found = true; 
        break;
    }                                                                                             
}

GCC can fully unroll such loops, but currently LLVM cannot because LLVM only supports loops having exact constant trip counts.

The upper bound of the trip count can be obtained from calling ScalarEvolution::getMaxBackedgeTakenCount(). Part of the patch is the refactoring work in SCEV to prevent duplicating code.

The feature of using the upper bound is enabled under the same circumstance when runtime unrolling is enabled since both are used to unroll loops without knowing the exact constant trip count.

The modified test/CodeGen/AMDGPU/tti-unroll-prefs.ll can be used as the test case of this patch.

Diff Detail

Repository: rL LLVM

Event Timeline

haicheng updated this revision to Diff 72006.Sep 20 2016, 9:52 PM

haicheng retitled this revision from to [LoopUnroll] Use the upper bound of the loop trip count to completely unroll loops.

haicheng updated this object.

haicheng added reviewers: mzolotukhin, mcrosier, mssimpso.

haicheng set the repository for this revision to rL LLVM.

haicheng added a subscriber: llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptSep 20 2016, 9:52 PM

Herald added subscribers: nhaehnle, mzolotukhin, mcrosier, sanjoy. · View Herald Transcript

Hi,

This makes total sense to me. But I think there is no need in introducing yet another knob in loop-unroller: have you measured the effect of the change on benchmarks (LLVM testsuite/SPEC/others)? I think it should be correct to unroll a loop with a small trip count even if it's not exactly known: the effect of this transformation would be not worse then from unrolling a similar loop with a constant, but equal to upper bound, trip-count. The benchmark results would help confirming or rejecting this assumption.

Thanks,
Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
934	Unnecessary change.

ABataev added a subscriber: ABataev.Sep 23 2016, 1:23 AM

ABataev added inline comments.

lib/Analysis/ScalarEvolution.cpp
5349	const auto *MaxExitCount
lib/Transforms/Utils/LoopUnroll.cpp
543	Maybe just `NeedConditional = (UseUpperBound && j)`?

In D24790#550261, @mzolotukhin wrote:

Hi,

This makes total sense to me. But I think there is no need in introducing yet another knob in loop-unroller: have you measured the effect of the change on benchmarks (LLVM testsuite/SPEC/others)? I think it should be correct to unroll a loop with a small trip count even if it's not exactly known: the effect of this transformation would be not worse then from unrolling a similar loop with a constant, but equal to upper bound, trip-count. The benchmark results would help confirming or rejecting this assumption.

Thanks,
Michael

Thank you for reviewing my patch, Michael.

My initial design was to completely unroll all loops having small trip count upper bounds whenever the exact trip counts are unknown, but I saw several regressions in spec200x and internal benchmarks (e.g. spec2000/eon -1.8%, spec2000/gcc -1.2%) running on a AArch64 device. One of the major reason was that I unrolled many loops with calls. As you may already know, the cost model of call is not that awesome. BasicTTIImplBase::getUnrollingPreferences() can help me check the call IRs in the loop and I need a boolean anyway to pass the check result so that I create a new entry in UnrollingPreferences.

If using exact trip count to unroll, the unrolled loop usually becomes a giant basic block which is preferable. However, if using the upper bound to unroll, the unrolled loop usually become a sequence of small basic blocks because it is not safe to merge loop blocks belonging to different iterations. Some of these blocks may not be executed during runtime. This is another reason that I think we may need to be more conservative to use upper bound to unroll loops.

I tried several different configurations and the patch I uploaded was the best I found. No noticeable regressions in the benchmarks I tested and there are several small wins in spec200x (e.g. spec2000/perlbmk +0.8%, spec2006/xalancbmk +0.7%) and many larger wins in our internal benchmarks which are much larger than spec2006.

Haicheng

Address the comments from Michael and Alexey. Thank you very much.

Herald added a subscriber: wdng. · View Herald TranscriptSep 23 2016, 9:04 AM

Hi Haicheng,

Thanks for working on this, please find my answers below and some more remarks/nit-picks inline.

One of the major reason was that I unrolled many loops with calls. As you may already know, the cost model of call is not that awesome.

Maybe we just fix the cost model for calls instead :-) But I know, it might not be actually possible at all.

If using exact trip count to unroll, the unrolled loop usually becomes a giant basic block which is preferable. However, if using the upper bound to unroll, the unrolled loop usually become a sequence of small basic blocks because it is not safe to merge loop blocks belonging to different iterations. Some of these blocks may not be executed during runtime. This is another reason that I think we may need to be more conservative to use upper bound to unroll loops.

I see, this makes perfect sense to me. Indeed, having separate thresholds might be reasonable.

I tried several different configurations and the patch I uploaded was the best I found.

Have you tested it on any other architecture except AArch64?

Thanks,
Michael

lib/Analysis/ScalarEvolution.cpp
5316	The first letter shouldn't be capitalized. I think the routine name also might be improved, but I don't have any better suggestions at the moment.
lib/Transforms/Scalar/LoopUnrollPass.cpp
766	s/make/makes/
767	s/staticaly/statically/
771	This gets confusing. Can we somehow rename these variables so that it's clearer what's the difference between them?
779	What if `TripCount` matches `MaxTripCount`? Will we think that we're using upper bound instead of the exact trip count in this case?
1007	I'm not a big fan of reusing this variable. Initially it was supposed to come from `UP` and show if we're allowed to use upper-bound instead of exactly-known trip-count, but now we're also using it to communicate between various routines (and the interface was not specified). Can it be refactored somehow?
lib/Transforms/Utils/LoopUnroll.cpp
206	Please add a comment about `UseUpperBound`. Actually, I don't think that's a good name for this argument, because this functions doesn't use upper bound for anything. What it needs is a flag indicating that conditional branches must be preserved - I'd suggest to reflect that in the argument name.
543	Is it intentional that we can make `NeedConditional` true after it was set to false before that? From semantics it looks like this never happens (in this case we can assert it), but I'd like to make sure it was not overlooked. Did you intend to replace this `if` with `else if` as well?
test/CodeGen/AMDGPU/tti-unroll-prefs.ll
14–16	We could've just passed `-unroll-threshold` to overcome this. It might make sense to do it even now, so that the test doesn't break if UnrollPreferences are changed.

Address Michael's comments. Thank you.

Herald edited edge metadata. · View Herald TranscriptOct 3 2016, 7:34 AM

haicheng added inline comments.Oct 3 2016, 7:58 AM

lib/Transforms/Scalar/LoopUnrollPass.cpp
779	`TripCount` and `MaxTripCount` cannot both be non zero because `MaxTripCount` is computed only if `TripCount` is Zero. If one is non zero, the other one must be zero. If they are both zero, `FullUnrollTripCount` is zero and then we cannot enter here.
lib/Transforms/Utils/LoopUnroll.cpp
543	I intended to replace `if` with `else if`. Using complete unroll should not enter the `else if` part. I think the change improves the readability.
test/CodeGen/AMDGPU/tti-unroll-prefs.ll
14–16	This loop has a break statement so that we cannot get the exact trip cont. I think we cannot fully unroll the loop without my change. Passing `-unroll-threshold` cannot overcome this.

haicheng added inline comments.Oct 3 2016, 9:51 AM

lib/Transforms/Scalar/LoopUnrollPass.cpp
779	I also added a sentence in the comment to describe this.
lib/Transforms/Utils/LoopUnroll.cpp
543	Without the change of `else if`, the logic of my patch is also wrong.

Hi Michael,

Please see my inlined response. Thank you.

In D24790#554730, @mzolotukhin wrote:

Hi Haicheng,

Thanks for working on this, please find my answers below and some more remarks/nit-picks inline.

One of the major reason was that I unrolled many loops with calls. As you may already know, the cost model of call is not that awesome.

Maybe we just fix the cost model for calls instead :-) But I know, it might not be actually possible at all.

It should be fixed, but I have no clue how to do that now. Can I leave it for the future?

If using exact trip count to unroll, the unrolled loop usually becomes a giant basic block which is preferable. However, if using the upper bound to unroll, the unrolled loop usually become a sequence of small basic blocks because it is not safe to merge loop blocks belonging to different iterations. Some of these blocks may not be executed during runtime. This is another reason that I think we may need to be more conservative to use upper bound to unroll loops.

I see, this makes perfect sense to me. Indeed, having separate thresholds might be reasonable.

I tried threshold 100 for using upper bound to unroll instead of the current default threshold 150, but I saw some regressions.

I have some other random thoughts about unroll threshold that not directly related to this patch and not tested with any benchmarks yet. I think we may want to encourage the unroller to unroll loops with larger trip count to reduce more loop overhead. For example, if we have a loop whose size is 75 and can be unrolled twice and we have another loop whose size is 21 but can be unrolled 8 times, we may prefer unrolling the latter more. It occurs to me because I noticed that GCC unrolls more small loops with high trip count than LLVM does and LLVM unrolls more large loops with small trip count than GCC does. Maybe we can use some equations like 100+10*trip_count to calculate the threshold. What do you think?

I tried several different configurations and the patch I uploaded was the best I found.

Have you tested it on any other architecture except AArch64?

I ran spec2000/2006 on x86 last week, no noticeable regressions. Some small improvements: spec2000/bzip2 +0.7, spec2006/dealII +2.1% gcc +0.8%.

Thanks,
Michael

Hi Michael,

Would you please take another look at my modified patch? Thank you in advance.

Haicheng

Hi Haicheng,

Please find a couple of minor comments inline.

Thanks,
Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
95–102	Do we really need two options here? Can the first one be replaced with `(UnrollMaxUpperBound == 0)`?
779	Could you add an assert for this please?

Address Michael's comments. Thank you.

Hi Haicheng,

The patch looks mostly good to me modulo a couple of small remarks, please feel free to commit once they are addressed.

Thanks,
Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
774–775	The assert might fail to catch a bug if the multiplication result overflows and wraps around to 0.
test/Transforms/LoopUnroll/AArch64/full-unroll-trip-count-upper-bound.ll
1–2 ↗	(On Diff #74193)	Shouldn't we have `-unroll-upper` here? Actually, I think we need to check that a) a loop is not unrolled without this flag, and b) it's unrolled with it. Also, what happened to the changes in the other (existing) test?

This revision is now accepted and ready to land.Oct 10 2016, 6:38 PM

Addressed more comments from Michael. I will commit tomorrow morning if no new comments coming today.

Closed by commit rL284044: [LoopUnroll] Use the upper bound of the loop trip count to fullly unroll a loop (authored by haicheng). · Explain WhyOct 12 2016, 1:33 PM

This revision was automatically updated to reflect the committed changes.

evstupac added a subscriber: evstupac.Nov 14 2016, 6:47 PM

evstupac added inline comments.

llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp
1027 ↗	(On Diff #74432)	Sorry for comment after approve. What I see here is that TripCount could implicitly (as TripCount was passed to computeUnrollCount as a reference) become MaxTripCount. Here it is passed to UnrollLoop as TripCount. Could you please add a comment to UnrollLoop that TripCount parameter is not always a real loop trip count, but sometimes MaxTripCount. Or rename TripCount to MaxTripCount in UnrollLoop parameters. There could be future errors if someone assumes that TripCount is exact loop trip count.

haicheng added inline comments.Nov 14 2016, 9:49 PM

llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp
1027 ↗	(On Diff #74432)	I will modify the comments in a separate patch to make it clearer. However, you may need to know that even the trip counter is calculated by SCEV::getSmallConstantTripCount(), it may not be the exact times the loop header get executed. Please see the comments before llvm::UnrollLoop() in LoopUnroll.cpp.

Revision Contents

Path

Size

include/

llvm/

Analysis/

ScalarEvolution.h

5 lines

TargetTransformInfo.h

2 lines

CodeGen/

BasicTTIImpl.h

3 lines

Transforms/

Scalar.h

5 lines

Scalar/

LoopUnrollPass.h

1 line

Utils/

UnrollLoop.h

4 lines

lib/

Analysis/

ScalarEvolution.cpp

30 lines

Transforms/

Scalar/

LoopUnrollPass.cpp

90 lines

Utils/

LoopUnroll.cpp

20 lines

test/

CodeGen/

AMDGPU/

tti-unroll-prefs.ll

9 lines

Diff 72293

include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 1,386 Lines • ▼ Show 20 Lines	public:
/// value. Returns 0 if the trip count is unknown or not constant. This		/// value. Returns 0 if the trip count is unknown or not constant. This
/// "trip count" assumes that control exits via ExitingBlock. More		/// "trip count" assumes that control exits via ExitingBlock. More
/// precisely, it is the number of times that control may reach ExitingBlock		/// precisely, it is the number of times that control may reach ExitingBlock
/// before taking the branch. For loops with multiple exits, it may not be		/// before taking the branch. For loops with multiple exits, it may not be
/// the number times that the loop header executes if the loop exits		/// the number times that the loop header executes if the loop exits
/// prematurely via another branch.		/// prematurely via another branch.
unsigned getSmallConstantTripCount(Loop L, BasicBlock ExitingBlock);		unsigned getSmallConstantTripCount(Loop L, BasicBlock ExitingBlock);

		/// Returns the upper bound of the loop trip count as a normal unsigned
		/// value.
		/// Returns 0 if the trip count is unknown or not constant.
		unsigned getSmallConstantMaxTripCount(Loop *L);

/// Returns the largest constant divisor of the trip count of the		/// Returns the largest constant divisor of the trip count of the
/// loop if it is a single-exit loop and we can compute a small maximum for		/// loop if it is a single-exit loop and we can compute a small maximum for
/// that loop.		/// that loop.
///		///
/// Implemented in terms of the \c getSmallConstantTripMultiple overload with		/// Implemented in terms of the \c getSmallConstantTripMultiple overload with
/// the single exiting block passed to it. See that routine for details.		/// the single exiting block passed to it. See that routine for details.
unsigned getSmallConstantTripMultiple(Loop *L);		unsigned getSmallConstantTripMultiple(Loop *L);

▲ Show 20 Lines • Show All 432 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	struct UnrollingPreferences {
/// Allow generation of a loop remainder (extra iterations after unroll).		/// Allow generation of a loop remainder (extra iterations after unroll).
bool AllowRemainder;		bool AllowRemainder;
/// Allow emitting expensive instructions (such as divisions) when computing		/// Allow emitting expensive instructions (such as divisions) when computing
/// the trip count of a loop for runtime unrolling.		/// the trip count of a loop for runtime unrolling.
bool AllowExpensiveTripCount;		bool AllowExpensiveTripCount;
/// Apply loop unroll on any kind of loop		/// Apply loop unroll on any kind of loop
/// (mainly to loops that fail runtime unrolling).		/// (mainly to loops that fail runtime unrolling).
bool Force;		bool Force;
		/// Allow using trip count upper bound to unroll loops.
		bool UpperBound;
};		};

/// \brief Get target-customized preferences for the generic loop unrolling		/// \brief Get target-customized preferences for the generic loop unrolling
/// transformation. The caller will initialize UP with the current		/// transformation. The caller will initialize UP with the current
/// target-independent defaults.		/// target-independent defaults.
void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) const;		void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) const;

/// @}		/// @}
▲ Show 20 Lines • Show All 797 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	for (Loop::block_iterator I = L->block_begin(), E = L->block_end(); I != E;
continue;		continue;
}		}

return;		return;
}		}
}		}

// Enable runtime and partial unrolling up to the specified size.		// Enable runtime and partial unrolling up to the specified size.
UP.Partial = UP.Runtime = true;		// Enable using trip count upper bound to unroll loops.
		UP.Partial = UP.Runtime = UP.UpperBound = true;
UP.PartialThreshold = MaxOps;		UP.PartialThreshold = MaxOps;

// Avoid unrolling when optimizing for size.		// Avoid unrolling when optimizing for size.
UP.OptSizeThreshold = 0;		UP.OptSizeThreshold = 0;
UP.PartialOptSizeThreshold = 0;		UP.PartialOptSizeThreshold = 0;
}		}

/// @}		/// @}
▲ Show 20 Lines • Show All 679 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	//			//
	Pass *createLoopInstSimplifyPass();			Pass *createLoopInstSimplifyPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopUnroll - This pass is a simple loop unrolling pass.			// LoopUnroll - This pass is a simple loop unrolling pass.
	//			//
	Pass *createLoopUnrollPass(int Threshold = -1, int Count = -1,			Pass *createLoopUnrollPass(int Threshold = -1, int Count = -1,
	int AllowPartial = -1, int Runtime = -1);			int AllowPartial = -1, int Runtime = -1,
	// Create an unrolling pass for full unrolling only.			int UpperBound = -1);
				// Create an unrolling pass for full unrolling that uses exact trip count only.
	Pass *createSimpleLoopUnrollPass();			Pass *createSimpleLoopUnrollPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopReroll - This pass is a simple loop rerolling pass.			// LoopReroll - This pass is a simple loop rerolling pass.
	//			//
	Pass *createLoopRerollPass();			Pass *createLoopRerollPass();

	▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar/LoopUnrollPass.h

	Show All 15 Lines

	namespace llvm {			namespace llvm {

	struct LoopUnrollPass : public PassInfoMixin<LoopUnrollPass> {			struct LoopUnrollPass : public PassInfoMixin<LoopUnrollPass> {
	Optional<unsigned> ProvidedCount;			Optional<unsigned> ProvidedCount;
	Optional<unsigned> ProvidedThreshold;			Optional<unsigned> ProvidedThreshold;
	Optional<bool> ProvidedAllowPartial;			Optional<bool> ProvidedAllowPartial;
	Optional<bool> ProvidedRuntime;			Optional<bool> ProvidedRuntime;
				Optional<bool> ProvidedUpperBound;

	PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM);			PreservedAnalyses run(Loop &L, LoopAnalysisManager &AM);
	};			};
	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_SCALAR_LOOPUNROLLPASS_H			#endif // LLVM_TRANSFORMS_SCALAR_LOOPUNROLLPASS_H

include/llvm/Transforms/Utils/UnrollLoop.h

	Show All 26 Lines
	class LPPassManager;			class LPPassManager;
	class MDNode;			class MDNode;
	class Pass;			class Pass;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class ScalarEvolution;			class ScalarEvolution;

	bool UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,			bool UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,
	bool AllowRuntime, bool AllowExpensiveTripCount,			bool AllowRuntime, bool AllowExpensiveTripCount,
	unsigned TripMultiple, LoopInfo LI, ScalarEvolution SE,			bool UseUpperBound, unsigned TripMultiple, LoopInfo *LI,
	DominatorTree DT, AssumptionCache AC,			ScalarEvolution SE, DominatorTree DT, AssumptionCache *AC,
	OptimizationRemarkEmitter *ORE, bool PreserveLCSSA);			OptimizationRemarkEmitter *ORE, bool PreserveLCSSA);

	bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,			bool UnrollRuntimeLoopRemainder(Loop *L, unsigned Count,
	bool AllowExpensiveTripCount,			bool AllowExpensiveTripCount,
	bool UseEpilogRemainder, LoopInfo *LI,			bool UseEpilogRemainder, LoopInfo *LI,
	ScalarEvolution SE, DominatorTree DT,			ScalarEvolution SE, DominatorTree DT,
	bool PreserveLCSSA);			bool PreserveLCSSA);

	MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);			MDNode GetUnrollMetadata(MDNode LoopID, StringRef Name);
	}			}

	#endif			#endif

lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 5,307 Lines • ▼ Show 20 Lines
	}			}



	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Iteration Count Computation Code			// Iteration Count Computation Code
	//			//

				static unsigned TranslateToConstantTripCount(const SCEVConstant *ExitCount) {
				mzolotukhinUnsubmitted Done Reply Inline Actions The first letter shouldn't be capitalized. I think the routine name also might be improved, but I don't have any better suggestions at the moment. mzolotukhin: The first letter shouldn't be capitalized. I think the routine name also might be improved, but…
				if (!ExitCount)
				return 0;

				ConstantInt *ExitConst = ExitCount->getValue();

				// Guard against huge trip counts.
				if (ExitConst->getValue().getActiveBits() > 32)
				return 0;

				// In case of integer overflow, this returns 0, which is correct.
				return ((unsigned)ExitConst->getZExtValue()) + 1;
				}

	unsigned ScalarEvolution::getSmallConstantTripCount(Loop *L) {			unsigned ScalarEvolution::getSmallConstantTripCount(Loop *L) {
	if (BasicBlock *ExitingBB = L->getExitingBlock())			if (BasicBlock *ExitingBB = L->getExitingBlock())
	return getSmallConstantTripCount(L, ExitingBB);			return getSmallConstantTripCount(L, ExitingBB);

	// No trip count information for multiple exits.			// No trip count information for multiple exits.
	return 0;			return 0;
	}			}

	unsigned ScalarEvolution::getSmallConstantTripCount(Loop *L,			unsigned ScalarEvolution::getSmallConstantTripCount(Loop *L,
	BasicBlock *ExitingBlock) {			BasicBlock *ExitingBlock) {
	assert(ExitingBlock && "Must pass a non-null exiting block!");			assert(ExitingBlock && "Must pass a non-null exiting block!");
	assert(L->isLoopExiting(ExitingBlock) &&			assert(L->isLoopExiting(ExitingBlock) &&
	"Exiting block must actually branch out of the loop!");			"Exiting block must actually branch out of the loop!");
	const SCEVConstant *ExitCount =			const SCEVConstant *ExitCount =
	dyn_cast<SCEVConstant>(getExitCount(L, ExitingBlock));			dyn_cast<SCEVConstant>(getExitCount(L, ExitingBlock));
	if (!ExitCount)			return TranslateToConstantTripCount(ExitCount);
	return 0;			}

	ConstantInt *ExitConst = ExitCount->getValue();

	// Guard against huge trip counts.
	if (ExitConst->getValue().getActiveBits() > 32)
	return 0;

	// In case of integer overflow, this returns 0, which is correct.			unsigned ScalarEvolution::getSmallConstantMaxTripCount(Loop *L) {
	return ((unsigned)ExitConst->getZExtValue()) + 1;			const auto *MaxExitCount =
				ABataevUnsubmitted Done Reply Inline Actions const auto MaxExitCount ABataev:* const auto *MaxExitCount
				dyn_cast<SCEVConstant>(getMaxBackedgeTakenCount(L));
				return TranslateToConstantTripCount(MaxExitCount);
	}			}

	unsigned ScalarEvolution::getSmallConstantTripMultiple(Loop *L) {			unsigned ScalarEvolution::getSmallConstantTripMultiple(Loop *L) {
	if (BasicBlock *ExitingBB = L->getExitingBlock())			if (BasicBlock *ExitingBB = L->getExitingBlock())
	return getSmallConstantTripMultiple(L, ExitingBB);			return getSmallConstantTripMultiple(L, ExitingBB);

	// No trip multiple information for multiple exits.			// No trip multiple information for multiple exits.
	return 0;			return 0;
	▲ Show 20 Lines • Show All 5,238 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	static cl::opt<bool> UnrollAllowRemainder(
"unroll-allow-remainder", cl::Hidden,		"unroll-allow-remainder", cl::Hidden,
cl::desc("Allow generation of a loop remainder (extra iterations) "		cl::desc("Allow generation of a loop remainder (extra iterations) "
"when unrolling a loop."));		"when unrolling a loop."));

static cl::opt<bool>		static cl::opt<bool>
UnrollRuntime("unroll-runtime", cl::ZeroOrMore, cl::Hidden,		UnrollRuntime("unroll-runtime", cl::ZeroOrMore, cl::Hidden,
cl::desc("Unroll loops with run-time trip counts"));		cl::desc("Unroll loops with run-time trip counts"));

		static cl::opt<bool> UnrollUpperBound(
		"unroll-upperbound", cl::ZeroOrMore, cl::Hidden,
		cl::desc("Unroll loops with the upper bounds of the trip counts"));

		static cl::opt<unsigned> UnrollMaxUpperBound(
		"unroll-max-upperbound", cl::init(8), cl::Hidden,
		cl::desc(
		"The max of trip count upper bound that is considered in unrolling"));
		mzolotukhinUnsubmitted Done Reply Inline Actions Do we really need two options here? Can the first one be replaced with `(UnrollMaxUpperBound == 0)`? mzolotukhin: Do we really need two options here? Can the first one be replaced with `(UnrollMaxUpperBound ==…

static cl::opt<unsigned> PragmaUnrollThreshold(		static cl::opt<unsigned> PragmaUnrollThreshold(
"pragma-unroll-threshold", cl::init(16 * 1024), cl::Hidden,		"pragma-unroll-threshold", cl::init(16 * 1024), cl::Hidden,
cl::desc("Unrolled size limit for loops with an unroll(full) or "		cl::desc("Unrolled size limit for loops with an unroll(full) or "
"unroll_count pragma."));		"unroll_count pragma."));

/// A magic value for use with the Threshold parameter to indicate		/// A magic value for use with the Threshold parameter to indicate
/// that the loop unroll should be performed regardless of how much		/// that the loop unroll should be performed regardless of how much
/// code expansion would result.		/// code expansion would result.
static const unsigned NoThreshold = UINT_MAX;		static const unsigned NoThreshold = UINT_MAX;

/// Default unroll count for loops with run-time trip count if		/// Default unroll count for loops with run-time trip count if
/// -unroll-count is not set		/// -unroll-count is not set
static const unsigned DefaultUnrollRuntimeCount = 8;		static const unsigned DefaultUnrollRuntimeCount = 8;

/// Gather the various unrolling parameters based on the defaults, compiler		/// Gather the various unrolling parameters based on the defaults, compiler
/// flags, TTI overrides and user specified parameters.		/// flags, TTI overrides and user specified parameters.
static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(		static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(
Loop *L, const TargetTransformInfo &TTI, Optional<unsigned> UserThreshold,		Loop *L, const TargetTransformInfo &TTI, Optional<unsigned> UserThreshold,
Optional<unsigned> UserCount, Optional<bool> UserAllowPartial,		Optional<unsigned> UserCount, Optional<bool> UserAllowPartial,
Optional<bool> UserRuntime) {		Optional<bool> UserRuntime, Optional<bool> UserUpperBound) {
TargetTransformInfo::UnrollingPreferences UP;		TargetTransformInfo::UnrollingPreferences UP;

// Set up the defaults		// Set up the defaults
UP.Threshold = 150;		UP.Threshold = 150;
UP.PercentDynamicCostSavedThreshold = 50;		UP.PercentDynamicCostSavedThreshold = 50;
UP.DynamicCostSavingsDiscount = 100;		UP.DynamicCostSavingsDiscount = 100;
UP.OptSizeThreshold = 0;		UP.OptSizeThreshold = 0;
UP.PartialThreshold = UP.Threshold;		UP.PartialThreshold = UP.Threshold;
UP.PartialOptSizeThreshold = 0;		UP.PartialOptSizeThreshold = 0;
UP.Count = 0;		UP.Count = 0;
UP.MaxCount = UINT_MAX;		UP.MaxCount = UINT_MAX;
UP.FullUnrollMaxCount = UINT_MAX;		UP.FullUnrollMaxCount = UINT_MAX;
UP.Partial = false;		UP.Partial = false;
UP.Runtime = false;		UP.Runtime = false;
UP.AllowRemainder = true;		UP.AllowRemainder = true;
UP.AllowExpensiveTripCount = false;		UP.AllowExpensiveTripCount = false;
UP.Force = false;		UP.Force = false;
		UP.UpperBound = false;

// Override with any target specific settings		// Override with any target specific settings
TTI.getUnrollingPreferences(L, UP);		TTI.getUnrollingPreferences(L, UP);

// Apply size attributes		// Apply size attributes
if (L->getHeader()->getParent()->optForSize()) {		if (L->getHeader()->getParent()->optForSize()) {
UP.Threshold = UP.OptSizeThreshold;		UP.Threshold = UP.OptSizeThreshold;
UP.PartialThreshold = UP.PartialOptSizeThreshold;		UP.PartialThreshold = UP.PartialOptSizeThreshold;
Show All 14 Lines	static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(
if (UnrollFullMaxCount.getNumOccurrences() > 0)		if (UnrollFullMaxCount.getNumOccurrences() > 0)
UP.FullUnrollMaxCount = UnrollFullMaxCount;		UP.FullUnrollMaxCount = UnrollFullMaxCount;
if (UnrollAllowPartial.getNumOccurrences() > 0)		if (UnrollAllowPartial.getNumOccurrences() > 0)
UP.Partial = UnrollAllowPartial;		UP.Partial = UnrollAllowPartial;
if (UnrollAllowRemainder.getNumOccurrences() > 0)		if (UnrollAllowRemainder.getNumOccurrences() > 0)
UP.AllowRemainder = UnrollAllowRemainder;		UP.AllowRemainder = UnrollAllowRemainder;
if (UnrollRuntime.getNumOccurrences() > 0)		if (UnrollRuntime.getNumOccurrences() > 0)
UP.Runtime = UnrollRuntime;		UP.Runtime = UnrollRuntime;
		if (UnrollUpperBound.getNumOccurrences() > 0)
		UP.UpperBound = UnrollUpperBound;

// Apply user values provided by argument		// Apply user values provided by argument
if (UserThreshold.hasValue()) {		if (UserThreshold.hasValue()) {
UP.Threshold = *UserThreshold;		UP.Threshold = *UserThreshold;
UP.PartialThreshold = *UserThreshold;		UP.PartialThreshold = *UserThreshold;
}		}
if (UserCount.hasValue())		if (UserCount.hasValue())
UP.Count = *UserCount;		UP.Count = *UserCount;
if (UserAllowPartial.hasValue())		if (UserAllowPartial.hasValue())
UP.Partial = *UserAllowPartial;		UP.Partial = *UserAllowPartial;
if (UserRuntime.hasValue())		if (UserRuntime.hasValue())
UP.Runtime = *UserRuntime;		UP.Runtime = *UserRuntime;
		if (UserUpperBound.hasValue())
		UP.UpperBound = *UserUpperBound;

return UP;		return UP;
}		}

namespace {		namespace {
/// A struct to densely store the state of an instruction after unrolling at		/// A struct to densely store the state of an instruction after unrolling at
/// each iteration.		/// each iteration.
///		///
▲ Show 20 Lines • Show All 508 Lines • ▼ Show 20 Lines
}		}

// Returns true if unroll count was set explicitly.		// Returns true if unroll count was set explicitly.
// Calculates unroll count and writes it to UP.Count.		// Calculates unroll count and writes it to UP.Count.
static bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI,		static bool computeUnrollCount(Loop *L, const TargetTransformInfo &TTI,
DominatorTree &DT, LoopInfo *LI,		DominatorTree &DT, LoopInfo *LI,
ScalarEvolution *SE,		ScalarEvolution *SE,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
unsigned TripCount, unsigned TripMultiple,		unsigned &TripCount, unsigned MaxTripCount,
unsigned LoopSize,		unsigned &TripMultiple, unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP) {		TargetTransformInfo::UnrollingPreferences &UP) {
// BEInsns represents number of instructions optimized when "back edge"		// BEInsns represents number of instructions optimized when "back edge"
// becomes "fall through" in unrolled loop.		// becomes "fall through" in unrolled loop.
// For now we count a conditional branch on a backedge and a comparison		// For now we count a conditional branch on a backedge and a comparison
// feeding it.		// feeding it.
unsigned BEInsns = 2;		unsigned BEInsns = 2;
// Check for explicit Count.		// Check for explicit Count.
// 1st priority is unroll count set by "unroll-count" option.		// 1st priority is unroll count set by "unroll-count" option.
Show All 36 Lines	if (ExplicitUnroll && TripCount != 0) {
// unrolling limits. Set thresholds to at least the PragmaThreshold value		// unrolling limits. Set thresholds to at least the PragmaThreshold value
// which is larger than the default limits.		// which is larger than the default limits.
UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);		UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);
UP.PartialThreshold =		UP.PartialThreshold =
std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);		std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);
}		}

// 3rd priority is full unroll count.		// 3rd priority is full unroll count.
// Full unroll make sense only when TripCount could be staticaly calculated.		// Full unroll make sense only when TripCount or its upper bound could be
		mzolotukhinUnsubmitted Done Reply Inline Actions s/make/makes/ mzolotukhin: s/make/makes/
		// staticaly calculated.
		mzolotukhinUnsubmitted Done Reply Inline Actions s/staticaly/statically/ mzolotukhin: s/staticaly/statically/
// Also we need to check if we exceed FullUnrollMaxCount.		// Also we need to check if we exceed FullUnrollMaxCount.
if (TripCount && TripCount <= UP.FullUnrollMaxCount) {		// If using the upper bound to unroll, TripMultiple should be set to 1 because
		// we do not know when loop may exit.
		unsigned FullUnrollTripCount = TripCount ? TripCount : MaxTripCount;
		mzolotukhinUnsubmitted Done Reply Inline Actions This gets confusing. Can we somehow rename these variables so that it's clearer what's the difference between them? mzolotukhin: This gets confusing. Can we somehow rename these variables so that it's clearer what's the…
		if (FullUnrollTripCount && FullUnrollTripCount <= UP.FullUnrollMaxCount) {
// When computing the unrolled size, note that BEInsns are not replicated		// When computing the unrolled size, note that BEInsns are not replicated
// like the rest of the loop body.		// like the rest of the loop body.
UnrolledSize = (uint64_t)(LoopSize - BEInsns) * TripCount + BEInsns;		UnrolledSize =
		mzolotukhinUnsubmitted Done Reply Inline Actions The assert might fail to catch a bug if the multiplication result overflows and wraps around to 0. mzolotukhin: The assert might fail to catch a bug if the multiplication result overflows and wraps around to…
		(uint64_t)(LoopSize - BEInsns) * FullUnrollTripCount + BEInsns;
if (canUnrollCompletely(L, UP.Threshold, 100, UP.DynamicCostSavingsDiscount,		if (canUnrollCompletely(L, UP.Threshold, 100, UP.DynamicCostSavingsDiscount,
UnrolledSize, UnrolledSize)) {		UnrolledSize, UnrolledSize)) {
		UP.UpperBound = (MaxTripCount == FullUnrollTripCount);
		mzolotukhinUnsubmitted Done Reply Inline Actions What if `TripCount` matches `MaxTripCount`? Will we think that we're using upper bound instead of the exact trip count in this case? mzolotukhin: What if `TripCount` matches `MaxTripCount`? Will we think that we're using upper bound instead…
		haichengAuthorUnsubmitted Done Reply Inline Actions `TripCount` and `MaxTripCount` cannot both be non zero because `MaxTripCount` is computed only if `TripCount` is Zero. If one is non zero, the other one must be zero. If they are both zero, `FullUnrollTripCount` is zero and then we cannot enter here. haicheng: `TripCount `and `MaxTripCount `cannot both be non zero because `MaxTripCount `is computed only…
		haichengAuthorUnsubmitted Done Reply Inline Actions I also added a sentence in the comment to describe this. haicheng: I also added a sentence in the comment to describe this.
		mzolotukhinUnsubmitted Done Reply Inline Actions Could you add an assert for this please? mzolotukhin: Could you add an assert for this please?
		TripCount = FullUnrollTripCount;
		TripMultiple = UP.UpperBound ? 1 : TripMultiple;
UP.Count = TripCount;		UP.Count = TripCount;
return ExplicitUnroll;		return ExplicitUnroll;
} else {		} else {
// The loop isn't that small, but we still can fully unroll it if that		// The loop isn't that small, but we still can fully unroll it if that
// helps to remove a significant number of instructions.		// helps to remove a significant number of instructions.
// To check that, run additional analysis on the loop.		// To check that, run additional analysis on the loop.
if (Optional<EstimatedUnrollCost> Cost = analyzeLoopUnrollCost(		if (Optional<EstimatedUnrollCost> Cost = analyzeLoopUnrollCost(
L, TripCount, DT, *SE, TTI,		L, FullUnrollTripCount, DT, *SE, TTI,
UP.Threshold + UP.DynamicCostSavingsDiscount))		UP.Threshold + UP.DynamicCostSavingsDiscount))
if (canUnrollCompletely(L, UP.Threshold,		if (canUnrollCompletely(L, UP.Threshold,
UP.PercentDynamicCostSavedThreshold,		UP.PercentDynamicCostSavedThreshold,
UP.DynamicCostSavingsDiscount,		UP.DynamicCostSavingsDiscount,
Cost->UnrolledCost, Cost->RolledDynamicCost)) {		Cost->UnrolledCost, Cost->RolledDynamicCost)) {
		UP.UpperBound = (MaxTripCount == FullUnrollTripCount);
		TripCount = FullUnrollTripCount;
		TripMultiple = UP.UpperBound ? 1 : TripMultiple;
UP.Count = TripCount;		UP.Count = TripCount;
return ExplicitUnroll;		return ExplicitUnroll;
}		}
}		}
}		}

// 4rd priority is partial unrolling.		// 4rd priority is partial unrolling.
// Try partial unroll only when TripCount could be staticaly calculated.		// Try partial unroll only when TripCount could be staticaly calculated.
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines

static bool tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI,		static bool tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI,
ScalarEvolution *SE, const TargetTransformInfo &TTI,		ScalarEvolution *SE, const TargetTransformInfo &TTI,
AssumptionCache &AC, OptimizationRemarkEmitter &ORE,		AssumptionCache &AC, OptimizationRemarkEmitter &ORE,
bool PreserveLCSSA,		bool PreserveLCSSA,
Optional<unsigned> ProvidedCount,		Optional<unsigned> ProvidedCount,
Optional<unsigned> ProvidedThreshold,		Optional<unsigned> ProvidedThreshold,
Optional<bool> ProvidedAllowPartial,		Optional<bool> ProvidedAllowPartial,
Optional<bool> ProvidedRuntime) {		Optional<bool> ProvidedRuntime,
		Optional<bool> ProvidedUpperBound) {
DEBUG(dbgs() << "Loop Unroll: F[" << L->getHeader()->getParent()->getName()		DEBUG(dbgs() << "Loop Unroll: F[" << L->getHeader()->getParent()->getName()
<< "] Loop %" << L->getHeader()->getName() << "\n");		<< "] Loop %" << L->getHeader()->getName() << "\n");
if (HasUnrollDisablePragma(L)) {		if (HasUnrollDisablePragma(L)) {
		mzolotukhinUnsubmitted Done Reply Inline Actions Unnecessary change. mzolotukhin: Unnecessary change.
return false;		return false;
}		}

unsigned NumInlineCandidates;		unsigned NumInlineCandidates;
bool NotDuplicatable;		bool NotDuplicatable;
bool Convergent;		bool Convergent;
unsigned LoopSize = ApproximateLoopSize(		unsigned LoopSize = ApproximateLoopSize(
L, NumInlineCandidates, NotDuplicatable, Convergent, TTI, &AC);		L, NumInlineCandidates, NotDuplicatable, Convergent, TTI, &AC);
Show All 10 Lines	static bool tryToUnrollLoop(Loop L, DominatorTree &DT, LoopInfo LI,
if (!L->isLoopSimplifyForm()) {		if (!L->isLoopSimplifyForm()) {
DEBUG(		DEBUG(
dbgs() << " Not unrolling loop which is not in loop-simplify form.\n");		dbgs() << " Not unrolling loop which is not in loop-simplify form.\n");
return false;		return false;
}		}

// Find trip count and trip multiple if count is not available		// Find trip count and trip multiple if count is not available
unsigned TripCount = 0;		unsigned TripCount = 0;
		unsigned MaxTripCount = 0;
unsigned TripMultiple = 1;		unsigned TripMultiple = 1;
// If there are multiple exiting blocks but one of them is the latch, use the		// If there are multiple exiting blocks but one of them is the latch, use the
// latch for the trip count estimation. Otherwise insist on a single exiting		// latch for the trip count estimation. Otherwise insist on a single exiting
// block for the trip count estimation.		// block for the trip count estimation.
BasicBlock *ExitingBlock = L->getLoopLatch();		BasicBlock *ExitingBlock = L->getLoopLatch();
if (!ExitingBlock \|\| !L->isLoopExiting(ExitingBlock))		if (!ExitingBlock \|\| !L->isLoopExiting(ExitingBlock))
ExitingBlock = L->getExitingBlock();		ExitingBlock = L->getExitingBlock();
if (ExitingBlock) {		if (ExitingBlock) {
TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);		TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);		TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
}		}

TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(		TargetTransformInfo::UnrollingPreferences UP = gatherUnrollingPreferences(
L, TTI, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial,		L, TTI, ProvidedThreshold, ProvidedCount, ProvidedAllowPartial,
ProvidedRuntime);		ProvidedRuntime, ProvidedUpperBound);

// Exit early if unrolling is disabled.		// Exit early if unrolling is disabled.
if (UP.Threshold == 0 && (!UP.Partial \|\| UP.PartialThreshold == 0))		if (UP.Threshold == 0 && (!UP.Partial \|\| UP.PartialThreshold == 0))
return false;		return false;

// If the loop contains a convergent operation, the prelude we'd add		// If the loop contains a convergent operation, the prelude we'd add
// to do the first few instructions before we hit the unrolled loop		// to do the first few instructions before we hit the unrolled loop
// is unsafe -- it adds a control-flow dependency to the convergent		// is unsafe -- it adds a control-flow dependency to the convergent
// operation. Therefore restrict remainder loop (try unrollig without).		// operation. Therefore restrict remainder loop (try unrollig without).
//		//
// TODO: This is quite conservative. In practice, convergent_op()		// TODO: This is quite conservative. In practice, convergent_op()
// is likely to be called unconditionally in the loop. In this		// is likely to be called unconditionally in the loop. In this
// case, the program would be ill-formed (on most architectures)		// case, the program would be ill-formed (on most architectures)
// unless n were the same on all threads in a thread group.		// unless n were the same on all threads in a thread group.
// Assuming n is the same on all threads, any kind of unrolling is		// Assuming n is the same on all threads, any kind of unrolling is
// safe. But currently llvm's notion of convergence isn't powerful		// safe. But currently llvm's notion of convergence isn't powerful
// enough to express this.		// enough to express this.
if (Convergent)		if (Convergent)
UP.AllowRemainder = false;		UP.AllowRemainder = false;

bool IsCountSetExplicitly = computeUnrollCount(		// Try to find the trip count upper bound if it is allowed.
L, TTI, DT, LI, SE, &ORE, TripCount, TripMultiple, LoopSize, UP);		if (UP.UpperBound) {
		if (!TripCount) {
		MaxTripCount = SE->getSmallConstantMaxTripCount(L);
		// Only unroll with small upper bound.
		if (MaxTripCount > UnrollMaxUpperBound)
		MaxTripCount = 0;
		}
		// computeUnrollCount() will set UP.UpperBound to true later if using the
		// upper bound to unroll meets the heuristics.
		UP.UpperBound = false;
		mzolotukhinUnsubmitted Done Reply Inline Actions I'm not a big fan of reusing this variable. Initially it was supposed to come from `UP` and show if we're allowed to use upper-bound instead of exactly-known trip-count, but now we're also using it to communicate between various routines (and the interface was not specified). Can it be refactored somehow? mzolotukhin: I'm not a big fan of reusing this variable. Initially it was supposed to come from `UP` and…
		}

		bool IsCountSetExplicitly =
		computeUnrollCount(L, TTI, DT, LI, SE, &ORE, TripCount, MaxTripCount,
		TripMultiple, LoopSize, UP);
if (!UP.Count)		if (!UP.Count)
return false;		return false;
// Unroll factor (Count) must be less or equal to TripCount.		// Unroll factor (Count) must be less or equal to TripCount.
if (TripCount && UP.Count > TripCount)		if (TripCount && UP.Count > TripCount)
UP.Count = TripCount;		UP.Count = TripCount;

// Unroll the loop.		// Unroll the loop.
if (!UnrollLoop(L, UP.Count, TripCount, UP.Force, UP.Runtime,		if (!UnrollLoop(L, UP.Count, TripCount, UP.Force, UP.Runtime,
UP.AllowExpensiveTripCount, TripMultiple, LI, SE, &DT, &AC,		UP.AllowExpensiveTripCount, UP.UpperBound, TripMultiple, LI,
&ORE, PreserveLCSSA))		SE, &DT, &AC, &ORE, PreserveLCSSA))
return false;		return false;

// If loop has an unroll count pragma or unrolled by explicitly set count		// If loop has an unroll count pragma or unrolled by explicitly set count
// mark loop as unrolled to prevent unrolling beyond that requested.		// mark loop as unrolled to prevent unrolling beyond that requested.
if (IsCountSetExplicitly)		if (IsCountSetExplicitly)
SetLoopAlreadyUnrolled(L);		SetLoopAlreadyUnrolled(L);
return true;		return true;
}		}

namespace {		namespace {
class LoopUnroll : public LoopPass {		class LoopUnroll : public LoopPass {
public:		public:
static char ID; // Pass ID, replacement for typeid		static char ID; // Pass ID, replacement for typeid
LoopUnroll(Optional<unsigned> Threshold = None,		LoopUnroll(Optional<unsigned> Threshold = None,
Optional<unsigned> Count = None,		Optional<unsigned> Count = None,
Optional<bool> AllowPartial = None, Optional<bool> Runtime = None)		Optional<bool> AllowPartial = None, Optional<bool> Runtime = None,
		Optional<bool> UpperBound = None)
: LoopPass(ID), ProvidedCount(std::move(Count)),		: LoopPass(ID), ProvidedCount(std::move(Count)),
ProvidedThreshold(Threshold), ProvidedAllowPartial(AllowPartial),		ProvidedThreshold(Threshold), ProvidedAllowPartial(AllowPartial),
ProvidedRuntime(Runtime) {		ProvidedRuntime(Runtime), ProvidedUpperBound(UpperBound) {
initializeLoopUnrollPass(*PassRegistry::getPassRegistry());		initializeLoopUnrollPass(*PassRegistry::getPassRegistry());
}		}

Optional<unsigned> ProvidedCount;		Optional<unsigned> ProvidedCount;
Optional<unsigned> ProvidedThreshold;		Optional<unsigned> ProvidedThreshold;
Optional<bool> ProvidedAllowPartial;		Optional<bool> ProvidedAllowPartial;
Optional<bool> ProvidedRuntime;		Optional<bool> ProvidedRuntime;
		Optional<bool> ProvidedUpperBound;

bool runOnLoop(Loop *L, LPPassManager &) override {		bool runOnLoop(Loop *L, LPPassManager &) override {
if (skipLoop(L))		if (skipLoop(L))
return false;		return false;

Function &F = *L->getHeader()->getParent();		Function &F = *L->getHeader()->getParent();

auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LoopInfo *LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
ScalarEvolution *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();		ScalarEvolution *SE = &getAnalysis<ScalarEvolutionWrapperPass>().getSE();
const TargetTransformInfo &TTI =		const TargetTransformInfo &TTI =
getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		auto &AC = getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
// For the old PM, we can't use OptimizationRemarkEmitter as an analysis		// For the old PM, we can't use OptimizationRemarkEmitter as an analysis
// pass. Function analyses need to be preserved across loop transformations		// pass. Function analyses need to be preserved across loop transformations
// but ORE cannot be preserved (see comment before the pass definition).		// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(&F);		OptimizationRemarkEmitter ORE(&F);
bool PreserveLCSSA = mustPreserveAnalysisID(LCSSAID);		bool PreserveLCSSA = mustPreserveAnalysisID(LCSSAID);

return tryToUnrollLoop(L, DT, LI, SE, TTI, AC, ORE, PreserveLCSSA,		return tryToUnrollLoop(L, DT, LI, SE, TTI, AC, ORE, PreserveLCSSA,
ProvidedCount, ProvidedThreshold,		ProvidedCount, ProvidedThreshold,
ProvidedAllowPartial, ProvidedRuntime);		ProvidedAllowPartial, ProvidedRuntime,
		ProvidedUpperBound);
}		}

/// This transformation requires natural loop information & requires that		/// This transformation requires natural loop information & requires that
/// loop preheaders be inserted into the CFG...		/// loop preheaders be inserted into the CFG...
///		///
void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
// FIXME: Loop passes are required to preserve domtree, and for now we just		// FIXME: Loop passes are required to preserve domtree, and for now we just
// recreate dom info if anything gets unrolled.		// recreate dom info if anything gets unrolled.
getLoopAnalysisUsage(AU);		getLoopAnalysisUsage(AU);
}		}
};		};
}		}

char LoopUnroll::ID = 0;		char LoopUnroll::ID = 0;
INITIALIZE_PASS_BEGIN(LoopUnroll, "loop-unroll", "Unroll loops", false, false)		INITIALIZE_PASS_BEGIN(LoopUnroll, "loop-unroll", "Unroll loops", false, false)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(LoopPass)		INITIALIZE_PASS_DEPENDENCY(LoopPass)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_END(LoopUnroll, "loop-unroll", "Unroll loops", false, false)		INITIALIZE_PASS_END(LoopUnroll, "loop-unroll", "Unroll loops", false, false)

Pass *llvm::createLoopUnrollPass(int Threshold, int Count, int AllowPartial,		Pass *llvm::createLoopUnrollPass(int Threshold, int Count, int AllowPartial,
int Runtime) {		int Runtime, int UpperBound) {
// TODO: It would make more sense for this function to take the optionals		// TODO: It would make more sense for this function to take the optionals
// directly, but that's dangerous since it would silently break out of tree		// directly, but that's dangerous since it would silently break out of tree
// callers.		// callers.
return new LoopUnroll(Threshold == -1 ? None : Optional<unsigned>(Threshold),		return new LoopUnroll(Threshold == -1 ? None : Optional<unsigned>(Threshold),
Count == -1 ? None : Optional<unsigned>(Count),		Count == -1 ? None : Optional<unsigned>(Count),
AllowPartial == -1 ? None		AllowPartial == -1 ? None
: Optional<bool>(AllowPartial),		: Optional<bool>(AllowPartial),
Runtime == -1 ? None : Optional<bool>(Runtime));		Runtime == -1 ? None : Optional<bool>(Runtime),
		UpperBound == -1 ? None : Optional<bool>(UpperBound));
}		}

Pass *llvm::createSimpleLoopUnrollPass() {		Pass *llvm::createSimpleLoopUnrollPass() {
return llvm::createLoopUnrollPass(-1, -1, 0, 0);		return llvm::createLoopUnrollPass(-1, -1, 0, 0, 0);
}		}

PreservedAnalyses LoopUnrollPass::run(Loop &L, LoopAnalysisManager &AM) {		PreservedAnalyses LoopUnrollPass::run(Loop &L, LoopAnalysisManager &AM) {
const auto &FAM =		const auto &FAM =
AM.getResult<FunctionAnalysisManagerLoopProxy>(L).getManager();		AM.getResult<FunctionAnalysisManagerLoopProxy>(L).getManager();
Function *F = L.getHeader()->getParent();		Function *F = L.getHeader()->getParent();


Show All 12 Lines	PreservedAnalyses LoopUnrollPass::run(Loop &L, LoopAnalysisManager &AM) {
if (!TTI)		if (!TTI)
report_fatal_error("LoopUnrollPass: TargetIRAnalysis not cached at a higher level");		report_fatal_error("LoopUnrollPass: TargetIRAnalysis not cached at a higher level");
if (!AC)		if (!AC)
report_fatal_error("LoopUnrollPass: AssumptionAnalysis not cached at a higher level");		report_fatal_error("LoopUnrollPass: AssumptionAnalysis not cached at a higher level");
if (!ORE)		if (!ORE)
report_fatal_error("LoopUnrollPass: OptimizationRemarkEmitterAnalysis not "		report_fatal_error("LoopUnrollPass: OptimizationRemarkEmitterAnalysis not "
"cached at a higher level");		"cached at a higher level");

bool Changed = tryToUnrollLoop(		bool Changed =
&L, DT, LI, SE, TTI, AC, ORE, /PreserveLCSSA/ true, ProvidedCount,		tryToUnrollLoop(&L, DT, LI, SE, TTI, AC, ORE, /PreserveLCSSA/ true,
ProvidedThreshold, ProvidedAllowPartial, ProvidedRuntime);		ProvidedCount, ProvidedThreshold, ProvidedAllowPartial,
		ProvidedRuntime, ProvidedUpperBound);

if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();
return getLoopPassPreservedAnalyses();		return getLoopPassPreservedAnalyses();
}		}

lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines
/// AllowExpensiveTripCount is false.		/// AllowExpensiveTripCount is false.
///		///
/// The LoopInfo Analysis that is passed will be kept consistent.		/// The LoopInfo Analysis that is passed will be kept consistent.
///		///
/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and		/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and
/// DominatorTree if they are non-null.		/// DominatorTree if they are non-null.
bool llvm::UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,		bool llvm::UnrollLoop(Loop *L, unsigned Count, unsigned TripCount, bool Force,
bool AllowRuntime, bool AllowExpensiveTripCount,		bool AllowRuntime, bool AllowExpensiveTripCount,
unsigned TripMultiple, LoopInfo LI, ScalarEvolution SE,		bool UseUpperBound, unsigned TripMultiple, LoopInfo *LI,
		mzolotukhinUnsubmitted Done Reply Inline Actions Please add a comment about `UseUpperBound`. Actually, I don't think that's a good name for this argument, because this functions doesn't use upper bound for anything. What it needs is a flag indicating that conditional branches must be preserved - I'd suggest to reflect that in the argument name. mzolotukhin: Please add a comment about `UseUpperBound`. Actually, I don't think that's a good name for this…
DominatorTree DT, AssumptionCache AC,		ScalarEvolution SE, DominatorTree DT,
OptimizationRemarkEmitter *ORE, bool PreserveLCSSA) {		AssumptionCache AC, OptimizationRemarkEmitter ORE,
		bool PreserveLCSSA) {
BasicBlock *Preheader = L->getLoopPreheader();		BasicBlock *Preheader = L->getLoopPreheader();
if (!Preheader) {		if (!Preheader) {
DEBUG(dbgs() << " Can't unroll; loop preheader-insertion failed.\n");		DEBUG(dbgs() << " Can't unroll; loop preheader-insertion failed.\n");
return false;		return false;
}		}

BasicBlock *LatchBlock = L->getLoopLatch();		BasicBlock *LatchBlock = L->getLoopLatch();
if (!LatchBlock) {		if (!LatchBlock) {
▲ Show 20 Lines • Show All 314 Lines • ▼ Show 20 Lines	if (RuntimeTripCount && j != 0) {
NeedConditional = false;		NeedConditional = false;
}		}

// For a complete unroll, make the last iteration end with a branch		// For a complete unroll, make the last iteration end with a branch
// to the exit block.		// to the exit block.
if (CompletelyUnroll) {		if (CompletelyUnroll) {
if (j == 0)		if (j == 0)
Dest = LoopExit;		Dest = LoopExit;
NeedConditional = false;		// If using trip count upper bound to completely unroll, we need to keep
}		// the conditional branch except the last one because the loop may exit
		// after any iteration.
		NeedConditional = (UseUpperBound && j);
		ABataevUnsubmitted Done Reply Inline Actions Maybe just `NeedConditional = (UseUpperBound && j)`? ABataev: Maybe just `NeedConditional = (UseUpperBound && j)`?
		mzolotukhinUnsubmitted Not Done Reply Inline Actions Is it intentional that we can make `NeedConditional` true after it was set to false before that? From semantics it looks like this never happens (in this case we can assert it), but I'd like to make sure it was not overlooked. Did you intend to replace this `if` with `else if` as well? mzolotukhin: Is it intentional that we can make `NeedConditional` true after it was set to false before that?
		haichengAuthorUnsubmitted Not Done Reply Inline Actions I intended to replace `if` with `else if`. Using complete unroll should not enter the `else if` part. I think the change improves the readability. haicheng: I intended to replace `if `with `else if`. Using complete unroll should not enter the `else…
		haichengAuthorUnsubmitted Not Done Reply Inline Actions Without the change of `else if`, the logic of my patch is also wrong. haicheng: Without the change of `else if`, the logic of my patch is also wrong.
		} else if (j != BreakoutTrip && (TripMultiple == 0 \|\| j % TripMultiple != 0)) {
// If we know the trip count or a multiple of it, we can safely use an		// If we know the trip count or a multiple of it, we can safely use an
// unconditional branch for some iterations.		// unconditional branch for some iterations.
if (j != BreakoutTrip && (TripMultiple == 0 \|\| j % TripMultiple != 0)) {
NeedConditional = false;		NeedConditional = false;
}		}

if (NeedConditional) {		if (NeedConditional) {
// Update the conditional branch's successor for the following		// Update the conditional branch's successor for the following
// iteration.		// iteration.
Term->setSuccessor(!ContinueOnTrue, Dest);		Term->setSuccessor(!ContinueOnTrue, Dest);
} else {		} else {
▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/tti-unroll-prefs.ll

	; RUN: opt -loop-unroll -S -mtriple=amdgcn-- -mcpu=SI %s \| FileCheck %s			; RUN: opt -loop-unroll -unroll-upperbound -S -mtriple=amdgcn-- -mcpu=SI %s \| FileCheck %s

	; This IR comes from this OpenCL C code:			; This IR comes from this OpenCL C code:
	;			;
	; if (b + 4 > a) {			; if (b + 4 > a) {
	; for (int i = 0; i < 4; i++, b++) {			; for (int i = 0; i < 4; i++, b++) {
	; if (b + 1 <= a)			; if (b + 1 <= a)
	; *(dst + c + b) = 0;			; *(dst + c + b) = 0;
	; else			; else
	; break;			; break;
	; }			; }
	; }			; }
	;			;
	; This test is meant to check that this loop isn't unrolled into more than			; This test is meant to check that this loop isn't unrolled into more than
	; four iterations. The loop unrolling preferences we currently use cause this			; four iterations.
	; loop to not be unrolled at all, but that may change in the future.
	mzolotukhinUnsubmitted Not Done Reply Inline Actions We could've just passed `-unroll-threshold` to overcome this. It might make sense to do it even now, so that the test doesn't break if UnrollPreferences are changed. mzolotukhin: We could've just passed `-unroll-threshold` to overcome this. It might make sense to do it even…
	haichengAuthorUnsubmitted Not Done Reply Inline Actions This loop has a break statement so that we cannot get the exact trip cont. I think we cannot fully unroll the loop without my change. Passing `-unroll-threshold` cannot overcome this. haicheng: This loop has a break statement so that we cannot get the exact trip cont. I think we cannot…

	; CHECK-LABEL: @test			; CHECK-LABEL: @test
	; CHECK: store i8 0, i8 addrspace(1)*			; CHECK: store i8 0, i8 addrspace(1)*
				; CHECK: store i8 0, i8 addrspace(1)*
				; CHECK: store i8 0, i8 addrspace(1)*
				; CHECK: store i8 0, i8 addrspace(1)*
	; CHECK-NOT: store i8 0, i8 addrspace(1)*			; CHECK-NOT: store i8 0, i8 addrspace(1)*
	; CHECK: ret void
	define void @test(i8 addrspace(1)* nocapture %dst, i32 %a, i32 %b, i32 %c) {			define void @test(i8 addrspace(1)* nocapture %dst, i32 %a, i32 %b, i32 %c) {
	entry:			entry:
	%add = add nsw i32 %b, 4			%add = add nsw i32 %b, 4
	%cmp = icmp sgt i32 %add, %a			%cmp = icmp sgt i32 %add, %a
	br i1 %cmp, label %for.cond.preheader, label %if.end7			br i1 %cmp, label %for.cond.preheader, label %if.end7

	for.cond.preheader: ; preds = %entry			for.cond.preheader: ; preds = %entry
	%cmp313 = icmp slt i32 %b, %a			%cmp313 = icmp slt i32 %b, %a
	Show All 29 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Use the upper bound of the loop trip count to completely unroll loopsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 72293

include/llvm/Analysis/ScalarEvolution.h

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/CodeGen/BasicTTIImpl.h

include/llvm/Transforms/Scalar.h

include/llvm/Transforms/Scalar/LoopUnrollPass.h

include/llvm/Transforms/Utils/UnrollLoop.h

lib/Analysis/ScalarEvolution.cpp

lib/Transforms/Scalar/LoopUnrollPass.cpp

lib/Transforms/Utils/LoopUnroll.cpp

test/CodeGen/AMDGPU/tti-unroll-prefs.ll

[LoopUnroll] Use the upper bound of the loop trip count to completely unroll loops
ClosedPublic