This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
LoopUtils.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
2/4
LoopUnrollPass.cpp
-
Utils/
1/3
LoopUtils.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
2/2
unroll-heuristics-pgo.ll

Differential D26527

Use profile info to adjust loop unroll threshold.
ClosedPublic

Authored by danielcdh on Nov 10 2016, 2:37 PM.

Download Raw Diff

Details

Reviewers

mzolotukhin
davidxl

Commits

rG41d72a863260: Use profile info to adjust loop unroll threshold.
rL287186: Use profile info to adjust loop unroll threshold.

Summary

For flat loop, even if it is hot, it is not a good idea to unroll in runtime, thus we set a lower partial unroll threshold.
For hot loop, we set a higher unroll threshold and allows expensive tripcount computation to allow more aggressive unrolling.

Diff Detail

Build Status

Buildable 1331
Build 1331: arc lint + arc unit

Event Timeline

danielcdh updated this revision to Diff 77556.Nov 10 2016, 2:37 PM

danielcdh retitled this revision from to Use profile info to adjust loop unroll threshold..

danielcdh updated this object.

danielcdh added reviewers: davidxl, mzolotukhin.

danielcdh added a subscriber: llvm-commits.

Herald added subscribers: mehdi_amini, sanjoy. · View Herald TranscriptNov 10 2016, 2:37 PM

Hi,

Thanks for working on this! Please find some comments inline.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
762	This looks like a magic number to me. Can we use some parameter for it (or maybe separate thresholds for 'hot' and 'cold' loops)?
test/Other/pass-pipelines.ll
49–50 ↗	(On Diff #77556)	Hmm, is loop-unroll in a separate instance of loop pass manager now?

Update the patch to remove dependency to BFI/PSI and only use trip count to evaluate if we want to unroll the loop.

Also steal Micheal's getLoopEstimatedTripCount implementation.

lib/Transforms/Scalar/LoopUnrollPass.cpp
762	Logic removed from the patch
test/Other/pass-pipelines.ll
49–50 ↗	(On Diff #77556)	Removed dependency to BFI/PSI

The change looks good to me, thank you! I'm assuming you and Michael will figure out which version of getLoopEstimatedTripCount you want to use, other than that I have mostly nitpicky comments below.

BTW, do you have performance testing results for this patch? I'd expect some improvements in code-size and compile-time with these changes.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
757	Please add some comment here.
758–759	`if (auto ProfileTripCount = getLoopEstimatedTripCount(L))` ?
lib/Transforms/Utils/LoopUtils.cpp
1071	This version and the one from D25963 should eventually become the same, right?
1077–1078	Probably we also need to check that the latch is exiting (i.e. the branch is conditional).
test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll
5	Please add `@` to the name.
6	Is it enough to just check presence of the prologue? Maybe explicitly check that we have several copies of some instruction?

This revision is now accepted and ready to land.Nov 16 2016, 2:24 PM

update

lib/Transforms/Utils/LoopUtils.cpp
1071	Yes, I stole the code from D25963 ;-)

The perf/size impact of this patch is small on speccpu as flat loop is rare in most of the benchmarks.

spec/2006/fp/C++/444.namd 25.47 +0.30%
spec/2006/fp/C++/447.dealII 45.46 +0.23%
spec/2006/fp/C++/450.soplex 43.38 +0.58%
spec/2006/fp/C++/453.povray 37.88 -0.78%
spec/2006/fp/C/433.milc 23.75 -0.13%
spec/2006/fp/C/470.lbm 41.53 -0.09%
spec/2006/fp/C/482.sphinx3 48.97 -0.11%
spec/2006/int/C++/471.omnetpp 22.79 -0.22%
spec/2006/int/C++/473.astar 22.99 +0.17%
spec/2006/int/C++/483.xalancbmk 38.55 -0.42%
spec/2006/int/C/400.perlbench 37.06 +1.14%
spec/2006/int/C/401.bzip2 23.38 +0.98%
spec/2006/int/C/403.gcc 34.52 -0.25%
spec/2006/int/C/429.mcf 42.28 -0.05%
spec/2006/int/C/445.gobmk 27.98 +0.60%
spec/2006/int/C/456.hmmer 26.01 -0.06%
spec/2006/int/C/458.sjeng 30.42 +0.50%
spec/2006/int/C/462.libquantum 57.48 +0.37%
spec/2006/int/C/464.h264ref 47.6 -0.70%

geometric mean +0.11%

danielcdh closed this revision.Nov 16 2016, 5:26 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

LoopUtils.h

5 lines

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

16 lines

Utils/

LoopUtils.cpp

36 lines

test/

Transforms/

LoopUnroll/

unroll-heuristics-pgo.ll

54 lines

Diff 78260

include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 455 Lines • ▼ Show 20 Lines
	/// Optional's not-a-value.			/// Optional's not-a-value.
	Optional<const MDOperand > findStringMetadataForLoop(Loop TheLoop,			Optional<const MDOperand > findStringMetadataForLoop(Loop TheLoop,
	StringRef Name);			StringRef Name);

	/// \brief Set input string into loop metadata by keeping other values intact.			/// \brief Set input string into loop metadata by keeping other values intact.
	void addStringMetadataToLoop(Loop TheLoop, const char MDString,			void addStringMetadataToLoop(Loop TheLoop, const char MDString,
	unsigned V = 0);			unsigned V = 0);

				/// \brief Get a loop's estimated trip count based on branch weight metadata.
				/// Returns 0 when the count is estimated to be 0, or None when a meaningful
				/// estimate can not be made.
				Optional<unsigned> getLoopEstimatedTripCount(Loop *L);

	/// Helper to consistently add the set of standard passes to a loop pass's \c			/// Helper to consistently add the set of standard passes to a loop pass's \c
	/// AnalysisUsage.			/// AnalysisUsage.
	///			///
	/// All loop passes should call this as part of implementing their \c			/// All loop passes should call this as part of implementing their \c
	/// getAnalysisUsage.			/// getAnalysisUsage.
	void getLoopAnalysisUsage(AnalysisUsage &AU);			void getLoopAnalysisUsage(AnalysisUsage &AU);

	/// Returns true if the hoister and sinker can handle this instruction.			/// Returns true if the hoister and sinker can handle this instruction.
	Show All 12 Lines

lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> UnrollMaxUpperBound(
cl::desc(		cl::desc(
"The max of trip count upper bound that is considered in unrolling"));		"The max of trip count upper bound that is considered in unrolling"));

static cl::opt<unsigned> PragmaUnrollThreshold(		static cl::opt<unsigned> PragmaUnrollThreshold(
"pragma-unroll-threshold", cl::init(16 * 1024), cl::Hidden,		"pragma-unroll-threshold", cl::init(16 * 1024), cl::Hidden,
cl::desc("Unrolled size limit for loops with an unroll(full) or "		cl::desc("Unrolled size limit for loops with an unroll(full) or "
"unroll_count pragma."));		"unroll_count pragma."));

		static cl::opt<unsigned> FlatLoopTripCountThreshold(
		"flat-loop-tripcount-threshold", cl::init(5), cl::Hidden,
		cl::desc("If the runtime tripcount for the loop is lower than the "
		"threshold, the loop is considered as flat and will be less "
		"aggressively unrolled."));

/// A magic value for use with the Threshold parameter to indicate		/// A magic value for use with the Threshold parameter to indicate
/// that the loop unroll should be performed regardless of how much		/// that the loop unroll should be performed regardless of how much
/// code expansion would result.		/// code expansion would result.
static const unsigned NoThreshold = UINT_MAX;		static const unsigned NoThreshold = UINT_MAX;

/// Gather the various unrolling parameters based on the defaults, compiler		/// Gather the various unrolling parameters based on the defaults, compiler
/// flags, TTI overrides and user specified parameters.		/// flags, TTI overrides and user specified parameters.
static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(		static TargetTransformInfo::UnrollingPreferences gatherUnrollingPreferences(
▲ Show 20 Lines • Show All 630 Lines • ▼ Show 20 Lines	if (PragmaFullUnroll && TripCount != 0) {
if (getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold)		if (getUnrolledLoopSize(LoopSize, UP) < PragmaUnrollThreshold)
return false;		return false;
}		}

bool PragmaEnableUnroll = HasUnrollEnablePragma(L);		bool PragmaEnableUnroll = HasUnrollEnablePragma(L);
bool ExplicitUnroll = PragmaCount > 0 \|\| PragmaFullUnroll \|\|		bool ExplicitUnroll = PragmaCount > 0 \|\| PragmaFullUnroll \|\|
PragmaEnableUnroll \|\| UserUnrollCount;		PragmaEnableUnroll \|\| UserUnrollCount;

		if (L->getHeader()->getParent()->getEntryCount() && TripCount == 0) {
		mzolotukhinUnsubmitted Done Reply Inline Actions Please add some comment here. mzolotukhin: Please add some comment here.
		auto ProfileTripCount = getLoopEstimatedTripCount(L);
		if (ProfileTripCount) {
		mzolotukhinUnsubmitted Done Reply Inline Actions `if (auto ProfileTripCount = getLoopEstimatedTripCount(L))` ? mzolotukhin: `if (auto ProfileTripCount = getLoopEstimatedTripCount(L))` ?
		if (*ProfileTripCount < FlatLoopTripCountThreshold)
		return false;
		else
		mzolotukhinUnsubmitted Not Done Reply Inline Actions This looks like a magic number to me. Can we use some parameter for it (or maybe separate thresholds for 'hot' and 'cold' loops)? mzolotukhin: This looks like a magic number to me. Can we use some parameter for it (or maybe separate…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Logic removed from the patch danielcdh: Logic removed from the patch
		UP.AllowExpensiveTripCount = true;
		}
		}

if (ExplicitUnroll && TripCount != 0) {		if (ExplicitUnroll && TripCount != 0) {
// If the loop has an unrolling pragma, we want to be more aggressive with		// If the loop has an unrolling pragma, we want to be more aggressive with
// unrolling limits. Set thresholds to at least the PragmaThreshold value		// unrolling limits. Set thresholds to at least the PragmaThreshold value
// which is larger than the default limits.		// which is larger than the default limits.
UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);		UP.Threshold = std::max<unsigned>(UP.Threshold, PragmaUnrollThreshold);
UP.PartialThreshold =		UP.PartialThreshold =
std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);		std::max<unsigned>(UP.PartialThreshold, PragmaUnrollThreshold);
}		}
▲ Show 20 Lines • Show All 403 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 1,061 Lines • ▼ Show 20 Lines	bool llvm::isGuaranteedToExecute(const Instruction &Inst,
if (ExitBlocks.empty())		if (ExitBlocks.empty())
return false;		return false;

// FIXME: In general, we have to prove that the loop isn't an infinite loop.		// FIXME: In general, we have to prove that the loop isn't an infinite loop.
// See http::llvm.org/PR24078 . (The "ExitBlocks.empty()" check above is		// See http::llvm.org/PR24078 . (The "ExitBlocks.empty()" check above is
// just a special case of this.)		// just a special case of this.)
return true;		return true;
}		}

		Optional<unsigned> llvm::getLoopEstimatedTripCount(Loop *L) {
		mzolotukhinUnsubmitted Not Done Reply Inline Actions This version and the one from D25963 should eventually become the same, right? mzolotukhin: This version and the one from D25963 should eventually become the same, right?
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Yes, I stole the code from D25963 ;-) danielcdh: Yes, I stole the code from D25963 ;-)
		// Only support loops with a unique exiting block, and a latch.
		if (!L->getExitingBlock())
		return None;

		// Get the branch weights for the the loop's backedge.
		BranchInst *LatchBR =
		dyn_cast<BranchInst>(L->getLoopLatch()->getTerminator());
		mzolotukhinUnsubmitted Done Reply Inline Actions Probably we also need to check that the latch is exiting (i.e. the branch is conditional). mzolotukhin: Probably we also need to check that the latch is exiting (i.e. the branch is conditional).
		if (!LatchBR)
		return None;

		assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|
		LatchBR->getSuccessor(1) == L->getHeader()) &&
		"At least one edge out of the latch must go to the header");

		// To estimate the number of times the loop body was executed, we want to
		// know the number of times the backedge was taken, vs. the number of times
		// we exited the loop.
		// The branch weights give us almost what we want, since they were adjusted
		// from the raw counts to provide a better probability estimate. Remove
		// the adjustment by subtracting 1 from both weights.
		uint64_t TrueVal, FalseVal;
		if (!LatchBR->extractProfMetadata(TrueVal, FalseVal) \|\| (TrueVal <= 1) \|\|
		(FalseVal <= 1))
		return None;

		TrueVal -= 1;
		FalseVal -= 1;

		// Divide the count of the backedge by the count of the edge exiting the loop.
		if (LatchBR->getSuccessor(0) == L->getHeader())
		return TrueVal / FalseVal;
		else
		return FalseVal / TrueVal;
		}

test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll

This file was added.

				; RUN: opt < %s -S -loop-unroll -unroll-runtime -unroll-threshold=40 -unroll-dynamic-cost-savings-discount=0 \| FileCheck %s

				@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16

				; CHECK-LABEL: bar_prof
				mzolotukhinUnsubmitted Done Reply Inline Actions Please add `@` to the name. mzolotukhin: Please add `@` to the name.
				; CHECK: loop.prol
				mzolotukhinUnsubmitted Done Reply Inline Actions Is it enough to just check presence of the prologue? Maybe explicitly check that we have several copies of some instruction? mzolotukhin: Is it enough to just check presence of the prologue? Maybe explicitly check that we have…
				define i32 @bar_prof(i32* noalias nocapture readonly %src, i64 %c) !prof !1 {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]
				%r = phi i32 [ 0, %entry ], [ %add, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv
				%src_element = load i32, i32* %arrayidx, align 4
				%array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv
				%const_array_element = load i32, i32* %array_const_idx, align 4
				%mul = mul nsw i32 %src_element, %const_array_element
				%add = add nsw i32 %mul, %r
				%inc = add nuw nsw i64 %iv, 1
				%exitcond86.i = icmp eq i64 %inc, %c
				br i1 %exitcond86.i, label %loop.end, label %loop, !prof !2

				loop.end:
				%r.lcssa = phi i32 [ %r, %loop ]
				ret i32 %r.lcssa
				}

				; CHECK-LABEL: bar_prof_flat
				; CHECK-NOT: loop.prol
				define i32 @bar_prof_flat(i32* noalias nocapture readonly %src, i64 %c) !prof !1 {
				entry:
				br label %loop

				loop:
				%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]
				%r = phi i32 [ 0, %entry ], [ %add, %loop ]
				%arrayidx = getelementptr inbounds i32, i32* %src, i64 %iv
				%src_element = load i32, i32* %arrayidx, align 4
				%array_const_idx = getelementptr inbounds [9 x i32], [9 x i32]* @known_constant, i64 0, i64 %iv
				%const_array_element = load i32, i32* %array_const_idx, align 4
				%mul = mul nsw i32 %src_element, %const_array_element
				%add = add nsw i32 %mul, %r
				%inc = add nuw nsw i64 %iv, 1
				%exitcond86.i = icmp eq i64 %inc, %c
				br i1 %exitcond86.i, label %loop, label %loop.end, !prof !2

				loop.end:
				%r.lcssa = phi i32 [ %r, %loop ]
				ret i32 %r.lcssa
				}

				!1 = !{!"function_entry_count", i64 1}
				!2 = !{!"branch_weights", i32 1, i32 1000}

This is an archive of the discontinued LLVM Phabricator instance.

Use profile info to adjust loop unroll threshold.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 78260

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Scalar/LoopUnrollPass.cpp

lib/Transforms/Utils/LoopUtils.cpp

test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll

Use profile info to adjust loop unroll threshold.
ClosedPublic