This is an archive of the discontinued LLVM Phabricator instance.

Use profile summary to disable peeling for huge working sets
ClosedPublic

Authored by tejohnson on Aug 3 2017, 2:27 PM.

Download Raw Diff

Details

Reviewers

Commits

rG8482e5692078: Use profile summary to disable peeling for huge working sets
rL310005: Use profile summary to disable peeling for huge working sets

Summary

Detect when the working set size of a profiled application is huge,
by comparing the number of counts required to reach the hot percentile
in the profile summary to a large threshold*.

When the working set size is determined to be huge, disable peeling
to avoid bloating the working set further.

*Note that the selected threshold (15K) is significantly larger than the
largest working set value in SPEC cpu2006 (which is gcc at around 11K).

Diff Detail

Repository: rL LLVM

Event Timeline

tejohnson created this revision.Aug 3 2017, 2:27 PM

Herald added subscribers: eraman, mzolotukhin, mehdi_amini. · View Herald TranscriptAug 3 2017, 2:27 PM

eraman added inline comments.Aug 3 2017, 2:50 PM

lib/Analysis/ProfileSummaryInfo.cpp
69 ↗	(On Diff #109636)	The getMinCountForPercentile and getNumCountsForPercentile could be merged to a getEntryForPercentile

Have we considered limiting the max number of peeled iterations instead of disabling it?

In D36288#831144, @davidxl wrote:

Have we considered limiting the max number of peeled iterations instead of disabling it?

I am planning to do some follow on tuning that would limit the factor based on additional profile info, but as a first heuristic disabling it helps quite a bit.

lib/Analysis/ProfileSummaryInfo.cpp
69 ↗	(On Diff #109636)	That's a good idea, will do so.

Implement Easwaran's suggestion

Harbormaster completed remote builds in B9004: Diff 109646.Aug 3 2017, 4:05 PM

lgtm

This revision is now accepted and ready to land.Aug 3 2017, 4:36 PM

Closed by commit rL310005: Use profile summary to disable peeling for huge working sets (authored by tejohnson). · Explain WhyAug 3 2017, 4:43 PM

This revision was automatically updated to reflect the committed changes.

When the working set size is determined to be huge, disable peeling to avoid bloating the working set further.

Just lurking and curious: how is "working set" defined first? And then how is peeling bloating it?

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

ProfileSummaryInfo.h

6 lines

lib/

Analysis/

ProfileSummaryInfo.cpp

34 lines

Transforms/

Scalar/

LoopUnrollPass.cpp

24 lines

test/

Other/

new-pm-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

Transforms/

LoopUnroll/

peel-loop-pgo.ll

53 lines

Diff 109656

llvm/trunk/include/llvm/Analysis/ProfileSummaryInfo.h

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
class ProfileSummaryInfo {		class ProfileSummaryInfo {
private:		private:
Module &M;		Module &M;
std::unique_ptr<ProfileSummary> Summary;		std::unique_ptr<ProfileSummary> Summary;
bool computeSummary();		bool computeSummary();
void computeThresholds();		void computeThresholds();
// Count thresholds to answer isHotCount and isColdCount queries.		// Count thresholds to answer isHotCount and isColdCount queries.
Optional<uint64_t> HotCountThreshold, ColdCountThreshold;		Optional<uint64_t> HotCountThreshold, ColdCountThreshold;
		// True if the working set size of the code is considered huge,
		// because the number of profile counts required to reach the hot
		// percentile is above a huge threshold.
		Optional<bool> HasHugeWorkingSetSize;

public:		public:
ProfileSummaryInfo(Module &M) : M(M) {}		ProfileSummaryInfo(Module &M) : M(M) {}
ProfileSummaryInfo(ProfileSummaryInfo &&Arg)		ProfileSummaryInfo(ProfileSummaryInfo &&Arg)
: M(Arg.M), Summary(std::move(Arg.Summary)) {}		: M(Arg.M), Summary(std::move(Arg.Summary)) {}

/// \brief Returns true if profile summary is available.		/// \brief Returns true if profile summary is available.
bool hasProfileSummary() { return computeSummary(); }		bool hasProfileSummary() { return computeSummary(); }
Show All 19 Lines	public:
bool invalidate(Module &, const PreservedAnalyses &,		bool invalidate(Module &, const PreservedAnalyses &,
ModuleAnalysisManager::Invalidator &) {		ModuleAnalysisManager::Invalidator &) {
return false;		return false;
}		}

/// Returns the profile count for \p CallInst.		/// Returns the profile count for \p CallInst.
Optional<uint64_t> getProfileCount(const Instruction *CallInst,		Optional<uint64_t> getProfileCount(const Instruction *CallInst,
BlockFrequencyInfo *BFI);		BlockFrequencyInfo *BFI);
		/// Returns true if the working set size of the code is considered huge.
		bool hasHugeWorkingSetSize();
/// \brief Returns true if \p F has hot function entry.		/// \brief Returns true if \p F has hot function entry.
bool isFunctionEntryHot(const Function *F);		bool isFunctionEntryHot(const Function *F);
/// Returns true if \p F has hot function entry or hot call edge.		/// Returns true if \p F has hot function entry or hot call edge.
bool isFunctionHotInCallGraph(const Function *F);		bool isFunctionHotInCallGraph(const Function *F);
/// \brief Returns true if \p F has cold function entry.		/// \brief Returns true if \p F has cold function entry.
bool isFunctionEntryCold(const Function *F);		bool isFunctionEntryCold(const Function *F);
/// Returns true if \p F has cold function entry or cold call edge.		/// Returns true if \p F has cold function entry or cold call edge.
bool isFunctionColdInCallGraph(const Function *F);		bool isFunctionColdInCallGraph(const Function *F);
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/trunk/lib/Analysis/ProfileSummaryInfo.cpp

Show All 38 Lines	static cl::opt<int> ProfileSummaryCutoffCold(
cl::desc("A count is cold if it is below the minimum count"		cl::desc("A count is cold if it is below the minimum count"
" to reach this percentile of total counts."));		" to reach this percentile of total counts."));

static cl::opt<bool> AccurateSampleProfile(		static cl::opt<bool> AccurateSampleProfile(
"accurate-sample-profile", cl::Hidden, cl::init(false),		"accurate-sample-profile", cl::Hidden, cl::init(false),
cl::desc("If the sample profile is accurate, we will mark all un-sampled "		cl::desc("If the sample profile is accurate, we will mark all un-sampled "
"callsite as cold. Otherwise, treat un-sampled callsites as if "		"callsite as cold. Otherwise, treat un-sampled callsites as if "
"we have no profile."));		"we have no profile."));
		static cl::opt<unsigned> ProfileSummaryHugeWorkingSetSizeThreshold(
		"profile-summary-huge-working-set-size-threshold", cl::Hidden,
		cl::init(15000), cl::ZeroOrMore,
		cl::desc("The code working set size is considered huge if the number of"
		" blocks required to reach the -profile-summary-cutoff-hot"
		" percentile exceeds this count."));

// Find the minimum count to reach a desired percentile of counts.		// Find the summary entry for a desired percentile of counts.
static uint64_t getMinCountForPercentile(SummaryEntryVector &DS,		static const ProfileSummaryEntry &getEntryForPercentile(SummaryEntryVector &DS,
uint64_t Percentile) {		uint64_t Percentile) {
auto Compare = [](const ProfileSummaryEntry &Entry, uint64_t Percentile) {		auto Compare = [](const ProfileSummaryEntry &Entry, uint64_t Percentile) {
return Entry.Cutoff < Percentile;		return Entry.Cutoff < Percentile;
};		};
auto It = std::lower_bound(DS.begin(), DS.end(), Percentile, Compare);		auto It = std::lower_bound(DS.begin(), DS.end(), Percentile, Compare);
// The required percentile has to be <= one of the percentiles in the		// The required percentile has to be <= one of the percentiles in the
// detailed summary.		// detailed summary.
if (It == DS.end())		if (It == DS.end())
report_fatal_error("Desired percentile exceeds the maximum cutoff");		report_fatal_error("Desired percentile exceeds the maximum cutoff");
return It->MinCount;		return *It;
}		}

// The profile summary metadata may be attached either by the frontend or by		// The profile summary metadata may be attached either by the frontend or by
// any backend passes (IR level instrumentation, for example). This method		// any backend passes (IR level instrumentation, for example). This method
// checks if the Summary is null and if so checks if the summary metadata is now		// checks if the Summary is null and if so checks if the summary metadata is now
// available in the module and parses it to get the Summary object. Returns true		// available in the module and parses it to get the Summary object. Returns true
// if a valid Summary is available.		// if a valid Summary is available.
bool ProfileSummaryInfo::computeSummary() {		bool ProfileSummaryInfo::computeSummary() {
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	bool ProfileSummaryInfo::isFunctionEntryCold(const Function *F) {
return FunctionCount && isColdCount(FunctionCount.getValue());		return FunctionCount && isColdCount(FunctionCount.getValue());
}		}

/// Compute the hot and cold thresholds.		/// Compute the hot and cold thresholds.
void ProfileSummaryInfo::computeThresholds() {		void ProfileSummaryInfo::computeThresholds() {
if (!computeSummary())		if (!computeSummary())
return;		return;
auto &DetailedSummary = Summary->getDetailedSummary();		auto &DetailedSummary = Summary->getDetailedSummary();
HotCountThreshold =		auto &HotEntry =
getMinCountForPercentile(DetailedSummary, ProfileSummaryCutoffHot);		getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffHot);
ColdCountThreshold =		HotCountThreshold = HotEntry.MinCount;
getMinCountForPercentile(DetailedSummary, ProfileSummaryCutoffCold);		auto &ColdEntry =
		getEntryForPercentile(DetailedSummary, ProfileSummaryCutoffCold);
		ColdCountThreshold = ColdEntry.MinCount;
		HasHugeWorkingSetSize =
		HotEntry.NumCounts > ProfileSummaryHugeWorkingSetSizeThreshold;
		}

		bool ProfileSummaryInfo::hasHugeWorkingSetSize() {
		if (!HasHugeWorkingSetSize)
		computeThresholds();
		return HasHugeWorkingSetSize && HasHugeWorkingSetSize.getValue();
}		}

bool ProfileSummaryInfo::isHotCount(uint64_t C) {		bool ProfileSummaryInfo::isHotCount(uint64_t C) {
if (!HotCountThreshold)		if (!HotCountThreshold)
computeThresholds();		computeThresholds();
return HotCountThreshold && C >= HotCountThreshold.getValue();		return HotCountThreshold && C >= HotCountThreshold.getValue();
}		}

▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp

Show All 15 Lines
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/LoopUnrollAnalyzer.h"		#include "llvm/Analysis/LoopUnrollAnalyzer.h"
#include "llvm/Analysis/OptimizationDiagnosticInfo.h"		#include "llvm/Analysis/OptimizationDiagnosticInfo.h"
		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
▲ Show 20 Lines • Show All 1,215 Lines • ▼ Show 20 Lines	PreservedAnalyses LoopUnrollPass::run(Function &F,
FunctionAnalysisManager &AM) {		FunctionAnalysisManager &AM) {
auto &SE = AM.getResult<ScalarEvolutionAnalysis>(F);		auto &SE = AM.getResult<ScalarEvolutionAnalysis>(F);
auto &LI = AM.getResult<LoopAnalysis>(F);		auto &LI = AM.getResult<LoopAnalysis>(F);
auto &TTI = AM.getResult<TargetIRAnalysis>(F);		auto &TTI = AM.getResult<TargetIRAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &AC = AM.getResult<AssumptionAnalysis>(F);		auto &AC = AM.getResult<AssumptionAnalysis>(F);
auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);		auto &ORE = AM.getResult<OptimizationRemarkEmitterAnalysis>(F);

		const ModuleAnalysisManager &MAM =
		AM.getResult<ModuleAnalysisManagerFunctionProxy>(F).getManager();
		ProfileSummaryInfo *PSI =
		MAM.getCachedResult<ProfileSummaryAnalysis>(*F.getParent());

bool Changed = false;		bool Changed = false;

// The unroller requires loops to be in simplified form, and also needs LCSSA.		// The unroller requires loops to be in simplified form, and also needs LCSSA.
// Since simplification may add new inner loops, it has to run before the		// Since simplification may add new inner loops, it has to run before the
// legality and profitability checks. This means running the loop unroller		// legality and profitability checks. This means running the loop unroller
// will simplify all loops, regardless of whether anything end up being		// will simplify all loops, regardless of whether anything end up being
// unrolled.		// unrolled.
for (auto &L : LI) {		for (auto &L : LI) {
Show All 12 Lines
#ifndef NDEBUG		#ifndef NDEBUG
Loop *ParentL = L.getParentLoop();		Loop *ParentL = L.getParentLoop();
#endif		#endif

// The API here is quite complex to call, but there are only two interesting		// The API here is quite complex to call, but there are only two interesting
// states we support: partial and full (or "simple") unrolling. However, to		// states we support: partial and full (or "simple") unrolling. However, to
// enable these things we actually pass "None" in for the optional to avoid		// enable these things we actually pass "None" in for the optional to avoid
// providing an explicit choice.		// providing an explicit choice.
Optional<bool> AllowPartialParam, RuntimeParam, UpperBoundParam;		Optional<bool> AllowPartialParam, RuntimeParam, UpperBoundParam,
bool CurChanged = tryToUnrollLoop(		AllowPeeling;
&L, DT, &LI, SE, TTI, AC, ORE,		// Check if the profile summary indicates that the profiled application
		// has a huge working set size, in which case we disable peeling to avoid
		// bloating it further.
		if (PSI && PSI->hasHugeWorkingSetSize())
		AllowPeeling = false;
		bool CurChanged =
		tryToUnrollLoop(&L, DT, &LI, SE, TTI, AC, ORE,
/PreserveLCSSA/ true, OptLevel, /Count/ None,		/PreserveLCSSA/ true, OptLevel, /Count/ None,
/Threshold/ None, AllowPartialParam, RuntimeParam, UpperBoundParam,		/Threshold/ None, AllowPartialParam, RuntimeParam,
/AllowPeeling/ None);		UpperBoundParam, AllowPeeling);
Changed \|= CurChanged;		Changed \|= CurChanged;

// The parent must not be damaged by unrolling!		// The parent must not be damaged by unrolling!
#ifndef NDEBUG		#ifndef NDEBUG
if (CurChanged && ParentL)		if (CurChanged && ParentL)
ParentL->verifyLoop();		ParentL->verifyLoop();
#endif		#endif
}		}

if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return getLoopPassPreservedAnalyses();		return getLoopPassPreservedAnalyses();
}		}

llvm/trunk/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: SLPVectorizerPass			; CHECK-O-NEXT: Running pass: SLPVectorizerPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
				; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifierPass			; CHECK-O-NEXT: Running pass: InstSimplifierPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	Show All 36 Lines

llvm/trunk/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 179 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BranchProbabilityAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopLoadEliminationPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: LoopAccessAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: SLPVectorizerPass			; CHECK-POSTLINK-O-NEXT: Running pass: SLPVectorizerPass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
				; CHECK-POSTLINK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifierPass			; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifierPass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run.
	Show All 37 Lines

llvm/trunk/test/Transforms/LoopUnroll/peel-loop-pgo.ll

	; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll 2>&1 \| FileCheck %s			; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll 2>&1 \| FileCheck %s
	; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<opt-remark-emit>,unroll' 2>&1 \| FileCheck %s			; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,unroll)' 2>&1 \| FileCheck %s
				; Confirm that peeling is disabled if the number of counts required to reach
				; the hot percentile is above the threshold.
				; RUN: opt < %s -S -profile-summary-huge-working-set-size-threshold=9 -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,unroll)' 2>&1 \| FileCheck %s --check-prefix=NOPEEL
	; REQUIRES: asserts			; REQUIRES: asserts

	; Make sure we use the profile information correctly to peel-off 3 iterations			; Make sure we use the profile information correctly to peel-off 3 iterations
	; from the loop, and update the branch weights for the peeled loop properly.			; from the loop, and update the branch weights for the peeled loop properly.

	; CHECK: Loop Unroll: F[basic]			; CHECK: Loop Unroll: F[basic]
	; CHECK: PEELING loop %for.body with iteration count 3!			; CHECK: PEELING loop %for.body with iteration count 3!
	; CHECK: Loop Unroll: F[optsize]			; CHECK: Loop Unroll: F[optsize]
	; CHECK-NOT: PEELING			; CHECK-NOT: PEELING

	; Confirm that no peeling occurs when we are performing full unrolling.			; Confirm that no peeling occurs when we are performing full unrolling.
	; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<opt-remark-emit>,loop(unroll-full)' 2>&1 \| FileCheck %s --check-prefix=FULLUNROLL			; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<opt-remark-emit>,loop(unroll-full)' 2>&1 \| FileCheck %s --check-prefix=NOPEEL
	; FULLUNROLL-NOT: PEELING			; NOPEEL-NOT: PEELING

	; CHECK-LABEL: @basic			; CHECK-LABEL: @basic
	; CHECK: br i1 %{{.}}, label %[[NEXT0:.]], label %for.cond.for.end_crit_edge, !prof !1			; CHECK: br i1 %{{.}}, label %[[NEXT0:.]], label %for.cond.for.end_crit_edge, !prof !15
	; CHECK: [[NEXT0]]:			; CHECK: [[NEXT0]]:
	; CHECK: br i1 %{{.}}, label %[[NEXT1:.]], label %for.cond.for.end_crit_edge, !prof !2			; CHECK: br i1 %{{.}}, label %[[NEXT1:.]], label %for.cond.for.end_crit_edge, !prof !16
	; CHECK: [[NEXT1]]:			; CHECK: [[NEXT1]]:
	; CHECK: br i1 %{{.}}, label %[[NEXT2:.]], label %for.cond.for.end_crit_edge, !prof !3			; CHECK: br i1 %{{.}}, label %[[NEXT2:.]], label %for.cond.for.end_crit_edge, !prof !17
	; CHECK: [[NEXT2]]:			; CHECK: [[NEXT2]]:
	; CHECK: br i1 %{{.}}, label %for.body, label %{{.}}, !prof !4			; CHECK: br i1 %{{.}}, label %for.body, label %{{.}}, !prof !18

	define void @basic(i32* %p, i32 %k) #0 !prof !0 {			define void @basic(i32* %p, i32 %k) #0 !prof !15 {
	entry:			entry:
	%cmp3 = icmp slt i32 0, %k			%cmp3 = icmp slt i32 0, %k
	br i1 %cmp3, label %for.body.lr.ph, label %for.end			br i1 %cmp3, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body.lr.ph, %for.body			for.body: ; preds = %for.body.lr.ph, %for.body
	%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]			%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
	%p.addr.04 = phi i32* [ %p, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]			%p.addr.04 = phi i32* [ %p, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]
	%incdec.ptr = getelementptr inbounds i32, i32* %p.addr.04, i32 1			%incdec.ptr = getelementptr inbounds i32, i32* %p.addr.04, i32 1
	store i32 %i.05, i32* %p.addr.04, align 4			store i32 %i.05, i32* %p.addr.04, align 4
	%inc = add nsw i32 %i.05, 1			%inc = add nsw i32 %i.05, 1
	%cmp = icmp slt i32 %inc, %k			%cmp = icmp slt i32 %inc, %k
	br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge, !prof !1			br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge, !prof !16

	for.cond.for.end_crit_edge: ; preds = %for.body			for.cond.for.end_crit_edge: ; preds = %for.body
	br label %for.end			br label %for.end

	for.end: ; preds = %for.cond.for.end_crit_edge, %entry			for.end: ; preds = %for.cond.for.end_crit_edge, %entry
	ret void			ret void
	}			}

	; We don't want to peel loops when optimizing for size.			; We don't want to peel loops when optimizing for size.
	; CHECK-LABEL: @optsize			; CHECK-LABEL: @optsize
	; CHECK: for.body.lr.ph:			; CHECK: for.body.lr.ph:
	; CHECK-NEXT: br label %for.body			; CHECK-NEXT: br label %for.body
	; CHECK: for.body:			; CHECK: for.body:
	; CHECK-NOT: br			; CHECK-NOT: br
	; CHECK: br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge			; CHECK: br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge
	define void @optsize(i32* %p, i32 %k) #1 !prof !0 {			define void @optsize(i32* %p, i32 %k) #1 !prof !15 {
	entry:			entry:
	%cmp3 = icmp slt i32 0, %k			%cmp3 = icmp slt i32 0, %k
	br i1 %cmp3, label %for.body.lr.ph, label %for.end			br i1 %cmp3, label %for.body.lr.ph, label %for.end

	for.body.lr.ph: ; preds = %entry			for.body.lr.ph: ; preds = %entry
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body.lr.ph, %for.body			for.body: ; preds = %for.body.lr.ph, %for.body
	%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]			%i.05 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
	%p.addr.04 = phi i32* [ %p, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]			%p.addr.04 = phi i32* [ %p, %for.body.lr.ph ], [ %incdec.ptr, %for.body ]
	%incdec.ptr = getelementptr inbounds i32, i32* %p.addr.04, i32 1			%incdec.ptr = getelementptr inbounds i32, i32* %p.addr.04, i32 1
	store i32 %i.05, i32* %p.addr.04, align 4			store i32 %i.05, i32* %p.addr.04, align 4
	%inc = add nsw i32 %i.05, 1			%inc = add nsw i32 %i.05, 1
	%cmp = icmp slt i32 %inc, %k			%cmp = icmp slt i32 %inc, %k
	br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge, !prof !1			br i1 %cmp, label %for.body, label %for.cond.for.end_crit_edge, !prof !16

	for.cond.for.end_crit_edge: ; preds = %for.body			for.cond.for.end_crit_edge: ; preds = %for.body
	br label %for.end			br label %for.end

	for.end: ; preds = %for.cond.for.end_crit_edge, %entry			for.end: ; preds = %for.cond.for.end_crit_edge, %entry
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }
	attributes #1 = { nounwind optsize }			attributes #1 = { nounwind optsize }

	!0 = !{!"function_entry_count", i64 1}			!llvm.module.flags = !{!1}
	!1 = !{!"branch_weights", i32 3001, i32 1001}

	;CHECK: !1 = !{!"branch_weights", i32 900, i32 101}			!1 = !{i32 1, !"ProfileSummary", !2}
	;CHECK: !2 = !{!"branch_weights", i32 540, i32 360}			!2 = !{!3, !4, !5, !6, !7, !8, !9, !10}
	;CHECK: !3 = !{!"branch_weights", i32 162, i32 378}			!3 = !{!"ProfileFormat", !"InstrProf"}
	;CHECK: !4 = !{!"branch_weights", i32 1399, i32 162}			!4 = !{!"TotalCount", i64 10}
				!5 = !{!"MaxCount", i64 3}
				!6 = !{!"MaxInternalCount", i64 1}
				!7 = !{!"MaxFunctionCount", i64 3}
				!8 = !{!"NumCounts", i64 2}
				!9 = !{!"NumFunctions", i64 2}
				!10 = !{!"DetailedSummary", !11}
				!11 = !{!12, !13, !14}
				!12 = !{i32 10000, i64 3, i32 2}
				!13 = !{i32 999000, i64 1, i32 10}
				!14 = !{i32 999999, i64 1, i32 10}
				!15 = !{!"function_entry_count", i64 1}
				!16 = !{!"branch_weights", i32 3001, i32 1001}

				;CHECK: !15 = !{!"branch_weights", i32 900, i32 101}
				;CHECK: !16 = !{!"branch_weights", i32 540, i32 360}
				;CHECK: !17 = !{!"branch_weights", i32 162, i32 378}
				;CHECK: !18 = !{!"branch_weights", i32 1399, i32 162}