This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
4/12
InductiveRangeCheckElimination.cpp
-
test/Transforms/IRCE/
-
Transforms/
-
IRCE/
2
low-iterations.ll

Differential D89541

[IRCE] Do not transform is loop has small number of iterations
ClosedPublic

Authored by skatkov on Oct 16 2020, 4:23 AM.

Download Raw Diff

Details

Reviewers

ebrevnov
dantrushin
asbirlea
mkazantsev

Commits

rG38799975ceb2: [IRCE] Do not transform if loop has small number of iterations

Summary

IRCE has some overhead for runtime checks and in case number of iteration is small
the overhead can kill the benefit from optimizations.

This CL bases on BlockFrequencyInfo of pre-header and header to estimate the
number of loop iterations. If it is less than irce-min-estimated-iters we do not transform the loop.

Probably it is better to make more complex cost model but for simplicity it seems the be enough.

The usage of BFI is added only for new pass manager and tries to use it efficiently.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

skatkov created this revision.Oct 16 2020, 4:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2020, 4:23 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

skatkov requested review of this revision.Oct 16 2020, 4:23 AM

mkazantsev added inline comments.Oct 18 2020, 9:15 PM

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
120	I think the name should hint that this is profile-based, e.g. "MinRuntimeLoopIterations" or so. We also might want to have another boolean flag to turn this check off, because some users of IRCE might not have a profile.
244	Why not `Optional<BlockFrequencyInfo>`?
1799	If you stop updating this thing after pre-opt preparations, does it make sense to limit its scope? There is a temptation to use `CFGChanged` variable in the end to not drop some analysis if it did not happen.
1911	profitability
1912	"the estimated number of iterations basing on frequency info"?
llvm/test/Transforms/IRCE/low-iterations.ll
2	Pass the frequency parameter value directly rather than rely on the default value.

skatkov added inline comments.Oct 18 2020, 9:36 PM

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
120	The value 0 switches off this check.
244	I'd like to avoid generation of BFI result if I will not use it, So it is a lazy accessor to BFI.

mkazantsev added inline comments.Oct 18 2020, 10:06 PM

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
120	Okay.

I think it makes sense to introduce the proposed heuristic since there is no easy way to estimate cost of the code generated by SCEVExpander at the moment.

FYI, there is on going work (https://reviews.llvm.org/D75980) which enables vectorizer to evaluate cost of the code SCEVExpander "would" generate. Looks like there is an additional use case for that. Also this may be useful for LoopPredication since it does similar thing to IRCE. Taking that we might need to make this functionality publicly available. @fhahn what do you think?

Thanks
Evgeniy

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905	There is an existing utility function inside the vectorizer (getSmallBestKnownTC) which estimates loop's trip count. Can this be used (if made public) instead of BFI?

ebrevnov added a subscriber: fhahn.Oct 18 2020, 10:32 PM

skatkov added inline comments.Oct 19 2020, 2:43 AM

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905	The utility you mentioned uses SE and BPI on latch only to estimate the trip count. IRCE has already estimation basing on BPI of latch. It is not enough if the main exit from the loop is not a latch like in the test I've added to this CL. May be for vectorizer it is ok to be based on latch but not for IRCE.

Handled Max's comments.

LGTM with a nit.

llvm/test/Transforms/IRCE/low-iterations.ll
6	`CHECK-NOT-NO`?

This revision is now accepted and ready to land.Oct 19 2020, 4:23 AM

ebrevnov added inline comments.Oct 19 2020, 6:56 AM

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905	The utility you mentioned uses SE That's right, currently it does use of SE. Since it aims at returning best known TC I don't see any problems to make SE optional. and BPI on latch only to estimate the trip count. In fact, it doesn't use BPI. It checks for profile info directly. I take this another reason to have one source of estimation for loop's TC. If BPI available it would be preferable to relay on it instead of profile info. If BFI available, there is no need to use BPI since it is completely based on BPI. If we know exact trip count from SE know need to use BFI. IRCE has already estimation basing on BPI of latch. It is not enough if the main exit from the loop is not a latch like in the test I've added to this CL. If the above suggestion looks too complex I would still want us to merge the logic at least inside IRCE. If BFI is available I don't see why we should even try to do an estimation based on BPI for the latch inside parseLoopStructure. May be for vectorizer it is ok to be based on latch but not for IRCE.

skatkov added inline comments.Oct 19 2020, 8:02 PM

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp
1905	I'll do a follow-up patch for this.

Closed by commit rG38799975ceb2: [IRCE] Do not transform if loop has small number of iterations (authored by skatkov). · Explain WhyOct 19 2020, 8:42 PM

This revision was automatically updated to reflect the committed changes.

skatkov added a commit: rG38799975ceb2: [IRCE] Do not transform if loop has small number of iterations.

skatkov mentioned this in D89773: [IRCE] consolidate profitability check.Oct 21 2020, 7:53 PM

skatkov mentioned this in rG75d0e0cd5f4c: [IRCE] consolidate profitability check.Oct 21 2020, 9:45 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

InductiveRangeCheckElimination.cpp

51 lines

test/

Transforms/

IRCE/

low-iterations.ll

43 lines

Diff 299251

llvm/lib/Transforms/Scalar/InductiveRangeCheckElimination.cpp

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/PriorityWorklist.h"		#include "llvm/ADT/PriorityWorklist.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/Analysis/LoopAnalysisManager.h"		#include "llvm/Analysis/LoopAnalysisManager.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	static cl::opt<bool> PrintRangeChecks("irce-print-range-checks", cl::Hidden,
cl::init(false));		cl::init(false));

static cl::opt<int> MaxExitProbReciprocal("irce-max-exit-prob-reciprocal",		static cl::opt<int> MaxExitProbReciprocal("irce-max-exit-prob-reciprocal",
cl::Hidden, cl::init(10));		cl::Hidden, cl::init(10));

static cl::opt<bool> SkipProfitabilityChecks("irce-skip-profitability-checks",		static cl::opt<bool> SkipProfitabilityChecks("irce-skip-profitability-checks",
cl::Hidden, cl::init(false));		cl::Hidden, cl::init(false));

		static cl::opt<unsigned> MinRuntimeIterations("min-runtime-iterations",
		cl::Hidden, cl::init(3));
		mkazantsevUnsubmitted Not Done Reply Inline Actions I think the name should hint that this is profile-based, e.g. "MinRuntimeLoopIterations" or so. We also might want to have another boolean flag to turn this check off, because some users of IRCE might not have a profile. mkazantsev: I think the name should hint that this is profile-based, e.g. "MinRuntimeLoopIterations" or so.
		skatkovAuthorUnsubmitted Done Reply Inline Actions The value 0 switches off this check. skatkov: The value 0 switches off this check.
		mkazantsevUnsubmitted Not Done Reply Inline Actions Okay. mkazantsev: Okay.

static cl::opt<bool> AllowUnsignedLatchCondition("irce-allow-unsigned-latch",		static cl::opt<bool> AllowUnsignedLatchCondition("irce-allow-unsigned-latch",
cl::Hidden, cl::init(true));		cl::Hidden, cl::init(true));

static cl::opt<bool> AllowNarrowLatchCondition(		static cl::opt<bool> AllowNarrowLatchCondition(
"irce-allow-narrow-latch", cl::Hidden, cl::init(true),		"irce-allow-narrow-latch", cl::Hidden, cl::init(true),
cl::desc("If set to true, IRCE may eliminate wide range checks in loops "		cl::desc("If set to true, IRCE may eliminate wide range checks in loops "
"with narrow latch condition."));		"with narrow latch condition."));

▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
};		};

class InductiveRangeCheckElimination {		class InductiveRangeCheckElimination {
ScalarEvolution &SE;		ScalarEvolution &SE;
BranchProbabilityInfo *BPI;		BranchProbabilityInfo *BPI;
DominatorTree &DT;		DominatorTree &DT;
LoopInfo &LI;		LoopInfo &LI;

		using GetBFIFunc =
		llvm::Optional<llvm::function_ref<llvm::BlockFrequencyInfo &()> >;
		GetBFIFunc GetBFI;
		mkazantsevUnsubmitted Not Done Reply Inline Actions Why not `Optional<BlockFrequencyInfo>`? mkazantsev: Why not `Optional<BlockFrequencyInfo>`?
		skatkovAuthorUnsubmitted Done Reply Inline Actions I'd like to avoid generation of BFI result if I will not use it, So it is a lazy accessor to BFI. skatkov: I'd like to avoid generation of BFI result if I will not use it, So it is a lazy accessor to…

public:		public:
InductiveRangeCheckElimination(ScalarEvolution &SE,		InductiveRangeCheckElimination(ScalarEvolution &SE,
BranchProbabilityInfo *BPI, DominatorTree &DT,		BranchProbabilityInfo *BPI, DominatorTree &DT,
LoopInfo &LI)		LoopInfo &LI, GetBFIFunc GetBFI = None)
: SE(SE), BPI(BPI), DT(DT), LI(LI) {}		: SE(SE), BPI(BPI), DT(DT), LI(LI), GetBFI(GetBFI) {}

bool run(Loop L, function_ref<void(Loop , bool)> LPMAddNewLoop);		bool run(Loop L, function_ref<void(Loop , bool)> LPMAddNewLoop);
};		};

class IRCELegacyPass : public FunctionPass {		class IRCELegacyPass : public FunctionPass {
public:		public:
static char ID;		static char ID;

▲ Show 20 Lines • Show All 1,516 Lines • ▼ Show 20 Lines
}		}

PreservedAnalyses IRCEPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses IRCEPass::run(Function &F, FunctionAnalysisManager &AM) {
auto &SE = AM.getResult<ScalarEvolutionAnalysis>(F);		auto &SE = AM.getResult<ScalarEvolutionAnalysis>(F);
auto &DT = AM.getResult<DominatorTreeAnalysis>(F);		auto &DT = AM.getResult<DominatorTreeAnalysis>(F);
auto &BPI = AM.getResult<BranchProbabilityAnalysis>(F);		auto &BPI = AM.getResult<BranchProbabilityAnalysis>(F);
LoopInfo &LI = AM.getResult<LoopAnalysis>(F);		LoopInfo &LI = AM.getResult<LoopAnalysis>(F);

InductiveRangeCheckElimination IRCE(SE, &BPI, DT, LI);		// Get BFI analysis result on demand. Please note that modification of
		// CFG invalidates this analysis and we should handle it.
		auto getBFI = [&F, &AM ]()->BlockFrequencyInfo & {
		return AM.getResult<BlockFrequencyAnalysis>(F);
		};
		InductiveRangeCheckElimination IRCE(SE, &BPI, DT, LI, { getBFI });

bool Changed = false;		bool Changed = false;
		{
		bool CFGChanged = false;
for (const auto &L : LI) {		for (const auto &L : LI) {
Changed \|= simplifyLoop(L, &DT, &LI, &SE, nullptr, nullptr,		CFGChanged \|= simplifyLoop(L, &DT, &LI, &SE, nullptr, nullptr,
/PreserveLCSSA=/false);		/PreserveLCSSA=/false);
Changed \|= formLCSSARecursively(*L, DT, &LI, &SE);		Changed \|= formLCSSARecursively(*L, DT, &LI, &SE);
}		}
		Changed \|= CFGChanged;

		mkazantsevUnsubmitted Not Done Reply Inline Actions If you stop updating this thing after pre-opt preparations, does it make sense to limit its scope? There is a temptation to use `CFGChanged` variable in the end to not drop some analysis if it did not happen. mkazantsev: If you stop updating this thing after pre-opt preparations, does it make sense to limit its…
		if (CFGChanged && !SkipProfitabilityChecks)
		AM.invalidate<BlockFrequencyAnalysis>(F);
		}

SmallPriorityWorklist<Loop *, 4> Worklist;		SmallPriorityWorklist<Loop *, 4> Worklist;
appendLoopsToWorklist(LI, Worklist);		appendLoopsToWorklist(LI, Worklist);
auto LPMAddNewLoop = [&Worklist](Loop *NL, bool IsSubloop) {		auto LPMAddNewLoop = [&Worklist](Loop *NL, bool IsSubloop) {
if (!IsSubloop)		if (!IsSubloop)
appendLoopsToWorklist(*NL, Worklist);		appendLoopsToWorklist(*NL, Worklist);
};		};

while (!Worklist.empty()) {		while (!Worklist.empty()) {
Loop *L = Worklist.pop_back_val();		Loop *L = Worklist.pop_back_val();
Changed \|= IRCE.run(L, LPMAddNewLoop);		if (IRCE.run(L, LPMAddNewLoop)) {
		Changed = true;
		if (!SkipProfitabilityChecks)
		AM.invalidate<BlockFrequencyAnalysis>(F);
		}
}		}

if (!Changed)		if (!Changed)
return PreservedAnalyses::all();		return PreservedAnalyses::all();
return getLoopPassPreservedAnalyses();		return getLoopPassPreservedAnalyses();
}		}

bool IRCELegacyPass::runOnFunction(Function &F) {		bool IRCELegacyPass::runOnFunction(Function &F) {
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	bool InductiveRangeCheckElimination::run(
Optional<LoopStructure> MaybeLoopStructure =		Optional<LoopStructure> MaybeLoopStructure =
LoopStructure::parseLoopStructure(SE, BPI, *L, FailureReason);		LoopStructure::parseLoopStructure(SE, BPI, *L, FailureReason);
if (!MaybeLoopStructure.hasValue()) {		if (!MaybeLoopStructure.hasValue()) {
LLVM_DEBUG(dbgs() << "irce: could not parse loop structure: "		LLVM_DEBUG(dbgs() << "irce: could not parse loop structure: "
<< FailureReason << "\n";);		<< FailureReason << "\n";);
return false;		return false;
}		}
LoopStructure LS = MaybeLoopStructure.getValue();		LoopStructure LS = MaybeLoopStructure.getValue();
		// Profitability check.
		if (!SkipProfitabilityChecks && GetBFI.hasValue()) {
		ebrevnovUnsubmitted Not Done Reply Inline Actions There is an existing utility function inside the vectorizer (getSmallBestKnownTC) which estimates loop's trip count. Can this be used (if made public) instead of BFI? ebrevnov: There is an existing utility function inside the vectorizer (getSmallBestKnownTC) which…
		skatkovAuthorUnsubmitted Done Reply Inline Actions The utility you mentioned uses SE and BPI on latch only to estimate the trip count. IRCE has already estimation basing on BPI of latch. It is not enough if the main exit from the loop is not a latch like in the test I've added to this CL. May be for vectorizer it is ok to be based on latch but not for IRCE. skatkov: The utility you mentioned uses SE and BPI on latch only to estimate the trip count. IRCE has…
		ebrevnovUnsubmitted Not Done Reply Inline Actions The utility you mentioned uses SE That's right, currently it does use of SE. Since it aims at returning best known TC I don't see any problems to make SE optional. and BPI on latch only to estimate the trip count. In fact, it doesn't use BPI. It checks for profile info directly. I take this another reason to have one source of estimation for loop's TC. If BPI available it would be preferable to relay on it instead of profile info. If BFI available, there is no need to use BPI since it is completely based on BPI. If we know exact trip count from SE know need to use BFI. IRCE has already estimation basing on BPI of latch. It is not enough if the main exit from the loop is not a latch like in the test I've added to this CL. If the above suggestion looks too complex I would still want us to merge the logic at least inside IRCE. If BFI is available I don't see why we should even try to do an estimation based on BPI for the latch inside parseLoopStructure. May be for vectorizer it is ok to be based on latch but not for IRCE. ebrevnov: > The utility you mentioned uses SE That's right, currently it does use of SE. Since it aims at…
		skatkovAuthorUnsubmitted Done Reply Inline Actions I'll do a follow-up patch for this. skatkov: I'll do a follow-up patch for this.
		BlockFrequencyInfo &BFI = (*GetBFI)();
		uint64_t hFreq = BFI.getBlockFreq(LS.Header).getFrequency();
		uint64_t phFreq = BFI.getBlockFreq(Preheader).getFrequency();
		if (phFreq != 0 && hFreq != 0 && (hFreq / phFreq < MinRuntimeIterations)) {
		LLVM_DEBUG(dbgs() << "irce: could not prove profitability: "
		<< "the estimated number of iterations basing on "
		mkazantsevUnsubmitted Not Done Reply Inline Actions profitability mkazantsev: profitability
		"frequency info is " << (hFreq / phFreq) << "\n";);
		mkazantsevUnsubmitted Not Done Reply Inline Actions "the estimated number of iterations basing on frequency info"? mkazantsev: "the estimated number of iterations basing on frequency info"?
		return false;
		}
		}
const SCEVAddRecExpr *IndVar =		const SCEVAddRecExpr *IndVar =
cast<SCEVAddRecExpr>(SE.getMinusSCEV(SE.getSCEV(LS.IndVarBase), SE.getSCEV(LS.IndVarStep)));		cast<SCEVAddRecExpr>(SE.getMinusSCEV(SE.getSCEV(LS.IndVarBase), SE.getSCEV(LS.IndVarStep)));

Optional<InductiveRangeCheck::Range> SafeIterRange;		Optional<InductiveRangeCheck::Range> SafeIterRange;
Instruction *ExprInsertPt = Preheader->getTerminator();		Instruction *ExprInsertPt = Preheader->getTerminator();

SmallVector<InductiveRangeCheck, 4> RangeChecksToEliminate;		SmallVector<InductiveRangeCheck, 4> RangeChecksToEliminate;
// Basing on the type of latch predicate, we interpret the IV iteration range		// Basing on the type of latch predicate, we interpret the IV iteration range
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/Transforms/IRCE/low-iterations.ll

This file was added.

				; RUN: opt -verify-loop-info -irce-print-changed-loops -passes=irce -min-runtime-iterations=3 < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-NO
				; RUN: opt -verify-loop-info -irce-print-changed-loops -passes=irce -min-runtime-iterations=0 < %s 2>&1 \| FileCheck %s --check-prefixes=CHECK-YES
				mkazantsevUnsubmitted Not Done Reply Inline Actions Pass the frequency parameter value directly rather than rely on the default value. mkazantsev: Pass the frequency parameter value directly rather than rely on the default value.

				; CHECK-YES: constrained Loop
				; CHECK-NO-NOT: constrained Loop

				mkazantsevUnsubmitted Not Done Reply Inline Actions `CHECK-NOT-NO`? mkazantsev: `CHECK-NOT-NO`?
				define i32 @multiple_access_no_preloop(
				i32* %arr_a, i32* %a_len_ptr, i32* %arr_b, i32* %b_len_ptr, i32 %n) {

				entry:
				%len.a = load i32, i32* %a_len_ptr, !range !0
				%first.itr.check = icmp sgt i32 %n, 0
				br i1 %first.itr.check, label %loop, label %exit, !prof !1

				loop:
				%idx = phi i32 [ 0, %entry ] , [ %idx.next, %backedge ]
				%idx.next = add i32 %idx, 1
				%abc.a = icmp slt i32 %idx, %len.a
				br i1 %abc.a, label %in.bounds.a, label %exit, !prof !2

				in.bounds.a:
				%addr.a = getelementptr i32, i32* %arr_a, i32 %idx
				%val = load i32, i32* %addr.a
				%cond = icmp ne i32 %val, 0
				; Most probable exit from a loop.
				br i1 %cond, label %found, label %backedge, !prof !3

				backedge:
				%next = icmp slt i32 %idx.next, %n
				br i1 %next, label %loop, label %exit, !prof !4

				found:
				ret i32 %val

				exit:
				ret i32 0
				}

				!0 = !{i32 0, i32 2147483647}
				!1 = !{!"branch_weights", i32 1024, i32 1}
				!2 = !{!"branch_weights", i32 512, i32 1}
				!3 = !{!"branch_weights", i32 1, i32 2}
				!4 = !{!"branch_weights", i32 512, i32 1}