This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnswitch] Add block frequency analysis to recognize hot/cold regions
ClosedPublic

Authored by • chenli on Jul 29 2015, 12:02 PM.

Download Raw Diff

Details

Reviewers

broune
dnovillo
reames
silvas

Commits

rG9f27fc0599c6: [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions
rL248777: [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions

Summary

This patch adds block frequency analysis to LoopUnswitch pass to recognize hot/cold regions. For cold regions the pass only performs trivial unswitches since they do not increase code size, and for hot regions everything works as before. This helps to minimize code growth in cold regions and be more aggressive in hot regions. Currently the default cold regions are blocks with frequencies below 20% of function entry frequency, and it can be adjusted via -loop-unswitch-cold-block-frequency flag. The entire feature is controlled via -loop-unswitch-with-block-frequency flag and it is off by default.

Diff Detail

Event Timeline

• chenli updated this revision to Diff 30937.Jul 29 2015, 12:02 PM

• chenli retitled this revision from to [LoopUnswitch] Add block frequency analysis to recognize hot/cold regions.

• chenli updated this object.

• chenli added reviewers: reames, broune.

• chenli added a subscriber: llvm-commits.

• chenli added inline comments.Jul 29 2015, 12:20 PM

lib/Transforms/Scalar/LoopUnswitch.cpp
496	I chose to use loop preheader's frequency for coldness comparison. This will ignore the size of the loop because it does not count backedges or iteration numbers. My reasoning is that if the loop was only entered once in the pass 1000 executions, we should be confident that it will never be entered again. Therefore, no matter how many times the loop iterates, it shouldn't affect our decision. But I could see the other way around. I am open to change it if anyone prefers the other way.

reames requested changes to this revision.Jul 31 2015, 4:25 PM

reames edited edge metadata.

reames added inline comments.

lib/Transforms/Scalar/LoopUnswitch.cpp
76	What are your plans for testing this and getting it enabled by default?
82	Have you explored this parameter at all? My guess would be that disabling loop unswitching should only be done for really cold loops. I have no evidence, but a cool loop in a really hot function seems like something we should unswitch.
433	Minor comment clarification: "function entry baseline frequency". "Loops with headers below this frequency"
496	I feel pretty strongly that making decisions based on the number of times the loop executes rather than the number of times the header executes is likely to produce negative results. Consider: while(true) { if (loop invariant) { } if (loop invariant2) { } if (loop invariant3) { } do work(); } Your change would completely disable unswitching in this loop which seems non-ideal.
498	Given there's already handling for OptForSize within UnswitchIfProfitable, I'd like to see the handling combined. A cleanup change to move the OptForSize check here - I think that's NFC right? - and then using that in this patch would help.
test/Transforms/LoopUnswitch/cold-loop.ll
21	This is perhaps a bad example. In this particular case, we could split the loop into two subloops without code growth. Maybe tweak your example so it's more obvious why we want to avoid unswitching this?

This revision now requires changes to proceed.Jul 31 2015, 4:25 PM

reames added reviewers: silvas, dnovillo.Jul 31 2015, 4:25 PM

The main problem with the patch is that the cold loop detection can have many false positives. With profile feedback, cold loops can be reliably detected but not with the method described in this patch. Without profile feedback, the cold loop filtering should probably not be turned on by default unless it is optimized for size. When the filtering is not on, we should not pay the compile time penalty of computing BFI unconditionally either. Cong has recently made it possible to compute BFI without relying on running the wrapper pass.

lib/Transforms/Scalar/LoopUnswitch.cpp
496	The hotness of the loop should be measured by the frequency of the hottest BB in the loop relative to the function entry. With PGO, you will need to check the execution count of the hottest BB in the loop (relative to some threshold set by the program summary).
test/Transforms/LoopUnswitch/cold-loop.ll
21	Unswitching duplicates the loop, so size overhead is the loop setup code (in this case probably just a branch associated with backedge).

In D11605#216478, @davidxl wrote:

The main problem with the patch is that the cold loop detection can have many false positives. With profile feedback, cold loops can be reliably detected but not with the method described in this patch.

Do you mean by using other profiles together with BFI or using BFI differently?

Without profile feedback, the cold loop filtering should probably not be turned on by default unless it is optimized for size.

Yes, you are correct. Initially I was thinking this could take cold loop's unswitch budget to hot loop and thus make hot loop unswitch more aggressively. But it turned out separate loops will have the same initial unswitch budget and not effect each other. So it will only optimize code size.

When the filtering is not on, we should not pay the compile time penalty of computing BFI unconditionally either. Cong has recently made it possible to compute BFI without relying on running the wrapper pass.

Do you mean this patch http://reviews.llvm.org/D11196 ?

lib/Transforms/Scalar/LoopUnswitch.cpp
82	I was thinking about that too. Maybe adding function hotness to its loops is a good idea, ie. lower this threshold for hot functions and increase it for cold functions.
496	I think the hottest BB should be the header, assuming its inner loops will be processed by their own unswitch passes.
498	I will have a look at this.
test/Transforms/LoopUnswitch/cold-loop.ll
21	Yes, this is not a good case. I will add some side-effect code after the first condition (so the first one is still trivial and not affected by coldness check, but unswitching the second one will grow code size).

In D11605#216478, @davidxl wrote:

The main problem with the patch is that the cold loop detection can have many false positives. With profile feedback, cold loops can be reliably detected but not with the method described in this patch.

By the correct cold loop detection, do you specifically mean to check the execution count of the hottest BB in the loop?

Update patch to fix the following issues:

Use standalone BFI without relying on the wrapper pass.
Use the hottest block (loop header in this case) to determine the loop's hotness.
Update the non-trivial condition test case to have effect of code growth.

I also plan to have a follow-up patch to introduce function_entry_count to hot/cold region detection.

ping ...

Minor comments below. Once addressed, I plan to give a LGTM unless David has further comments in the meantime.

lib/Transforms/Scalar/LoopUnswitch.cpp
75	This comment is a bit deceptive. I'd just drop it.
82	Right now, we're only limiting code size increase. Please adjust the string.
86	I think the default here should be much lower, but we can tune this in separate changes. My guess here would be something like 1/100 rather than 20/100. I also think we should have an absolute cutoff, but again, we can tune separately.
174	Add a comment which says what these are used for and that they're only used with the PGO option enabled.
test/Transforms/LoopUnswitch/cold-loop.ll
40	Please add a couple of instructions here per previous comment on test case.

davidxl added inline comments.Sep 10 2015, 11:32 AM

lib/Transforms/Scalar/LoopUnswitch.cpp
86	This is not frequency. Maybe call it 'ColdnessThreshold' or something.
87	It is the 'Coldness threshold in percent. The loop header frequency (relative to the entry frequency) is compared with the threshold to determine if non-trivial unswitching should be enabled.'
446	Note that this also triggers the check when PGO is not enabled (e.g, with static prediction). To check if PGO is on, do: HasProfileData = F.getEntryCount().hasValue(); if (HasProfileData) { ..

• chenli added inline comments.Sep 10 2015, 11:48 AM

lib/Transforms/Scalar/LoopUnswitch.cpp
446	Do you mean that without PGO enabled, this should never be triggered?

Right. I won't trust static profile data to tell you what is cold or hot.

In fact I think the right way to determine coldness is to use the
actual profile count of the loop header to determine loop is cold or
not -- the actual exec count of a loop header can be computed by
multiplying the relative frequency with the the F.getEntryCount(). A
common threshold can be 0 or 1, which means the loop rarely executes.

David

Update patch w.r.t Philip's and David's comments.

Looks fine, but please wait for Phillip or another review to give you LGTM.

LGTM

reames accepted this revision.Sep 28 2015, 5:41 PM

reames edited edge metadata.

This revision is now accepted and ready to land.Sep 28 2015, 5:41 PM

• chenli closed this revision.Sep 28 2015, 10:05 PM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LoopUnswitch.cpp

33 lines

test/

Transforms/

LoopUnswitch/

cold-loop.ll

41 lines

Diff 30937

lib/Transforms/Scalar/LoopUnswitch.cpp

Show All 31 Lines
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AssumptionCache.h"		#include "llvm/Analysis/AssumptionCache.h"
#include "llvm/Analysis/CodeMetrics.h"		#include "llvm/Analysis/CodeMetrics.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/LoopPass.h"		#include "llvm/Analysis/LoopPass.h"
#include "llvm/Analysis/ScalarEvolution.h"		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/MDBuilder.h"		#include "llvm/IR/MDBuilder.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Support/BranchProbability.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include <algorithm>		#include <algorithm>
#include <map>		#include <map>
#include <set>		#include <set>
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-unswitch"		#define DEBUG_TYPE "loop-unswitch"

STATISTIC(NumBranches, "Number of branches unswitched");		STATISTIC(NumBranches, "Number of branches unswitched");
STATISTIC(NumSwitches, "Number of switches unswitched");		STATISTIC(NumSwitches, "Number of switches unswitched");
STATISTIC(NumSelects , "Number of selects unswitched");		STATISTIC(NumSelects , "Number of selects unswitched");
STATISTIC(NumTrivial , "Number of unswitches that are trivial");		STATISTIC(NumTrivial , "Number of unswitches that are trivial");
STATISTIC(NumSimplify, "Number of simplifications of unswitched code");		STATISTIC(NumSimplify, "Number of simplifications of unswitched code");
STATISTIC(TotalInsts, "Total number of instructions analyzed");		STATISTIC(TotalInsts, "Total number of instructions analyzed");

// The specific value of 100 here was chosen based only on intuition and a		// The specific value of 100 here was chosen based only on intuition and a
// few specific examples.		// few specific examples.
static cl::opt<unsigned>		static cl::opt<unsigned>
Threshold("loop-unswitch-threshold", cl::desc("Max loop size to unswitch"),		Threshold("loop-unswitch-threshold", cl::desc("Max loop size to unswitch"),
cl::init(100), cl::Hidden);		cl::init(100), cl::Hidden);

		static cl::opt<bool>
		reamesUnsubmitted Not Done Reply Inline Actions This comment is a bit deceptive. I'd just drop it. reames: This comment is a bit deceptive. I'd just drop it.
		LoopUnswitchWithBlockFrequency("loop-unswitch-with-block-frequency", cl::init(false), cl::Hidden,
		reamesUnsubmitted Not Done Reply Inline Actions What are your plans for testing this and getting it enabled by default? reames: What are your plans for testing this and getting it enabled by default?
		cl::desc("Enable the use of the block frequency analysis to access PGO "
		"heuristics to minimize code growth in cold regions and be more "
		"aggressive in hot regions."));

		// The default frequency 20 was chosen based on LoopVectorize's choice of cold frequency.
		static cl::opt<unsigned>
		reamesUnsubmitted Not Done Reply Inline Actions Have you explored this parameter at all? My guess would be that disabling loop unswitching should only be done for really cold loops. I have no evidence, but a cool loop in a really hot function seems like something we should unswitch. reames: Have you explored this parameter at all? My guess would be that disabling loop unswitching…
		chenliAuthorUnsubmitted Not Done Reply Inline Actions I was thinking about that too. Maybe adding function hotness to its loops is a good idea, ie. lower this threshold for hot functions and increase it for cold functions. chenli: I was thinking about that too. Maybe adding function hotness to its loops is a good idea, ie.
		reamesUnsubmitted Not Done Reply Inline Actions Right now, we're only limiting code size increase. Please adjust the string. reames: Right now, we're only limiting code size increase. Please adjust the string.
		ColdFrequency("loop-unswitch-cold-block-frequency", cl::init(20), cl::Hidden,
		cl::desc("Block frequency to be considered as cold. Non-trivial unswitches "
		"are not applied to cold blocks."));

		reamesUnsubmitted Not Done Reply Inline Actions I think the default here should be much lower, but we can tune this in separate changes. My guess here would be something like 1/100 rather than 20/100. I also think we should have an absolute cutoff, but again, we can tune separately. reames: I think the default here should be much lower, but we can tune this in separate changes. My…
		davidxlUnsubmitted Not Done Reply Inline Actions This is not frequency. Maybe call it 'ColdnessThreshold' or something. davidxl: This is not frequency. Maybe call it 'ColdnessThreshold' or something.
namespace {		namespace {
		davidxlUnsubmitted Not Done Reply Inline Actions It is the 'Coldness threshold in percent. The loop header frequency (relative to the entry frequency) is compared with the threshold to determine if non-trivial unswitching should be enabled.' davidxl: It is the 'Coldness threshold in percent. The loop header frequency (relative to the entry…

class LUAnalysisCache {		class LUAnalysisCache {

typedef DenseMap<const SwitchInst, SmallPtrSet<const Value , 8> >		typedef DenseMap<const SwitchInst, SmallPtrSet<const Value , 8> >
UnswitchedValsMap;		UnswitchedValsMap;

typedef UnswitchedValsMap::iterator UnswitchedValsIt;		typedef UnswitchedValsMap::iterator UnswitchedValsIt;

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	class LoopUnswitch : public LoopPass {
LPPassManager *LPM;		LPPassManager *LPM;
AssumptionCache *AC;		AssumptionCache *AC;

// LoopProcessWorklist - Used to check if second loop needs processing		// LoopProcessWorklist - Used to check if second loop needs processing
// after RewriteLoopBodyWithConditionConstant rewrites first loop.		// after RewriteLoopBodyWithConditionConstant rewrites first loop.
std::vector<Loop*> LoopProcessWorklist;		std::vector<Loop*> LoopProcessWorklist;

LUAnalysisCache BranchesInfo;		LUAnalysisCache BranchesInfo;
		BlockFrequencyInfo *BFI;

		BlockFrequency ColdEntryFreq;

bool OptimizeForSize;		bool OptimizeForSize;
		reamesUnsubmitted Not Done Reply Inline Actions Add a comment which says what these are used for and that they're only used with the PGO option enabled. reames: Add a comment which says what these are used for and that they're only used with the PGO option…
bool redoLoop;		bool redoLoop;

Loop *currentLoop;		Loop *currentLoop;
DominatorTree *DT;		DominatorTree *DT;
BasicBlock *loopHeader;		BasicBlock *loopHeader;
BasicBlock *loopPreheader;		BasicBlock *loopPreheader;

// LoopBlocks contains all of the basic blocks of the loop, including the		// LoopBlocks contains all of the basic blocks of the loop, including the
Show All 24 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addPreservedID(LoopSimplifyID);		AU.addPreservedID(LoopSimplifyID);
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
AU.addPreserved<LoopInfoWrapperPass>();		AU.addPreserved<LoopInfoWrapperPass>();
AU.addRequiredID(LCSSAID);		AU.addRequiredID(LCSSAID);
AU.addPreservedID(LCSSAID);		AU.addPreservedID(LCSSAID);
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<ScalarEvolution>();		AU.addPreserved<ScalarEvolution>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
		AU.addRequired<BlockFrequencyInfoWrapperPass>();
}		}

private:		private:

void releaseMemory() override {		void releaseMemory() override {
BranchesInfo.forgetLoop(currentLoop);		BranchesInfo.forgetLoop(currentLoop);
}		}

▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines
char LoopUnswitch::ID = 0;		char LoopUnswitch::ID = 0;
INITIALIZE_PASS_BEGIN(LoopUnswitch, "loop-unswitch", "Unswitch loops",		INITIALIZE_PASS_BEGIN(LoopUnswitch, "loop-unswitch", "Unswitch loops",
false, false)		false, false)
INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)		INITIALIZE_PASS_DEPENDENCY(AssumptionCacheTracker)
INITIALIZE_PASS_DEPENDENCY(LoopSimplify)		INITIALIZE_PASS_DEPENDENCY(LoopSimplify)
INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)		INITIALIZE_PASS_DEPENDENCY(LoopInfoWrapperPass)
INITIALIZE_PASS_DEPENDENCY(LCSSA)		INITIALIZE_PASS_DEPENDENCY(LCSSA)
		INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
INITIALIZE_PASS_END(LoopUnswitch, "loop-unswitch", "Unswitch loops",		INITIALIZE_PASS_END(LoopUnswitch, "loop-unswitch", "Unswitch loops",
false, false)		false, false)

Pass *llvm::createLoopUnswitchPass(bool Os) {		Pass *llvm::createLoopUnswitchPass(bool Os) {
return new LoopUnswitch(Os);		return new LoopUnswitch(Os);
}		}

/// FindLIVLoopCondition - Cond is a condition that occurs in L. If it is		/// FindLIVLoopCondition - Cond is a condition that occurs in L. If it is
Show All 31 Lines	static Value FindLIVLoopCondition(Value Cond, Loop *L, bool &Changed) {

return nullptr;		return nullptr;
}		}

bool LoopUnswitch::runOnLoop(Loop *L, LPPassManager &LPM_Ref) {		bool LoopUnswitch::runOnLoop(Loop *L, LPPassManager &LPM_Ref) {
if (skipOptnoneFunction(L))		if (skipOptnoneFunction(L))
return false;		return false;

		BFI = &getAnalysis<BlockFrequencyInfoWrapperPass>().getBFI();

		// Use BranchProbability to compute a minimum frequency based on
		// entry baseline frequency. Blocks below this frequency are considered
		// as cold.
		const BranchProbability ColdProb(ColdFrequency, 100);
		reamesUnsubmitted Not Done Reply Inline Actions Minor comment clarification: "function entry baseline frequency". "Loops with headers below this frequency" reames: Minor comment clarification: "function entry baseline frequency". "Loops with headers below…
		ColdEntryFreq = BlockFrequency(BFI->getEntryFreq()) * ColdProb;

AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(		AC = &getAnalysis<AssumptionCacheTracker>().getAssumptionCache(
*L->getHeader()->getParent());		*L->getHeader()->getParent());
LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();		LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
LPM = &LPM_Ref;		LPM = &LPM_Ref;
DominatorTreeWrapperPass *DTWP =		DominatorTreeWrapperPass *DTWP =
getAnalysisIfAvailable<DominatorTreeWrapperPass>();		getAnalysisIfAvailable<DominatorTreeWrapperPass>();
DT = DTWP ? &DTWP->getDomTree() : nullptr;		DT = DTWP ? &DTWP->getDomTree() : nullptr;
currentLoop = L;		currentLoop = L;
Function *F = currentLoop->getHeader()->getParent();		Function *F = currentLoop->getHeader()->getParent();
bool Changed = false;		bool Changed = false;
do {		do {
		davidxlUnsubmitted Not Done Reply Inline Actions Note that this also triggers the check when PGO is not enabled (e.g, with static prediction). To check if PGO is on, do: HasProfileData = F.getEntryCount().hasValue(); if (HasProfileData) { .. davidxl: Note that this also triggers the check when PGO is not enabled (e.g, with static prediction).
		chenliAuthorUnsubmitted Not Done Reply Inline Actions Do you mean that without PGO enabled, this should never be triggered? chenli: Do you mean that without PGO enabled, this should never be triggered?
assert(currentLoop->isLCSSAForm(*DT));		assert(currentLoop->isLCSSAForm(*DT));
redoLoop = false;		redoLoop = false;
Changed \|= processCurrentLoop();		Changed \|= processCurrentLoop();
} while(redoLoop);		} while(redoLoop);

if (Changed) {		if (Changed) {
// FIXME: Reconstruct dom info, because it is not preserved properly.		// FIXME: Reconstruct dom info, because it is not preserved properly.
if (DT)		if (DT)
Show All 31 Lines	if (!BranchesInfo.countLoop(
AC))		AC))
return false;		return false;

// Try trivial unswitch first before loop over other basic blocks in the loop.		// Try trivial unswitch first before loop over other basic blocks in the loop.
if (TryTrivialLoopUnswitch(Changed)) {		if (TryTrivialLoopUnswitch(Changed)) {
return true;		return true;
}		}

		// Compute the weighted frequency of this loop being executed and see if it
		// is less than ColdFrequency% of the function entry baseline frequency.
		BlockFrequency LoopEntryFreq = BFI->getBlockFreq(loopPreheader);
		chenliAuthorUnsubmitted Not Done Reply Inline Actions I chose to use loop preheader's frequency for coldness comparison. This will ignore the size of the loop because it does not count backedges or iteration numbers. My reasoning is that if the loop was only entered once in the pass 1000 executions, we should be confident that it will never be entered again. Therefore, no matter how many times the loop iterates, it shouldn't affect our decision. But I could see the other way around. I am open to change it if anyone prefers the other way. chenli: I chose to use loop preheader's frequency for coldness comparison. This will ignore the size of…
		reamesUnsubmitted Not Done Reply Inline Actions I feel pretty strongly that making decisions based on the number of times the loop executes rather than the number of times the header executes is likely to produce negative results. Consider: while(true) { if (loop invariant) { } if (loop invariant2) { } if (loop invariant3) { } do work(); } Your change would completely disable unswitching in this loop which seems non-ideal. reames: I feel pretty strongly that making decisions based on the number of times the loop executes…
		davidxlUnsubmitted Not Done Reply Inline Actions The hotness of the loop should be measured by the frequency of the hottest BB in the loop relative to the function entry. With PGO, you will need to check the execution count of the hottest BB in the loop (relative to some threshold set by the program summary). davidxl: The hotness of the loop should be measured by the frequency of the hottest BB in the loop…
		chenliAuthorUnsubmitted Not Done Reply Inline Actions I think the hottest BB should be the header, assuming its inner loops will be processed by their own unswitch passes. chenli: I think the hottest BB should be the header, assuming its inner loops will be processed by…
		if (LoopUnswitchWithBlockFrequency && LoopEntryFreq < ColdEntryFreq)
		return false;
		reamesUnsubmitted Not Done Reply Inline Actions Given there's already handling for OptForSize within UnswitchIfProfitable, I'd like to see the handling combined. A cleanup change to move the OptForSize check here - I think that's NFC right? - and then using that in this patch would help. reames: Given there's already handling for OptForSize within UnswitchIfProfitable, I'd like to see the…
		chenliAuthorUnsubmitted Not Done Reply Inline Actions I will have a look at this. chenli: I will have a look at this.

// Loop over all of the basic blocks in the loop. If we find an interior		// Loop over all of the basic blocks in the loop. If we find an interior
// block that is branching on a loop-invariant condition, we can unswitch this		// block that is branching on a loop-invariant condition, we can unswitch this
// loop.		// loop.
for (Loop::block_iterator I = currentLoop->block_begin(),		for (Loop::block_iterator I = currentLoop->block_begin(),
E = currentLoop->block_end(); I != E; ++I) {		E = currentLoop->block_end(); I != E; ++I) {
TerminatorInst TI = (I)->getTerminator();		TerminatorInst TI = (I)->getTerminator();
if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {		if (BranchInst *BI = dyn_cast<BranchInst>(TI)) {
// If this isn't branching on an invariant condition, we can't unswitch		// If this isn't branching on an invariant condition, we can't unswitch
▲ Show 20 Lines • Show All 809 Lines • Show Last 20 Lines

test/Transforms/LoopUnswitch/cold-loop.ll

This file was added.

				; RUN: opt < %s -loop-unswitch -loop-unswitch-with-block-frequency -S 2>&1 \| FileCheck %s

				; This test contains a cold loop where the first condition is trivial
				; and the second condition is non-trivial. LoopUnswitch pass should be
				; able to unswitch the trivial one but not the non-trivial one.

				define i32 @test(i1 %cond1, i1 %cond2, i1 %cond3) {
				br i1 %cond1, label %loop_preheader, label %loop_exit, !prof !0

				; CHECK: loop_preheader:
				; CHECK: br i1 %cond2, label %loop_preheader.loop_preheader.split_crit_edge, label %loop_preheader.loop_exit.loopexit.split_crit_edge
				loop_preheader:
				br label %loop_begin

				loop_begin:
				br i1 %cond2, label %continue, label %loop_exit ; trivial condition

				; CHECK: continue:
				; CHECK: br i1 %cond3, label %do_something, label %do_something_else
				continue:
				br i1 %cond3, label %do_something, label %do_something_else ; non-trivial condition
				reamesUnsubmitted Not Done Reply Inline Actions This is perhaps a bad example. In this particular case, we could split the loop into two subloops without code growth. Maybe tweak your example so it's more obvious why we want to avoid unswitching this? reames: This is perhaps a bad example. In this particular case, we could split the loop into two…
				davidxlUnsubmitted Not Done Reply Inline Actions Unswitching duplicates the loop, so size overhead is the loop setup code (in this case probably just a branch associated with backedge). davidxl: Unswitching duplicates the loop, so size overhead is the loop setup code (in this case probably…
				chenliAuthorUnsubmitted Not Done Reply Inline Actions Yes, this is not a good case. I will add some side-effect code after the first condition (so the first one is still trivial and not affected by coldness check, but unswitching the second one will grow code size). chenli: Yes, this is not a good case. I will add some side-effect code after the first condition (so…

				do_something:
				call void @some_func1() noreturn nounwind
				br label %joint

				do_something_else:
				call void @some_func2() noreturn nounwind
				br label %joint

				joint:
				br label %loop_begin

				loop_exit:
				ret i32 0
				}

				declare void @some_func1() noreturn
				declare void @some_func2() noreturn

				reamesUnsubmitted Not Done Reply Inline Actions Please add a couple of instructions here per previous comment on test case. reames: Please add a couple of instructions here per previous comment on test case.
				!0 = !{!"branch_weights", i32 1, i32 10}
				No newline at end of file