This is an archive of the discontinued LLVM Phabricator instance.

[BranchProbabilityInfo] Handle irreducible loops.
ClosedPublic

Authored by gberry on Oct 27 2017, 1:06 PM.

Download Raw Diff

Details

Reviewers

dexonsmith
davidxl
hjyamauchi
fhahn

Commits

rGeed6531ea2a3: [BranchProbabilityInfo] Handle irreducible loops.
rL317094: [BranchProbabilityInfo] Handle irreducible loops.

Summary

Compute the strongly connected components of the CFG and fall back to
use these for blocks that are in loops that are not detected by
LoopInfo when computing loop back-edge and exit branch probabilities.

Diff Detail

Build Status

Buildable 11580
Build 11580: arc lint + arc unit

Event Timeline

gberry created this revision.Oct 27 2017, 1:06 PM

Harbormaster completed remote builds in B11580: Diff 120682.Oct 27 2017, 1:06 PM

davidxl added a reviewer: hjyamauchi.Oct 27 2017, 1:13 PM

haicheng added a subscriber: haicheng.Oct 27 2017, 1:33 PM

fhahn added a subscriber: fhahn.Oct 27 2017, 2:12 PM

fhahn added inline comments.Oct 27 2017, 3:12 PM

lib/Analysis/BranchProbabilityInfo.cpp
458	Would it be possible to use insert here instead, to avoid doing another lookup at line 463?

gberry marked an inline comment as done.Oct 27 2017, 4:45 PM

gberry added inline comments.

lib/Analysis/BranchProbabilityInfo.cpp
458	Yep, fixed. Thanks.

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

fhahn added a reviewer: fhahn.Oct 30 2017, 4:04 AM

davidxl added inline comments.Oct 30 2017, 10:45 AM

lib/Analysis/BranchProbabilityInfo.cpp
450	Nit: It is probably better to change this lambda into a static function and move the body out to make the caller function body look cleaner.

In D39385#910489, @fhahn wrote:

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

I observed the following performance improvements on AArch64 (Falkor):

spec2017/xz:train             1.00
spec2006/bzip2:lessnoise      7.24

As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.

lib/Analysis/BranchProbabilityInfo.cpp
450	Sure, will do.

Address Florian and David's comments.

davidxl added inline comments.Oct 31 2017, 9:51 AM

lib/Analysis/BranchProbabilityInfo.cpp
824	Do you see any visible compile time impact of this?

gberry added inline comments.Oct 31 2017, 11:19 AM

lib/Analysis/BranchProbabilityInfo.cpp
824	I did not (see my previous comment). For the record, I looked at compile times compiling CTMark for aarch64.

lgtm

This revision is now accepted and ready to land.Oct 31 2017, 11:24 AM

In D39385#910911, @gberry wrote:
In D39385#910489, @fhahn wrote:

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

I observed the following performance improvements on AArch64 (Falkor):
spec2017/xz:train             1.00
spec2006/bzip2:lessnoise      7.24

Great thanks!

As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.

Yeah that's what I though too. LGTM

Closed by commit rL317094: [BranchProbabilityInfo] Handle irreducible loops. (authored by gberry). · Explain WhyNov 1 2017, 8:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

BranchProbabilityInfo.h

11 lines

lib/

Analysis/

BranchProbabilityInfo.cpp

86 lines

test/

Analysis/

BranchProbabilityInfo/

loop.ll

37 lines

Diff 120682

include/llvm/Analysis/BranchProbabilityInfo.h

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	private:
};		};

DenseSet<BasicBlockCallbackVH, DenseMapInfo<Value*>> Handles;		DenseSet<BasicBlockCallbackVH, DenseMapInfo<Value*>> Handles;

// Since we allow duplicate edges from one basic block to another, we use		// Since we allow duplicate edges from one basic block to another, we use
// a pair (PredBlock and an index in the successors) to specify an edge.		// a pair (PredBlock and an index in the successors) to specify an edge.
using Edge = std::pair<const BasicBlock *, unsigned>;		using Edge = std::pair<const BasicBlock *, unsigned>;

		// Use to track SCCs for handling irreducible loops.
		using SccMap = DenseMap<const BasicBlock *, int>;
		using SccHeaderMaps = std::vector<DenseMap<const BasicBlock *, bool>>;
		struct SccInfo {
		SccMap SccNums;
		SccHeaderMaps SccHeaders;
		};

// Default weight value. Used when we don't have information about the edge.		// Default weight value. Used when we don't have information about the edge.
// TODO: DEFAULT_WEIGHT makes sense during static predication, when none of		// TODO: DEFAULT_WEIGHT makes sense during static predication, when none of
// the successors have a weight yet. But it doesn't make sense when providing		// the successors have a weight yet. But it doesn't make sense when providing
// weight to an edge that may have siblings with non-zero weights. This can		// weight to an edge that may have siblings with non-zero weights. This can
// be handled various ways, but it's probably fine for an edge with unknown		// be handled various ways, but it's probably fine for an edge with unknown
// weight to just "inherit" the non-zero weight of an adjacent successor.		// weight to just "inherit" the non-zero weight of an adjacent successor.
static const uint32_t DEFAULT_WEIGHT = 16;		static const uint32_t DEFAULT_WEIGHT = 16;

Show All 9 Lines	private:
SmallPtrSet<const BasicBlock *, 16> PostDominatedByColdCall;		SmallPtrSet<const BasicBlock *, 16> PostDominatedByColdCall;

void updatePostDominatedByUnreachable(const BasicBlock *BB);		void updatePostDominatedByUnreachable(const BasicBlock *BB);
void updatePostDominatedByColdCall(const BasicBlock *BB);		void updatePostDominatedByColdCall(const BasicBlock *BB);
bool calcUnreachableHeuristics(const BasicBlock *BB);		bool calcUnreachableHeuristics(const BasicBlock *BB);
bool calcMetadataWeights(const BasicBlock *BB);		bool calcMetadataWeights(const BasicBlock *BB);
bool calcColdCallHeuristics(const BasicBlock *BB);		bool calcColdCallHeuristics(const BasicBlock *BB);
bool calcPointerHeuristics(const BasicBlock *BB);		bool calcPointerHeuristics(const BasicBlock *BB);
bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI);		bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI,
		SccInfo &SccI);
bool calcZeroHeuristics(const BasicBlock BB, const TargetLibraryInfo TLI);		bool calcZeroHeuristics(const BasicBlock BB, const TargetLibraryInfo TLI);
bool calcFloatingPointHeuristics(const BasicBlock *BB);		bool calcFloatingPointHeuristics(const BasicBlock *BB);
bool calcInvokeHeuristics(const BasicBlock *BB);		bool calcInvokeHeuristics(const BasicBlock *BB);
};		};

/// \brief Analysis pass which computes \c BranchProbabilityInfo.		/// \brief Analysis pass which computes \c BranchProbabilityInfo.
class BranchProbabilityAnalysis		class BranchProbabilityAnalysis
: public AnalysisInfoMixin<BranchProbabilityAnalysis> {		: public AnalysisInfoMixin<BranchProbabilityAnalysis> {
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lib/Analysis/BranchProbabilityInfo.cpp

//===- BranchProbabilityInfo.cpp - Branch Probability Analysis ------------===//		//===- BranchProbabilityInfo.cpp - Branch Probability Analysis ------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Loops should be simplified before this analysis.		// Loops should be simplified before this analysis.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
		#include "llvm/ADT/SCCIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	bool BranchProbabilityInfo::calcPointerHeuristics(const BasicBlock *BB) {
setEdgeProbability(BB, TakenIdx, TakenProb);		setEdgeProbability(BB, TakenIdx, TakenProb);
setEdgeProbability(BB, NonTakenIdx, TakenProb.getCompl());		setEdgeProbability(BB, NonTakenIdx, TakenProb.getCompl());
return true;		return true;
}		}

// Calculate Edge Weights using "Loop Branch Heuristics". Predict backedges		// Calculate Edge Weights using "Loop Branch Heuristics". Predict backedges
// as taken, exiting edges as not-taken.		// as taken, exiting edges as not-taken.
bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,		bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,
const LoopInfo &LI) {		const LoopInfo &LI,
		SccInfo &SccI) {

		auto getSCCNum = [&SccI](const BasicBlock *BB) {
		auto SccIt = SccI.SccNums.find(BB);
		if (SccIt == SccI.SccNums.end())
		return -1;
		return SccIt->second;
		};

		int SccNum;
Loop *L = LI.getLoopFor(BB);		Loop *L = LI.getLoopFor(BB);
if (!L)		if (!L) {
		SccNum = getSCCNum(BB);
		if (SccNum < 0)
return false;		return false;
		}

		// Consider any block that is an entry point to the SCC as a header.
		auto isSCCHeader = [&](const BasicBlock *BB) {
		davidxlUnsubmitted Not Done Reply Inline Actions Nit: It is probably better to change this lambda into a static function and move the body out to make the caller function body look cleaner. davidxl: Nit: It is probably better to change this lambda into a static function and move the body out…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions Sure, will do. gberry: Sure, will do.
		assert(getSCCNum(BB) == SccNum);

		// Lazily compute the set of headers for a given SCC and cache the results
		// in the SccHeaderMap.
		if (SccI.SccHeaders.size() <= static_cast<unsigned>(SccNum))
		SccI.SccHeaders.resize(SccNum + 1);
		auto &SccHeaderMap = SccI.SccHeaders[SccNum];
		auto SccHeaderMapIt = SccHeaderMap.find(BB);
		fhahnUnsubmitted Done Reply Inline Actions Would it be possible to use insert here instead, to avoid doing another lookup at line 463? fhahn: Would it be possible to use insert here instead, to avoid doing another lookup at line 463?
		gberryAuthorUnsubmitted Not Done Reply Inline Actions Yep, fixed. Thanks. gberry: Yep, fixed. Thanks.
		if (SccHeaderMapIt == SccHeaderMap.end()) {
		bool IsHeader = llvm::any_of(
		make_range(pred_begin(BB), pred_end(BB)),
		[&](const BasicBlock *Pred) { return getSCCNum(Pred) != SccNum; });
		SccHeaderMap[BB] = IsHeader;
		return IsHeader;
		} else
		return SccHeaderMapIt->second;
		};

SmallVector<unsigned, 8> BackEdges;		SmallVector<unsigned, 8> BackEdges;
SmallVector<unsigned, 8> ExitingEdges;		SmallVector<unsigned, 8> ExitingEdges;
SmallVector<unsigned, 8> InEdges; // Edges from header to the loop.		SmallVector<unsigned, 8> InEdges; // Edges from header to the loop.

for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {		for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {
		// Use LoopInfo if we have it, otherwise fall-back to SCC info to catch
		// irreducible loops.
		if (L) {
if (!L->contains(*I))		if (!L->contains(*I))
ExitingEdges.push_back(I.getSuccessorIndex());		ExitingEdges.push_back(I.getSuccessorIndex());
else if (L->getHeader() == *I)		else if (L->getHeader() == *I)
BackEdges.push_back(I.getSuccessorIndex());		BackEdges.push_back(I.getSuccessorIndex());
else		else
InEdges.push_back(I.getSuccessorIndex());		InEdges.push_back(I.getSuccessorIndex());
		} else {
		if (getSCCNum(*I) != SccNum)
		ExitingEdges.push_back(I.getSuccessorIndex());
		else if (isSCCHeader(*I))
		BackEdges.push_back(I.getSuccessorIndex());
		else
		InEdges.push_back(I.getSuccessorIndex());
		}
}		}

if (BackEdges.empty() && ExitingEdges.empty())		if (BackEdges.empty() && ExitingEdges.empty())
return false;		return false;

// Collect the sum of probabilities of back-edges/in-edges/exiting-edges, and		// Collect the sum of probabilities of back-edges/in-edges/exiting-edges, and
// normalize them so that they sum up to one.		// normalize them so that they sum up to one.
BranchProbability Probs[] = {BranchProbability::getZero(),		BranchProbability Probs[] = {BranchProbability::getZero(),
▲ Show 20 Lines • Show All 312 Lines • ▼ Show 20 Lines
void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,		void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
DEBUG(dbgs() << "---- Branch Probability Info : " << F.getName()		DEBUG(dbgs() << "---- Branch Probability Info : " << F.getName()
<< " ----\n\n");		<< " ----\n\n");
LastF = &F; // Store the last function we ran on for printing.		LastF = &F; // Store the last function we ran on for printing.
assert(PostDominatedByUnreachable.empty());		assert(PostDominatedByUnreachable.empty());
assert(PostDominatedByColdCall.empty());		assert(PostDominatedByColdCall.empty());

		// Record SCC numbers of blocks in the CFG to identify irreducible loops.
		// FIXME: We could only calculate this if the CFG is known to be irreducible
		// (perhaps cache this info in LoopInfo if we can easily calculate it there?).
		int SccNum = 0;
		SccInfo SccI;
		for (scc_iterator<const Function *> It = scc_begin(&F); !It.isAtEnd();
		davidxlUnsubmitted Not Done Reply Inline Actions Do you see any visible compile time impact of this? davidxl: Do you see any visible compile time impact of this?
		gberryAuthorUnsubmitted Not Done Reply Inline Actions I did not (see my previous comment). For the record, I looked at compile times compiling CTMark for aarch64. gberry: I did not (see my previous comment). For the record, I looked at compile times compiling…
		++It, ++SccNum) {
		// Ignore single-block SCCs since they either aren't loops or LoopInfo will
		// catch them.
		const std::vector<const BasicBlock > &Scc = It;
		if (Scc.size() == 1)
		continue;

		DEBUG(dbgs() << "BPI: SCC " << SccNum << ":");
		for (auto *BB : Scc) {
		DEBUG(dbgs() << " " << BB->getName());
		SccI.SccNums[BB] = SccNum;
		}
		DEBUG(dbgs() << "\n");
		}

// Walk the basic blocks in post-order so that we can build up state about		// Walk the basic blocks in post-order so that we can build up state about
// the successors of a block iteratively.		// the successors of a block iteratively.
for (auto BB : post_order(&F.getEntryBlock())) {		for (auto BB : post_order(&F.getEntryBlock())) {
DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");		DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");
updatePostDominatedByUnreachable(BB);		updatePostDominatedByUnreachable(BB);
updatePostDominatedByColdCall(BB);		updatePostDominatedByColdCall(BB);
// If there is no at least two successors, no sense to set probability.		// If there is no at least two successors, no sense to set probability.
if (BB->getTerminator()->getNumSuccessors() < 2)		if (BB->getTerminator()->getNumSuccessors() < 2)
continue;		continue;
if (calcMetadataWeights(BB))		if (calcMetadataWeights(BB))
continue;		continue;
if (calcUnreachableHeuristics(BB))		if (calcUnreachableHeuristics(BB))
continue;		continue;
if (calcColdCallHeuristics(BB))		if (calcColdCallHeuristics(BB))
continue;		continue;
if (calcLoopBranchHeuristics(BB, LI))		if (calcLoopBranchHeuristics(BB, LI, SccI))
continue;		continue;
if (calcPointerHeuristics(BB))		if (calcPointerHeuristics(BB))
continue;		continue;
if (calcZeroHeuristics(BB, TLI))		if (calcZeroHeuristics(BB, TLI))
continue;		continue;
if (calcFloatingPointHeuristics(BB))		if (calcFloatingPointHeuristics(BB))
continue;		continue;
calcInvokeHeuristics(BB);		calcInvokeHeuristics(BB);
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

test/Analysis/BranchProbabilityInfo/loop.ll

; Test the static branch probability heuristics for no-return functions.		; Test the static branch probability heuristics for no-return functions.
; RUN: opt < %s -analyze -branch-prob \| FileCheck %s		; RUN: opt < %s -analyze -branch-prob \| FileCheck %s
; RUN: opt < %s -passes='print<branch-prob>' --disable-output 2>&1 \| FileCheck %s		; RUN: opt < %s -passes='print<branch-prob>' --disable-output 2>&1 \| FileCheck %s

declare void @g1()		declare void @g1()
declare void @g2()		declare void @g2()
declare void @g3()		declare void @g3()
declare void @g4()		declare void @g4()
		declare i32 @g5()

define void @test1(i32 %a, i32 %b) {		define void @test1(i32 %a, i32 %b) {
entry:		entry:
br label %do.body		br label %do.body
; CHECK: edge entry -> do.body probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]		; CHECK: edge entry -> do.body probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

do.body:		do.body:
%i.0 = phi i32 [ 0, %entry ], [ %inc3, %do.end ]		%i.0 = phi i32 [ 0, %entry ], [ %inc3, %do.end ]
▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	for.end:
br i1 %exitcond20, label %for.end15, label %for.body		br i1 %exitcond20, label %for.end15, label %for.body
; CHECK: edge for.end -> for.end15 probability is 0x04000000 / 0x80000000 = 3.12%		; CHECK: edge for.end -> for.end15 probability is 0x04000000 / 0x80000000 = 3.12%
; CHECK: edge for.end -> for.body probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]		; CHECK: edge for.end -> for.body probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]

for.end15:		for.end15:
call void @g4()		call void @g4()
ret void		ret void
}		}

		; Test that an irreducible loop gets heavily weighted back-edges.
		define void @test9(i32 %i, i32 %x, i32 %c) {
		entry:
		%tobool = icmp eq i32 %c, 0
		br i1 %tobool, label %if.end, label %midloop
		; CHECK: edge entry -> if.end probability is 0x30000000 / 0x80000000 = 37.50%
		; CHECK: edge entry -> midloop probability is 0x50000000 / 0x80000000 = 62.50%

		if.end:
		br label %for.cond
		; CHECK: edge if.end -> for.cond probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		for.cond:
		%i.addr.0 = phi i32 [ %inc, %for.inc ], [ 0, %if.end ]
		%cmp = icmp slt i32 %i.addr.0, %x
		br i1 %cmp, label %midloop, label %end
		; CHECK: edge for.cond -> midloop probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]
		; CHECK: edge for.cond -> end probability is 0x04000000 / 0x80000000 = 3.12%

		midloop:
		%i.addr.1 = phi i32 [ %i, %entry ], [ %i.addr.0, %for.cond ]
		%call1 = call i32 @g5()
		%tobool2 = icmp eq i32 %call1, 0
		br i1 %tobool2, label %for.inc, label %end
		; CHECK: edge midloop -> for.inc probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]
		; CHECK: edge midloop -> end probability is 0x04000000 / 0x80000000 = 3.12%

		for.inc:
		%inc = add nsw i32 %i.addr.1, 1
		br label %for.cond
		; CHECK: edge for.inc -> for.cond probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		end:
		ret void
		}

This is an archive of the discontinued LLVM Phabricator instance.

[BranchProbabilityInfo] Handle irreducible loops.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 120682

include/llvm/Analysis/BranchProbabilityInfo.h

lib/Analysis/BranchProbabilityInfo.cpp

test/Analysis/BranchProbabilityInfo/loop.ll

[BranchProbabilityInfo] Handle irreducible loops.
ClosedPublic