This is an archive of the discontinued LLVM Phabricator instance.

[BranchProbabilityInfo] Handle irreducible loops.
ClosedPublic

Authored by gberry on Oct 27 2017, 1:06 PM.

Download Raw Diff

Details

Reviewers

dexonsmith
davidxl
hjyamauchi
fhahn

Commits

rGeed6531ea2a3: [BranchProbabilityInfo] Handle irreducible loops.
rL317094: [BranchProbabilityInfo] Handle irreducible loops.

Summary

Compute the strongly connected components of the CFG and fall back to
use these for blocks that are in loops that are not detected by
LoopInfo when computing loop back-edge and exit branch probabilities.

Diff Detail

Build Status

Buildable 11655
Build 11655: arc lint + arc unit

Event Timeline

gberry created this revision.Oct 27 2017, 1:06 PM

Harbormaster completed remote builds in B11580: Diff 120682.Oct 27 2017, 1:06 PM

davidxl added a reviewer: hjyamauchi.Oct 27 2017, 1:13 PM

haicheng added a subscriber: haicheng.Oct 27 2017, 1:33 PM

fhahn added a subscriber: fhahn.Oct 27 2017, 2:12 PM

fhahn added inline comments.Oct 27 2017, 3:12 PM

lib/Analysis/BranchProbabilityInfo.cpp
482	Would it be possible to use insert here instead, to avoid doing another lookup at line 463?

gberry marked an inline comment as done.Oct 27 2017, 4:45 PM

gberry added inline comments.

lib/Analysis/BranchProbabilityInfo.cpp
482	Yep, fixed. Thanks.

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

fhahn added a reviewer: fhahn.Oct 30 2017, 4:04 AM

davidxl added inline comments.Oct 30 2017, 10:45 AM

lib/Analysis/BranchProbabilityInfo.cpp
474	Nit: It is probably better to change this lambda into a static function and move the body out to make the caller function body look cleaner.

In D39385#910489, @fhahn wrote:

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

I observed the following performance improvements on AArch64 (Falkor):

spec2017/xz:train             1.00
spec2006/bzip2:lessnoise      7.24

As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.

lib/Analysis/BranchProbabilityInfo.cpp
474	Sure, will do.

Address Florian and David's comments.

davidxl added inline comments.Oct 31 2017, 9:51 AM

lib/Analysis/BranchProbabilityInfo.cpp
828	Do you see any visible compile time impact of this?

gberry added inline comments.Oct 31 2017, 11:19 AM

lib/Analysis/BranchProbabilityInfo.cpp
828	I did not (see my previous comment). For the record, I looked at compile times compiling CTMark for aarch64.

lgtm

This revision is now accepted and ready to land.Oct 31 2017, 11:24 AM

In D39385#910911, @gberry wrote:
In D39385#910489, @fhahn wrote:

This looks sensible to me. Do you have any numbers about the impact of this change? Not sure about the compile time impact of computing the SCC here (computing SCC should not be too expensive, O(N+E)), but at least BlockFrequencyInfoImpl uses SCC for a similar thing but it is only computed once it discovers irreducible control flow. It might be worth having this as a separate analysis to do the construction only once in the future.

I observed the following performance improvements on AArch64 (Falkor):
spec2017/xz:train             1.00
spec2006/bzip2:lessnoise      7.24

Great thanks!

As for compile time, the times for BPI do increase, but the increases are in the noise of the compile time as a whole. I looked briefly at adding irreducible loop detection to LoopInfo, so I could check that here before computing SCCs (see FIXME comment around line 820), but it didn't seem simple enough to justify the effort given the level of compile time increase.

Yeah that's what I though too. LGTM

Closed by commit rL317094: [BranchProbabilityInfo] Handle irreducible loops. (authored by gberry). · Explain WhyNov 1 2017, 8:17 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Analysis/

BranchProbabilityInfo.h

12 lines

lib/

Analysis/

BranchProbabilityInfo.cpp

90 lines

test/

Analysis/

BranchProbabilityInfo/

loop.ll

37 lines

Diff 120984

include/llvm/Analysis/BranchProbabilityInfo.h

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	public:
}		}

void calculate(const Function &F, const LoopInfo &LI,		void calculate(const Function &F, const LoopInfo &LI,
const TargetLibraryInfo *TLI = nullptr);		const TargetLibraryInfo *TLI = nullptr);

/// Forget analysis results for the given basic block.		/// Forget analysis results for the given basic block.
void eraseBlock(const BasicBlock *BB);		void eraseBlock(const BasicBlock *BB);

		// Use to track SCCs for handling irreducible loops.
		using SccMap = DenseMap<const BasicBlock *, int>;
		using SccHeaderMap = DenseMap<const BasicBlock *, bool>;
		using SccHeaderMaps = std::vector<SccHeaderMap>;
		struct SccInfo {
		SccMap SccNums;
		SccHeaderMaps SccHeaders;
		};

private:		private:
// We need to store CallbackVH's in order to correctly handle basic block		// We need to store CallbackVH's in order to correctly handle basic block
// removal.		// removal.
class BasicBlockCallbackVH final : public CallbackVH {		class BasicBlockCallbackVH final : public CallbackVH {
BranchProbabilityInfo *BPI;		BranchProbabilityInfo *BPI;

void deleted() override {		void deleted() override {
assert(BPI != nullptr);		assert(BPI != nullptr);
Show All 32 Lines	private:
SmallPtrSet<const BasicBlock *, 16> PostDominatedByColdCall;		SmallPtrSet<const BasicBlock *, 16> PostDominatedByColdCall;

void updatePostDominatedByUnreachable(const BasicBlock *BB);		void updatePostDominatedByUnreachable(const BasicBlock *BB);
void updatePostDominatedByColdCall(const BasicBlock *BB);		void updatePostDominatedByColdCall(const BasicBlock *BB);
bool calcUnreachableHeuristics(const BasicBlock *BB);		bool calcUnreachableHeuristics(const BasicBlock *BB);
bool calcMetadataWeights(const BasicBlock *BB);		bool calcMetadataWeights(const BasicBlock *BB);
bool calcColdCallHeuristics(const BasicBlock *BB);		bool calcColdCallHeuristics(const BasicBlock *BB);
bool calcPointerHeuristics(const BasicBlock *BB);		bool calcPointerHeuristics(const BasicBlock *BB);
bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI);		bool calcLoopBranchHeuristics(const BasicBlock *BB, const LoopInfo &LI,
		SccInfo &SccI);
bool calcZeroHeuristics(const BasicBlock BB, const TargetLibraryInfo TLI);		bool calcZeroHeuristics(const BasicBlock BB, const TargetLibraryInfo TLI);
bool calcFloatingPointHeuristics(const BasicBlock *BB);		bool calcFloatingPointHeuristics(const BasicBlock *BB);
bool calcInvokeHeuristics(const BasicBlock *BB);		bool calcInvokeHeuristics(const BasicBlock *BB);
};		};

/// \brief Analysis pass which computes \c BranchProbabilityInfo.		/// \brief Analysis pass which computes \c BranchProbabilityInfo.
class BranchProbabilityAnalysis		class BranchProbabilityAnalysis
: public AnalysisInfoMixin<BranchProbabilityAnalysis> {		: public AnalysisInfoMixin<BranchProbabilityAnalysis> {
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

lib/Analysis/BranchProbabilityInfo.cpp

//===- BranchProbabilityInfo.cpp - Branch Probability Analysis ------------===//		//===- BranchProbabilityInfo.cpp - Branch Probability Analysis ------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Loops should be simplified before this analysis.		// Loops should be simplified before this analysis.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Analysis/BranchProbabilityInfo.h"		#include "llvm/Analysis/BranchProbabilityInfo.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
		#include "llvm/ADT/SCCIterator.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
▲ Show 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	bool BranchProbabilityInfo::calcPointerHeuristics(const BasicBlock *BB) {

BranchProbability TakenProb(PH_TAKEN_WEIGHT,		BranchProbability TakenProb(PH_TAKEN_WEIGHT,
PH_TAKEN_WEIGHT + PH_NONTAKEN_WEIGHT);		PH_TAKEN_WEIGHT + PH_NONTAKEN_WEIGHT);
setEdgeProbability(BB, TakenIdx, TakenProb);		setEdgeProbability(BB, TakenIdx, TakenProb);
setEdgeProbability(BB, NonTakenIdx, TakenProb.getCompl());		setEdgeProbability(BB, NonTakenIdx, TakenProb.getCompl());
return true;		return true;
}		}

		static int getSCCNum(const BasicBlock *BB,
		const BranchProbabilityInfo::SccInfo &SccI) {
		auto SccIt = SccI.SccNums.find(BB);
		if (SccIt == SccI.SccNums.end())
		return -1;
		return SccIt->second;
		}

		// Consider any block that is an entry point to the SCC as a header.
		static bool isSCCHeader(const BasicBlock *BB, int SccNum,
		BranchProbabilityInfo::SccInfo &SccI) {
		assert(getSCCNum(BB, SccI) == SccNum);

		// Lazily compute the set of headers for a given SCC and cache the results
		// in the SccHeaderMap.
		if (SccI.SccHeaders.size() <= static_cast<unsigned>(SccNum))
		SccI.SccHeaders.resize(SccNum + 1);
		auto &HeaderMap = SccI.SccHeaders[SccNum];
		bool Inserted;
		BranchProbabilityInfo::SccHeaderMap::iterator HeaderMapIt;
		std::tie(HeaderMapIt, Inserted) = HeaderMap.insert(std::make_pair(BB, false));
		if (Inserted) {
		bool IsHeader = llvm::any_of(make_range(pred_begin(BB), pred_end(BB)),
		[&](const BasicBlock *Pred) {
		return getSCCNum(Pred, SccI) != SccNum;
		});
		HeaderMapIt->second = IsHeader;
		return IsHeader;
		} else
		return HeaderMapIt->second;
		}

// Calculate Edge Weights using "Loop Branch Heuristics". Predict backedges		// Calculate Edge Weights using "Loop Branch Heuristics". Predict backedges
// as taken, exiting edges as not-taken.		// as taken, exiting edges as not-taken.
bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,		bool BranchProbabilityInfo::calcLoopBranchHeuristics(const BasicBlock *BB,
const LoopInfo &LI) {		const LoopInfo &LI,
		SccInfo &SccI) {
		int SccNum;
Loop *L = LI.getLoopFor(BB);		Loop *L = LI.getLoopFor(BB);
if (!L)		if (!L) {
		SccNum = getSCCNum(BB, SccI);
		if (SccNum < 0)
return false;		return false;
		}

SmallVector<unsigned, 8> BackEdges;		SmallVector<unsigned, 8> BackEdges;
SmallVector<unsigned, 8> ExitingEdges;		SmallVector<unsigned, 8> ExitingEdges;
		davidxlUnsubmitted Not Done Reply Inline Actions Nit: It is probably better to change this lambda into a static function and move the body out to make the caller function body look cleaner. davidxl: Nit: It is probably better to change this lambda into a static function and move the body out…
		gberryAuthorUnsubmitted Not Done Reply Inline Actions Sure, will do. gberry: Sure, will do.
SmallVector<unsigned, 8> InEdges; // Edges from header to the loop.		SmallVector<unsigned, 8> InEdges; // Edges from header to the loop.

for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {		for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E; ++I) {
		// Use LoopInfo if we have it, otherwise fall-back to SCC info to catch
		// irreducible loops.
		if (L) {
if (!L->contains(*I))		if (!L->contains(*I))
ExitingEdges.push_back(I.getSuccessorIndex());		ExitingEdges.push_back(I.getSuccessorIndex());
		fhahnUnsubmitted Done Reply Inline Actions Would it be possible to use insert here instead, to avoid doing another lookup at line 463? fhahn: Would it be possible to use insert here instead, to avoid doing another lookup at line 463?
		gberryAuthorUnsubmitted Not Done Reply Inline Actions Yep, fixed. Thanks. gberry: Yep, fixed. Thanks.
else if (L->getHeader() == *I)		else if (L->getHeader() == *I)
BackEdges.push_back(I.getSuccessorIndex());		BackEdges.push_back(I.getSuccessorIndex());
else		else
InEdges.push_back(I.getSuccessorIndex());		InEdges.push_back(I.getSuccessorIndex());
		} else {
		if (getSCCNum(*I, SccI) != SccNum)
		ExitingEdges.push_back(I.getSuccessorIndex());
		else if (isSCCHeader(*I, SccNum, SccI))
		BackEdges.push_back(I.getSuccessorIndex());
		else
		InEdges.push_back(I.getSuccessorIndex());
		}
}		}

if (BackEdges.empty() && ExitingEdges.empty())		if (BackEdges.empty() && ExitingEdges.empty())
return false;		return false;

// Collect the sum of probabilities of back-edges/in-edges/exiting-edges, and		// Collect the sum of probabilities of back-edges/in-edges/exiting-edges, and
// normalize them so that they sum up to one.		// normalize them so that they sum up to one.
BranchProbability Probs[] = {BranchProbability::getZero(),		BranchProbability Probs[] = {BranchProbability::getZero(),
▲ Show 20 Lines • Show All 312 Lines • ▼ Show 20 Lines
void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,		void BranchProbabilityInfo::calculate(const Function &F, const LoopInfo &LI,
const TargetLibraryInfo *TLI) {		const TargetLibraryInfo *TLI) {
DEBUG(dbgs() << "---- Branch Probability Info : " << F.getName()		DEBUG(dbgs() << "---- Branch Probability Info : " << F.getName()
<< " ----\n\n");		<< " ----\n\n");
LastF = &F; // Store the last function we ran on for printing.		LastF = &F; // Store the last function we ran on for printing.
assert(PostDominatedByUnreachable.empty());		assert(PostDominatedByUnreachable.empty());
assert(PostDominatedByColdCall.empty());		assert(PostDominatedByColdCall.empty());

		// Record SCC numbers of blocks in the CFG to identify irreducible loops.
		// FIXME: We could only calculate this if the CFG is known to be irreducible
		// (perhaps cache this info in LoopInfo if we can easily calculate it there?).
		int SccNum = 0;
		SccInfo SccI;
		for (scc_iterator<const Function *> It = scc_begin(&F); !It.isAtEnd();
		davidxlUnsubmitted Not Done Reply Inline Actions Do you see any visible compile time impact of this? davidxl: Do you see any visible compile time impact of this?
		gberryAuthorUnsubmitted Not Done Reply Inline Actions I did not (see my previous comment). For the record, I looked at compile times compiling CTMark for aarch64. gberry: I did not (see my previous comment). For the record, I looked at compile times compiling…
		++It, ++SccNum) {
		// Ignore single-block SCCs since they either aren't loops or LoopInfo will
		// catch them.
		const std::vector<const BasicBlock > &Scc = It;
		if (Scc.size() == 1)
		continue;

		DEBUG(dbgs() << "BPI: SCC " << SccNum << ":");
		for (auto *BB : Scc) {
		DEBUG(dbgs() << " " << BB->getName());
		SccI.SccNums[BB] = SccNum;
		}
		DEBUG(dbgs() << "\n");
		}

// Walk the basic blocks in post-order so that we can build up state about		// Walk the basic blocks in post-order so that we can build up state about
// the successors of a block iteratively.		// the successors of a block iteratively.
for (auto BB : post_order(&F.getEntryBlock())) {		for (auto BB : post_order(&F.getEntryBlock())) {
DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");		DEBUG(dbgs() << "Computing probabilities for " << BB->getName() << "\n");
updatePostDominatedByUnreachable(BB);		updatePostDominatedByUnreachable(BB);
updatePostDominatedByColdCall(BB);		updatePostDominatedByColdCall(BB);
// If there is no at least two successors, no sense to set probability.		// If there is no at least two successors, no sense to set probability.
if (BB->getTerminator()->getNumSuccessors() < 2)		if (BB->getTerminator()->getNumSuccessors() < 2)
continue;		continue;
if (calcMetadataWeights(BB))		if (calcMetadataWeights(BB))
continue;		continue;
if (calcUnreachableHeuristics(BB))		if (calcUnreachableHeuristics(BB))
continue;		continue;
if (calcColdCallHeuristics(BB))		if (calcColdCallHeuristics(BB))
continue;		continue;
if (calcLoopBranchHeuristics(BB, LI))		if (calcLoopBranchHeuristics(BB, LI, SccI))
continue;		continue;
if (calcPointerHeuristics(BB))		if (calcPointerHeuristics(BB))
continue;		continue;
if (calcZeroHeuristics(BB, TLI))		if (calcZeroHeuristics(BB, TLI))
continue;		continue;
if (calcFloatingPointHeuristics(BB))		if (calcFloatingPointHeuristics(BB))
continue;		continue;
calcInvokeHeuristics(BB);		calcInvokeHeuristics(BB);
▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

test/Analysis/BranchProbabilityInfo/loop.ll

; Test the static branch probability heuristics for no-return functions.		; Test the static branch probability heuristics for no-return functions.
; RUN: opt < %s -analyze -branch-prob \| FileCheck %s		; RUN: opt < %s -analyze -branch-prob \| FileCheck %s
; RUN: opt < %s -passes='print<branch-prob>' --disable-output 2>&1 \| FileCheck %s		; RUN: opt < %s -passes='print<branch-prob>' --disable-output 2>&1 \| FileCheck %s

declare void @g1()		declare void @g1()
declare void @g2()		declare void @g2()
declare void @g3()		declare void @g3()
declare void @g4()		declare void @g4()
		declare i32 @g5()

define void @test1(i32 %a, i32 %b) {		define void @test1(i32 %a, i32 %b) {
entry:		entry:
br label %do.body		br label %do.body
; CHECK: edge entry -> do.body probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]		; CHECK: edge entry -> do.body probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

do.body:		do.body:
%i.0 = phi i32 [ 0, %entry ], [ %inc3, %do.end ]		%i.0 = phi i32 [ 0, %entry ], [ %inc3, %do.end ]
▲ Show 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	for.end:
br i1 %exitcond20, label %for.end15, label %for.body		br i1 %exitcond20, label %for.end15, label %for.body
; CHECK: edge for.end -> for.end15 probability is 0x04000000 / 0x80000000 = 3.12%		; CHECK: edge for.end -> for.end15 probability is 0x04000000 / 0x80000000 = 3.12%
; CHECK: edge for.end -> for.body probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]		; CHECK: edge for.end -> for.body probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]

for.end15:		for.end15:
call void @g4()		call void @g4()
ret void		ret void
}		}

		; Test that an irreducible loop gets heavily weighted back-edges.
		define void @test9(i32 %i, i32 %x, i32 %c) {
		entry:
		%tobool = icmp eq i32 %c, 0
		br i1 %tobool, label %if.end, label %midloop
		; CHECK: edge entry -> if.end probability is 0x30000000 / 0x80000000 = 37.50%
		; CHECK: edge entry -> midloop probability is 0x50000000 / 0x80000000 = 62.50%

		if.end:
		br label %for.cond
		; CHECK: edge if.end -> for.cond probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		for.cond:
		%i.addr.0 = phi i32 [ %inc, %for.inc ], [ 0, %if.end ]
		%cmp = icmp slt i32 %i.addr.0, %x
		br i1 %cmp, label %midloop, label %end
		; CHECK: edge for.cond -> midloop probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]
		; CHECK: edge for.cond -> end probability is 0x04000000 / 0x80000000 = 3.12%

		midloop:
		%i.addr.1 = phi i32 [ %i, %entry ], [ %i.addr.0, %for.cond ]
		%call1 = call i32 @g5()
		%tobool2 = icmp eq i32 %call1, 0
		br i1 %tobool2, label %for.inc, label %end
		; CHECK: edge midloop -> for.inc probability is 0x7c000000 / 0x80000000 = 96.88% [HOT edge]
		; CHECK: edge midloop -> end probability is 0x04000000 / 0x80000000 = 3.12%

		for.inc:
		%inc = add nsw i32 %i.addr.1, 1
		br label %for.cond
		; CHECK: edge for.inc -> for.cond probability is 0x80000000 / 0x80000000 = 100.00% [HOT edge]

		end:
		ret void
		}

This is an archive of the discontinued LLVM Phabricator instance.

[BranchProbabilityInfo] Handle irreducible loops.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 120984

include/llvm/Analysis/BranchProbabilityInfo.h

lib/Analysis/BranchProbabilityInfo.cpp

test/Analysis/BranchProbabilityInfo/loop.ll

[BranchProbabilityInfo] Handle irreducible loops.
ClosedPublic