This is an archive of the discontinued LLVM Phabricator instance.

[BPI] Adjust the probability for floating point unordered comparison
ClosedPublic

Authored by Carrot on Jul 25 2019, 3:45 PM.

Download Raw Diff

Details

Reviewers

ebrevnov
bkramer
efriedma
fhahn
hfinkel
scanon
xbolva00

Commits

rGb329e0728b3e: [BPI] Adjust the probability for floating point unordered comparison
rL371541: [BPI] Adjust the probability for floating point unordered comparison

Summary

Since NaN is very rare in normal programs, so the probability for floating point unordered comparison should be extremely small. Current probability is 3/8, it is too large, this patch changes it to a tiny number.

Diff Detail

Event Timeline

Carrot created this revision.Jul 25 2019, 3:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2019, 3:45 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I've never looked to this piece of code before. For that reason I would like some one more familiar with this part to make a review.

xbolva00 added reviewers: efriedma, fhahn, hfinkel.Jul 26 2019, 3:46 AM

scanon requested changes to this revision.Jul 26 2019, 6:33 AM

scanon added a subscriber: scanon.

scanon added inline comments.

lib/Analysis/BranchProbabilityInfo.cpp
122	How was this weight selected? Why 2*20 - 1 in particular? If it's just to match IH_TAKEN_WEIGHT, let's use that value instead of repeating it here. The comment could probable use a little work, too. I broadly agree with the effect* of this change, but I don't think that unordered is that rare--rather, NaN is often an indication that something has gone wrong, and it makes sense to move that off the hot path.

This revision now requires changes to proceed.Jul 26 2019, 6:33 AM

Comment change.

lib/Analysis/BranchProbabilityInfo.cpp
122	It has no semantic relation with IH_TAKEN_WEIGHT, I looked for an appropriate weight to represent the extremely likely probability, and IH_TAKEN_WEIGHT has the similar effect on control flow, so I used the same number. Do you still think it's better to use IH_TAKEN_WEIGHT here, or what probability is more appropriate here?

What about

if(isnan(x)) ?

LLVM IR has __isnan call.

isnan() is lowered to fcmp uno, in this case the taken probability may be higher, but still much smaller than ordered result, and the usage of isnan() is already very rare, most of uno comparisons are generated from different math library functions.
On the other side I guess most of the taken uno comparisons in a correct program comes from explicitly isnan() call.

ping

It's not that NaN is rare in normal programs, or that it indicates a bug in the code. It's that testing for NaN is usually an indication that you're testing for an exceptional case, and it makes sense to move those off the hot path (i.e. NaN is actually pretty common, but the likelihood of handling it on the normal-value path through code is small).

test/CodeGen/SystemZ/call-05.ll
456	I know nothing about Z architecture; is this codegen change actually an improvement? What architectural considerations are at work here?

Carrot updated this revision to Diff 214392.Aug 9 2019, 9:39 AM

Carrot marked an inline comment as done.

Carrot added inline comments.

test/CodeGen/SystemZ/call-05.ll
456	In this case, the machine code if conversion only considers branch and a target with non trivial probability, the original probability is 3/8, large enough to enable this if conversion. After reducing the probability, if conversion thinks it is too small, not worth the effort to optimize it.

What will this do to software that does frequently use NaNs and doesn't treat them as exceptional?

static prediction is intended to help common use cases, so for corner cases when NaNs are used frequently, user will need to resort to source annotation (such as builtin_expect).

ping

I think this change makes sense for plain static mode.

some benchmark numbers?

I think this patch was motivated by the perf of some (micro)benchmarks attached in @Carrot’s previous patch and patch fixes the perf issue.

But I agree, more benchmark results would be useful.

In D65303#1661329, @davidxl wrote:

some benchmark numbers?

It brings back 35% regression of Evgeniy's micro benchmark.

SPEC 2006 is running.

spec 2006 int result is 39.2 vs 39.3, so no change.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 10 2019, 10:25 AM

Closed by commit rL371541: [BPI] Adjust the probability for floating point unordered comparison (authored by Carrot). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Analysis/

BranchProbabilityInfo.cpp

16 lines

test/

Analysis/

BranchProbabilityInfo/

fcmp.ll

41 lines

CodeGen/

SystemZ/

call-05.ll

3 lines

Diff 214392

lib/Analysis/BranchProbabilityInfo.cpp

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines
static const uint32_t PH_NONTAKEN_WEIGHT = 12;		static const uint32_t PH_NONTAKEN_WEIGHT = 12;

static const uint32_t ZH_TAKEN_WEIGHT = 20;		static const uint32_t ZH_TAKEN_WEIGHT = 20;
static const uint32_t ZH_NONTAKEN_WEIGHT = 12;		static const uint32_t ZH_NONTAKEN_WEIGHT = 12;

static const uint32_t FPH_TAKEN_WEIGHT = 20;		static const uint32_t FPH_TAKEN_WEIGHT = 20;
static const uint32_t FPH_NONTAKEN_WEIGHT = 12;		static const uint32_t FPH_NONTAKEN_WEIGHT = 12;

		/// This is the probability for an ordered floating point comparison.
		static const uint32_t FPH_ORD_WEIGHT = 1024 * 1024 - 1;
		scanonUnsubmitted Not Done Reply Inline Actions How was this weight selected? Why 2*20 - 1 in particular? If it's just to match IH_TAKEN_WEIGHT, let's use that value instead of repeating it here. The comment could probable use a little work, too. I broadly agree with the effect* of this change, but I don't think that unordered is that rare--rather, NaN is often an indication that something has gone wrong, and it makes sense to move that off the hot path. scanon: How was this weight selected? Why 2**20 - 1 in particular? If it's just to match…
		CarrotAuthorUnsubmitted Done Reply Inline Actions It has no semantic relation with IH_TAKEN_WEIGHT, I looked for an appropriate weight to represent the extremely likely probability, and IH_TAKEN_WEIGHT has the similar effect on control flow, so I used the same number. Do you still think it's better to use IH_TAKEN_WEIGHT here, or what probability is more appropriate here? Carrot: It has no semantic relation with IH_TAKEN_WEIGHT, I looked for an appropriate weight to…
		/// This is the probability for an unordered floating point comparison, it means
		/// one or two of the operands are NaN. Usually it is used to test for an
		/// exceptional case, so the result is unlikely.
		static const uint32_t FPH_UNO_WEIGHT = 1;

/// Invoke-terminating normal branch taken weight		/// Invoke-terminating normal branch taken weight
///		///
/// This is the weight for branching to the normal destination of an invoke		/// This is the weight for branching to the normal destination of an invoke
/// instruction. We expect this to happen most of the time. Set the weight to an		/// instruction. We expect this to happen most of the time. Set the weight to an
/// absurdly high value so that nested loops subsume it.		/// absurdly high value so that nested loops subsume it.
static const uint32_t IH_TAKEN_WEIGHT = 1024 * 1024 - 1;		static const uint32_t IH_TAKEN_WEIGHT = 1024 * 1024 - 1;

/// Invoke-terminating normal branch not-taken weight.		/// Invoke-terminating normal branch not-taken weight.
▲ Show 20 Lines • Show All 644 Lines • ▼ Show 20 Lines	bool BranchProbabilityInfo::calcFloatingPointHeuristics(const BasicBlock *BB) {
if (!BI \|\| !BI->isConditional())		if (!BI \|\| !BI->isConditional())
return false;		return false;

Value *Cond = BI->getCondition();		Value *Cond = BI->getCondition();
FCmpInst *FCmp = dyn_cast<FCmpInst>(Cond);		FCmpInst *FCmp = dyn_cast<FCmpInst>(Cond);
if (!FCmp)		if (!FCmp)
return false;		return false;

		uint32_t TakenWeight = FPH_TAKEN_WEIGHT;
		uint32_t NontakenWeight = FPH_NONTAKEN_WEIGHT;
bool isProb;		bool isProb;
if (FCmp->isEquality()) {		if (FCmp->isEquality()) {
// f1 == f2 -> Unlikely		// f1 == f2 -> Unlikely
// f1 != f2 -> Likely		// f1 != f2 -> Likely
isProb = !FCmp->isTrueWhenEqual();		isProb = !FCmp->isTrueWhenEqual();
} else if (FCmp->getPredicate() == FCmpInst::FCMP_ORD) {		} else if (FCmp->getPredicate() == FCmpInst::FCMP_ORD) {
// !isnan -> Likely		// !isnan -> Likely
isProb = true;		isProb = true;
		TakenWeight = FPH_ORD_WEIGHT;
		NontakenWeight = FPH_UNO_WEIGHT;
} else if (FCmp->getPredicate() == FCmpInst::FCMP_UNO) {		} else if (FCmp->getPredicate() == FCmpInst::FCMP_UNO) {
// isnan -> Unlikely		// isnan -> Unlikely
isProb = false;		isProb = false;
		TakenWeight = FPH_ORD_WEIGHT;
		NontakenWeight = FPH_UNO_WEIGHT;
} else {		} else {
return false;		return false;
}		}

unsigned TakenIdx = 0, NonTakenIdx = 1;		unsigned TakenIdx = 0, NonTakenIdx = 1;

if (!isProb)		if (!isProb)
std::swap(TakenIdx, NonTakenIdx);		std::swap(TakenIdx, NonTakenIdx);

BranchProbability TakenProb(FPH_TAKEN_WEIGHT,		BranchProbability TakenProb(TakenWeight, TakenWeight + NontakenWeight);
FPH_TAKEN_WEIGHT + FPH_NONTAKEN_WEIGHT);
setEdgeProbability(BB, TakenIdx, TakenProb);		setEdgeProbability(BB, TakenIdx, TakenProb);
setEdgeProbability(BB, NonTakenIdx, TakenProb.getCompl());		setEdgeProbability(BB, NonTakenIdx, TakenProb.getCompl());
return true;		return true;
}		}

bool BranchProbabilityInfo::calcInvokeHeuristics(const BasicBlock *BB) {		bool BranchProbabilityInfo::calcInvokeHeuristics(const BasicBlock *BB) {
const InvokeInst *II = dyn_cast<InvokeInst>(BB->getTerminator());		const InvokeInst *II = dyn_cast<InvokeInst>(BB->getTerminator());
if (!II)		if (!II)
▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

test/Analysis/BranchProbabilityInfo/fcmp.ll

				; RUN: opt < %s -analyze -branch-prob \| FileCheck %s

				; This function tests the floating point unorder comparison. The probability
				; of NaN should be extremely small.
				; CHECK: Printing analysis 'Branch Probability Analysis' for function 'uno'
				; CHECK: edge -> a probability is 0x00000800 / 0x80000000 = 0.00%
				; CHECK: edge -> b probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]

				define void @uno(float %val1, float %val2) {
				%cond = fcmp uno float %val1, %val2
				br i1 %cond, label %a, label %b

				a:
				call void @fa()
				ret void

				b:
				call void @fb()
				ret void
				}

				; This function tests the floating point order comparison.
				; CHECK: Printing analysis 'Branch Probability Analysis' for function 'ord'
				; CHECK: edge -> a probability is 0x7ffff800 / 0x80000000 = 100.00% [HOT edge]
				; CHECK: edge -> b probability is 0x00000800 / 0x80000000 = 0.00%

				define void @ord(float %val1, float %val2) {
				%cond = fcmp ord float %val1, %val2
				br i1 %cond, label %a, label %b

				a:
				call void @fa()
				ret void

				b:
				call void @fb()
				ret void
				}

				declare void @fa()
				declare void @fb()

test/CodeGen/SystemZ/call-05.ll

Show First 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	b:
store i32 1, i32 *@var;		store i32 1, i32 *@var;
ret void		ret void
}		}

; Check a conditional sibling call - float uno compare.		; Check a conditional sibling call - float uno compare.
define void @f25(float %val1, float %val2) {		define void @f25(float %val1, float %val2) {
; CHECK-LABEL: f25:		; CHECK-LABEL: f25:
; CHECK: cebr %f0, %f2		; CHECK: cebr %f0, %f2
; CHECK: bor %r1		; CHECK: jo
; CHECK: br %r14		; CHECK: br %r14
		; CHECK: br %r1
		scanonUnsubmitted Not Done Reply Inline Actions I know nothing about Z architecture; is this codegen change actually an improvement? What architectural considerations are at work here? scanon: I know nothing about Z architecture; is this codegen change actually an improvement? What…
		CarrotAuthorUnsubmitted Done Reply Inline Actions In this case, the machine code if conversion only considers branch and a target with non trivial probability, the original probability is 3/8, large enough to enable this if conversion. After reducing the probability, if conversion thinks it is too small, not worth the effort to optimize it. Carrot: In this case, the machine code if conversion only considers branch and a target with non…
%fun_a = load volatile void() , void()* @fun_a;		%fun_a = load volatile void() , void()* @fun_a;
%cond = fcmp uno float %val1, %val2;		%cond = fcmp uno float %val1, %val2;
br i1 %cond, label %a, label %b;		br i1 %cond, label %a, label %b;

a:		a:
tail call void %fun_a()		tail call void %fun_a()
ret void		ret void

b:		b:
store i32 1, i32 *@var;		store i32 1, i32 *@var;
ret void		ret void
}		}