This is an archive of the discontinued LLVM Phabricator instance.

Avoid inlining call sites in unreachable-terminated block
ClosedPublic

Authored by junbuml on Jan 26 2016, 5:43 PM.

Download Raw Diff

Details

Reviewers

dblaikie
majnemer
davidxl
manmanren
eraman
hfinkel
mcrosier

Commits

rG53907161cc20: Avoid inlining call sites in unreachable-terminated block
rL259403: Avoid inlining call sites in unreachable-terminated block

Summary

If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, there is little point in inlining the call site unless there is literally zero cost. Unlike my previous change (D15289), this change specifically handle the call sites followed by unreachable in the same basic block for call or in the normal destination for the invoke. This change could be a reasonable first step to conservatively inline call sites leading to an unreachable-terminated block while BFI / BPI is not yet available in inliner.

Diff Detail

Repository: rL LLVM

Event Timeline

junbuml updated this revision to Diff 46084.Jan 26 2016, 5:43 PM

junbuml retitled this revision from to Avoid inlining CallSites in unreachable-terminated block.

junbuml updated this object.

junbuml added reviewers: mcrosier, manmanren, majnemer, hfinkel, davidxl, eraman.

junbuml added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptJan 26 2016, 5:43 PM

The patch is straightforward and looks reasonable to me.

lib/Analysis/InlineCost.cpp
1217 ↗	(On Diff #46084)	If the normal destination of the invoke or the parent block of the call site is unreachable-terminated, ...

Addressed David's comment.

LGTM, but perhaps @davidxl or @eraman should provide the official approval.

The only concern I have is the suppression of inlining of hot calls in the following hypothetical example:

main() {

hot_call_1();
...
hot_call_N()
exit(0);

}

Even though each of the hot_call_X calls are executed once, inlining may be beneficial (due to constprop for example). Short of checking for caller to be main (as you've done in your previous patch), is there a way to distinguish this case? I suppose this pattern is very rare in real code to worry about this.

Thanks Easwaran for the comment.
I can see that the case you mention could happen with this change. Although I tried to come up with a workaround without BPI, nothing seems to show desirable results in my test environment. To cover this case, I may want to wait until BPI is hooked in inliner and revisit this to elaborate it by checking the actual coldness of the block. So, in future, we may only suppress inlining a call site in a block that is leading to an unreachable and dominated by a block of which all edges from predecessors have zero branch probability.

I mentioned this in another reply. This scenario is rare so this patch
is still good in general. If we want to handle it in the future, this
heuristics needs to be moved up and can be followed by other threshold
adjusting heuristics -- but that should not be required for this
patch.

David

Thanks David for the comment.
Based on @davidxl's comment, I added FIXME to describe the corner case mention by @dblaikie and @eraman. I will be happy to hear any opinion or suggestion in this change.

In D16616#339783, @davidxl wrote:

I mentioned this in another reply. This scenario is rare so this patch
is still good in general. If we want to handle it in the future, this
heuristics needs to be moved up and can be followed by other threshold
adjusting heuristics -- but that should not be required for this
patch.

David

Sounds reasonable. I also don't expect this kind of code to appear in real life, but one can imagine microbenchmarks or toy programs to have this pattern.

LGTM

This revision is now accepted and ready to land.Feb 1 2016, 11:54 AM

Closed by commit rL259403: Avoid inlining call sites in unreachable-terminated block (authored by junbuml). · Explain WhyFeb 1 2016, 12:59 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Analysis/

InlineCost.cpp

23 lines

test/

Transforms/

Inline/

inline_unreachable.ll

130 lines

JumpThreading/

pr26096.ll

11 lines

Diff 46571

llvm/trunk/lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 1,208 Lines • ▼ Show 20 Lines	bool CallAnalyzer::analyzeCall(CallSite CS) {

// If there is only one call of the function, and it has internal linkage,		// If there is only one call of the function, and it has internal linkage,
// the cost of inlining it drops dramatically.		// the cost of inlining it drops dramatically.
bool OnlyOneCallAndLocalLinkage = F.hasLocalLinkage() && F.hasOneUse() &&		bool OnlyOneCallAndLocalLinkage = F.hasLocalLinkage() && F.hasOneUse() &&
&F == CS.getCalledFunction();		&F == CS.getCalledFunction();
if (OnlyOneCallAndLocalLinkage)		if (OnlyOneCallAndLocalLinkage)
Cost += InlineConstants::LastCallToStaticBonus;		Cost += InlineConstants::LastCallToStaticBonus;

// If the instruction after the call, or if the normal destination of the		// If the normal destination of the invoke or the parent block of the call
// invoke is an unreachable instruction, the function is noreturn. As such,		// site is unreachable-terminated, there is little point in inlining this
// there is little point in inlining this unless there is literally zero		// unless there is literally zero cost.
// cost.		// FIXME: Note that it is possible that an unreachable-terminated block has a
		// hot entry. For example, in below scenario inlining hot_call_X() may be
		// beneficial :
		// main() {
		// hot_call_1();
		// ...
		// hot_call_N()
		// exit(0);
		// }
		// For now, we are not handling this corner case here as it is rare in real
		// code. In future, we should elaborate this based on BPI and BFI in more
		// general threshold adjusting heuristics in updateThreshold().
Instruction *Instr = CS.getInstruction();		Instruction *Instr = CS.getInstruction();
if (InvokeInst *II = dyn_cast<InvokeInst>(Instr)) {		if (InvokeInst *II = dyn_cast<InvokeInst>(Instr)) {
if (isa<UnreachableInst>(II->getNormalDest()->begin()))		if (isa<UnreachableInst>(II->getNormalDest()->getTerminator()))
Threshold = 0;		Threshold = 0;
} else if (isa<UnreachableInst>(++BasicBlock::iterator(Instr)))		} else if (isa<UnreachableInst>(Instr->getParent()->getTerminator()))
Threshold = 0;		Threshold = 0;

// If this function uses the coldcc calling convention, prefer not to inline		// If this function uses the coldcc calling convention, prefer not to inline
// it.		// it.
if (F.getCallingConv() == CallingConv::Cold)		if (F.getCallingConv() == CallingConv::Cold)
Cost += InlineConstants::ColdccPenalty;		Cost += InlineConstants::ColdccPenalty;

// Check if we're done. This can happen due to bonuses and penalties.		// Check if we're done. This can happen due to bonuses and penalties.
▲ Show 20 Lines • Show All 293 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/Inline/inline_unreachable.ll

				; RUN: opt < %s -inline -S \| FileCheck %s

				@a = global i32 4
				@_ZTIi = external global i8*

				; CHECK-LABEL: callSimpleFunction
				; CHECK: call i32 @simpleFunction
				define i32 @callSimpleFunction(i32 %idx, i32 %limit) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				%s = call i32 @simpleFunction(i32 %idx)
				store i32 %s, i32* @a
				unreachable

				if.end:
				ret i32 %idx
				}

				; CHECK-LABEL: callSmallFunction
				; CHECK-NOT: call i32 @smallFunction
				define i32 @callSmallFunction(i32 %idx, i32 %limit) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then:
				%s = call i32 @smallFunction(i32 %idx)
				store i32 %s, i32* @a
				unreachable

				if.end:
				ret i32 %idx
				}

				; CHECK-LABEL: throwSimpleException
				; CHECK: invoke i32 @simpleFunction
				define i32 @throwSimpleException(i32 %idx, i32 %limit) #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%exception = call i8* @__cxa_allocate_exception(i64 1) #0
				invoke i32 @simpleFunction(i32 %idx)
				to label %invoke.cont unwind label %lpad

				invoke.cont: ; preds = %if.then
				call void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8), i8 null) #1
				unreachable

				lpad: ; preds = %if.then
				%ll = landingpad { i8*, i32 }
				cleanup
				ret i32 %idx

				if.end: ; preds = %entry
				ret i32 %idx
				}

				; CHECK-LABEL: throwSmallException
				; CHECK-NOT: invoke i32 @smallFunction
				define i32 @throwSmallException(i32 %idx, i32 %limit) #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				%cmp = icmp sge i32 %idx, %limit
				br i1 %cmp, label %if.then, label %if.end

				if.then: ; preds = %entry
				%exception = call i8* @__cxa_allocate_exception(i64 1) #0
				invoke i32 @smallFunction(i32 %idx)
				to label %invoke.cont unwind label %lpad

				invoke.cont: ; preds = %if.then
				call void @__cxa_throw(i8* %exception, i8* bitcast (i8** @_ZTIi to i8), i8 null) #1
				unreachable

				lpad: ; preds = %if.then
				%ll = landingpad { i8*, i32 }
				cleanup
				ret i32 %idx

				if.end: ; preds = %entry
				ret i32 %idx
				}

				define i32 @simpleFunction(i32 %a) #0 {
				entry:
				%a1 = load volatile i32, i32* @a
				%x1 = add i32 %a1, %a1
				%a2 = load volatile i32, i32* @a
				%x2 = add i32 %x1, %a2
				%a3 = load volatile i32, i32* @a
				%x3 = add i32 %x2, %a3
				%a4 = load volatile i32, i32* @a
				%x4 = add i32 %x3, %a4
				%a5 = load volatile i32, i32* @a
				%x5 = add i32 %x4, %a5
				%a6 = load volatile i32, i32* @a
				%x6 = add i32 %x5, %a6
				%a7 = load volatile i32, i32* @a
				%x7 = add i32 %x6, %a6
				%a8 = load volatile i32, i32* @a
				%x8 = add i32 %x7, %a8
				%a9 = load volatile i32, i32* @a
				%x9 = add i32 %x8, %a9
				%a10 = load volatile i32, i32* @a
				%x10 = add i32 %x9, %a10
				%a11 = load volatile i32, i32* @a
				%x11 = add i32 %x10, %a11
				%a12 = load volatile i32, i32* @a
				%x12 = add i32 %x11, %a12
				%add = add i32 %x12, %a
				ret i32 %add
				}

				define i32 @smallFunction(i32 %a) {
				entry:
				%r = load volatile i32, i32* @a
				ret i32 %r
				}

				attributes #0 = { nounwind }
				attributes #1 = { noreturn }

				declare i8* @__cxa_allocate_exception(i64)
				declare i32 @__gxx_personality_v0(...)
				declare void @__cxa_throw(i8, i8, i8*)

llvm/trunk/test/Transforms/JumpThreading/pr26096.ll

	; RUN: opt -prune-eh -inline -jump-threading -S < %s \| FileCheck %s			; RUN: opt -prune-eh -inline -jump-threading -S < %s \| FileCheck %s

	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	@d = external global i32*, align 8			@d = external global i32*, align 8

	define void @fn3(i1 %B) {			define void @fn3(i1 %B) {
	entry:			entry:
	br i1 %B, label %if.end, label %if.then			br i1 %B, label %if.end, label %if.then

	if.then: ; preds = %entry			if.then: ; preds = %entry
	call void @fn2()			call void @fn2(i1 %B)
	ret void			ret void

	if.end: ; preds = %entry			if.end: ; preds = %entry
	call void @fn2()			call void @fn2(i1 %B)
	ret void			ret void
	}			}

	define internal void @fn2() unnamed_addr {			define internal void @fn2(i1 %B) unnamed_addr {
	entry:			entry:
	call void @fn1()			call void @fn1()
	call void @fn1()			call void @fn1()
	call void @fn1()			call void @fn1()
				br i1 %B, label %if.end, label %if.then
				if.then:
				unreachable

				if.end:
	unreachable			unreachable
	}			}

	; CHECK-LABEL: define internal void @fn2(			; CHECK-LABEL: define internal void @fn2(
	; CHECK: %[[LOAD:.]] = load i32, i32** @d, align 8			; CHECK: %[[LOAD:.]] = load i32, i32** @d, align 8
	; CHECK: %tobool1.i = icmp eq i32* %[[LOAD]], null			; CHECK: %tobool1.i = icmp eq i32* %[[LOAD]], null

	define internal void @fn1() unnamed_addr {			define internal void @fn1() unnamed_addr {
	Show All 35 Lines