This is an archive of the discontinued LLVM Phabricator instance.

[InlineCost] Fix bug 42084: remember negative result when computing full inline cost
Needs RevisionPublic

Authored by yrouban on Jun 24 2019, 2:53 AM.

Download Raw Diff

Details

Reviewers

fedor.sergeev
chandlerc
haicheng
eraman
xbolva00

Summary

This is a minimal fix for the bug https://bugs.llvm.org/show_bug.cgi?id=42084 extracted from the patch D63058.
The rest of D63058 could be treated as a feature.

Diff Detail

Event Timeline

yrouban created this revision.Jun 24 2019, 2:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2019, 2:53 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

yrouban set the repository for this revision to rG LLVM Github Monorepo.Jun 24 2019, 2:54 AM

Please, see my comment to PR42084: https://bugs.llvm.org/show_bug.cgi?id=42084#c3.
A core of the problem is in two difference Cost-vs-Threshold checks used in analyzeBlock (Cost >= Threshold) and analyzeCall (Cost < max(1, Threshold)).
I believe a proper fix for this bug would be to use a unified Cost-vs-Threshold check everywhere.

This revision now requires changes to proceed.Jun 24 2019, 9:25 AM

In D63706#1555837, @fedor.sergeev wrote:

Please, see my comment to PR42084: https://bugs.llvm.org/show_bug.cgi?id=42084#c3.
A core of the problem is in two difference Cost-vs-Threshold checks used in analyzeBlock (Cost >= Threshold) and analyzeCall (Cost < max(1, Threshold)).
I believe a proper fix for this bug would be to use a unified Cost-vs-Threshold check everywhere.

I agree that those two checks seem to be inconsistent with each other, but I insist on my fix. It makes the method analyzeBlock() return same result (negative or positive) regardless of the flag ComputeFullInlineCost.
The root cause is that the analyzeBlock() returns different results (negative or positive) for different ComputeFullInlineCost. And the checks you mentioned if fixed could just hide this difference.

In D63706#1556918, @yrouban wrote:

In D63706#1555837, @fedor.sergeev wrote:

Please, see my comment to PR42084: https://bugs.llvm.org/show_bug.cgi?id=42084#c3.
A core of the problem is in two difference Cost-vs-Threshold checks used in analyzeBlock (Cost >= Threshold) and analyzeCall (Cost < max(1, Threshold)).
I believe a proper fix for this bug would be to use a unified Cost-vs-Threshold check everywhere.

I agree that those two checks seem to be inconsistent with each other, but I insist on my fix. It makes the method analyzeBlock() return same result (negative or positive) regardless of the flag ComputeFullInlineCost.
The root cause is that the analyzeBlock() returns different results (negative or positive) for different ComputeFullInlineCost. And the checks you mentioned if fixed could just hide this difference.

The root cause for what?

Currently as implemented ComputeFullInlineCost implies traversing *all* the instructions in order to:

find some "interesting" constructs that prohibit inlining, as presence of these constructs is interesting to inline-cost users by itself
compute full cost (e.g. as though Threshold was infinite)

When ComputeFullInlineCost is false we allow to make shortcuts and skip as many calculations as possible if we can prove that inlining result is negative.

In line with this definition, analyzeBlock's negative return value is only used to terminate the walk through the blocks,
so returning false or true only changes amount of blocks being traversed. Which is a perfectly good and expected effect for ComputeFullInlineCost.

With your suggested change in ComputeFullInlineCost mode we:

continue traversing a single block in analyzeBlock as before
start detecting Cost-vs-Threshold violation
stop traversal of blocks in analyzeCall on Cost-vs-Threshold violation as detected by analyzeBlock

That leads to early termination of the walk through blocks, which I believe is not intended.

An example of inline-cost usage for the purpose of "inline body investigation" on inlining validity is in SampleProfileLoader::inlineCallInstruction.
Comments from it:

// Checks if there is anything in the reachable portion of the callee at
// this callsite that makes this inlining potentially illegal.

It is interesting that initial version of D37779 that introduced this usage was initially setting "kinda infinite" Threshold in order to reach the same effect.

It does seem that we need a more disciplined interface for the "investigation" mode, but in absence of that ComputeFullInlineCost mode needs to adhere to the initial idea of traversing *all* the reachable instructions.

xbolva00 resigned from this revision.Jul 29 2019, 10:21 AM

Revision Contents

Path

Size

llvm/

lib/

Analysis/

InlineCost.cpp

11 lines

test/

Transforms/

Inline/

inline_negative_result.ll

20 lines

Diff 206175

llvm/lib/Analysis/InlineCost.cpp

Show First 20 Lines • Show All 1,574 Lines • ▼ Show 20 Lines
/// This method walks the analyzer over every instruction in the given basic		/// This method walks the analyzer over every instruction in the given basic
/// block and accounts for their cost during inlining at this callsite. It		/// block and accounts for their cost during inlining at this callsite. It
/// aborts early if the threshold has been exceeded or an impossible to inline		/// aborts early if the threshold has been exceeded or an impossible to inline
/// construct has been detected. It returns false if inlining is no longer		/// construct has been detected. It returns false if inlining is no longer
/// viable, and true if inlining remains viable.		/// viable, and true if inlining remains viable.
InlineResult		InlineResult
CallAnalyzer::analyzeBlock(BasicBlock *BB,		CallAnalyzer::analyzeBlock(BasicBlock *BB,
SmallPtrSetImpl<const Value *> &EphValues) {		SmallPtrSetImpl<const Value *> &EphValues) {
		bool Result = true;

for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {		for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
// FIXME: Currently, the number of instructions in a function regardless of		// FIXME: Currently, the number of instructions in a function regardless of
// our ability to simplify them during inline to constants or dead code,		// our ability to simplify them during inline to constants or dead code,
// are actually used by the vector bonus heuristic. As long as that's true,		// are actually used by the vector bonus heuristic. As long as that's true,
// we have to special case debug intrinsics here to prevent differences in		// we have to special case debug intrinsics here to prevent differences in
// inlining due to debug symbols. Eventually, the number of unsimplified		// inlining due to debug symbols. Eventually, the number of unsimplified
// instructions shouldn't factor into the cost computation, but until then,		// instructions shouldn't factor into the cost computation, but until then,
// hack around it here.		// hack around it here.
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	if (IsCallerRecursive &&
<< NV("Callee", &F) << " is " << NV("InlineResult", IR.message)		<< NV("Callee", &F) << " is " << NV("InlineResult", IR.message)
<< ". Cost is not fully computed";		<< ". Cost is not fully computed";
});		});
return IR;		return IR;
}		}

// Check if we've passed the maximum possible threshold so we don't spin in		// Check if we've passed the maximum possible threshold so we don't spin in
// huge basic blocks that will never inline.		// huge basic blocks that will never inline.
if (Cost >= Threshold && !ComputeFullInlineCost)		if (Cost >= Threshold) {
return false;		Result = false;
		if (!ComputeFullInlineCost)
		break;
		}
}		}

return true;		return Result;
}		}

/// Compute the base pointer and cumulative constant offsets for V.		/// Compute the base pointer and cumulative constant offsets for V.
///		///
/// This strips all constant offsets off of V, leaving it the base pointer, and		/// This strips all constant offsets off of V, leaving it the base pointer, and
/// accumulates the total constant offset applied in the returned constant. It		/// accumulates the total constant offset applied in the returned constant. It
/// returns 0 if V is not a pointer, and returns the constant '0' if there are		/// returns 0 if V is not a pointer, and returns the constant '0' if there are
/// no constant offsets applied.		/// no constant offsets applied.
▲ Show 20 Lines • Show All 574 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/inline_negative_result.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; RUN: opt < %s -inline -S -inline-remark-attribute \| FileCheck %s
	; RUN: opt < %s -inline -S \| FileCheck %s			; RUN: opt < %s -inline -S -inline-remark-attribute --pass-remarks-missed=inline --pass-remarks-analysis=inline --pass-remarks=inline \| FileCheck %s
				; RUN: opt < %s -inline -S -inline-remark-attribute -inline-cost-full=true \| FileCheck %s
				; RUN: opt < %s -inline -S -inline-remark-attribute -inline-cost-full=false \| FileCheck %s

	; PR42084			; PR42084
				; The test checks that inline remarks do not change inline decisions.

	define internal fastcc void @func4() {			define internal fastcc void @func4() {
	; CHECK-LABEL: @func4(			; CHECK-LABEL: @func4(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[FOR_COND:%.*]]			; CHECK-NEXT: br label [[FOR_COND:%.*]]
	; CHECK: for.cond:			; CHECK: for.cond:
	; CHECK-NEXT: tail call void (...) @g()			; CHECK-NEXT: tail call void (...) @g() [[INLINE_REMARK0:#[0-9]+]]
	; CHECK-NEXT: br label [[FOR_COND]]			; CHECK-NEXT: br label [[FOR_COND]]
	;			;
	entry:			entry:
	br label %for.cond			br label %for.cond

	for.cond:			for.cond:
	tail call void (...) @g()			tail call void (...) @g()
	br label %for.cond			br label %for.cond
	}			}

	define internal fastcc void @func3() {			define internal fastcc void @func3() {
	; CHECK-LABEL: @func3(			; CHECK-LABEL: @func3(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: tail call fastcc void @func4()			; CHECK-NEXT: tail call fastcc void @func4() [[INLINE_REMARK1:#[0-9]+]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	tail call fastcc void @func4()			tail call fastcc void @func4()
	unreachable			unreachable
	}			}

	define internal fastcc void @func2() {			define internal fastcc void @func2() {
	; CHECK-LABEL: @func2(			; CHECK-LABEL: @func2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: tail call fastcc void @func3()			; CHECK-NEXT: tail call fastcc void @func3() [[INLINE_REMARK1:#[0-9]+]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	tail call fastcc void @func3()			tail call fastcc void @func3()
	unreachable			unreachable
	}			}

	define internal fastcc void @func1() {			define internal fastcc void @func1() {
	; CHECK-LABEL: @func1(			; CHECK-LABEL: @func1(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: tail call fastcc void @func2()			; CHECK-NEXT: tail call fastcc void @func2() [[INLINE_REMARK1:#[0-9]+]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	tail call fastcc void @func2()			tail call fastcc void @func2()
	unreachable			unreachable
	}			}

	define i32 @main() {			define i32 @main() {
	; CHECK-LABEL: @main(			; CHECK-LABEL: @main(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: tail call fastcc void @func1()			; CHECK-NEXT: tail call fastcc void @func1() [[INLINE_REMARK1:#[0-9]+]]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	tail call fastcc void @func1()			tail call fastcc void @func1()
	unreachable			unreachable
	}			}

	declare void @g(...)			declare void @g(...)

				; CHECK: attributes [[INLINE_REMARK0]] = { "inline-remark"="unavailable definition" }
				; CHECK: attributes [[INLINE_REMARK1]] = { "inline-remark"="(cost=0, threshold=0)" }