This is an archive of the discontinued LLVM Phabricator instance.

[Inliner][NewPM] Inline functions outside of current SCC first
Needs ReviewPublic

Authored by haicheng on Nov 15 2017, 1:05 PM.

Download Raw Diff

Details

Reviewers

chandlerc
eraman
davidxl
Prazek
davide
sanjoy

Summary

When we enabled the new PM, we noticed several big regressions. One reason is that the new PM has different inline orders. The legacy PM has a step that the new PM does not have which is moving the call sites calling functions in the current SCC to the end of the iteration list. I don't know whether the new PM omits this intentionally or not, but I can see two benefits of doing this.

This step first inlines functions outside of the current SCC to discourage functions inside the same SCC inlining each other which can bloat up the code size.

Inlining a callee inside the current SCC likely makes the caller recursive. LLVM does not inline any recursive functions.

One drawback I can think of is that callsites from the same caller are stored in two places instead of one. Thus, we may have to switch function proxies more often.

This patch just copied the code from the legacy PM to the new PM. Here is the SPEC performance and code size change

	code size (%) (- is smaller)	performance (%) (+ is faster)
spec2000/ammp	-1.32	+0.02
spec2000/vortex	-1.06	-0.21
spec2006/gobmk	+0.8	+1.22
spec2006/povray	-0.11	+31.90
spec2017/leela	-1.67	-0.78
spec2017/povray	-0.24	+27.12

Diff Detail

Repository: rL LLVM

Event Timeline

haicheng created this revision.Nov 15 2017, 1:05 PM

Herald added a subscriber: mcrosier. · View Herald TranscriptNov 15 2017, 1:05 PM

ashutosh.nema added a subscriber: ashutosh.nema.Nov 15 2017, 9:33 PM

Kindly Ping

Please let me know if this is the right approach.

Using heuristic like this for inline order decision is like tossing a coin. It is very likely that doing this can hurt some cases where inlining of inner edges are important but gets blocked.

Due to current practical limitation in the inliner such as lack of the ability to inline self recursive functions, this patch can help to workaround that limitation a little, so the patch looks fine for now, though we should not depend strongly on this inline behavior in the future.

In D40097#932995, @davidxl wrote:

Using heuristic like this for inline order decision is like tossing a coin. It is very likely that doing this can hurt some cases where inlining of inner edges are important but gets blocked.

Due to current practical limitation in the inliner such as lack of the ability to inline self recursive functions, this patch can help to workaround that limitation a little, so the patch looks fine for now, though we should not depend strongly on this inline behavior in the future.

Thank you for your reply.

I agree that this heuristic is not the best, but it is consistent with what the legacy PM does so that we wouldn't have unpleasant surprise (e.g. spec2006/2017 povray listed in the summary) when we switch to the new PM. When we have a better one, we can certainly replace this heuristic.

Or I can modify the heuristic in this way: only call sites which can make the callers recursive if the callees are inlined are moved to the end of the candidate list.

Kindly Ping.

Kindly Ping (#2).

Kindly Ping (#3)

Now I only move the callsites whose callees also call the callers to the end of the inline list to delay the creation of recursive functions. Is it more acceptable?

Unfortunately, I don't know much about the inliner to accept it, but propsy for finding it and fixing.
I am sorry to see that it takes so much time for the review, I hope someone who knows about the inliner will review it soon.

Thank you, Piotr.

This patch just lets NewPM be consistent with the legacy PM to prevent big performance drop we observed when turning on NewPM.

sanjoy resigned from this revision.Jan 29 2022, 5:33 PM

Herald added a subscriber: ormris. · View Herald TranscriptJan 29 2022, 5:33 PM

Revision Contents

Path

Size

lib/

Transforms/

IPO/

Inliner.cpp

13 lines

test/

Transforms/

Inline/

cgscc-incremental-invalidate.ll

52 lines

cgscc-invalidate.ll

32 lines

cgscc-order.ll

36 lines

internal-scc-members.ll

2 lines

Diff 128252

lib/Transforms/IPO/Inliner.cpp

Show First 20 Lines • Show All 840 Lines • ▼ Show 20 Lines	for (Instruction &I : instructions(N.getFunction()))
if (auto CS = CallSite(&I))		if (auto CS = CallSite(&I))
if (Function *Callee = CS.getCalledFunction())		if (Function *Callee = CS.getCalledFunction())
if (!Callee->isDeclaration())		if (!Callee->isDeclaration())
Calls.push_back({CS, -1});		Calls.push_back({CS, -1});
}		}
if (Calls.empty())		if (Calls.empty())
return PreservedAnalyses::all();		return PreservedAnalyses::all();

// Capture updatable variables for the current SCC and RefSCC.		// Now that we have all of the call sites, move the ones whose callee also
		// calls its caller to the end of the list to delay the creation of recursive
		// functions.
		unsigned FirstCallCanBecomeRecursive = Calls.size();
		for (unsigned i = 0; i < FirstCallCanBecomeRecursive; ++i)
		if (Function *Callee = Calls[i].first.getCalledFunction()) {
		Function *Caller = Calls[i].first.getCaller();
		if ((CG.lookup(Callee))->lookup(CG.lookup(Caller)))
		std::swap(Calls[i--], Calls[--FirstCallCanBecomeRecursive]);
		}

		// Capture updatable variables for the current SC and RefSCC.
auto *C = &InitialC;		auto *C = &InitialC;
auto *RC = &C->getOuterRefSCC();		auto *RC = &C->getOuterRefSCC();

// When inlining a callee produces new call sites, we want to keep track of		// When inlining a callee produces new call sites, we want to keep track of
// the fact that they were inlined from the callee. This allows us to avoid		// the fact that they were inlined from the callee. This allows us to avoid
// infinite inlining in some obscure cases. To represent this, we use an		// infinite inlining in some obscure cases. To represent this, we use an
// index into the InlineHistory vector.		// index into the InlineHistory vector.
SmallVector<std::pair<Function *, int>, 16> InlineHistory;		SmallVector<std::pair<Function *, int>, 16> InlineHistory;
▲ Show 20 Lines • Show All 259 Lines • Show Last 20 Lines

test/Transforms/Inline/cgscc-incremental-invalidate.ll

; Test for a subtle bug when computing analyses during inlining and mutating		; Test for a subtle bug when computing analyses during inlining and mutating
; the SCC structure. Without care, this can fail to invalidate analyses.		; the SCC structure. Without care, this can fail to invalidate analyses.
;		;
; RUN: opt < %s -passes='cgscc(inline,function(verify<domtree>))' -debug-pass-manager -S 2>&1 \| FileCheck %s		; RUN: opt < %s -passes='cgscc(inline,function(verify<domtree>))' -debug-pass-manager -S 2>&1 \| FileCheck %s

; First we check that the passes run in the way we expect. Otherwise this test		; First we check that the passes run in the way we expect. Otherwise this test
; may stop testing anything.		; may stop testing anything.
;		;
; CHECK-LABEL: Starting llvm::Module pass manager run.		; CHECK-LABEL: Starting llvm::Module pass manager run.
; CHECK: Running pass: InlinerPass on (test1_f, test1_g, test1_h)		; CHECK: Running pass: InlinerPass on (test1_h, test1_g, test1_f)
; CHECK: Running analysis: FunctionAnalysisManagerCGSCCProxy on (test1_f, test1_g, test1_h)		; CHECK: Running analysis: FunctionAnalysisManagerCGSCCProxy on (test1_h, test1_g, test1_f)
; CHECK: Running analysis: DominatorTreeAnalysis on test1_f		; CHECK: Running analysis: DominatorTreeAnalysis on test1_f
; CHECK: Running analysis: DominatorTreeAnalysis on test1_g		; CHECK: Running analysis: DominatorTreeAnalysis on test1_g
; CHECK: Invalidating all non-preserved analyses for: (test1_f)		; CHECK: Invalidating all non-preserved analyses for: (test1_f)
; CHECK: Invalidating all non-preserved analyses for: test1_f		; CHECK: Invalidating all non-preserved analyses for: test1_f
; CHECK: Invalidating analysis: DominatorTreeAnalysis on test1_f		; CHECK: Invalidating analysis: DominatorTreeAnalysis on test1_f
; CHECK: Invalidating analysis: LoopAnalysis on test1_f		; CHECK: Invalidating analysis: LoopAnalysis on test1_f
; CHECK: Invalidating analysis: BranchProbabilityAnalysis on test1_f		; CHECK: Invalidating analysis: BranchProbabilityAnalysis on test1_f
; CHECK: Invalidating analysis: BlockFrequencyAnalysis on test1_f		; CHECK: Invalidating analysis: BlockFrequencyAnalysis on test1_f
; CHECK: Invalidating all non-preserved analyses for: (test1_g, test1_h)		; CHECK: Invalidating all non-preserved analyses for: (test1_h, test1_g)
; CHECK: Invalidating all non-preserved analyses for: test1_g
; CHECK: Invalidating analysis: DominatorTreeAnalysis on test1_g
; CHECK: Invalidating analysis: LoopAnalysis on test1_g
; CHECK: Invalidating analysis: BranchProbabilityAnalysis on test1_g
; CHECK: Invalidating analysis: BlockFrequencyAnalysis on test1_g
; CHECK: Invalidating all non-preserved analyses for: test1_h		; CHECK: Invalidating all non-preserved analyses for: test1_h
; CHECK: Invalidating analysis: DominatorTreeAnalysis on test1_h		; CHECK: Invalidating analysis: DominatorTreeAnalysis on test1_h
; CHECK: Invalidating analysis: LoopAnalysis on test1_h		; CHECK: Invalidating analysis: LoopAnalysis on test1_h
; CHECK: Invalidating analysis: BranchProbabilityAnalysis on test1_h		; CHECK: Invalidating analysis: BranchProbabilityAnalysis on test1_h
; CHECK: Invalidating analysis: BlockFrequencyAnalysis on test1_h		; CHECK: Invalidating analysis: BlockFrequencyAnalysis on test1_h
		; CHECK: Invalidating all non-preserved analyses for: test1_g
		; CHECK: Invalidating analysis: DominatorTreeAnalysis on test1_g
		; CHECK: Invalidating analysis: LoopAnalysis on test1_g
		; CHECK: Invalidating analysis: BranchProbabilityAnalysis on test1_g
		; CHECK: Invalidating analysis: BlockFrequencyAnalysis on test1_g
; CHECK-NOT: Invalidating analysis:		; CHECK-NOT: Invalidating analysis:
; CHECK: Starting llvm::Function pass manager run.		; CHECK: Starting llvm::Function pass manager run.
; CHECK-NEXT: Running pass: DominatorTreeVerifierPass on test1_g
; CHECK-NEXT: Running analysis: DominatorTreeAnalysis on test1_g
; CHECK-NEXT: Finished llvm::Function pass manager run.
; CHECK-NEXT: Starting llvm::Function pass manager run.
; CHECK-NEXT: Running pass: DominatorTreeVerifierPass on test1_h		; CHECK-NEXT: Running pass: DominatorTreeVerifierPass on test1_h
; CHECK-NEXT: Running analysis: DominatorTreeAnalysis on test1_h		; CHECK-NEXT: Running analysis: DominatorTreeAnalysis on test1_h
; CHECK-NEXT: Finished llvm::Function pass manager run.		; CHECK-NEXT: Finished llvm::Function pass manager run.
		; CHECK-NEXT: Starting llvm::Function pass manager run.
		; CHECK-NEXT: Running pass: DominatorTreeVerifierPass on test1_g
		; CHECK-NEXT: Running analysis: DominatorTreeAnalysis on test1_g
		; CHECK-NEXT: Finished llvm::Function pass manager run.
; CHECK-NOT: Invalidating analysis:		; CHECK-NOT: Invalidating analysis:
; CHECK: Running pass: DominatorTreeVerifierPass on test1_f		; CHECK: Running pass: DominatorTreeVerifierPass on test1_f
; CHECK-NEXT: Running analysis: DominatorTreeAnalysis on test1_f		; CHECK-NEXT: Running analysis: DominatorTreeAnalysis on test1_f

; An external function used to control branches.		; An external function used to control branches.
declare i1 @flag()		declare i1 @flag()
; CHECK-LABEL: declare i1 @flag()		; CHECK-LABEL: declare i1 @flag()

Show All 16 Lines
return:		return:
ret void		ret void
}		}

; The 'test1_' prefixed functions work to carefully test that incrementally		; The 'test1_' prefixed functions work to carefully test that incrementally
; reducing an SCC in the inliner cannot accidentially leave stale function		; reducing an SCC in the inliner cannot accidentially leave stale function
; analysis results due to failing to invalidate them for all the functions.		; analysis results due to failing to invalidate them for all the functions.

; The inliner visits this last function. It can't actually break any cycles		; We visit this function first in the inliner, and while we inline callee
; here, but because we visit this function we compute fresh analyses for it.		; perturbing the CFG, we don't inline anything else and the SCC structure
; These analyses are then invalidated when we inline callee disrupting the		; remains in tact.
; CFG, and it is important that they be freed.		define void @test1_f() {
define void @test1_h() {		; CHECK-LABEL: define void @test1_f()
; CHECK-LABEL: define void @test1_h()
entry:		entry:
call void @test1_g()		; We force this edge to survive inlining.
		call void @test1_g() noinline
; CHECK: call void @test1_g()		; CHECK: call void @test1_g()

; Pull interesting CFG into this function.		; Pull interesting CFG into this function.
call void @callee()		call void @callee()
; CHECK-NOT: call void @callee()		; CHECK-NOT: call void @callee()

ret void		ret void
; CHECK: ret void		; CHECK: ret void
Show All 17 Lines	; CHECK: call void @test1_h()
; Pull interesting CFG into this function.		; Pull interesting CFG into this function.
call void @callee()		call void @callee()
; CHECK-NOT: call void @callee()		; CHECK-NOT: call void @callee()

ret void		ret void
; CHECK: ret void		; CHECK: ret void
}		}

; We visit this function first in the inliner, and while we inline callee		; The inliner visits this last function. It can't actually break any cycles
; perturbing the CFG, we don't inline anything else and the SCC structure		; here, but because we visit this function we compute fresh analyses for it.
; remains in tact.		; These analyses are then invalidated when we inline callee disrupting the
define void @test1_f() {		; CFG, and it is important that they be freed.
; CHECK-LABEL: define void @test1_f()		define void @test1_h() {
		; CHECK-LABEL: define void @test1_h()
entry:		entry:
; We force this edge to survive inlining.		call void @test1_g()
call void @test1_g() noinline
; CHECK: call void @test1_g()		; CHECK: call void @test1_g()

; Pull interesting CFG into this function.		; Pull interesting CFG into this function.
call void @callee()		call void @callee()
; CHECK-NOT: call void @callee()		; CHECK-NOT: call void @callee()

ret void		ret void
; CHECK: ret void		; CHECK: ret void
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/Transforms/Inline/cgscc-invalidate.ll

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	; CHECK: call void @callee
ret void		ret void
; CHECK: ret void		; CHECK: ret void
}		}


; The 'test3_' prefixed functions test the scenario of not inlining preserving		; The 'test3_' prefixed functions test the scenario of not inlining preserving
; dominators after splitting an SCC into two smaller SCCs.		; dominators after splitting an SCC into two smaller SCCs.

; This function ends up split into a separate SCC, which can cause its analyses		; The function test3_f gets visited first and we end up inlining everything we
; to become stale if the splitting doesn't properly invalidate things. Also, as		; can into this routine. That splits test3_g into a separate SCC that is enqued
; a consequence of being split out, test3_f is too large to inline by the time		; for later processing.
; we get here.		define void @test3_f() {
define void @test3_g() {		; CHECK-LABEL: define void @test3_f()
; CHECK-LABEL: define void @test3_g()
entry:		entry:
; Create the second edge in the SCC cycle.		; Create the first edge in the SCC cycle.
call void @test3_f()		call void @test3_g()
		; CHECK-NOT: @test3_g()
; CHECK: call void @test3_f()		; CHECK: call void @test3_f()

; Pull interesting CFG into this function.		; Pull interesting CFG into this function.
call void @callee()		call void @callee()
; CHECK-NOT: call void @callee()		; CHECK-NOT: call void @callee()

ret void		ret void
; CHECK: ret void		; CHECK: ret void
}		}

; The second function gets visited first and we end up inlining everything we		; This function ends up split into a separate SCC, which can cause its analyses
; can into this routine. That splits test3_g into a separate SCC that is enqued		; to become stale if the splitting doesn't properly invalidate things. Also, as
; for later processing.		; a consequence of being split out, test3_f is too large to inline by the time
define void @test3_f() {		; we get here.
; CHECK-LABEL: define void @test3_f()		define void @test3_g() {
		; CHECK-LABEL: define void @test3_g()
entry:		entry:
; Create the first edge in the SCC cycle.		; Create the second edge in the SCC cycle.
call void @test3_g()		call void @test3_f()
; CHECK-NOT: @test3_g()
; CHECK: call void @test3_f()		; CHECK: call void @test3_f()

; Pull interesting CFG into this function.		; Pull interesting CFG into this function.
call void @callee()		call void @callee()
; CHECK-NOT: call void @callee()		; CHECK-NOT: call void @callee()

ret void		ret void
; CHECK: ret void		; CHECK: ret void
}		}

test/Transforms/Inline/cgscc-order.ll

This file was added.

				; RUN: opt < %s -passes='cgscc(inline)' -S -inline-threshold=30 \| FileCheck %s

				@glbl = external global i32

				define void @out() {
				store i32 0, i32* @glbl
				store i32 1, i32* @glbl
				store i32 2, i32* @glbl
				store i32 3, i32* @glbl
				store i32 4, i32* @glbl
				store i32 5, i32* @glbl
				store i32 6, i32* @glbl
				store i32 7, i32* @glbl
				store i32 8, i32* @glbl
				store i32 9, i32* @glbl
				ret void
				}

				define void @scc_a() {
				entry:
				call void @scc_b()
				call void @out()
				ret void
				}

				define void @scc_b() {
				; CHECK-LABEL: define void @scc_b(
				; Make sure out is inlined into scc_a first so that scc_a is too big to be
				; inined into scc_b
				entry:
				; CHECK: call
				; CHECK-NEXT: ret
				call void @scc_a()
				ret void
				}

test/Transforms/Inline/internal-scc-members.ll

	Show All 12 Lines
	entry:			entry:
	call void @test1_scc1()			call void @test1_scc1()
	ret void			ret void
	}			}

	; CHECK-NOT: @test1_scc1			; CHECK-NOT: @test1_scc1
	define internal void @test1_scc1() {			define internal void @test1_scc1() {
	entry:			entry:
	call void @test1_scc0()			call void @test1_scc0() noinline
	ret void			ret void
	}			}

	; CHECK-LABEL: define void @test1()			; CHECK-LABEL: define void @test1()
	; CHECK: call void @test1_scc0()			; CHECK: call void @test1_scc0()
	define void @test1() {			define void @test1() {
	entry:			entry:
	call void @test1_scc0() noinline			call void @test1_scc0() noinline
	ret void			ret void
	}			}