This is an archive of the discontinued LLVM Phabricator instance.

[Pipelines] Enable EarlyCSE after CoroCleanup to avoid runtime performance losses (5/5)
AbandonedPublic

Authored by ChuanqiXu on Apr 25 2022, 12:43 AM.

Download Raw Diff

Details

Reviewers

aeubanks
rjmccall
jyknight
efriedma

Summary

After the previous patch landed, the compiler wouldn't optimize readnone function if we enabled coroutine. This is not good . This patch tries to fix the problem by enabling EarlyCSE pass after CoroSplit. I think this is the price we couldn't avoid.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Apr 25 2022, 12:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 12:43 AM

Herald added subscribers: ormris, wenlei, steven_wu, hiraditya. · View Herald Transcript

ChuanqiXu requested review of this revision.Apr 25 2022, 12:43 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 12:43 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

ChuanqiXu added reviewers: aeubanks, rjmccall, jyknight.Apr 25 2022, 1:18 AM

ChuanqiXu added a parent revision: D124363: [Coroutines] Don't optimize readnone function before we split coroutine (4/5).Apr 25 2022, 1:33 AM

Harbormaster completed remote builds in B161112: Diff 424834.Apr 25 2022, 3:20 AM

would it make sense to put all of the coroutine lower passes right at the beginning of the pipeline? e.g. around LowerExpectIntrinsicPass? is there a reason CoroSplit is interleaved in the CGSCC pass manager?

I'd like to understand the constraints better before going forward with a solution that currently seems unprincipled to me

In D124364#3472330, @aeubanks wrote:

would it make sense to put all of the coroutine lower passes right at the beginning of the pipeline? e.g. around LowerExpectIntrinsicPass? is there a reason CoroSplit is interleaved in the CGSCC pass manager?

I'd like to understand the constraints better before going forward with a solution that currently seems unprincipled to me

there's also been some talk about building infrastructure to conditionally running passes if some other pass has done some transformation, but ideally we'd have a simpler solution here

In D124364#3472330, @aeubanks wrote:

would it make sense to put all of the coroutine lower passes right at the beginning of the pipeline? e.g. around LowerExpectIntrinsicPass? is there a reason CoroSplit is interleaved in the CGSCC pass manager?

I'd like to understand the constraints better before going forward with a solution that currently seems unprincipled to me

We couldn't forward CoroSplit pass. We would lose many optimization opportunities in that way. Here is the background:

(1) CoroSplit pass would split a coroutine into multiple functions.
(2) The LLVM compiler is much better at optimizing a function than interprocedural optimizations (IPO). In fact, the most effective IPO in LLVM now is inlining, which would enable function optimization after inlining another function.
(3) After CoroSplit, it is rare if we could get the original function by inlining.
(4) As a result, we could find that CoroSplit would break function level optimization opportunities and we couldn't save it by IPO.
(5) So we would try to run as many optimization passes before CoroSplit as possible.

So we couldn't run CoroSplit pass at the very beginning.

is there a reason CoroSplit is interleaved in the CGSCC pass manager?

From the statement above, we could know that it is better to run CoroSplit after inlining to inline calls in a coroutine to get better optimization chances. So the question is converted to "Why CoroSplit pass is in CGSCC pass instead of behind it". The reason here is that we couldn't inline a unlowered coroutine in another unlowered coroutine otherwise CoroSplit coulnd't handle it (due to the design of coroutine). Due to the potential inlining, the unlowered coroutines are not allowed to be inlined. But it is no harmful to inline a lowered coroutine. And it is a necessary to enable an optimization pass called CoroElide to inline a lowered coroutine into a unlowered coroutine. So here are the reasons we put CoroSplit in CGSCC passes:
(1) We need to run inlining before CoroSplit.
(2) We need to run inlining after CoroSplit too.

Hope I state things clear enough : )

ChuanqiXu added a reviewer: efriedma.Apr 25 2022, 8:32 PM

Plan changes due to the previous patch changes its plan too

Now we prefer https://reviews.llvm.org/D125293

Revision Contents

Path

Size

llvm/

lib/

Passes/

PassBuilderPipelines.cpp

6 lines

test/

Other/

new-pm-defaults.ll

2 lines

new-pm-thinlto-defaults.ll

1 line

new-pm-thinlto-postlink-pgo-defaults.ll

1 line

new-pm-thinlto-postlink-samplepgo-defaults.ll

1 line

Transforms/

Coroutines/

coro-readnone-04.ll

85 lines

Diff 424834

llvm/lib/Passes/PassBuilderPipelines.cpp

Show First 20 Lines • Show All 1,172 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,

for (auto &C : OptimizerEarlyEPCallbacks)		for (auto &C : OptimizerEarlyEPCallbacks)
C(MPM, Level);		C(MPM, Level);

FunctionPassManager OptimizePM;		FunctionPassManager OptimizePM;
OptimizePM.addPass(Float2IntPass());		OptimizePM.addPass(Float2IntPass());
OptimizePM.addPass(LowerConstantIntrinsicsPass());		OptimizePM.addPass(LowerConstantIntrinsicsPass());

if (EnableMatrix) {		if (EnableMatrix)
OptimizePM.addPass(LowerMatrixIntrinsicsPass());		OptimizePM.addPass(LowerMatrixIntrinsicsPass());

OptimizePM.addPass(EarlyCSEPass());		OptimizePM.addPass(EarlyCSEPass());
}

// FIXME: We need to run some loop optimizations to re-rotate loops after		// FIXME: We need to run some loop optimizations to re-rotate loops after
// simplifycfg and others undo their rotation.		// simplifycfg and others undo their rotation.

// Optimize the loop execution. These passes operate on entire loop nests		// Optimize the loop execution. These passes operate on entire loop nests
// rather than on each loop in an inside-out manner, and so they are actually		// rather than on each loop in an inside-out manner, and so they are actually
// function passes.		// function passes.

▲ Show 20 Lines • Show All 677 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines
	; CHECK-DEFAULT-NEXT: Running pass: EliminateAvailableExternallyPass			; CHECK-DEFAULT-NEXT: Running pass: EliminateAvailableExternallyPass
	; CHECK-LTO-NOT: Running pass: EliminateAvailableExternallyPass			; CHECK-LTO-NOT: Running pass: EliminateAvailableExternallyPass
	; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RecomputeGlobalsAAPass			; CHECK-O-NEXT: Running pass: RecomputeGlobalsAAPass
	; CHECK-EP-OPTIMIZER-EARLY: Running pass: NoOpModulePass			; CHECK-EP-OPTIMIZER-EARLY: Running pass: NoOpModulePass
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass on foo			; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass on foo
	; CHECK-MATRIX: Running pass: LowerMatrixIntrinsicsPass on f			; CHECK-MATRIX: Running pass: LowerMatrixIntrinsicsPass on f
	; CHECK-MATRIX-NEXT: Running pass: EarlyCSEPass on f			; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-EP-VECTORIZER-START-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-VECTORIZER-START-NEXT: Running pass: NoOpFunctionPass
	; CHECK-EXT: Running pass: {{.*}}::Bye on foo			; CHECK-EXT: Running pass: {{.*}}::Bye on foo
	; CHECK-NOEXT: {{^}}			; CHECK-NOEXT: {{^}}
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: GlobalOptPass			; CHECK-POSTLINK-O-NEXT: Running pass: GlobalOptPass
	; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-POSTLINK-O-NEXT: Running pass: EliminateAvailableExternallyPass			; CHECK-POSTLINK-O-NEXT: Running pass: EliminateAvailableExternallyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-POSTLINK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: RecomputeGlobalsAAPass			; CHECK-POSTLINK-O-NEXT: Running pass: RecomputeGlobalsAAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: Float2IntPass			; CHECK-POSTLINK-O-NEXT: Running pass: Float2IntPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-POSTLINK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
				; CHECK-POSTLINK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass			; CHECK-POSTLINK-O-NEXT: Running pass: LCSSAPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-POSTLINK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis			; CHECK-POSTLINK-O-NEXT: Running analysis: BlockFrequencyAnalysis
	▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: GlobalOptPass			; CHECK-O-NEXT: Running pass: GlobalOptPass
	; CHECK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass			; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass
	; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RecomputeGlobalsAAPass			; CHECK-O-NEXT: Running pass: RecomputeGlobalsAAPass
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass on foo			; CHECK-O-NEXT: Running pass: LoopSimplifyPass on foo
	; CHECK-O-NEXT: Running pass: LCSSAPass on foo			; CHECK-O-NEXT: Running pass: LCSSAPass on foo
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: GlobalOptPass			; CHECK-O-NEXT: Running pass: GlobalOptPass
	; CHECK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass			; CHECK-O-NEXT: Running pass: EliminateAvailableExternallyPass
	; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass			; CHECK-O-NEXT: Running pass: ReversePostOrderFunctionAttrsPass
	; CHECK-O-NEXT: Running pass: RecomputeGlobalsAAPass			; CHECK-O-NEXT: Running pass: RecomputeGlobalsAAPass
	; CHECK-O-NEXT: Running pass: Float2IntPass			; CHECK-O-NEXT: Running pass: Float2IntPass
	; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass			; CHECK-O-NEXT: Running pass: LowerConstantIntrinsicsPass
	; CHECK-EXT: Running pass: {{.*}}::Bye			; CHECK-EXT: Running pass: {{.*}}::Bye
				; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running pass: LoopSimplifyPass			; CHECK-O-NEXT: Running pass: LoopSimplifyPass
	; CHECK-O-NEXT: Running pass: LCSSAPass			; CHECK-O-NEXT: Running pass: LCSSAPass
	; CHECK-O-NEXT: Running pass: LoopRotatePass			; CHECK-O-NEXT: Running pass: LoopRotatePass
	; CHECK-O-NEXT: Running pass: LoopDeletionPass			; CHECK-O-NEXT: Running pass: LoopDeletionPass
	; CHECK-O-NEXT: Running pass: LoopDistributePass			; CHECK-O-NEXT: Running pass: LoopDistributePass
	; CHECK-O-NEXT: Running pass: InjectTLIMappings			; CHECK-O-NEXT: Running pass: InjectTLIMappings
	; CHECK-O-NEXT: Running pass: LoopVectorizePass			; CHECK-O-NEXT: Running pass: LoopVectorizePass
	; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass			; CHECK-O-NEXT: Running pass: LoopLoadEliminationPass
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

llvm/test/Transforms/Coroutines/coro-readnone-04.ll

This file was added.

				; Tests that the readnone function which don't cross suspend points could be optimized correctly.
				; RUN: opt < %s -S -passes='default<O3>' -opaque-pointers \| FileCheck %s

				define ptr @f() "coroutine.presplit" {
				entry:
				%id = call token @llvm.coro.id(i32 0, i8* null, i8* null, i8* null)
				%size = call i32 @llvm.coro.size.i32()
				%alloc = call i8* @malloc(i32 %size)
				%hdl = call i8* @llvm.coro.begin(token %id, i8* %alloc)
				%sus_result = call i8 @llvm.coro.suspend(token none, i1 false)
				switch i8 %sus_result, label %suspend [i8 0, label %resume
				i8 1, label %cleanup]
				resume:
				%j = call i32 @readnone_func() readnone
				%i = call i32 @readnone_func() readnone
				%cmp = icmp eq i32 %i, %j
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				%mem = call i8* @llvm.coro.free(token %id, i8* %hdl)
				call void @free(i8* %mem)
				br label %suspend

				suspend:
				call i1 @llvm.coro.end(i8* %hdl, i1 0)
				ret i8* %hdl
				}

				define void @g() {
				entry:
				%j = call i32 @readnone_func() #0
				%i = call i32 @readnone_func() #0
				%cmp = icmp eq i32 %i, %j
				br i1 %cmp, label %same, label %diff

				same:
				call void @print_same()
				br label %cleanup

				diff:
				call void @print_diff()
				br label %cleanup

				cleanup:
				ret void
				}


				; CHECK-LABEL: void @g
				; CHECK-NEXT: entry
				; CHECK-NEXT: call i32 @readnone_func()
				; CHECK-NEXT: call void @print_same()
				; CHECK-NEXT: ret void

				; CHECK-LABEL: void @f.resume
				; CHECK-NEXT: resume:
				; CHECK-NEXT: call i32 @readnone_func()
				; CHECK-NEXT: call void @print_same(
				; CHECK-NEXT: call void @free(
				; CHECK-NEXT: ret void


				declare i32 @readnone_func() readnone

				declare void @print_same()
				declare void @print_diff()
				declare i8* @llvm.coro.free(token, i8*)
				declare i32 @llvm.coro.size.i32()
				declare i8 @llvm.coro.suspend(token, i1)

				declare token @llvm.coro.id(i32, i8, i8, i8*)
				declare i1 @llvm.coro.alloc(token)
				declare i8* @llvm.coro.begin(token, i8*)
				declare i1 @llvm.coro.end(i8*, i1)

				declare noalias i8* @malloc(i32)
				declare void @free(i8*)

This is an archive of the discontinued LLVM Phabricator instance.

[Pipelines] Enable EarlyCSE after CoroCleanup to avoid runtime performance losses (5/5)AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 424834

llvm/lib/Passes/PassBuilderPipelines.cpp

llvm/test/Other/new-pm-defaults.ll

llvm/test/Other/new-pm-thinlto-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-pgo-defaults.ll

llvm/test/Other/new-pm-thinlto-postlink-samplepgo-defaults.ll

llvm/test/Transforms/Coroutines/coro-readnone-04.ll

[Pipelines] Enable EarlyCSE after CoroCleanup to avoid runtime performance losses (5/5)
AbandonedPublic