This is an archive of the discontinued LLVM Phabricator instance.

[HotColdSplit] Move splitting after instrumented PGO use
ClosedPublic

Authored by tejohnson on Feb 5 2019, 8:23 PM.

Download Raw Diff

Details

Reviewers

vsk
hiraditya
sebpop

Commits

rG716abbeb4382: [HotColdSplit] Move splitting after instrumented PGO use
rL353270: [HotColdSplit] Move splitting after instrumented PGO use

Summary

Follow up to D57082 which moved splitting earlier in the pipeline, in
order to perform it before inlining. However, it was moved too early,
before the IR is annotated with instrumented PGO data. This caused the
splitting to incorrectly determine cold functions.

Move it to just after PGO annotation (still before inlining), in both
pass managers.

Diff Detail

Repository: rL LLVM

Event Timeline

tejohnson created this revision.Feb 5 2019, 8:23 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 8:23 PM

Herald added a subscriber: mehdi_amini. · View Herald Transcript

Harbormaster completed remote builds in B27801: Diff 185481.Feb 5 2019, 8:23 PM

Thanks, LGTM.

test/Other/pass-pipelines.ll
13 ↗	(On Diff #185481)	nit, as this isn't a pgo-specific test, it might aid readability to use 'PGOUSE' (or similar) as the check prefix.

This revision is now accepted and ready to land.Feb 5 2019, 8:26 PM

tejohnson marked an inline comment as done.Feb 5 2019, 8:28 PM

Implement suggestion

Harbormaster completed remote builds in B27802: Diff 185482.Feb 5 2019, 8:29 PM

Closed by commit rL353270: [HotColdSplit] Move splitting after instrumented PGO use (authored by tejohnson). · Explain WhyFeb 5 2019, 8:29 PM

This revision was automatically updated to reflect the committed changes.

@tejohnson have you had a chance to evaluate performance with IR-PGO + splitting enabled?

Our internal CI shows performance regressions on SPEC/CINT2000 with FE-PGO + splitting enabled. Allowing inlining of split functions reduces the perf regression, and moving splitting after inlining eliminates it. It seems important to inline and optimize certain basic blocks which FE PGO data marks cold. We may need to address this by changing the FE-PGO instrumentation, or by ignoring ProfileSummaryInfo when it's based on a FE profile.

What's interesting about this is that I didn't see a perf regression on SPEC/CINT2000 with IR-PGO + splitting. I'd be curious to know if your testing bears this out.

In D57805#1393416, @vsk wrote:

@tejohnson have you had a chance to evaluate performance with IR-PGO + splitting enabled?

I have one data point, more below.

Our internal CI shows performance regressions on SPEC/CINT2000 with FE-PGO + splitting enabled. Allowing inlining of split functions reduces the perf regression,

This is controlled by the MinSize attribute, right?

and moving splitting after inlining eliminates it.

It seems important to inline and optimize certain basic blocks which FE PGO data marks cold. We may need to address this by changing the FE-PGO instrumentation, or by ignoring ProfileSummaryInfo when it's based on a FE profile.

What's interesting about this is that I didn't see a perf regression on SPEC/CINT2000 with IR-PGO + splitting. I'd be curious to know if your testing bears this out.

I tried for one important internal app that we build with IR-PGO and ThinLTO (late last week with this patch and r353434). Unfortunately it is degrading around 1%. I verified that if I remove the part that marks existing cold functions (i.e those that don't get split) with the MinSize attribute that this reduces the degradation to around 0.5%. If I prevent marking the new split cold functions with MinSize it possibly gets a bit better (degradation only around 0.4%). When I had tried splitting awhile back when it was in the original position after inlining (in the ThinLTO backends) I got around neutral performance. I was hoping for some improvement based on experiments we had done back with gcc's function splitting (-freorder-blocks-and-partition).

I did do some profiling to compare function profiles with and without splitting enabled. I see only one case where we are spending any time in a cold split function (i.e. where the profile presumably wasn't accurate), but I don't think this is causing most of the difference. It looks like there are very different inlines (expected), but these might be causing a degradation for some reason. I will try moving the splitting to after the ThinLTO backend (post-thinlink) inlining and see what effect there is. Theoretically we should be getting more accurate importing/inlining, it would be good to understand where this is going wrong if not!

In D57805#1393516, @tejohnson wrote:

In D57805#1393416, @vsk wrote:

@tejohnson have you had a chance to evaluate performance with IR-PGO + splitting enabled?

I have one data point, more below.

Our internal CI shows performance regressions on SPEC/CINT2000 with FE-PGO + splitting enabled. Allowing inlining of split functions reduces the perf regression,

This is controlled by the MinSize attribute, right?

Partially, yes, I think MinSize affects inlining thresholds. It's also controlled by the 'noinline' attribute. To tweak this, you can disable CI->setIsNoInline() in extractColdFunction.

and moving splitting after inlining eliminates it.

It seems important to inline and optimize certain basic blocks which FE PGO data marks cold. We may need to address this by changing the FE-PGO instrumentation, or by ignoring ProfileSummaryInfo when it's based on a FE profile.

What's interesting about this is that I didn't see a perf regression on SPEC/CINT2000 with IR-PGO + splitting. I'd be curious to know if your testing bears this out.

I tried for one important internal app that we build with IR-PGO and ThinLTO (late last week with this patch and r353434). Unfortunately it is degrading around 1%. I verified that if I remove the part that marks existing cold functions (i.e those that don't get split) with the MinSize attribute that this reduces the degradation to around 0.5%. If I prevent marking the new split cold functions with MinSize it possibly gets a bit better (degradation only around 0.4%).

I see, it sounds like the perf regression with PGO may not be specific to FE-PGO.

When I had tried splitting awhile back when it was in the original position after inlining (in the ThinLTO backends) I got around neutral performance. I was hoping for some improvement based on experiments we had done back with gcc's function splitting (-freorder-blocks-and-partition).

I did do some profiling to compare function profiles with and without splitting enabled. I see only one case where we are spending any time in a cold split function (i.e. where the profile presumably wasn't accurate), but I don't think this is causing most of the difference. It looks like there are very different inlines (expected), but these might be causing a degradation for some reason. I will try moving the splitting to after the ThinLTO backend (post-thinlink) inlining and see what effect there is. Theoretically we should be getting more accurate importing/inlining, it would be good to understand where this is going wrong if not!

Thanks, this would be a really useful experiment.

In D57805#1393516, @tejohnson wrote:

... I will try moving the splitting to after the ThinLTO backend (post-thinlink) inlining and see what effect there is. Theoretically we should be getting more accurate importing/inlining, it would be good to understand where this is going wrong if not!

I have not yet tried the experiment you've described here. We've noticed that scheduling splitting early causes a regression for certain benchmarks in SPEC even without PGO data applied, however. As the heuristics for splitting are very conservative without PGO, this suggests that splitting before inlining may inadvertently hide important context from the optimizer.

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 14 2019, 3:37 PM

In D57805#1398718, @vsk wrote:

In D57805#1393516, @tejohnson wrote:

... I will try moving the splitting to after the ThinLTO backend (post-thinlink) inlining and see what effect there is. Theoretically we should be getting more accurate importing/inlining, it would be good to understand where this is going wrong if not!

I have not yet tried the experiment you've described here. We've noticed that scheduling splitting early causes a regression for certain benchmarks in SPEC even without PGO data applied, however. As the heuristics for splitting are very conservative without PGO, this suggests that splitting before inlining may inadvertently hide important context from the optimizer.

I gave this a try with our internal benchmark that was slowing down with splitting before inlining. Moving it after inlining (the post link inlining in ThinLTO) caused the degradation to go away. Unfortunately no speedup though. I haven't had a chance to dig into the performance results more than that (and am heading out of town for the better part of the next couple weeks). I will review your new patch to move the pass later shortly though.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Passes/

PassBuilder.cpp

16 lines

Transforms/

IPO/

PassManagerBuilder.cpp

10 lines

test/

Other/

Inputs/

pass-pipelines.proftext

1 line

new-pm-pgo.ll

2 lines

pass-pipelines.ll

16 lines

Diff 185483

llvm/trunk/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 674 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
// delete control flows that are dead once globals have been folded to		// delete control flows that are dead once globals have been folded to
// constants.		// constants.
MPM.addPass(createModuleToFunctionPassAdaptor(PromotePass()));		MPM.addPass(createModuleToFunctionPassAdaptor(PromotePass()));

// Remove any dead arguments exposed by cleanups and constand folding		// Remove any dead arguments exposed by cleanups and constand folding
// globals.		// globals.
MPM.addPass(DeadArgumentEliminationPass());		MPM.addPass(DeadArgumentEliminationPass());

// Split out cold code. Splitting is done before inlining because 1) the most
// common kinds of cold regions can (a) be found before inlining and (b) do
// not grow after inlining, and 2) inhibiting inlining of cold code improves
// code size & compile time. Split after Mem2Reg to make code model estimates
// more accurate, but before InstCombine to allow it to clean things up.
if (EnableHotColdSplit && Phase != ThinLTOPhase::PostLink)
MPM.addPass(HotColdSplittingPass());

// Create a small function pass pipeline to cleanup after all the global		// Create a small function pass pipeline to cleanup after all the global
// optimizations.		// optimizations.
FunctionPassManager GlobalCleanupPM(DebugLogging);		FunctionPassManager GlobalCleanupPM(DebugLogging);
GlobalCleanupPM.addPass(InstCombinePass());		GlobalCleanupPM.addPass(InstCombinePass());
invokePeepholeEPCallbacks(GlobalCleanupPM, Level);		invokePeepholeEPCallbacks(GlobalCleanupPM, Level);

GlobalCleanupPM.addPass(SimplifyCFGPass());		GlobalCleanupPM.addPass(SimplifyCFGPass());
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(GlobalCleanupPM)));		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(GlobalCleanupPM)));

// Add all the requested passes for instrumentation PGO, if requested.		// Add all the requested passes for instrumentation PGO, if requested.
if (PGOOpt && Phase != ThinLTOPhase::PostLink &&		if (PGOOpt && Phase != ThinLTOPhase::PostLink &&
(!PGOOpt->ProfileGenFile.empty() \|\| !PGOOpt->ProfileUseFile.empty())) {		(!PGOOpt->ProfileGenFile.empty() \|\| !PGOOpt->ProfileUseFile.empty())) {
addPGOInstrPasses(MPM, DebugLogging, Level, PGOOpt->RunProfileGen,		addPGOInstrPasses(MPM, DebugLogging, Level, PGOOpt->RunProfileGen,
PGOOpt->ProfileGenFile, PGOOpt->ProfileUseFile,		PGOOpt->ProfileGenFile, PGOOpt->ProfileUseFile,
PGOOpt->ProfileRemappingFile);		PGOOpt->ProfileRemappingFile);
MPM.addPass(PGOIndirectCallPromotion(false, false));		MPM.addPass(PGOIndirectCallPromotion(false, false));
}		}

// Synthesize function entry counts for non-PGO compilation.		// Synthesize function entry counts for non-PGO compilation.
if (EnableSyntheticCounts && !PGOOpt)		if (EnableSyntheticCounts && !PGOOpt)
MPM.addPass(SyntheticCountsPropagation());		MPM.addPass(SyntheticCountsPropagation());

		// Split out cold code. Splitting is done before inlining because 1) the most
		// common kinds of cold regions can (a) be found before inlining and (b) do
		// not grow after inlining, and 2) inhibiting inlining of cold code improves
		// code size & compile time. Split after Mem2Reg to make code model estimates
		// more accurate, but before InstCombine to allow it to clean things up.
		if (EnableHotColdSplit && Phase != ThinLTOPhase::PostLink)
		MPM.addPass(HotColdSplittingPass());

// Require the GlobalsAA analysis for the module so we can query it within		// Require the GlobalsAA analysis for the module so we can query it within
// the CGSCC pipeline.		// the CGSCC pipeline.
MPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());		MPM.addPass(RequireAnalysisPass<GlobalsAA, Module>());

// Require the ProfileSummaryAnalysis for the module so we can query it within		// Require the ProfileSummaryAnalysis for the module so we can query it within
// the inliner pass.		// the inliner pass.
MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());		MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());

▲ Show 20 Lines • Show All 1,402 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
MPM.add(createIPSCCPPass()); // IP SCCP		MPM.add(createIPSCCPPass()); // IP SCCP
MPM.add(createCalledValuePropagationPass());		MPM.add(createCalledValuePropagationPass());
MPM.add(createGlobalOptimizerPass()); // Optimize out global vars		MPM.add(createGlobalOptimizerPass()); // Optimize out global vars
// Promote any localized global vars.		// Promote any localized global vars.
MPM.add(createPromoteMemoryToRegisterPass());		MPM.add(createPromoteMemoryToRegisterPass());

MPM.add(createDeadArgEliminationPass()); // Dead argument elimination		MPM.add(createDeadArgEliminationPass()); // Dead argument elimination

// Split out cold code before inlining. See comment in the new PM
// (\ref buildModuleSimplificationPipeline).
if (EnableHotColdSplit && DefaultOrPreLinkPipeline)
MPM.add(createHotColdSplittingPass());

addInstructionCombiningPass(MPM); // Clean up after IPCP & DAE		addInstructionCombiningPass(MPM); // Clean up after IPCP & DAE
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE		MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE

// For SamplePGO in ThinLTO compile phase, we do not want to do indirect		// For SamplePGO in ThinLTO compile phase, we do not want to do indirect
// call promotion as it will change the CFG too much to make the 2nd		// call promotion as it will change the CFG too much to make the 2nd
// profile annotation in backend more difficult.		// profile annotation in backend more difficult.
// PGO instrumentation is added during the compile phase for ThinLTO, do		// PGO instrumentation is added during the compile phase for ThinLTO, do
// not run it a second time		// not run it a second time
if (DefaultOrPreLinkPipeline && !PrepareForThinLTOUsingPGOSampleProfile)		if (DefaultOrPreLinkPipeline && !PrepareForThinLTOUsingPGOSampleProfile)
addPGOInstrPasses(MPM);		addPGOInstrPasses(MPM);

		// Split out cold code before inlining. See comment in the new PM
		// (\ref buildModuleSimplificationPipeline).
		if (EnableHotColdSplit && DefaultOrPreLinkPipeline)
		MPM.add(createHotColdSplittingPass());

// We add a module alias analysis pass here. In part due to bugs in the		// We add a module alias analysis pass here. In part due to bugs in the
// analysis infrastructure this "works" in that the analysis stays alive		// analysis infrastructure this "works" in that the analysis stays alive
// for the entire SCC pass run below.		// for the entire SCC pass run below.
MPM.add(createGlobalsAAWrapperPass());		MPM.add(createGlobalsAAWrapperPass());

// Start of CallGraph SCC passes.		// Start of CallGraph SCC passes.
MPM.add(createPruneEHPass()); // Remove dead EH info		MPM.add(createPruneEHPass()); // Remove dead EH info
bool RunInliner = false;		bool RunInliner = false;
▲ Show 20 Lines • Show All 544 Lines • Show Last 20 Lines

llvm/trunk/test/Other/Inputs/pass-pipelines.proftext

:ir

llvm/trunk/test/Other/new-pm-pgo.ll

	; RUN: opt -debug-pass-manager -passes='default<O2>' -pgo-kind=pgo-instr-gen-pipeline -profile-file='temp' %s 2>&1 \|FileCheck %s --check-prefixes=GEN			; RUN: opt -debug-pass-manager -passes='default<O2>' -pgo-kind=pgo-instr-gen-pipeline -profile-file='temp' %s 2>&1 \|FileCheck %s --check-prefixes=GEN
	; RUN: llvm-profdata merge %S/Inputs/new-pm-pgo.proftext -o %t.profdata			; RUN: llvm-profdata merge %S/Inputs/new-pm-pgo.proftext -o %t.profdata
	; RUN: opt -debug-pass-manager -passes='default<O2>' -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' %s 2>&1 \|FileCheck %s --check-prefixes=USE			; RUN: opt -debug-pass-manager -passes='default<O2>' -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' %s 2>&1 \|FileCheck %s --check-prefixes=USE
				; RUN: opt -debug-pass-manager -passes='default<O2>' -hot-cold-split -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' %s 2>&1 \|FileCheck %s --check-prefixes=USE --check-prefixes=SPLIT
	; RUN: opt -debug-pass-manager -passes='default<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \			; RUN: opt -debug-pass-manager -passes='default<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \
	; RUN: \|FileCheck %s --check-prefixes=SAMPLE_USE,SAMPLE_USE_O			; RUN: \|FileCheck %s --check-prefixes=SAMPLE_USE,SAMPLE_USE_O
	; RUN: opt -debug-pass-manager -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \			; RUN: opt -debug-pass-manager -passes='thinlto-pre-link<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \
	; RUN: \|FileCheck %s --check-prefixes=SAMPLE_USE,SAMPLE_USE_PRE_LINK			; RUN: \|FileCheck %s --check-prefixes=SAMPLE_USE,SAMPLE_USE_PRE_LINK
	; RUN: opt -debug-pass-manager -passes='thinlto<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \			; RUN: opt -debug-pass-manager -passes='thinlto<O2>' -pgo-kind=pgo-sample-use-pipeline -profile-file='%S/Inputs/new-pm-pgo.prof' %s 2>&1 \
	; RUN: \|FileCheck %s --check-prefixes=SAMPLE_USE,SAMPLE_USE_POST_LINK			; RUN: \|FileCheck %s --check-prefixes=SAMPLE_USE,SAMPLE_USE_POST_LINK
	; RUN: opt -debug-pass-manager -passes='default<O2>' -new-pm-debug-info-for-profiling %s 2>&1 \|FileCheck %s --check-prefixes=SAMPLE_GEN			; RUN: opt -debug-pass-manager -passes='default<O2>' -new-pm-debug-info-for-profiling %s 2>&1 \|FileCheck %s --check-prefixes=SAMPLE_GEN
	;			;
	; GEN: Running pass: PGOInstrumentationGen			; GEN: Running pass: PGOInstrumentationGen
	; USE: Running pass: PGOInstrumentationUse			; USE: Running pass: PGOInstrumentationUse
	; USE: Running pass: PGOIndirectCallPromotion			; USE: Running pass: PGOIndirectCallPromotion
				; SPLIT: Running pass: HotColdSplittingPass
	; USE: Running pass: PGOMemOPSizeOpt			; USE: Running pass: PGOMemOPSizeOpt
	; SAMPLE_USE_O: Running pass: ModuleToFunctionPassAdaptor<{{.}}AddDiscriminatorsPass{{.}}>			; SAMPLE_USE_O: Running pass: ModuleToFunctionPassAdaptor<{{.}}AddDiscriminatorsPass{{.}}>
	; SAMPLE_USE_PRE_LINK: Running pass: ModuleToFunctionPassAdaptor<{{.}}AddDiscriminatorsPass{{.}}>			; SAMPLE_USE_PRE_LINK: Running pass: ModuleToFunctionPassAdaptor<{{.}}AddDiscriminatorsPass{{.}}>
	; SAMPLE_USE: Running pass: SimplifyCFGPass			; SAMPLE_USE: Running pass: SimplifyCFGPass
	; SAMPLE_USE: Running pass: SROA			; SAMPLE_USE: Running pass: SROA
	; SAMPLE_USE: Running pass: EarlyCSEPass			; SAMPLE_USE: Running pass: EarlyCSEPass
	; SAMPLE_USE: Running pass: LowerExpectIntrinsicPass			; SAMPLE_USE: Running pass: LowerExpectIntrinsicPass
	; SAMPLE_USE_POST_LINK: Running pass: InstCombinePass			; SAMPLE_USE_POST_LINK: Running pass: InstCombinePass
	Show All 9 Lines

llvm/trunk/test/Other/pass-pipelines.ll

	; Test the particular pass pipelines have the expected structure. This is			; Test the particular pass pipelines have the expected structure. This is
	; particularly important in order to check that the implicit scheduling of the			; particularly important in order to check that the implicit scheduling of the
	; legacy pass manager doesn't introduce unexpected structural changes in the			; legacy pass manager doesn't introduce unexpected structural changes in the
	; pass pipeline.			; pass pipeline.
	;			;
	; RUN: opt -disable-output -disable-verify -debug-pass=Structure \			; RUN: opt -disable-output -disable-verify -debug-pass=Structure \
	; RUN: -O2 %s 2>&1 \			; RUN: -O2 %s 2>&1 \
	; RUN: \| FileCheck %s --check-prefix=CHECK-O2			; RUN: \| FileCheck %s --check-prefix=CHECK-O2
				; RUN: llvm-profdata merge %S/Inputs/pass-pipelines.proftext -o %t.profdata
				; RUN: opt -disable-output -disable-verify -debug-pass=Structure \
				; RUN: -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' \
				; RUN: -O2 %s 2>&1 \
				; RUN: \| FileCheck %s --check-prefix=CHECK-O2 --check-prefix=PGOUSE
				; RUN: opt -disable-output -disable-verify -debug-pass=Structure \
				; RUN: -pgo-kind=pgo-instr-use-pipeline -profile-file='%t.profdata' \
				; RUN: -hot-cold-split \
				; RUN: -O2 %s 2>&1 \
				; RUN: \| FileCheck %s --check-prefix=CHECK-O2 --check-prefix=PGOUSE --check-prefix=SPLIT
	;			;
	; In the first pipeline there should just be a function pass manager, no other			; In the first pipeline there should just be a function pass manager, no other
	; pass managers.			; pass managers.
	; CHECK-O2: Pass Arguments:			; CHECK-O2: Pass Arguments:
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; CHECK-O2: FunctionPass Manager			; CHECK-O2: FunctionPass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	;			;
	; CHECK-O2: Pass Arguments:			; CHECK-O2: Pass Arguments:
	; CHECK-O2: ModulePass Manager			; CHECK-O2: ModulePass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; First function pass pipeline just does early opts.			; First function pass pipeline just does early opts.
	; CHECK-O2: FunctionPass Manager			; CHECK-O2: FunctionPass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; FIXME: It's a bit odd to do dead arg elim in the middle of early opts...			; FIXME: It's a bit odd to do dead arg elim in the middle of early opts...
	; CHECK-O2: Dead Argument Elimination			; CHECK-O2: Dead Argument Elimination
	; CHECK-O2-NEXT: FunctionPass Manager			; CHECK-O2-NEXT: FunctionPass Manager
	; CHECK-O2-NOT: Manager			; CHECK-O2-NOT: Manager
	; Very carefully assert the CGSCC pass pipeline as it is fragile and unusually			; Very carefully assert the CGSCC pass pipeline as it is fragile and unusually
	; susceptible to phase ordering issues.			; susceptible to phase ordering issues.
	; CHECK-O2: CallGraph Construction			; CHECK-O2: CallGraph Construction
				; PGOUSE: Call Graph SCC Pass Manager
				; PGOUSE: Function Integration/Inlining
				; PGOUSE: PGOInstrumentationUsePass
				; PGOUSE: PGOIndirectCallPromotion
				; SPLIT: Hot Cold Splitting
				; PGOUSE: CallGraph Construction
	; CHECK-O2-NEXT: Globals Alias Analysis			; CHECK-O2-NEXT: Globals Alias Analysis
	; CHECK-O2-NEXT: Call Graph SCC Pass Manager			; CHECK-O2-NEXT: Call Graph SCC Pass Manager
	; CHECK-O2-NEXT: Remove unused exception handling info			; CHECK-O2-NEXT: Remove unused exception handling info
	; CHECK-O2-NEXT: Function Integration/Inlining			; CHECK-O2-NEXT: Function Integration/Inlining
	; CHECK-O2-NEXT: Deduce function attributes			; CHECK-O2-NEXT: Deduce function attributes
	; Next up is the main function pass pipeline. It shouldn't be split up and			; Next up is the main function pass pipeline. It shouldn't be split up and
	; should contain the main loop pass pipeline as well.			; should contain the main loop pass pipeline as well.
	; CHECK-O2-NEXT: FunctionPass Manager			; CHECK-O2-NEXT: FunctionPass Manager
	▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines