This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
BackendUtil.cpp
-
test/CodeGen/
-
CodeGen/
-
pgo-sample-thinlto-summary.c
-
llvm/
-
include/llvm/Passes/
-
llvm/
-
Passes/
-
PassBuilder.h
-
lib/
-
Passes/
-
PassBuilder.cpp
-
Transforms/IPO/
-
IPO/
1/3
PassManagerBuilder.cpp

Differential D69732

[WIP][LTO] Apply SamplePGO pipeline tunes for ThinLTO pre-link to full LTO
Needs ReviewPublic

Authored by tejohnson on Nov 1 2019, 1:04 PM.

Download Raw Diff

Details

Reviewers

wristow
ormris

Summary

There are several modifications to the optimizations performed by the
ThinLTO pre link compile when building with Sample PGO, in order to get
better matching of the SamplePGO when it is re-applied in the backend.
These same tunes should be done for full LTO pre-links as well, as
presumably the same matching issues could occur there.

There are a few issues with this patch as it stands, relating to the
fact that not all of these optimizations are attempted again in the full
LTO backend, owing to the larger compile time with the monolithic LTO.
Specifically, this includes some loop optimizations:

In the old PM, LoopUnrollAndJam is not done in the full LTO backend.
In the new PM, none of the loop unrolling is done in the full LTO

backend. The comments indicate that this is in part due to issues with
the new PM loop pass manager (presumably sorted out by now, but I
haven't followed this). Here is the comment:

// FIXME: at this point, we run a bunch of loop passes:
// indVarSimplify, loopDeletion, loopInterchange, loopUnroll,
// loopVectorize. Enable them once the remaining issue with LPM
// are sorted out.

So what needs to happen still is to either:

Continue to diverge the ThinLTO and full LTO pre-link pipelines for

these optimizations (which means this patch needs to be adjusted).
OR

Fix the full LTO post-link pipelines to ensure these optimizations

all occur there.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 40418
Build 40525: arc lint + arc unit

Event Timeline

tejohnson created this revision.Nov 1 2019, 1:04 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 1 2019, 1:04 PM

Herald added subscribers: llvm-commits, cfe-commits, dexonsmith and 4 others. · View Herald Transcript

Harbormaster completed remote builds in B40418: Diff 227515.Nov 1 2019, 1:04 PM

This probably needs to be taken over by someone who cares about full LTO performance (@wristow or @ormris ?). This patch was some cleanup of the full LTO sample PGO pipeline, but has a number of issues I enumerate in the summary.

The comments indicate that this is in part due to issues with
the new PM loop pass manager

Wondering how different it is for these loop passes to be enabled for MonoLTO vs ThinLTO? If it's due to problems with the newPM, I guess ThinLTO would have the same problems? Asking because we have almost the same change as internal patch trying to get better LTO time profile precision for MonoLTO, and with that there's small win for oldPM+MonoLTO. But we'd love to converge on new PM for both MonoLTO and ThinLTO.

In D69732#1730884, @wenlei wrote:

The comments indicate that this is in part due to issues with
the new PM loop pass manager

Wondering how different it is for these loop passes to be enabled for MonoLTO vs ThinLTO? If it's due to problems with the newPM, I guess ThinLTO would have the same problems?

The ThinLTO backends don't use this code but rather PassBuilder::buildModuleOptimizationPipeline, which includes all of these loop optimizations (and other optimizations, since the ThinLTO backends can absorb the extra compile time cost). That's what makes me think this comment is stale and someone just forgot to add the loop optimization passes to the full LTO post-link pipeline. I haven't looked at the history of all these changes. My guess is that since the full LTO pre-link pipeline does all these loop optimizations (also through buildModuleOptimizationPipeline), no one noticed that they weren't also being done in the post-link full LTO pipeline.

Asking because we have almost the same change as internal patch trying to get better LTO time profile precision for MonoLTO, and with that there's small win for oldPM+MonoLTO.

That's good to know, it's what I would expect but good to have the confirmation.

But we'd love to converge on new PM for both MonoLTO and ThinLTO.

wenlei added inline comments.Nov 4 2019, 1:47 PM

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
614	this also need to be `PrepareForThinLTO \|\| PrepareForLTO` for oldPM?

This probably needs to be taken over by someone who cares about full LTO performance

We at PlayStation are definitely interested in full LTO performance, so we're looking into this. We certainly agree with the rationale that if suppressing some optimizations is useful to allow better SamplePGO matching, then we'd expect that would apply equally to both ThinLTO and full LTO.

I guess much of this comes down to a balancing act between:

The amount of the runtime benefit with Sample PGO if these loop optimizations are deferred to the full LTO back-end (like they are for ThinLTO).
The cost in compile-time resources in the full LTO back-end to do these loop optimizations at that later stage.

From the discussion here, the Sample PGO runtime win (point 1) seems more or less to be a given. If we find the compile-time cost in the full LTO back-end (point 2) is not significant, then the decision should be easy. So after seeing this patch, we're doing some experiments to at least try to get a handle on this. (I'm a bit concerned we won't be able to draw any hard conclusions from the results of our experiments, but at least we'll be able to make a better informed assessment.)

FTR, for PlayStation, we're using the old PM. But we'll do some experiments for both the old and new PM, to get a sense of the answers to the (old PM) LoopUnrollAndJam point, and the (new PM) FIXME comment.

wristow added inline comments.Nov 4 2019, 6:31 PM

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
614	I agree this is another instance where a balancing act question applies. In this case, assuming the comment about the concern of code bloat is accurate, it's not so much about compile-time resources in the full LTO back-end, but rather about minimizing the ThinLTO bitcode write/read time. So if as this WIP evolves, it ultimately is a win for SamplePGO to suppress some loop optimizations (unrolling/vectorization) here, then that will probably also be a small win in full LTO compile time. That said, in addition to these loop-related optimizations, there are other transformations here that are done in the full LTO pipeline (but not in the ThinLTO pipeline). So I suspect if some change to check for `PrepareForThinLTO \|\| PrepareForLTO` (rather than only `PrepareForThinLTO`) makes sense here from a Sample PGO perspective, then the change will be more complicated than simply adding the small set of passes here followed by the early return (that is, I think there are probably things after the `return` on line 621 that still ought to be enabled for full LTO -- essentially continuing to do them in the pre-link stage for full LTO, to try to avoid needing to do too much work in the full LTO backend stage, since it's more of a problem for the full backend to absorb that compile time cost).

In D69732#1733447, @wristow wrote:

This probably needs to be taken over by someone who cares about full LTO performance

We at PlayStation are definitely interested in full LTO performance, so we're looking into this. We certainly agree with the rationale that if suppressing some optimizations is useful to allow better SamplePGO matching, then we'd expect that would apply equally to both ThinLTO and full LTO.

I guess much of this comes down to a balancing act between:

The amount of the runtime benefit with Sample PGO if these loop optimizations are deferred to the full LTO back-end (like they are for ThinLTO).

The cost in compile-time resources in the full LTO back-end to do these loop optimizations at that later stage.

From the discussion here, the Sample PGO runtime win (point 1) seems more or less to be a given. If we find the compile-time cost in the full LTO back-end (point 2) is not significant, then the decision should be easy. So after seeing this patch, we're doing some experiments to at least try to get a handle on this. (I'm a bit concerned we won't be able to draw any hard conclusions from the results of our experiments, but at least we'll be able to make a better informed assessment.)

FTR, for PlayStation, we're using the old PM. But we'll do some experiments for both the old and new PM, to get a sense of the answers to the (old PM) LoopUnrollAndJam point, and the (new PM) FIXME comment.

This is a good summary. I look forward to your results.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
614	This early return was not for Sample PGO btw. It was added much earlier with the thought that a) these types of optimizations might affect function importing heuristics because they could bloat the code; b) we can push more optimizations to the post-link in ThinLTO because it is parallel; and c) there isn't otherwise a benefit to doing these optimizations in the pre vs post link, i.e. they aren't cleanup/simplification passes. The equation is of course different for full LTO which has a monolithic serial post link backend. But I believe this early return is the one @ormris is looking to remove on the ThinLTO pass to "merge" the two pipelines, which needs a good amount of evaluation on the ThinLTO performance side.

evgeny777 added a subscriber: evgeny777.Nov 5 2019, 8:38 AM

I've done testing with the following global parameters.

The base for the branch is llvmorg-10-init-8655-g94a4a2c97f8
Used llvm, clang, lld, and llvm-ar from this branch.
The sqlite kvtest program was the test payload.

This test compared an unmodified compiler from the base of the branch with a modified compiler with this patch applied and the loop optimisation passes mentioned above moved to the backend. The results were as follows. All numbers in seconds.

Run	Modified LTO	Modified SPGO+LTO	Unmodified SPGO+LTO
1	42.00	41.73	42.08
2	42.30	39.49	42.45
3	41.21	42.46	42.49
AVG:	41.84	41.23	42.34

TL;DR the average run using a compiler built with the modified SPGO pipeline is about a second faster. Definitely a positive initial result.

In D69732#1730732, @tejohnson wrote:

This probably needs to be taken over by someone who cares about full LTO performance (@wristow or @ormris ?). This patch was some cleanup of the full LTO sample PGO pipeline, but has a number of issues I enumerate in the summary.

Given the performance improvements here, I'd like to develop this patch further.

Ping @tejohnson

Given the performance improvements here, I'd like to develop this patch further.

In D69732#1784290, @ormris wrote:
Ping @tejohnson

@ormris, I think that since @tejohnson originally suggested that someone with more interest in full LTO performance pick this up (and she specifically suggested you or me), then you can feel free to take this over.

In D69732#1784511, @wristow wrote:

Given the performance improvements here, I'd like to develop this patch further.

In D69732#1784290, @ormris wrote:
Ping @tejohnson

@ormris, I think that since @tejohnson originally suggested that someone with more interest in full LTO performance pick this up (and she specifically suggested you or me), then you can feel free to take this over.

Yep, sorry, I didn't realize you were waiting for me to confirm! That sounds great.

In D69732#1771950, @ormris wrote:

I've done testing with the following global parameters.

The base for the branch is llvmorg-10-init-8655-g94a4a2c97f8

Used llvm, clang, lld, and llvm-ar from this branch.

The sqlite kvtest program was the test payload.

This test compared an unmodified compiler from the base of the branch with a modified compiler with this patch applied and the loop optimisation passes mentioned above moved to the backend. The results were as follows. All numbers in seconds.

Run Modified LTO Modified SPGO+LTO Unmodified SPGO+LTO

1 42.00 41.73 42.08

2 42.30 39.49 42.45

3 41.21 42.46 42.49

AVG: 41.84 41.23 42.34

TL;DR the average run using a compiler built with the modified SPGO pipeline is about a second faster. Definitely a positive initial result.

In D69732#1730732, @tejohnson wrote:

This probably needs to be taken over by someone who cares about full LTO performance (@wristow or @ormris ?). This patch was some cleanup of the full LTO sample PGO pipeline, but has a number of issues I enumerate in the summary.

Given the performance improvements here, I'd like to develop this patch further.

@ormris I'd like to follow up on this. We had a similar change internally which led up to some gains when combined with SPGO, and we'd like to help move forward with this patch here. Would you mind sharing the plan or progress on your side? Thanks!

wenlei mentioned this in D94613: [NFC] Rename ThinLTOPhase to PhaseInAllLTO and move it from PassBuilder.h to Pass.h.Jan 13 2021, 10:12 AM

@hoyFB wrote:

@ormris I'd like to follow up on this. We had a similar change internally which led up to some gains when combined with SPGO, and we'd like to help move forward with this patch here. Would you mind sharing the plan or progress on your side? Thanks!

Sorry for the late reply. Unfortunately, we weren't able to get very far beyond the experiments you see here. Feel free to take it from here.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

BackendUtil.cpp

2 lines

test/

CodeGen/

pgo-sample-thinlto-summary.c

33 lines

llvm/

include/

llvm/

Passes/

PassBuilder.h

29 lines

lib/

Passes/

PassBuilder.cpp

67 lines

Transforms/

IPO/

PassManagerBuilder.cpp

14 lines

Diff 227515

clang/lib/CodeGen/BackendUtil.cpp

Show First 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	if (CodeGenOpts.OptimizationLevel <= 1) {
PMBuilder.Inliner = createAlwaysInlinerLegacyPass(InsertLifetimeIntrinsics);		PMBuilder.Inliner = createAlwaysInlinerLegacyPass(InsertLifetimeIntrinsics);
} else {		} else {
// We do not want to inline hot callsites for SamplePGO module-summary build		// We do not want to inline hot callsites for SamplePGO module-summary build
// because profile annotation will happen again in ThinLTO backend, and we		// because profile annotation will happen again in ThinLTO backend, and we
// want the IR of the hot path to match the profile.		// want the IR of the hot path to match the profile.
PMBuilder.Inliner = createFunctionInliningPass(		PMBuilder.Inliner = createFunctionInliningPass(
CodeGenOpts.OptimizationLevel, CodeGenOpts.OptimizeSize,		CodeGenOpts.OptimizationLevel, CodeGenOpts.OptimizeSize,
(!CodeGenOpts.SampleProfileFile.empty() &&		(!CodeGenOpts.SampleProfileFile.empty() &&
CodeGenOpts.PrepareForThinLTO));		(CodeGenOpts.PrepareForThinLTO \|\| CodeGenOpts.PrepareForLTO)));
}		}

PMBuilder.OptLevel = CodeGenOpts.OptimizationLevel;		PMBuilder.OptLevel = CodeGenOpts.OptimizationLevel;
PMBuilder.SizeLevel = CodeGenOpts.OptimizeSize;		PMBuilder.SizeLevel = CodeGenOpts.OptimizeSize;
PMBuilder.SLPVectorize = CodeGenOpts.VectorizeSLP;		PMBuilder.SLPVectorize = CodeGenOpts.VectorizeSLP;
PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop;		PMBuilder.LoopVectorize = CodeGenOpts.VectorizeLoop;

PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops;		PMBuilder.DisableUnrollLoops = !CodeGenOpts.UnrollLoops;
▲ Show 20 Lines • Show All 1,102 Lines • Show Last 20 Lines

clang/test/CodeGen/pgo-sample-thinlto-summary.c

				// Tests to ensure that *LTO pre-link compiles don't perform optimizations
				// that can lead to subpar SamplePGO matching in the LTO backends.

	// RUN: %clang_cc1 -O2 -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -o - 2>&1 \| FileCheck %s -check-prefix=SAMPLEPGO			// RUN: %clang_cc1 -O2 -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -o - 2>&1 \| FileCheck %s -check-prefix=SAMPLEPGO
	// RUN: %clang_cc1 -O2 -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -flto=thin -o - 2>&1 \| FileCheck %s -check-prefix=THINLTO			// RUN: %clang_cc1 -O2 -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -flto=thin -o - 2>&1 \| FileCheck %s -check-prefix=LTO
				// RUN: %clang_cc1 -O2 -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -flto -o - 2>&1 \| FileCheck %s -check-prefix=LTO
	// RUN: %clang_cc1 -O2 -fexperimental-new-pass-manager -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -o - 2>&1 \| FileCheck %s -check-prefix=SAMPLEPGO			// RUN: %clang_cc1 -O2 -fexperimental-new-pass-manager -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -o - 2>&1 \| FileCheck %s -check-prefix=SAMPLEPGO
	// RUN: %clang_cc1 -O2 -fexperimental-new-pass-manager -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -flto=thin -o - 2>&1 \| FileCheck %s -check-prefix=THINLTO			// RUN: %clang_cc1 -O2 -fexperimental-new-pass-manager -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -flto=thin -o - 2>&1 \| FileCheck %s -check-prefix=LTO
	// Checks if hot call is inlined by normal compile, but not inlined by			// RUN: %clang_cc1 -O2 -fexperimental-new-pass-manager -fprofile-sample-use=%S/Inputs/pgo-sample-thinlto-summary.prof %s -emit-llvm -flto -o - 2>&1 \| FileCheck %s -check-prefix=LTO
	// thinlto compile.

	int baz(int);			int baz(int);
	int g;			int g;

	void foo(int n) {			void foo(int n) {
	for (int i = 0; i < n; i++)			for (int i = 0; i < n; i++)
	g += baz(i);			g += baz(i);
	}			}

				// Checks if hot call is inlined by normal compile, but not inlined by
				// (thin)lto pre-link compile.
	// SAMPLEPGO-LABEL: define {{(dso_local )?}}void @bar			// SAMPLEPGO-LABEL: define {{(dso_local )?}}void @bar
	// THINLTO-LABEL: define {{(dso_local )?}}void @bar			// LTO-LABEL: define {{(dso_local )?}}void @bar
	// SAMPLEPGO-NOT: call{{.*}}foo			// SAMPLEPGO-NOT: call{{.*}}foo
	// THINLTO: call{{.*}}foo			// LTO: call{{.*}}foo
	void bar(int n) {			void bar(int n) {
	for (int i = 0; i < n; i++)			for (int i = 0; i < n; i++)
	foo(i);			foo(i);
	}			}

	// Checks if loop unroll is invoked by normal compile, but not thinlto compile.			// Checks if loop unroll is invoked by normal compile, but not (thin)lto
				// pre-link compile.
	// SAMPLEPGO-LABEL: define {{(dso_local )?}}void @unroll			// SAMPLEPGO-LABEL: define {{(dso_local )?}}void @unroll
	// THINLTO-LABEL: define {{(dso_local )?}}void @unroll			// LTO-LABEL: define {{(dso_local )?}}void @unroll
	// SAMPLEPGO: call{{.*}}baz			// SAMPLEPGO: call{{.*}}baz
	// SAMPLEPGO: call{{.*}}baz			// SAMPLEPGO: call{{.*}}baz
	// THINLTO: call{{.*}}baz			// LTO: call{{.*}}baz
	// THINLTO-NOT: call{{.*}}baz			// LTO-NOT: call{{.*}}baz
	void unroll() {			void unroll() {
	for (int i = 0; i < 2; i++)			for (int i = 0; i < 2; i++)
	baz(i);			baz(i);
	}			}

	// Checks that icp is not invoked for ThinLTO, but invoked for normal samplepgo.			// Checks that icp is not invoked for (Thin)LTO pre-link compile, but invoked
				// for normal samplepgo.
	// SAMPLEPGO-LABEL: define {{(dso_local )?}}void @icp			// SAMPLEPGO-LABEL: define {{(dso_local )?}}void @icp
	// THINLTO-LABEL: define {{(dso_local )?}}void @icp			// LTO-LABEL: define {{(dso_local )?}}void @icp
	// SAMPLEPGO: if.true.direct_targ			// SAMPLEPGO: if.true.direct_targ
	// FIXME: the following condition needs to be reversed once			// FIXME: the following condition needs to be reversed once
	// LTOPreLinkDefaultPipeline is customized.			// LTOPreLinkDefaultPipeline is customized.
	// THINLTO-NOT: if.true.direct_targ			// LTO-NOT: if.true.direct_targ
	void icp(void (*p)()) {			void icp(void (*p)()) {
	p();			p();
	}			}

llvm/include/llvm/Passes/PassBuilder.h

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	public:
/// name is the name of a pass, the InnerPipeline is empty, since passes		/// name is the name of a pass, the InnerPipeline is empty, since passes
/// cannot contain inner pipelines. See parsePassPipeline() for a more		/// cannot contain inner pipelines. See parsePassPipeline() for a more
/// detailed description of the textual pipeline format.		/// detailed description of the textual pipeline format.
struct PipelineElement {		struct PipelineElement {
StringRef Name;		StringRef Name;
std::vector<PipelineElement> InnerPipeline;		std::vector<PipelineElement> InnerPipeline;
};		};

/// ThinLTO phase.		/// LTO phase.
///		///
/// This enumerates the LLVM ThinLTO optimization phases.		/// This enumerates the LLVM (Thin)LTO optimization phases.
enum class ThinLTOPhase {		enum class LTOPhase {
/// No ThinLTO behavior needed.		/// No LTO behavior needed.
None,		None,
/// ThinLTO prelink (summary) phase.		/// LTO prelink phase.
PreLink,		PreLink,
/// ThinLTO postlink (backend compile) phase.		/// LTO postlink (backend compile) phase.
PostLink		PostLink
};		};

/// LLVM-provided high-level optimization levels.		/// LLVM-provided high-level optimization levels.
///		///
/// This enumerates the LLVM-provided high-level optimization levels. Each		/// This enumerates the LLVM-provided high-level optimization levels. Each
/// level has a specific goal and rationale.		/// level has a specific goal and rationale.
enum OptimizationLevel {		enum OptimizationLevel {
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	public:
/// repeatedly over the IR and is not expected to destroy important		/// repeatedly over the IR and is not expected to destroy important
/// information about the semantics of the IR.		/// information about the semantics of the IR.
///		///
/// Note that \p Level cannot be `O0` here. The pipelines produced are		/// Note that \p Level cannot be `O0` here. The pipelines produced are
/// only intended for use when attempting to optimize code. If frontends		/// only intended for use when attempting to optimize code. If frontends
/// require some transformations for semantic reasons, they should explicitly		/// require some transformations for semantic reasons, they should explicitly
/// build them.		/// build them.
///		///
/// \p Phase indicates the current ThinLTO phase.		/// \p Phase indicates the current LTO phase.
FunctionPassManager		FunctionPassManager
buildFunctionSimplificationPipeline(OptimizationLevel Level,		buildFunctionSimplificationPipeline(OptimizationLevel Level, LTOPhase Phase,
ThinLTOPhase Phase,
bool DebugLogging = false);		bool DebugLogging = false);

/// Construct the core LLVM module canonicalization and simplification		/// Construct the core LLVM module canonicalization and simplification
/// pipeline.		/// pipeline.
///		///
/// This pipeline focuses on canonicalizing and simplifying the entire module		/// This pipeline focuses on canonicalizing and simplifying the entire module
/// of IR. Much like the function simplification pipeline above, it is		/// of IR. Much like the function simplification pipeline above, it is
/// suitable to run repeatedly over the IR and is not expected to destroy		/// suitable to run repeatedly over the IR and is not expected to destroy
/// important information. It does, however, perform inlining and other		/// important information. It does, however, perform inlining and other
/// heuristic based simplifications that are not strictly reversible.		/// heuristic based simplifications that are not strictly reversible.
///		///
/// Note that \p Level cannot be `O0` here. The pipelines produced are		/// Note that \p Level cannot be `O0` here. The pipelines produced are
/// only intended for use when attempting to optimize code. If frontends		/// only intended for use when attempting to optimize code. If frontends
/// require some transformations for semantic reasons, they should explicitly		/// require some transformations for semantic reasons, they should explicitly
/// build them.		/// build them.
///		///
/// \p Phase indicates the current ThinLTO phase.		/// \p Phase indicates the current LTO phase.
ModulePassManager		ModulePassManager
buildModuleSimplificationPipeline(OptimizationLevel Level,		buildModuleSimplificationPipeline(OptimizationLevel Level, LTOPhase Phase,
ThinLTOPhase Phase,
bool DebugLogging = false);		bool DebugLogging = false);

/// Construct the core LLVM module optimization pipeline.		/// Construct the core LLVM module optimization pipeline.
///		///
/// This pipeline focuses on optimizing the execution speed of the IR. It		/// This pipeline focuses on optimizing the execution speed of the IR. It
/// uses cost modeling and thresholds to balance code growth against runtime		/// uses cost modeling and thresholds to balance code growth against runtime
/// improvements. It includes vectorization and other information destroying		/// improvements. It includes vectorization and other information destroying
/// transformations. It also cannot generally be run repeatedly on a module		/// transformations. It also cannot generally be run repeatedly on a module
Show All 14 Lines	public:
/// optimization and code generation without any link-time optimization. It		/// optimization and code generation without any link-time optimization. It
/// typically correspond to frontend "-O[123]" options for optimization		/// typically correspond to frontend "-O[123]" options for optimization
/// levels \c O1, \c O2 and \c O3 resp.		/// levels \c O1, \c O2 and \c O3 resp.
///		///
/// Note that \p Level cannot be `O0` here. The pipelines produced are		/// Note that \p Level cannot be `O0` here. The pipelines produced are
/// only intended for use when attempting to optimize code. If frontends		/// only intended for use when attempting to optimize code. If frontends
/// require some transformations for semantic reasons, they should explicitly		/// require some transformations for semantic reasons, they should explicitly
/// build them.		/// build them.
ModulePassManager buildPerModuleDefaultPipeline(OptimizationLevel Level,		ModulePassManager
		buildPerModuleDefaultPipeline(OptimizationLevel Level,
bool DebugLogging = false,		bool DebugLogging = false,
bool LTOPreLink = false);		LTOPhase Phase = LTOPhase::None);

/// Build a pre-link, ThinLTO-targeting default optimization pipeline to		/// Build a pre-link, ThinLTO-targeting default optimization pipeline to
/// a pass manager.		/// a pass manager.
///		///
/// This adds the pre-link optimizations tuned to prepare a module for		/// This adds the pre-link optimizations tuned to prepare a module for
/// a ThinLTO run. It works to minimize the IR which needs to be analyzed		/// a ThinLTO run. It works to minimize the IR which needs to be analyzed
/// without making irreversible decisions which could be made better during		/// without making irreversible decisions which could be made better during
/// the LTO run.		/// the LTO run.
▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

llvm/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 377 Lines • ▼ Show 20 Lines
#define LOOP_ANALYSIS(NAME, CREATE_PASS) \		#define LOOP_ANALYSIS(NAME, CREATE_PASS) \
LAM.registerPass([&] { return CREATE_PASS; });		LAM.registerPass([&] { return CREATE_PASS; });
#include "PassRegistry.def"		#include "PassRegistry.def"

for (auto &C : LoopAnalysisRegistrationCallbacks)		for (auto &C : LoopAnalysisRegistrationCallbacks)
C(LAM);		C(LAM);
}		}

FunctionPassManager		FunctionPassManager PassBuilder::buildFunctionSimplificationPipeline(
PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,		OptimizationLevel Level, LTOPhase Phase, bool DebugLogging) {
ThinLTOPhase Phase,
bool DebugLogging) {
assert(Level != O0 && "Must request optimizations!");		assert(Level != O0 && "Must request optimizations!");
FunctionPassManager FPM(DebugLogging);		FunctionPassManager FPM(DebugLogging);

// Form SSA out of local memory accesses after breaking apart aggregates into		// Form SSA out of local memory accesses after breaking apart aggregates into
// scalars.		// scalars.
FPM.addPass(SROA());		FPM.addPass(SROA());

// Catch trivial redundancies		// Catch trivial redundancies
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	FunctionPassManager PassBuilder::buildFunctionSimplificationPipeline(
LPM1.addPass(SimpleLoopUnswitchPass());		LPM1.addPass(SimpleLoopUnswitchPass());
LPM2.addPass(IndVarSimplifyPass());		LPM2.addPass(IndVarSimplifyPass());
LPM2.addPass(LoopIdiomRecognizePass());		LPM2.addPass(LoopIdiomRecognizePass());

for (auto &C : LateLoopOptimizationsEPCallbacks)		for (auto &C : LateLoopOptimizationsEPCallbacks)
C(LPM2, Level);		C(LPM2, Level);

LPM2.addPass(LoopDeletionPass());		LPM2.addPass(LoopDeletionPass());
// Do not enable unrolling in PreLinkThinLTO phase during sample PGO		// Do not enable unrolling in PreLink LTO phase during sample PGO
// because it changes IR to makes profile annotation in back compile		// because it changes IR to makes profile annotation in back compile
// inaccurate.		// inaccurate.
if ((Phase != ThinLTOPhase::PreLink \|\| !PGOOpt \|\|		if ((Phase != LTOPhase::PreLink \|\| !PGOOpt \|\|
PGOOpt->Action != PGOOptions::SampleUse) &&		PGOOpt->Action != PGOOptions::SampleUse) &&
PTO.LoopUnrolling)		PTO.LoopUnrolling)
LPM2.addPass(LoopFullUnrollPass(Level, /OnlyWhenForced=/false,		LPM2.addPass(LoopFullUnrollPass(Level, /OnlyWhenForced=/false,
PTO.ForgetAllSCEVInLoopUnroll));		PTO.ForgetAllSCEVInLoopUnroll));

for (auto &C : LoopOptimizerEndEPCallbacks)		for (auto &C : LoopOptimizerEndEPCallbacks)
C(LPM2, Level);		C(LPM2, Level);

▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
static InlineParams		static InlineParams
getInlineParamsFromOptLevel(PassBuilder::OptimizationLevel Level) {		getInlineParamsFromOptLevel(PassBuilder::OptimizationLevel Level) {
auto O3 = PassBuilder::O3;		auto O3 = PassBuilder::O3;
unsigned OptLevel = Level > O3 ? 2 : Level;		unsigned OptLevel = Level > O3 ? 2 : Level;
unsigned SizeLevel = Level > O3 ? Level - O3 : 0;		unsigned SizeLevel = Level > O3 ? Level - O3 : 0;
return getInlineParams(OptLevel, SizeLevel);		return getInlineParams(OptLevel, SizeLevel);
}		}

ModulePassManager		ModulePassManager PassBuilder::buildModuleSimplificationPipeline(
PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,		OptimizationLevel Level, LTOPhase Phase, bool DebugLogging) {
ThinLTOPhase Phase,
bool DebugLogging) {
ModulePassManager MPM(DebugLogging);		ModulePassManager MPM(DebugLogging);

bool HasSampleProfile = PGOOpt && (PGOOpt->Action == PGOOptions::SampleUse);		bool HasSampleProfile = PGOOpt && (PGOOpt->Action == PGOOptions::SampleUse);

// In ThinLTO mode, when flattened profile is used, all the available		// In ThinLTO mode, when flattened profile is used, all the available
// profile information will be annotated in PreLink phase so there is		// profile information will be annotated in PreLink phase so there is
// no need to load the profile again in PostLink.		// no need to load the profile again in PostLink.
bool LoadSampleProfile =		bool LoadSampleProfile = HasSampleProfile && !(FlattenedProfileUsed &&
HasSampleProfile &&		Phase == LTOPhase::PostLink);
!(FlattenedProfileUsed && Phase == ThinLTOPhase::PostLink);

// During the ThinLTO backend phase we perform early indirect call promotion		// During the ThinLTO backend phase we perform early indirect call promotion
// here, before globalopt. Otherwise imported available_externally functions		// here, before globalopt. Otherwise imported available_externally functions
// look unreferenced and are removed. If we are going to load the sample		// look unreferenced and are removed. If we are going to load the sample
// profile then defer until later.		// profile then defer until later.
// TODO: See if we can move later and consolidate with the location where		// TODO: See if we can move later and consolidate with the location where
// we perform ICP when we are loading a sample profile.		// we perform ICP when we are loading a sample profile.
// TODO: We pass HasSampleProfile (whether there was a sample profile file		// TODO: We pass HasSampleProfile (whether there was a sample profile file
// passed to the compile) to the SamplePGO flag of ICP. This is used to		// passed to the compile) to the SamplePGO flag of ICP. This is used to
// determine whether the new direct calls are annotated with prof metadata.		// determine whether the new direct calls are annotated with prof metadata.
// Ideally this should be determined from whether the IR is annotated with		// Ideally this should be determined from whether the IR is annotated with
// sample profile, and not whether the a sample profile was provided on the		// sample profile, and not whether the a sample profile was provided on the
// command line. E.g. for flattened profiles where we will not be reloading		// command line. E.g. for flattened profiles where we will not be reloading
// the sample profile in the ThinLTO backend, we ideally shouldn't have to		// the sample profile in the ThinLTO backend, we ideally shouldn't have to
// provide the sample profile file.		// provide the sample profile file.
if (Phase == ThinLTOPhase::PostLink && !LoadSampleProfile)		if (Phase == LTOPhase::PostLink && !LoadSampleProfile)
MPM.addPass(PGOIndirectCallPromotion(true /* InLTO */, HasSampleProfile));		MPM.addPass(PGOIndirectCallPromotion(true /* InLTO */, HasSampleProfile));

// Do basic inference of function attributes from known properties of system		// Do basic inference of function attributes from known properties of system
// libraries and other oracles.		// libraries and other oracles.
MPM.addPass(InferFunctionAttrsPass());		MPM.addPass(InferFunctionAttrsPass());

// Create an early function pass manager to cleanup the output of the		// Create an early function pass manager to cleanup the output of the
// frontend.		// frontend.
Show All 15 Lines	if (LoadSampleProfile)
EarlyFPM.addPass(InstCombinePass());		EarlyFPM.addPass(InstCombinePass());
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(EarlyFPM)));		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(EarlyFPM)));

if (LoadSampleProfile) {		if (LoadSampleProfile) {
// Annotate sample profile right after early FPM to ensure freshness of		// Annotate sample profile right after early FPM to ensure freshness of
// the debug info.		// the debug info.
MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,		MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
PGOOpt->ProfileRemappingFile,		PGOOpt->ProfileRemappingFile,
Phase == ThinLTOPhase::PreLink));		Phase == LTOPhase::PreLink));
// Cache ProfileSummaryAnalysis once to avoid the potential need to insert		// Cache ProfileSummaryAnalysis once to avoid the potential need to insert
// RequireAnalysisPass for PSI before subsequent non-module passes.		// RequireAnalysisPass for PSI before subsequent non-module passes.
MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());		MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
// Do not invoke ICP in the ThinLTOPrelink phase as it makes it hard		// Do not invoke ICP in the LTOPrelink phase as it makes it hard
// for the profile annotation to be accurate in the ThinLTO backend.		// for the profile annotation to be accurate in the LTO backend.
if (Phase != ThinLTOPhase::PreLink)		if (Phase != LTOPhase::PreLink)
// We perform early indirect call promotion here, before globalopt.		// We perform early indirect call promotion here, before globalopt.
// This is important for the ThinLTO backend phase because otherwise		// This is important for the ThinLTO backend phase because otherwise
// imported available_externally functions look unreferenced and are		// imported available_externally functions look unreferenced and are
// removed.		// removed.
MPM.addPass(PGOIndirectCallPromotion(Phase == ThinLTOPhase::PostLink,		MPM.addPass(PGOIndirectCallPromotion(Phase == LTOPhase::PostLink,
true /* SamplePGO */));		true /* SamplePGO */));
}		}

// Interprocedural constant propagation now that basic cleanup has occurred		// Interprocedural constant propagation now that basic cleanup has occurred
// and prior to optimizing globals.		// and prior to optimizing globals.
// FIXME: This position in the pipeline hasn't been carefully considered in		// FIXME: This position in the pipeline hasn't been carefully considered in
// years, it should be re-analyzed.		// years, it should be re-analyzed.
MPM.addPass(IPSCCPPass());		MPM.addPass(IPSCCPPass());
Show All 21 Lines	ModulePassManager PassBuilder::buildModuleSimplificationPipeline(
FunctionPassManager GlobalCleanupPM(DebugLogging);		FunctionPassManager GlobalCleanupPM(DebugLogging);
GlobalCleanupPM.addPass(InstCombinePass());		GlobalCleanupPM.addPass(InstCombinePass());
invokePeepholeEPCallbacks(GlobalCleanupPM, Level);		invokePeepholeEPCallbacks(GlobalCleanupPM, Level);

GlobalCleanupPM.addPass(SimplifyCFGPass());		GlobalCleanupPM.addPass(SimplifyCFGPass());
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(GlobalCleanupPM)));		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(GlobalCleanupPM)));

// Add all the requested passes for instrumentation PGO, if requested.		// Add all the requested passes for instrumentation PGO, if requested.
if (PGOOpt && Phase != ThinLTOPhase::PostLink &&		if (PGOOpt && Phase != LTOPhase::PostLink &&
(PGOOpt->Action == PGOOptions::IRInstr \|\|		(PGOOpt->Action == PGOOptions::IRInstr \|\|
PGOOpt->Action == PGOOptions::IRUse)) {		PGOOpt->Action == PGOOptions::IRUse)) {
addPGOInstrPasses(MPM, DebugLogging, Level,		addPGOInstrPasses(MPM, DebugLogging, Level,
/* RunProfileGen */ PGOOpt->Action == PGOOptions::IRInstr,		/* RunProfileGen */ PGOOpt->Action == PGOOptions::IRInstr,
/* IsCS */ false, PGOOpt->ProfileFile,		/* IsCS */ false, PGOOpt->ProfileFile,
PGOOpt->ProfileRemappingFile);		PGOOpt->ProfileRemappingFile);
MPM.addPass(PGOIndirectCallPromotion(false, false));		MPM.addPass(PGOIndirectCallPromotion(false, false));
}		}
if (PGOOpt && Phase != ThinLTOPhase::PostLink &&		if (PGOOpt && Phase != LTOPhase::PostLink &&
PGOOpt->CSAction == PGOOptions::CSIRInstr)		PGOOpt->CSAction == PGOOptions::CSIRInstr)
MPM.addPass(PGOInstrumentationGenCreateVar(PGOOpt->CSProfileGenFile));		MPM.addPass(PGOInstrumentationGenCreateVar(PGOOpt->CSProfileGenFile));

// Synthesize function entry counts for non-PGO compilation.		// Synthesize function entry counts for non-PGO compilation.
if (EnableSyntheticCounts && !PGOOpt)		if (EnableSyntheticCounts && !PGOOpt)
MPM.addPass(SyntheticCountsPropagation());		MPM.addPass(SyntheticCountsPropagation());

// Require the GlobalsAA analysis for the module so we can query it within		// Require the GlobalsAA analysis for the module so we can query it within
Show All 16 Lines	ModulePassManager PassBuilder::buildModuleSimplificationPipeline(
// invoke or a call.		// invoke or a call.

// Run the inliner first. The theory is that we are walking bottom-up and so		// Run the inliner first. The theory is that we are walking bottom-up and so
// the callees have already been fully optimized, and we want to inline them		// the callees have already been fully optimized, and we want to inline them
// into the callers so that our optimizations can reflect that.		// into the callers so that our optimizations can reflect that.
// For PreLinkThinLTO pass, we disable hot-caller heuristic for sample PGO		// For PreLinkThinLTO pass, we disable hot-caller heuristic for sample PGO
// because it makes profile annotation in the backend inaccurate.		// because it makes profile annotation in the backend inaccurate.
InlineParams IP = getInlineParamsFromOptLevel(Level);		InlineParams IP = getInlineParamsFromOptLevel(Level);
if (Phase == ThinLTOPhase::PreLink && PGOOpt &&		if (Phase == LTOPhase::PreLink && PGOOpt &&
PGOOpt->Action == PGOOptions::SampleUse)		PGOOpt->Action == PGOOptions::SampleUse)
IP.HotCallSiteThreshold = 0;		IP.HotCallSiteThreshold = 0;
MainCGPipeline.addPass(InlinerPass(IP));		MainCGPipeline.addPass(InlinerPass(IP));

// Now deduce any function attributes based in the current code.		// Now deduce any function attributes based in the current code.
MainCGPipeline.addPass(PostOrderFunctionAttrsPass());		MainCGPipeline.addPass(PostOrderFunctionAttrsPass());

// When at O3 add argument promotion to the pass pipeline.		// When at O3 add argument promotion to the pass pipeline.
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions().
sinkCommonInsts(true)));		sinkCommonInsts(true)));

// Optimize parallel scalar instruction chains into SIMD instructions.		// Optimize parallel scalar instruction chains into SIMD instructions.
if (PTO.SLPVectorization)		if (PTO.SLPVectorization)
OptimizePM.addPass(SLPVectorizerPass());		OptimizePM.addPass(SLPVectorizerPass());

OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());

		// Do not enable unrolling in PreLink LTO phase during sample PGO
		// because it changes IR to makes profile annotation in back compile
		// inaccurate.
		bool DoLoopUnrolling =
		(!LTOPreLink \|\| !PGOOpt \|\| PGOOpt->Action != PGOOptions::SampleUse) &&
		PTO.LoopUnrolling;

// Unroll small loops to hide loop backedge latency and saturate any parallel		// Unroll small loops to hide loop backedge latency and saturate any parallel
// execution resources of an out-of-order processor. We also then need to		// execution resources of an out-of-order processor. We also then need to
// clean up redundancies and loop invariant code.		// clean up redundancies and loop invariant code.
// FIXME: It would be really good to use a loop-integrated instruction		// FIXME: It would be really good to use a loop-integrated instruction
// combiner for cleanup here so that the unrolling and LICM can be pipelined		// combiner for cleanup here so that the unrolling and LICM can be pipelined
// across the loop nests.		// across the loop nests.
// We do UnrollAndJam in a separate LPM to ensure it happens before unroll		// We do UnrollAndJam in a separate LPM to ensure it happens before unroll
if (EnableUnrollAndJam && PTO.LoopUnrolling) {		if (EnableUnrollAndJam && DoLoopUnrolling) {
OptimizePM.addPass(		OptimizePM.addPass(
createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));		createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));
}		}
OptimizePM.addPass(LoopUnrollPass(		OptimizePM.addPass(LoopUnrollPass(
LoopUnrollOptions(Level, /OnlyWhenForced=/!PTO.LoopUnrolling,		LoopUnrollOptions(Level, /OnlyWhenForced=/!DoLoopUnrolling,
PTO.ForgetAllSCEVInLoopUnroll)));		PTO.ForgetAllSCEVInLoopUnroll)));
OptimizePM.addPass(WarnMissedTransformationsPass());		OptimizePM.addPass(WarnMissedTransformationsPass());
OptimizePM.addPass(InstCombinePass());		OptimizePM.addPass(InstCombinePass());
OptimizePM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());		OptimizePM.addPass(RequireAnalysisPass<OptimizationRemarkEmitterAnalysis, Function>());
OptimizePM.addPass(createFunctionToLoopPassAdaptor(		OptimizePM.addPass(createFunctionToLoopPassAdaptor(
LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap),		LICMPass(PTO.LicmMssaOptCap, PTO.LicmMssaNoAccForPromotionCap),
EnableMSSALoopDependency, DebugLogging));		EnableMSSALoopDependency, DebugLogging));

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	ModulePassManager PassBuilder::buildModuleOptimizationPipeline(
MPM.addPass(GlobalDCEPass());		MPM.addPass(GlobalDCEPass());
MPM.addPass(ConstantMergePass());		MPM.addPass(ConstantMergePass());

return MPM;		return MPM;
}		}

ModulePassManager		ModulePassManager
PassBuilder::buildPerModuleDefaultPipeline(OptimizationLevel Level,		PassBuilder::buildPerModuleDefaultPipeline(OptimizationLevel Level,
bool DebugLogging, bool LTOPreLink) {		bool DebugLogging, LTOPhase Phase) {
assert(Level != O0 && "Must request optimizations for the default pipeline!");		assert(Level != O0 && "Must request optimizations for the default pipeline!");

ModulePassManager MPM(DebugLogging);		ModulePassManager MPM(DebugLogging);

// Force any function attributes we want the rest of the pipeline to observe.		// Force any function attributes we want the rest of the pipeline to observe.
MPM.addPass(ForceFunctionAttrsPass());		MPM.addPass(ForceFunctionAttrsPass());

// Apply module pipeline start EP callback.		// Apply module pipeline start EP callback.
for (auto &C : PipelineStartEPCallbacks)		for (auto &C : PipelineStartEPCallbacks)
C(MPM);		C(MPM);

if (PGOOpt && PGOOpt->SamplePGOSupport)		if (PGOOpt && PGOOpt->SamplePGOSupport)
MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));		MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));

// Add the core simplification pipeline.		// Add the core simplification pipeline.
MPM.addPass(buildModuleSimplificationPipeline(Level, ThinLTOPhase::None,		MPM.addPass(buildModuleSimplificationPipeline(Level, Phase, DebugLogging));
DebugLogging));

// Now add the optimization pipeline.		// Now add the optimization pipeline.
MPM.addPass(buildModuleOptimizationPipeline(Level, DebugLogging, LTOPreLink));		MPM.addPass(buildModuleOptimizationPipeline(Level, DebugLogging,
		Phase == LTOPhase::PreLink));

return MPM;		return MPM;
}		}

ModulePassManager		ModulePassManager
PassBuilder::buildThinLTOPreLinkDefaultPipeline(OptimizationLevel Level,		PassBuilder::buildThinLTOPreLinkDefaultPipeline(OptimizationLevel Level,
bool DebugLogging) {		bool DebugLogging) {
assert(Level != O0 && "Must request optimizations for the default pipeline!");		assert(Level != O0 && "Must request optimizations for the default pipeline!");

ModulePassManager MPM(DebugLogging);		ModulePassManager MPM(DebugLogging);

// Force any function attributes we want the rest of the pipeline to observe.		// Force any function attributes we want the rest of the pipeline to observe.
MPM.addPass(ForceFunctionAttrsPass());		MPM.addPass(ForceFunctionAttrsPass());

if (PGOOpt && PGOOpt->SamplePGOSupport)		if (PGOOpt && PGOOpt->SamplePGOSupport)
MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));		MPM.addPass(createModuleToFunctionPassAdaptor(AddDiscriminatorsPass()));

// Apply module pipeline start EP callback.		// Apply module pipeline start EP callback.
for (auto &C : PipelineStartEPCallbacks)		for (auto &C : PipelineStartEPCallbacks)
C(MPM);		C(MPM);

// If we are planning to perform ThinLTO later, we don't bloat the code with		// If we are planning to perform ThinLTO later, we don't bloat the code with
// unrolling/vectorization/... now. Just simplify the module as much as we		// unrolling/vectorization/... now. Just simplify the module as much as we
// can.		// can.
MPM.addPass(buildModuleSimplificationPipeline(Level, ThinLTOPhase::PreLink,		MPM.addPass(buildModuleSimplificationPipeline(Level, LTOPhase::PreLink,
DebugLogging));		DebugLogging));

// Run partial inlining pass to partially inline functions that have		// Run partial inlining pass to partially inline functions that have
// large bodies.		// large bodies.
// FIXME: It isn't clear whether this is really the right place to run this		// FIXME: It isn't clear whether this is really the right place to run this
// in ThinLTO. Because there is another canonicalization and simplification		// in ThinLTO. Because there is another canonicalization and simplification
// phase that will run after the thin link, running this here ends up with		// phase that will run after the thin link, running this here ends up with
// less information than will be available later and it may grow functions in		// less information than will be available later and it may grow functions in
Show All 34 Lines	ModulePassManager PassBuilder::buildThinLTODefaultPipeline(

if (Level == O0)		if (Level == O0)
return MPM;		return MPM;

// Force any function attributes we want the rest of the pipeline to observe.		// Force any function attributes we want the rest of the pipeline to observe.
MPM.addPass(ForceFunctionAttrsPass());		MPM.addPass(ForceFunctionAttrsPass());

// Add the core simplification pipeline.		// Add the core simplification pipeline.
MPM.addPass(buildModuleSimplificationPipeline(Level, ThinLTOPhase::PostLink,		MPM.addPass(buildModuleSimplificationPipeline(Level, LTOPhase::PostLink,
DebugLogging));		DebugLogging));

// Now add the optimization pipeline.		// Now add the optimization pipeline.
MPM.addPass(buildModuleOptimizationPipeline(Level, DebugLogging));		MPM.addPass(buildModuleOptimizationPipeline(Level, DebugLogging));

return MPM;		return MPM;
}		}

ModulePassManager		ModulePassManager
PassBuilder::buildLTOPreLinkDefaultPipeline(OptimizationLevel Level,		PassBuilder::buildLTOPreLinkDefaultPipeline(OptimizationLevel Level,
bool DebugLogging) {		bool DebugLogging) {
assert(Level != O0 && "Must request optimizations for the default pipeline!");		assert(Level != O0 && "Must request optimizations for the default pipeline!");
// FIXME: We should use a customized pre-link pipeline!		// FIXME: We should use a customized pre-link pipeline!
return buildPerModuleDefaultPipeline(Level, DebugLogging,		return buildPerModuleDefaultPipeline(Level, DebugLogging, LTOPhase::PreLink);
/* LTOPreLink */true);
}		}

ModulePassManager		ModulePassManager
PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, bool DebugLogging,		PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level, bool DebugLogging,
ModuleSummaryIndex *ExportSummary) {		ModuleSummaryIndex *ExportSummary) {
ModulePassManager MPM(DebugLogging);		ModulePassManager MPM(DebugLogging);

if (Level == O0) {		if (Level == O0) {
// The WPD and LowerTypeTest passes need to run at -O0 to lower type		// The WPD and LowerTypeTest passes need to run at -O0 to lower type
// metadata and intrinsics.		// metadata and intrinsics.
MPM.addPass(WholeProgramDevirtPass(ExportSummary, nullptr));		MPM.addPass(WholeProgramDevirtPass(ExportSummary, nullptr));
MPM.addPass(LowerTypeTestsPass(ExportSummary, nullptr));		MPM.addPass(LowerTypeTestsPass(ExportSummary, nullptr));
return MPM;		return MPM;
}		}

if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {		if (PGOOpt && PGOOpt->Action == PGOOptions::SampleUse) {
// Load sample profile before running the LTO optimization pipeline.		// Load sample profile before running the LTO optimization pipeline.
MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,		MPM.addPass(SampleProfileLoaderPass(PGOOpt->ProfileFile,
PGOOpt->ProfileRemappingFile,		PGOOpt->ProfileRemappingFile,
false /* ThinLTOPhase::PreLink */));		false /* LTOPhase::PreLink */));
// Cache ProfileSummaryAnalysis once to avoid the potential need to insert		// Cache ProfileSummaryAnalysis once to avoid the potential need to insert
// RequireAnalysisPass for PSI before subsequent non-module passes.		// RequireAnalysisPass for PSI before subsequent non-module passes.
MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());		MPM.addPass(RequireAnalysisPass<ProfileSummaryAnalysis, Module>());
}		}

// Remove unused virtual tables to improve the quality of code generated by		// Remove unused virtual tables to improve the quality of code generated by
// whole-program devirtualization and bitset lowering.		// whole-program devirtualization and bitset lowering.
MPM.addPass(GlobalDCEPass());		MPM.addPass(GlobalDCEPass());
▲ Show 20 Lines • Show All 1,222 Lines • Show Last 20 Lines

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 492 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
// the ThinLTO backend when PerformThinLTO=true, when we promote imported		// the ThinLTO backend when PerformThinLTO=true, when we promote imported
// inter-module indirect calls. For that we perform indirect call promotion		// inter-module indirect calls. For that we perform indirect call promotion
// earlier in the pass pipeline, here before globalopt. Otherwise imported		// earlier in the pass pipeline, here before globalopt. Otherwise imported
// available_externally functions look unreferenced and are removed.		// available_externally functions look unreferenced and are removed.
if (PerformThinLTO)		if (PerformThinLTO)
MPM.add(createPGOIndirectCallPromotionLegacyPass(/InLTO = / true,		MPM.add(createPGOIndirectCallPromotionLegacyPass(/InLTO = / true,
!PGOSampleUse.empty()));		!PGOSampleUse.empty()));

// For SamplePGO in ThinLTO compile phase, we do not want to unroll loops		// For SamplePGO in the *LTO compile phase, we do not want to unroll loops
// as it will change the CFG too much to make the 2nd profile annotation		// as it will change the CFG too much to make the 2nd profile annotation
// in backend more difficult.		// in backend more difficult.
bool PrepareForThinLTOUsingPGOSampleProfile =		bool PrepareForLTOUsingPGOSampleProfile =
PrepareForThinLTO && !PGOSampleUse.empty();		(PrepareForThinLTO \|\| PrepareForLTO) && !PGOSampleUse.empty();
if (PrepareForThinLTOUsingPGOSampleProfile)		if (PrepareForLTOUsingPGOSampleProfile)
DisableUnrollLoops = true;		DisableUnrollLoops = true;

// Infer attributes about declarations if possible.		// Infer attributes about declarations if possible.
MPM.add(createInferFunctionAttrsLegacyPass());		MPM.add(createInferFunctionAttrsLegacyPass());

addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);		addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);

if (OptLevel > 2)		if (OptLevel > 2)
Show All 10 Lines	void PassManagerBuilder::populateModulePassManager(
MPM.add(createPromoteMemoryToRegisterPass());		MPM.add(createPromoteMemoryToRegisterPass());

MPM.add(createDeadArgEliminationPass()); // Dead argument elimination		MPM.add(createDeadArgEliminationPass()); // Dead argument elimination

addInstructionCombiningPass(MPM); // Clean up after IPCP & DAE		addInstructionCombiningPass(MPM); // Clean up after IPCP & DAE
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE		MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE

// For SamplePGO in ThinLTO compile phase, we do not want to do indirect		// For SamplePGO in the *LTO compile phase, we do not want to do indirect
// call promotion as it will change the CFG too much to make the 2nd		// call promotion as it will change the CFG too much to make the 2nd
// profile annotation in backend more difficult.		// profile annotation in backend more difficult.
// PGO instrumentation is added during the compile phase for ThinLTO, do		// PGO instrumentation is added during the compile phase for *LTO, do
// not run it a second time		// not run it a second time
if (DefaultOrPreLinkPipeline && !PrepareForThinLTOUsingPGOSampleProfile)		if (DefaultOrPreLinkPipeline && !PrepareForLTOUsingPGOSampleProfile)
addPGOInstrPasses(MPM);		addPGOInstrPasses(MPM);

// Create profile COMDAT variables. Lld linker wants to see all variables		// Create profile COMDAT variables. Lld linker wants to see all variables
// before the LTO/ThinLTO link since it needs to resolve symbols/comdats.		// before the LTO/ThinLTO link since it needs to resolve symbols/comdats.
if (!PerformThinLTO && EnablePGOCSInstrGen)		if (!PerformThinLTO && EnablePGOCSInstrGen)
MPM.add(createPGOInstrumentationGenCreateVarLegacyPass(PGOInstrGen));		MPM.add(createPGOInstrumentationGenCreateVarLegacyPass(PGOInstrGen));

// We add a module alias analysis pass here. In part due to bugs in the		// We add a module alias analysis pass here. In part due to bugs in the
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
if (RunInliner) {		if (RunInliner) {
MPM.add(createGlobalOptimizerPass());		MPM.add(createGlobalOptimizerPass());
MPM.add(createGlobalDCEPass());		MPM.add(createGlobalDCEPass());
}		}

// If we are planning to perform ThinLTO later, let's not bloat the code with		// If we are planning to perform ThinLTO later, let's not bloat the code with
// unrolling/vectorization/... now. We'll first run the inliner + CGSCC passes		// unrolling/vectorization/... now. We'll first run the inliner + CGSCC passes
// during ThinLTO and perform the rest of the optimizations afterward.		// during ThinLTO and perform the rest of the optimizations afterward.
if (PrepareForThinLTO) {		if (PrepareForThinLTO) {
wenleiUnsubmitted Not Done Reply Inline Actions this also need to be `PrepareForThinLTO \|\| PrepareForLTO` for oldPM? wenlei: this also need to be `PrepareForThinLTO \|\| PrepareForLTO` for oldPM?
wristowUnsubmitted Not Done Reply Inline Actions I agree this is another instance where a balancing act question applies. In this case, assuming the comment about the concern of code bloat is accurate, it's not so much about compile-time resources in the full LTO back-end, but rather about minimizing the ThinLTO bitcode write/read time. So if as this WIP evolves, it ultimately is a win for SamplePGO to suppress some loop optimizations (unrolling/vectorization) here, then that will probably also be a small win in full LTO compile time. That said, in addition to these loop-related optimizations, there are other transformations here that are done in the full LTO pipeline (but not in the ThinLTO pipeline). So I suspect if some change to check for `PrepareForThinLTO \|\| PrepareForLTO` (rather than only `PrepareForThinLTO`) makes sense here from a Sample PGO perspective, then the change will be more complicated than simply adding the small set of passes here followed by the early return (that is, I think there are probably things after the `return` on line 621 that still ought to be enabled for full LTO -- essentially continuing to do them in the pre-link stage for full LTO, to try to avoid needing to do too much work in the full LTO backend stage, since it's more of a problem for the full backend to absorb that compile time cost). wristow: I agree this is another instance where a balancing act question applies. In this case…
tejohnsonAuthorUnsubmitted Done Reply Inline Actions This early return was not for Sample PGO btw. It was added much earlier with the thought that a) these types of optimizations might affect function importing heuristics because they could bloat the code; b) we can push more optimizations to the post-link in ThinLTO because it is parallel; and c) there isn't otherwise a benefit to doing these optimizations in the pre vs post link, i.e. they aren't cleanup/simplification passes. The equation is of course different for full LTO which has a monolithic serial post link backend. But I believe this early return is the one @ormris is looking to remove on the ThinLTO pass to "merge" the two pipelines, which needs a good amount of evaluation on the ThinLTO performance side. tejohnson: This early return was not for Sample PGO btw. It was added much earlier with the thought that…
// Ensure we perform any last passes, but do so before renaming anonymous		// Ensure we perform any last passes, but do so before renaming anonymous
// globals in case the passes add any.		// globals in case the passes add any.
addExtensionsToPM(EP_OptimizerLast, MPM);		addExtensionsToPM(EP_OptimizerLast, MPM);
MPM.add(createCanonicalizeAliasesPass());		MPM.add(createCanonicalizeAliasesPass());
// Rename anon globals to be able to export them in the summary.		// Rename anon globals to be able to export them in the summary.
MPM.add(createNameAnonGlobalPass());		MPM.add(createNameAnonGlobalPass());
return;		return;
}		}
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines