This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/IPO/
-
llvm/
-
Transforms/
-
IPO/
-
PassManagerBuilder.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp

Differential D29376

LTO: align the Monolithic LTO optimization pipeline on the ThinLTO and O2/O3 one
Needs RevisionPublic

Authored by mehdi_amini on Jan 31 2017, 11:19 PM.

Download Raw Diff

Details

Reviewers

kristof.beyls
jmolloy
davide

Summary

18 months ago, we tried to do the same thing but the increased
link-time was not judged acceptable because the lack of scaling
for LTO. Now that ThinLTO is there for any project that cares
about scaling and build time performance in general, we can be
more aggressive with Monolithic LTO for projects that are either
small enough or are willing to pay the extra compile time.

Aligning the pipeline on O2/O3 (and ThinLTO), makes the pipeline
maintenance easier: making the pipeline evolve for ThinLTO will
benefit Monolithic LTO and vice-versa. Also any misoptimization
for one use case is likely to affect the other and fixing it will
benefit all. Studying the difference in performance between LTO
and ThinLTO while doing this helped us last year to improve some
analyses that helped non-LTO builds.

Note: I haven't had time to benchmark this recently, so I put it up
for review so that everyone can run their own private benchmarks and
we can evaluate any identified defficiencies.

Diff Detail

Build Status

Buildable 3529
Build 3529: arc lint + arc unit

Event Timeline

mehdi_amini created this revision.Jan 31 2017, 11:19 PM

Herald added a subscriber: Prazek. · View Herald TranscriptJan 31 2017, 11:19 PM

eastig added a subscriber: eastig.Feb 1 2017, 1:56 AM

Hi Mehdi,

I'll check if the patch has any impact on our benchmarks.

Thanks,
Evgeny Astigeevich
Senior Compiler Engineer
Compilation Tools
ARM

For a start, this pipeline doesn't look right as it needs to include at least the CFI passes with the correct summary action.

But a higher level point is that it seems premature to do this until there is a proven migration path from regular to thin LTO for users of the CFI and whole-program devirtualization features, such as Chromium. This at least means that the features must be supported. For the migration path to be proven, at least one major user of the feature (i.e. Chromium) must have moved its official builds from regular to thin LTO.

We hope to have completed the migration by the end of the quarter, but that is, as always, optimistic.

In D29376#663665, @pcc wrote:

For a start, this pipeline doesn't look right as it needs to include at least the CFI passes with the correct summary action.

Yes I noticed it while doing it, but promised to the other guys that I'll get a patch out to be able to test the performance and the build time impact.

But a higher level point is that it seems premature to do this until there is a proven migration path from regular to thin LTO for users of the CFI and whole-program devirtualization features, such as Chromium.

I disagree that we would hold back the whole LTO optimization strategy based on a single large project that is using CFI-features. If the extra build-time is a real problem for Chromium and CFI, then a better strategy is to lower the optimization level during LTO to O1.

(O1 may have be tweaked with extra globalopt when "PerformLTO" is set though).

We already use regular LTO at opt level 1 -- this basically provides "dead stripping with support for CFI and devirtualization". That is essentially all we need from the LTO pipeline at the moment. Getting more out of LTO (cross-module inlining, etc.) would be nice to have, and is part of the motivation for switching to thin LTO.

Based on my reading of the code, the thin LTO opt level 1 pipeline does a lot more than the current regular LTO opt level 1 pipeline. Switching the pipeline like this would likely make link times unacceptably slow until we switch to thin LTO.

In D29376#663757, @pcc wrote:

We already use regular LTO at opt level 1 -- this basically provides "dead stripping with support for CFI and devirtualization". That is essentially all we need from the LTO pipeline at the moment.

So that's more like a O0 :)

Based on my reading of the code, the thin LTO opt level 1 pipeline does a lot more than the current regular LTO opt level 1 pipeline. Switching the pipeline like this would likely make link times unacceptably slow until we switch to thin LTO.

Unacceptably slow for Chromium is not the majority of use-cases.
If we end up with Chromium being the only blocker for any progress for the 99.9% other applications, you're likely gonna have to supply you own flag "-mllvm -do-lto-optimizations-but-not-too-much".

In D29376#663761, @mehdi_amini wrote:

In D29376#663757, @pcc wrote:

We already use regular LTO at opt level 1 -- this basically provides "dead stripping with support for CFI and devirtualization". That is essentially all we need from the LTO pipeline at the moment.

So that's more like a O0 :)

No, O0 does not include globaldce and functionattrs.

Based on my reading of the code, the thin LTO opt level 1 pipeline does a lot more than the current regular LTO opt level 1 pipeline. Switching the pipeline like this would likely make link times unacceptably slow until we switch to thin LTO.

Unacceptably slow for Chromium is not the majority of use-cases.
If we end up with Chromium being the only blocker for any progress for the 99.9% other applications, you're likely gonna have to supply you own flag "-mllvm -do-lto-optimizations-but-not-too-much".

It is not necessarily just Chromium, it is any user who just wants CFI/devirt without it being tied to the rest of the LTO pipeline. It is already possible to pick and choose those features at compile time, and I reckon that property should be carried through to the linker. LTO opt level 1 already allows users to do that; it roughly means "act like a regular linker with --gc-sections". I think we should keep that meaning at least until those users can move to thin LTO. I don't have a problem with changing the other opt levels, though (and in fact, I'd encourage it, as it would allow us to remove the existing regular LTO pipeline).

In D29376#663788, @pcc wrote:

In D29376#663761, @mehdi_amini wrote:

In D29376#663757, @pcc wrote:

We already use regular LTO at opt level 1 -- this basically provides "dead stripping with support for CFI and devirtualization". That is essentially all we need from the LTO pipeline at the moment.

So that's more like a O0 :)

No, O0 does not include globaldce and functionattrs.

I know, that's why I didn't write "that is exactly O0".

Based on my reading of the code, the thin LTO opt level 1 pipeline does a lot more than the current regular LTO opt level 1 pipeline. Switching the pipeline like this would likely make link times unacceptably slow until we switch to thin LTO.

Unacceptably slow for Chromium is not the majority of use-cases.
If we end up with Chromium being the only blocker for any progress for the 99.9% other applications, you're likely gonna have to supply you own flag "-mllvm -do-lto-optimizations-but-not-too-much".

It is not necessarily just Chromium, it is any user who just wants CFI/devirt without it being tied to the rest of the LTO pipeline.

That's very legitimate, and "not being tied to the rest of the LTO pipeline" is exactly the reason why I oppose blocking the LTO pipeline on such use case. If such use-case needs to be "untied", a separate flag/flow can be provided to them.

LTO opt level 1 already allows users to do that; it roughly means "act like a regular linker with --gc-sections".

I'd be fine with having LTO O1 being hardcoded to a special path that would be CFI+DCE, but then what is O0?

In D29376#663795, @mehdi_amini wrote:

It is not necessarily just Chromium, it is any user who just wants CFI/devirt without it being tied to the rest of the LTO pipeline.

That's very legitimate, and "not being tied to the rest of the LTO pipeline" is exactly the reason why I oppose blocking the LTO pipeline on such use case. If such use-case needs to be "untied", a separate flag/flow can be provided to them.

I agree that in the long term we should think about this more carefully. In fact right now in ThinLTO we have no way of untying these features from the rest of the pipeline. I think the long term plan should be:

figure out the right way to expose the "act like a regular linker with --gc-sections" feature
expose it to regular and thin LTO
remove the hardcoded regular LTO path you mention below

LTO opt level 1 already allows users to do that; it roughly means "act like a regular linker with --gc-sections".

I'd be fine with having LTO O1 being hardcoded to a special path that would be CFI+DCE,

Works for me, thanks.

but then what is O0?

I think O0 should have basically the same meaning as in the compiler: the minimal pass pipeline required for correctness.

Fixed a few discrepancies, add back the CFI/WPDevirt and try to hook this up in a way that makes sense.

At least the validation is passing now.

Herald added a subscriber: jfb. · View Herald TranscriptFeb 1 2017, 5:47 PM

Hi Mehdi,

I've got results of benchmarks for Cortex-M4:

A benchmark failed because something wrong with a created executable.
A number of performance gains is 21. Max score boost is ~4x.
A number of performance losses is 15. Max score degradation is ~47x.

You can see the patch affects performance very much and needs detailed performance analysis.

We have other M-profile boards I'll run the benchmarks on them as well.

Awesome! Thanks for running it, this is not surprising at all :)

Hi Mehdi,

I've got results of benchmarks for Cortex-M7, they are better than for Cortex-M4:

The same benchmark failed because something wrong with a created executable.
A number of performance gains is 22. Five of them have score boost > 4x. 17 have no change or slight change. Max score boost is ~20x.
A number of performance losses is 23. Five of them have score degradation between 1.2x and 2.5x. 18 have no change or slight change. Max score degradation is ~2.5x.

In D29376#673588, @eastig wrote:

Hi Mehdi,

I've got results of benchmarks for Cortex-M7, they are better than for Cortex-M4:

The same benchmark failed because something wrong with a created executable.

Can you please open a bug at llvm.org with a repro?

A number of performance gains is 22. Five of them have score boost > 4x. 17 have no change or slight change. Max score boost is ~20x.

A number of performance losses is 23. Five of them have score degradation between 1.2x and 2.5x. 18 have no change or slight change. Max score degradation is ~2.5x.

In D29376#673588, @eastig wrote:

Hi Mehdi,

I've got results of benchmarks for Cortex-M7, they are better than for Cortex-M4:

The same benchmark failed because something wrong with a created executable.

A number of performance gains is 22. Five of them have score boost > 4x. 17 have no change or slight change. Max score boost is ~20x.

A number of performance losses is 23. Five of them have score degradation between 1.2x and 2.5x. 18 have no change or slight change. Max score degradation is ~2.5x.

Thanks for the update. Good to know.

This is all shows how much performance room we have ahead :)

I'm not sure when I'll have time to dig into the perf regressions I noticed on the llvm-testsuite. It'll take some time to get there!

(hopefully we'll improve the non-LTO performance at the same time...)

I took some time to run on some games internally. The change is mostly performance neutral, some titles are 1% faster, some 1% slower, but nothing glamorous.
(Please note these are "real-world" programs, not synthetic benchmarks).
What worries me about this change is the increase in compile time. I was able to notice 30%-50% increase in compile time (for some of these, the LTO time was already 10 minutes so 50% more is not acceptable).
The time is spent in the usual suspects: GVN, InstCombine, Inlining etc... I'd love to hear others' thoughts, but I'm inclined to hold on this until we have a better story for our compile time.

In D29376#673598, @davide wrote:

Can you please open a bug at llvm.org with a repro?

I am on holidays next week. I'll investigate the issue when I am back.

In D29376#673607, @eastig wrote:

In D29376#673598, @davide wrote:

Can you please open a bug at llvm.org with a repro?

I am on holidays next week. I'll investigate the issue when I am back.

@eastig Did you get a chance to take a look?

This revision now requires changes to proceed.Apr 25 2017, 1:28 PM

In D29376#737274, @davide wrote:

In D29376#673607, @eastig wrote:

In D29376#673598, @davide wrote:

Can you please open a bug at llvm.org with a repro?

I am on holidays next week. I'll investigate the issue when I am back.

@eastig Did you get a chance to take a look?

Hi Davide,

Thank you for reminding.
Shame on me. I get it lost.
I'll check with the trunk if the issue still exists.

In D29376#737274, @davide wrote:

In D29376#673607, @eastig wrote:

In D29376#673598, @davide wrote:

Can you please open a bug at llvm.org with a repro?

I am on holidays next week. I'll investigate the issue when I am back.

@eastig Did you get a chance to take a look?

Hi Davide,

I have a good news. There is no need for a bug.

I have looked at the benchmark failure. In fact, it is not a crash. The benchmark completed its run. The benchmark did some checks of its results. The checks found the results suspicious. The execution time was almost zero. It seems the benchmark is very sensitive to LTO optimizations.
I compared binary files. The test code, a whole test loop, was completely removed. As the benchmark was run on a bare-metal board, a driver running the benchmark reported the board returned an unexpected error.

Thanks,
Evgeny

mehdi_amini mentioned this in D123803: [WIP][llvm] A Unified LTO Bitcode Frontend.May 19 2023, 2:00 PM

mehdi_amini mentioned this in D148010: [Pipelines] Don't run module optimization in full LTO pre-link.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

IPO/

PassManagerBuilder.h

6 lines

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

248 lines

Diff 86756

llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	public:
bool NewGVN;		bool NewGVN;
bool DisableGVNLoadPRE;		bool DisableGVNLoadPRE;
bool VerifyInput;		bool VerifyInput;
bool VerifyOutput;		bool VerifyOutput;
bool MergeFunctions;		bool MergeFunctions;
bool PrepareForLTO;		bool PrepareForLTO;
bool PrepareForThinLTO;		bool PrepareForThinLTO;
bool PerformThinLTO;		bool PerformThinLTO;
		bool PerformLTO;

/// Enable profile instrumentation pass.		/// Enable profile instrumentation pass.
bool EnablePGOInstrGen;		bool EnablePGOInstrGen;
/// Profile data file name that the instrumentation will be written to.		/// Profile data file name that the instrumentation will be written to.
std::string PGOInstrGen;		std::string PGOInstrGen;
/// Path of the profile data file.		/// Path of the profile data file.
std::string PGOInstrUse;		std::string PGOInstrUse;
/// Path of the sample Profile data file.		/// Path of the sample Profile data file.
Show All 11 Lines	public:
/// optimisations to run automatically.		/// optimisations to run automatically.
static void addGlobalExtension(ExtensionPointTy Ty, ExtensionFn Fn);		static void addGlobalExtension(ExtensionPointTy Ty, ExtensionFn Fn);
void addExtension(ExtensionPointTy Ty, ExtensionFn Fn);		void addExtension(ExtensionPointTy Ty, ExtensionFn Fn);

private:		private:
void addExtensionsToPM(ExtensionPointTy ETy,		void addExtensionsToPM(ExtensionPointTy ETy,
legacy::PassManagerBase &PM) const;		legacy::PassManagerBase &PM) const;
void addInitialAliasAnalysisPasses(legacy::PassManagerBase &PM) const;		void addInitialAliasAnalysisPasses(legacy::PassManagerBase &PM) const;
void addLTOOptimizationPasses(legacy::PassManagerBase &PM);
void addLateLTOOptimizationPasses(legacy::PassManagerBase &PM);
void addPGOInstrPasses(legacy::PassManagerBase &MPM);		void addPGOInstrPasses(legacy::PassManagerBase &MPM);
void addFunctionSimplificationPasses(legacy::PassManagerBase &MPM);		void addFunctionSimplificationPasses(legacy::PassManagerBase &MPM);
void addInstructionCombiningPass(legacy::PassManagerBase &MPM) const;		void addInstructionCombiningPass(legacy::PassManagerBase &MPM) const;

public:		public:
/// populateFunctionPassManager - This fills in the function pass manager,		/// populateFunctionPassManager - This fills in the function pass manager,
/// which is expected to be run on each function immediately as it is		/// which is expected to be run on each function immediately as it is
/// generated. The idea is to reduce the size of the IR in memory.		/// generated. The idea is to reduce the size of the IR in memory.
void populateFunctionPassManager(legacy::FunctionPassManager &FPM);		void populateFunctionPassManager(legacy::FunctionPassManager &FPM);

/// populateModulePassManager - This sets up the primary pass manager.		/// populateModulePassManager - This sets up the primary pass manager.
void populateModulePassManager(legacy::PassManagerBase &MPM);		void populateModulePassManager(legacy::PassManagerBase &MPM);
void populateLTOPassManager(legacy::PassManagerBase &PM);		void populateLTOPassManager(legacy::PassManagerBase &PM,
		bool IsThinLTO = false);
void populateThinLTOPassManager(legacy::PassManagerBase &PM);		void populateThinLTOPassManager(legacy::PassManagerBase &PM);
};		};

/// Registers a function for adding a standard set of passes. This should be		/// Registers a function for adding a standard set of passes. This should be
/// used by optimizer plugins to allow all front ends to transparently use		/// used by optimizer plugins to allow all front ends to transparently use
/// them. Create a static instance of this class in your plugin, providing a		/// them. Create a static instance of this class in your plugin, providing a
/// private function that the PassManagerBuilder can use to add your passes.		/// private function that the PassManagerBuilder can use to add your passes.
struct RegisterStandardPasses {		struct RegisterStandardPasses {
RegisterStandardPasses(PassManagerBuilder::ExtensionPointTy Ty,		RegisterStandardPasses(PassManagerBuilder::ExtensionPointTy Ty,
PassManagerBuilder::ExtensionFn Fn) {		PassManagerBuilder::ExtensionFn Fn) {
PassManagerBuilder::addGlobalExtension(Ty, std::move(Fn));		PassManagerBuilder::addGlobalExtension(Ty, std::move(Fn));
}		}
};		};

} // end namespace llvm		} // end namespace llvm
#endif		#endif

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 438 Lines • ▼ Show 20 Lines	if (!DisableUnitAtATime) {

addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);		addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);

MPM.add(createIPSCCPPass()); // IP SCCP		MPM.add(createIPSCCPPass()); // IP SCCP
MPM.add(createGlobalOptimizerPass()); // Optimize out global vars		MPM.add(createGlobalOptimizerPass()); // Optimize out global vars
// Promote any localized global vars.		// Promote any localized global vars.
MPM.add(createPromoteMemoryToRegisterPass());		MPM.add(createPromoteMemoryToRegisterPass());

		// Linking modules together can lead to duplicated global constants, only
		// keep one copy of each constant.
		if (PerformLTO)
		MPM.add(createConstantMergePass());

MPM.add(createDeadArgEliminationPass()); // Dead argument elimination		MPM.add(createDeadArgEliminationPass()); // Dead argument elimination

addInstructionCombiningPass(MPM); // Clean up after IPCP & DAE		addInstructionCombiningPass(MPM); // Clean up after IPCP & DAE
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE		MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE
}		}

if (!PerformThinLTO) {		if (!PerformThinLTO) {
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	if (!DisableUnitAtATime && OptLevel > 1 && !PrepareForLTO &&
MPM.add(createEliminateAvailableExternallyPass());		MPM.add(createEliminateAvailableExternallyPass());

if (!DisableUnitAtATime)		if (!DisableUnitAtATime)
MPM.add(createReversePostOrderFunctionAttrsPass());		MPM.add(createReversePostOrderFunctionAttrsPass());

// If we are planning to perform ThinLTO later, let's not bloat the code with		// If we are planning to perform ThinLTO later, let's not bloat the code with
// unrolling/vectorization/... now. We'll first run the inliner + CGSCC passes		// unrolling/vectorization/... now. We'll first run the inliner + CGSCC passes
// during ThinLTO and perform the rest of the optimizations afterward.		// during ThinLTO and perform the rest of the optimizations afterward.
if (PrepareForThinLTO) {		if (PrepareForThinLTO \|\| PrepareForLTO) {
// Reduce the size of the IR as much as possible.		// Reduce the size of the IR as much as possible.
MPM.add(createGlobalOptimizerPass());		MPM.add(createGlobalOptimizerPass());
		if (PrepareForThinLTO)
// Rename anon globals to be able to export them in the summary.		// Rename anon globals to be able to export them in the summary.
MPM.add(createNameAnonGlobalPass());		MPM.add(createNameAnonGlobalPass());
return;		return;
}		}

if (PerformThinLTO)		if (PerformThinLTO)
// Optimize globals now when performing ThinLTO, this enables more		// Optimize globals now when performing ThinLTO, this enables more
// optimizations later.		// optimizations later.
MPM.add(createGlobalOptimizerPass());		MPM.add(createGlobalOptimizerPass());

▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
// LoopSink pass needs to be a very late IR pass to avoid undoing LICM		// LoopSink pass needs to be a very late IR pass to avoid undoing LICM
// result too early.		// result too early.
MPM.add(createLoopSinkPass());		MPM.add(createLoopSinkPass());
// Get rid of LCSSA nodes.		// Get rid of LCSSA nodes.
MPM.add(createInstructionSimplifierPass());		MPM.add(createInstructionSimplifierPass());
addExtensionsToPM(EP_OptimizerLast, MPM);		addExtensionsToPM(EP_OptimizerLast, MPM);
}		}

void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {		void PassManagerBuilder::populateThinLTOPassManager(
// Remove unused virtual tables to improve the quality of code generated by		legacy::PassManagerBase &PM) {
// whole-program devirtualization and bitset lowering.		populateLTOPassManager(PM, true);
PM.add(createGlobalDCEPass());		}

// Provide AliasAnalysis services for optimizations.
addInitialAliasAnalysisPasses(PM);

// Allow forcing function attributes as a debugging and tuning aid.		void PassManagerBuilder::populateLTOPassManager(legacy::PassManagerBase &PM,
PM.add(createForceFunctionAttrsLegacyPass());		bool IsThinLTO) {
		if (IsThinLTO)
		PerformThinLTO = true;
		else
		PerformLTO = true;

// Infer attributes about declarations if possible.		if (VerifyInput)
PM.add(createInferFunctionAttrsLegacyPass());		PM.add(createVerifierPass());

if (OptLevel > 1) {		// Always start with GlobalDCE to prune after internalization.
// Indirect call promotion. This should promote all the targets that are		// FIXME: if the frontend emits optnone at O0, we could always run DCE
// left by the earlier promotion pass that promotes intra-module targets.		// unconditionally.
// This two-step promotion is to save the compile time. For LTO, it should		if (OptLevel != 0) {
// produce the same result as if we only do promotion here.		PM.add(createGlobalDCEPass());
PM.add(createPGOIndirectCallPromotionLegacyPass(true));

// Propagate constants at call sites into the functions they call. This
// opens opportunities for globalopt (and inlining) by substituting function
// pointers passed as arguments to direct uses of functions.
PM.add(createIPSCCPPass());
}

// Infer attributes about definitions. The readnone attribute in particular is
// required for virtual constant propagation.
PM.add(createPostOrderFunctionAttrsLegacyPass());
PM.add(createReversePostOrderFunctionAttrsPass());

		if (PerformLTO) {
// Split globals using inrange annotations on GEP indices. This can help		// Split globals using inrange annotations on GEP indices. This can help
// improve the quality of generated code when virtual constant propagation or		// improve the quality of generated code when virtual constant propagation
		// or
// control flow integrity are enabled.		// control flow integrity are enabled.
PM.add(createGlobalSplitPass());		PM.add(createGlobalSplitPass());

// Apply whole-program devirtualization and virtual constant propagation.		// Apply whole-program devirtualization and virtual constant propagation.
PM.add(createWholeProgramDevirtPass());		PM.add(createWholeProgramDevirtPass());

// That's all we need at opt level 1.
if (OptLevel == 1)
return;

// Now that we internalized some globals, see if we can hack on them!
PM.add(createGlobalOptimizerPass());
// Promote any localized global vars.
PM.add(createPromoteMemoryToRegisterPass());

// Linking modules together can lead to duplicated global constants, only
// keep one copy of each constant.
PM.add(createConstantMergePass());

// Remove unused arguments from functions.
PM.add(createDeadArgEliminationPass());

// Reduce the code after globalopt and ipsccp. Both can open up significant
// simplification opportunities, and both can propagate functions through
// function pointers. When this happens, we often have to resolve varargs
// calls, etc, so let instcombine do this.
addInstructionCombiningPass(PM);
addExtensionsToPM(EP_Peephole, PM);

// Inline small functions
bool RunInliner = Inliner;
if (RunInliner) {
PM.add(Inliner);
Inliner = nullptr;
}

PM.add(createPruneEHPass()); // Remove dead EH info.

// Optimize globals again if we ran the inliner.
if (RunInliner)
PM.add(createGlobalOptimizerPass());
PM.add(createGlobalDCEPass()); // Remove dead functions.

// If we didn't decide to inline a function, check to see if we can
// transform it to pass arguments by value instead of by reference.
PM.add(createArgumentPromotionPass());

// The IPO passes may leave cruft around. Clean up after them.
addInstructionCombiningPass(PM);
addExtensionsToPM(EP_Peephole, PM);
PM.add(createJumpThreadingPass());

// Break up allocas
PM.add(createSROAPass());

// Run a few AA driven optimizations here and now, to cleanup the code.
PM.add(createPostOrderFunctionAttrsLegacyPass()); // Add nocapture.
PM.add(createGlobalsAAWrapperPass()); // IP alias analysis.

PM.add(createLICMPass()); // Hoist loop invariants.
PM.add(createMergedLoadStoreMotionPass()); // Merge ld/st in diamonds.
PM.add(NewGVN ? createNewGVNPass()
: createGVNPass(DisableGVNLoadPRE)); // Remove redundancies.
PM.add(createMemCpyOptPass()); // Remove dead memcpys.

// Nuke dead stores.
PM.add(createDeadStoreEliminationPass());

// More loops are countable; try to optimize them.
PM.add(createIndVarSimplifyPass());
PM.add(createLoopDeletionPass());
if (EnableLoopInterchange)
PM.add(createLoopInterchangePass());

if (!DisableUnrollLoops)
PM.add(createSimpleLoopUnrollPass()); // Unroll small loops
PM.add(createLoopVectorizePass(true, LoopVectorize));
// The vectorizer may have significantly shortened a loop body; unroll again.
if (!DisableUnrollLoops)
PM.add(createLoopUnrollPass());

// Now that we've optimized loops (in particular loop induction variables),
// we may have exposed more scalar opportunities. Run parts of the scalar
// optimizer again at this point.
addInstructionCombiningPass(PM); // Initial cleanup
PM.add(createCFGSimplificationPass()); // if-convert
PM.add(createSCCPPass()); // Propagate exposed constants
addInstructionCombiningPass(PM); // Clean up again
PM.add(createBitTrackingDCEPass());

// More scalar chains could be vectorized due to more alias information
if (RunSLPAfterLoopVectorization)
if (SLPVectorize)
PM.add(createSLPVectorizerPass()); // Vectorize parallel scalar chains.

// After vectorization, assume intrinsics may tell us more about pointer
// alignments.
PM.add(createAlignmentFromAssumptionsPass());

if (LoadCombine)
PM.add(createLoadCombinePass());

// Cleanup and simplify the code after the scalar optimizations.
addInstructionCombiningPass(PM);
addExtensionsToPM(EP_Peephole, PM);

PM.add(createJumpThreadingPass());
}		}

void PassManagerBuilder::addLateLTOOptimizationPasses(		// Schedule the optimizations now.
legacy::PassManagerBase &PM) {		populateModulePassManager(PM);
// Delete basic blocks, which optimization passes may have killed.
PM.add(createCFGSimplificationPass());

// Drop bodies of available externally objects to improve GlobalDCE.
PM.add(createEliminateAvailableExternallyPass());

// Now that we have optimized the program, discard unreachable functions.
PM.add(createGlobalDCEPass());

// FIXME: this is profitable (for compiler time) to do at -O0 too, but
// currently it damages debug info.
if (MergeFunctions)
PM.add(createMergeFunctionsPass());
}		}

void PassManagerBuilder::populateThinLTOPassManager(		if (PerformThinLTO && Summary)
legacy::PassManagerBase &PM) {
PerformThinLTO = true;

if (VerifyInput)
PM.add(createVerifierPass());

if (Summary)
PM.add(		PM.add(
createLowerTypeTestsPass(LowerTypeTestsSummaryAction::Import, Summary));		createLowerTypeTestsPass(LowerTypeTestsSummaryAction::Import, Summary));

populateModulePassManager(PM);		if (PerformLTO) {
		// Create a function that performs CFI checks for cross-DSO calls with
if (VerifyOutput)		// targets in the current module.
PM.add(createVerifierPass());
PerformThinLTO = false;
}

void PassManagerBuilder::populateLTOPassManager(legacy::PassManagerBase &PM) {
if (LibraryInfo)
PM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));

if (VerifyInput)
PM.add(createVerifierPass());

if (OptLevel != 0)
addLTOOptimizationPasses(PM);

// Create a function that performs CFI checks for cross-DSO calls with targets
// in the current module.
PM.add(createCrossDSOCFIPass());		PM.add(createCrossDSOCFIPass());

// Lower type metadata and the type.test intrinsic. This pass supports Clang's		// Lower type metadata and the type.test intrinsic. This pass supports
// control flow integrity mechanisms (-fsanitize=cfi*) and needs to run at		// Clang's control flow integrity mechanisms (-fsanitize=cfi*) and needs to
// link time if CFI is enabled. The pass does nothing if CFI is disabled.		// run at link time if CFI is enabled. The pass does nothing if CFI is
PM.add(createLowerTypeTestsPass(Summary ? LowerTypeTestsSummaryAction::Export		// disabled.
		PM.add(createLowerTypeTestsPass(Summary
		? LowerTypeTestsSummaryAction::Export
: LowerTypeTestsSummaryAction::None,		: LowerTypeTestsSummaryAction::None,
Summary));		Summary));
		}
if (OptLevel != 0)
addLateLTOOptimizationPasses(PM);

if (VerifyOutput)		if (VerifyOutput)
PM.add(createVerifierPass());		PM.add(createVerifierPass());

		PerformThinLTO = false;
		PerformLTO = false;
}		}

inline PassManagerBuilder *unwrap(LLVMPassManagerBuilderRef P) {		inline PassManagerBuilder *unwrap(LLVMPassManagerBuilderRef P) {
return reinterpret_cast<PassManagerBuilder*>(P);		return reinterpret_cast<PassManagerBuilder*>(P);
}		}

inline LLVMPassManagerBuilderRef wrap(PassManagerBuilder *P) {		inline LLVMPassManagerBuilderRef wrap(PassManagerBuilder *P) {
return reinterpret_cast<LLVMPassManagerBuilderRef>(P);		return reinterpret_cast<LLVMPassManagerBuilderRef>(P);
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

LTO: align the Monolithic LTO optimization pipeline on the ThinLTO and O2/O3 oneNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 86756

llvm/include/llvm/Transforms/IPO/PassManagerBuilder.h

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp

LTO: align the Monolithic LTO optimization pipeline on the ThinLTO and O2/O3 one
Needs RevisionPublic