This is an archive of the discontinued LLVM Phabricator instance.

Differential D17115

Define the ThinLTO Pipeline
ClosedPublic

Authored by mehdi_amini on Feb 10 2016, 4:59 PM.

Download Raw Diff

Details

Reviewers

tejohnson

Commits

rG1db10ac6ce4d: Define the ThinLTO Pipeline (experimental)
rG484470d605ec: Define the ThinLTO Pipeline

Summary

On the contrary to Full LTO, ThinLTO can afford to shift compile time
from the frontend to the linker: both phases are parallel.
This pipeline is based on the proposal in D13443 for full LTO. We ]
didn't move forward on this proposal because the link was far too long
after that.

This patch refactor the "function simplification" passes that are part
of the inliner loop in a helper function (this part is NFC and can be
commited separately to simplify the diff). The ThinLTO pipeline
integrates in the regular O2/O3 flow:

The compile phase perform the inliner with a somehow lighter function simplification. (TODO: tune the inliner thresholds here) This is intendend to simplify the IR and get rid of obvious things like linkonce_odr that will be inlined.
The link phase will run the pipeline from the start, extended with some specific passes that leverage the augmented knowledge we have during LTO. Especially after the inliner is done, a sequence of globalDCE/globalOpt is performed, followed by another run of the "function simplification" passes.

The measurements on the public test suite as well as on our internal
suite show an overall net improvement. The binary size for the clang
executable is reduced by 5%. We're still tuning it with the bringup
of ThinLTO but this should provide a good starting point.

Diff Detail

Event Timeline

mehdi_amini updated this revision to Diff 47566.Feb 10 2016, 4:59 PM

mehdi_amini retitled this revision from to Define the ThinLTO Pipeline.

mehdi_amini updated this object.

mehdi_amini added a reviewer: tejohnson.

mehdi_amini added subscribers: dexonsmith, llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptFeb 10 2016, 4:59 PM

mgrang added a subscriber: mgrang.Feb 10 2016, 5:05 PM

mgrang added inline comments.

lib/Transforms/IPO/PassManagerBuilder.cpp
362	Period required at the end of comment.
429	Period required at the end of comment.
432	Period required at the end of comment.

Upload a new diff rebased on top of the NFC changes to reduce the diff.

mehdi_amini added inline comments.Feb 10 2016, 5:13 PM

lib/Transforms/IPO/PassManagerBuilder.cpp
354–356	I can fix these comments as part of the NFC part of the changes (I updated the diff to rebase on the existing code after refactoring). (I plan to commit separately and the comments are like that in trunk right now)

davidxl added a subscriber: davidxl.Feb 10 2016, 10:40 PM

davidxl added a subscriber: xur.Feb 10 2016, 11:05 PM

davidxl added inline comments.Feb 10 2016, 11:09 PM

lib/Transforms/IPO/PassManagerBuilder.cpp
366–369	This needs to be run in PrepareForThinLTO. During PerformThinLTO, only cross module indirect call promotion transformation needs to be done here.

tejohnson added inline comments.Feb 11 2016, 8:55 AM

lib/Transforms/IPO/PassManagerBuilder.cpp
366–369	Right, this should be guarded by (!PerformThinLTO). I think the IC promotion pass is not yet committed. But eventually we will add another round of it just after FunctionImport.
393	typo: s/performs/perform/
710	Don't we also need to do some of the LTO passes here? E.g. FunctionImporting is added in addLTOOptimizationPasses.

mehdi_amini added inline comments.Feb 11 2016, 9:04 AM

lib/Transforms/IPO/PassManagerBuilder.cpp
366–369	Good point, I didn't pay much attention to PGO, I'll update.
710	Yes we can add it the same way it is done in the LTO pipeline, i.e. guarded by the presence of the FunctionIndex. Right now I rather have the logic in the linker plugin and separate the import from the optimization. This is also required for the incremental scheme (see the beginning of ProcessThinLTOModule () in D17066).

davidxl mentioned this in D17108: [PGO] Add another interface for annotateValueSite.Feb 11 2016, 9:48 PM

Should take into account all the comments

tejohnson added inline comments.Feb 12 2016, 2:14 PM

lib/Transforms/IPO/PassManagerBuilder.cpp
710	Ok, for now I suppose you can just set the FunctionIndex to null before invoking this. I haven't done a full comparison between the LTO pipeline and the new ThinLTO pipeline here, is there anything else done in the LTO pipeline that is worth adding to this, or is it all covered by the module passes? Looks like it from my quick scan but I didn't compare extensively... Also, after this goes in clang should be changed to invoke this instead of populateLTOPassManager in EmitAssemblyHelper::CreatePasses when we have a function index. And I can change my gold threads patch for ThinLTO (D15390) to do the same for ThinLTO compiles.

mehdi_amini added inline comments.Feb 14 2016, 9:12 PM

lib/Transforms/IPO/PassManagerBuilder.cpp
710	I tend to be data driven, so we're looking at benchmarks and tracking issues right now, but this is a good starting point.

mehdi_amini mentioned this in D17272: Teach clang to use the ThinLTO pipeline.Feb 15 2016, 11:14 AM

LGTM

This revision is now accepted and ready to land.Feb 16 2016, 9:09 AM

r261029

Diffusion mentioned this in rL261045: Teach clang to use the ThinLTO pipeline.Feb 16 2016, 4:46 PM

tejohnson mentioned this in rL269386: [ThinLTO] Use correct pipeline for ThinLTO in gold-plugin..May 12 2016, 6:31 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

IPO/

PassManagerBuilder.h

3 lines

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

46 lines

Diff 47817

include/llvm/Transforms/IPO/PassManagerBuilder.h

Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	public:
bool LoopVectorize;		bool LoopVectorize;
bool RerollLoops;		bool RerollLoops;
bool LoadCombine;		bool LoadCombine;
bool DisableGVNLoadPRE;		bool DisableGVNLoadPRE;
bool VerifyInput;		bool VerifyInput;
bool VerifyOutput;		bool VerifyOutput;
bool MergeFunctions;		bool MergeFunctions;
bool PrepareForLTO;		bool PrepareForLTO;
		bool PrepareForThinLTO;
		bool PerformThinLTO;

/// Profile data file name that the instrumentation will be written to.		/// Profile data file name that the instrumentation will be written to.
std::string PGOInstrGen;		std::string PGOInstrGen;
/// Path of the profile data file.		/// Path of the profile data file.
std::string PGOInstrUse;		std::string PGOInstrUse;

private:		private:
/// ExtensionList - This is list of all of the extensions that are registered.		/// ExtensionList - This is list of all of the extensions that are registered.
Show All 22 Lines	public:
/// populateFunctionPassManager - This fills in the function pass manager,		/// populateFunctionPassManager - This fills in the function pass manager,
/// which is expected to be run on each function immediately as it is		/// which is expected to be run on each function immediately as it is
/// generated. The idea is to reduce the size of the IR in memory.		/// generated. The idea is to reduce the size of the IR in memory.
void populateFunctionPassManager(legacy::FunctionPassManager &FPM);		void populateFunctionPassManager(legacy::FunctionPassManager &FPM);

/// populateModulePassManager - This sets up the primary pass manager.		/// populateModulePassManager - This sets up the primary pass manager.
void populateModulePassManager(legacy::PassManagerBase &MPM);		void populateModulePassManager(legacy::PassManagerBase &MPM);
void populateLTOPassManager(legacy::PassManagerBase &PM);		void populateLTOPassManager(legacy::PassManagerBase &PM);
		void populateThinLTOPassManager(legacy::PassManagerBase &PM);
};		};

/// Registers a function for adding a standard set of passes. This should be		/// Registers a function for adding a standard set of passes. This should be
/// used by optimizer plugins to allow all front ends to transparently use		/// used by optimizer plugins to allow all front ends to transparently use
/// them. Create a static instance of this class in your plugin, providing a		/// them. Create a static instance of this class in your plugin, providing a
/// private function that the PassManagerBuilder can use to add your passes.		/// private function that the PassManagerBuilder can use to add your passes.
struct RegisterStandardPasses {		struct RegisterStandardPasses {
RegisterStandardPasses(PassManagerBuilder::ExtensionPointTy Ty,		RegisterStandardPasses(PassManagerBuilder::ExtensionPointTy Ty,
PassManagerBuilder::ExtensionFn Fn) {		PassManagerBuilder::ExtensionFn Fn) {
PassManagerBuilder::addGlobalExtension(Ty, Fn);		PassManagerBuilder::addGlobalExtension(Ty, Fn);
}		}
};		};

} // end namespace llvm		} // end namespace llvm
#endif		#endif

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	PassManagerBuilder::PassManagerBuilder() {
LoadCombine = RunLoadCombine;		LoadCombine = RunLoadCombine;
DisableGVNLoadPRE = false;		DisableGVNLoadPRE = false;
VerifyInput = false;		VerifyInput = false;
VerifyOutput = false;		VerifyOutput = false;
MergeFunctions = false;		MergeFunctions = false;
PrepareForLTO = false;		PrepareForLTO = false;
PGOInstrGen = RunPGOInstrGen;		PGOInstrGen = RunPGOInstrGen;
PGOInstrUse = RunPGOInstrUse;		PGOInstrUse = RunPGOInstrUse;
		PrepareForThinLTO = false;
		PerformThinLTO = false;
}		}

PassManagerBuilder::~PassManagerBuilder() {		PassManagerBuilder::~PassManagerBuilder() {
delete LibraryInfo;		delete LibraryInfo;
delete Inliner;		delete Inliner;
}		}

/// Set of global extensions, automatically added as part of the standard set.		/// Set of global extensions, automatically added as part of the standard set.
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addFunctionSimplificationPasses(
MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals		MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
MPM.add(createCFGSimplificationPass()); // Merge & remove BBs		MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
MPM.add(createInstructionCombiningPass()); // Combine silly seq's		MPM.add(createInstructionCombiningPass()); // Combine silly seq's
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);

MPM.add(createTailCallEliminationPass()); // Eliminate tail calls		MPM.add(createTailCallEliminationPass()); // Eliminate tail calls
MPM.add(createCFGSimplificationPass()); // Merge & remove BBs		MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
MPM.add(createReassociatePass()); // Reassociate expressions		MPM.add(createReassociatePass()); // Reassociate expressions
		if (PrepareForThinLTO) {
		MPM.add(createAggressiveDCEPass()); // Delete dead instructions
		MPM.add(createInstructionCombiningPass()); // Combine silly seq's
		return;
		}
// Rotate Loop - disable header duplication at -Oz		// Rotate Loop - disable header duplication at -Oz
MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));		MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
MPM.add(createLICMPass()); // Hoist loop invariants		MPM.add(createLICMPass()); // Hoist loop invariants
MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3));		MPM.add(createLoopUnswitchPass(SizeLevel \|\| OptLevel < 3));
MPM.add(createCFGSimplificationPass());		MPM.add(createCFGSimplificationPass());
MPM.add(createInstructionCombiningPass());		MPM.add(createInstructionCombiningPass());
MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars		MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.		MPM.add(createLoopIdiomPass()); // Recognize idioms like memset.
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
addInitialAliasAnalysisPasses(MPM);		addInitialAliasAnalysisPasses(MPM);

if (!DisableUnitAtATime) {		if (!DisableUnitAtATime) {
// Infer attributes about declarations if possible.		// Infer attributes about declarations if possible.
MPM.add(createInferFunctionAttrsLegacyPass());		MPM.add(createInferFunctionAttrsLegacyPass());

addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);		addExtensionsToPM(EP_ModuleOptimizerEarly, MPM);

MPM.add(createIPSCCPPass()); // IP SCCP		MPM.add(createIPSCCPPass()); // IP SCCP
MPM.add(createGlobalOptimizerPass()); // Optimize out global vars		MPM.add(createGlobalOptimizerPass()); // Optimize out global vars
// Promote any localized global vars.		// Promote any localized global vars.
		mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions I can fix these comments as part of the NFC part of the changes (I updated the diff to rebase on the existing code after refactoring). (I plan to commit separately and the comments are like that in trunk right now) mehdi_amini: I can fix these comments as part of the NFC part of the changes (I updated the diff to rebase…
MPM.add(createPromoteMemoryToRegisterPass());		MPM.add(createPromoteMemoryToRegisterPass());

MPM.add(createDeadArgEliminationPass()); // Dead argument elimination		MPM.add(createDeadArgEliminationPass()); // Dead argument elimination

MPM.add(createInstructionCombiningPass()); // Clean up after IPCP & DAE		MPM.add(createInstructionCombiningPass()); // Clean up after IPCP & DAE
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);
		mgrangUnsubmitted Done Reply Inline Actions Period required at the end of comment. mgrang: Period required at the end of comment.
MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE		MPM.add(createCFGSimplificationPass()); // Clean up after IPCP & DAE
}		}

		if (!PerformThinLTO)
		/// PGO instrumentation is added during the compile phase for ThinLTO, do
		/// not run it a second time
addPGOInstrPasses(MPM);		addPGOInstrPasses(MPM);
		davidxlUnsubmitted Done Reply Inline Actions This needs to be run in PrepareForThinLTO. During PerformThinLTO, only cross module indirect call promotion transformation needs to be done here. davidxl: This needs to be run in PrepareForThinLTO. During PerformThinLTO, only cross module indirect…
		tejohnsonUnsubmitted Done Reply Inline Actions Right, this should be guarded by (!PerformThinLTO). I think the IC promotion pass is not yet committed. But eventually we will add another round of it just after FunctionImport. tejohnson: Right, this should be guarded by (!PerformThinLTO). I think the IC promotion pass is not yet…
		mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Good point, I didn't pay much attention to PGO, I'll update. mehdi_amini: Good point, I didn't pay much attention to PGO, I'll update.

if (EnableNonLTOGlobalsModRef)		if (EnableNonLTOGlobalsModRef)
// We add a module alias analysis pass here. In part due to bugs in the		// We add a module alias analysis pass here. In part due to bugs in the
// analysis infrastructure this "works" in that the analysis stays alive		// analysis infrastructure this "works" in that the analysis stays alive
// for the entire SCC pass run below.		// for the entire SCC pass run below.
MPM.add(createGlobalsAAWrapperPass());		MPM.add(createGlobalsAAWrapperPass());

// Start of CallGraph SCC passes.		// Start of CallGraph SCC passes.
if (!DisableUnitAtATime)		if (!DisableUnitAtATime)
MPM.add(createPruneEHPass()); // Remove dead EH info		MPM.add(createPruneEHPass()); // Remove dead EH info
if (Inliner) {		if (Inliner) {
MPM.add(Inliner);		MPM.add(Inliner);
Inliner = nullptr;		Inliner = nullptr;
}		}
if (!DisableUnitAtATime)		if (!DisableUnitAtATime)
MPM.add(createPostOrderFunctionAttrsPass());		MPM.add(createPostOrderFunctionAttrsPass());
if (OptLevel > 2)		if (OptLevel > 2)
MPM.add(createArgumentPromotionPass()); // Scalarize uninlined fn args		MPM.add(createArgumentPromotionPass()); // Scalarize uninlined fn args

addFunctionSimplificationPasses(MPM);		addFunctionSimplificationPasses(MPM);

		// If we are planning to perform ThinLTO later, let's not bloat the code with
		// unrolling/vectorization/... now. We'll first run the inliner + CGSCC passes
		// during ThinLTO and perform the rest of the optimizations afterward.
		tejohnsonUnsubmitted Done Reply Inline Actions typo: s/performs/perform/ tejohnson: typo: s/performs/perform/
		if (PrepareForThinLTO)
		return;

// FIXME: This is a HACK! The inliner pass above implicitly creates a CGSCC		// FIXME: This is a HACK! The inliner pass above implicitly creates a CGSCC
// pass manager that we are specifically trying to avoid. To prevent this		// pass manager that we are specifically trying to avoid. To prevent this
// we must insert a no-op module pass to reset the pass manager.		// we must insert a no-op module pass to reset the pass manager.
MPM.add(createBarrierNoopPass());		MPM.add(createBarrierNoopPass());

// Scheduling LoopVersioningLICM when inlining is over, because after that		// Scheduling LoopVersioningLICM when inlining is over, because after that
// we may see more accurate aliasing. Reason to run this late is that too		// we may see more accurate aliasing. Reason to run this late is that too
// early versioning may prevent further inlining due to increase of code		// early versioning may prevent further inlining due to increase of code
// size. By placing it just after inlining other optimizations which runs		// size. By placing it just after inlining other optimizations which runs
// later might get benefit of no-alias assumption in clone loop.		// later might get benefit of no-alias assumption in clone loop.
if (UseLoopVersioningLICM) {		if (UseLoopVersioningLICM) {
MPM.add(createLoopVersioningLICMPass()); // Do LoopVersioningLICM		MPM.add(createLoopVersioningLICMPass()); // Do LoopVersioningLICM
MPM.add(createLICMPass()); // Hoist loop invariants		MPM.add(createLICMPass()); // Hoist loop invariants
}		}

if (!DisableUnitAtATime)		if (!DisableUnitAtATime)
MPM.add(createReversePostOrderFunctionAttrsPass());		MPM.add(createReversePostOrderFunctionAttrsPass());

if (!DisableUnitAtATime && OptLevel > 1 && !PrepareForLTO) {		if (!DisableUnitAtATime && OptLevel > 1 && !PrepareForLTO)
// Remove avail extern fns and globals definitions if we aren't		// Remove avail extern fns and globals definitions if we aren't
// compiling an object file for later LTO. For LTO we want to preserve		// compiling an object file for later LTO. For LTO we want to preserve
// these so they are eligible for inlining at link-time. Note if they		// these so they are eligible for inlining at link-time. Note if they
// are unreferenced they will be removed by GlobalDCE later, so		// are unreferenced they will be removed by GlobalDCE later, so
// this only impacts referenced available externally globals.		// this only impacts referenced available externally globals.
// Eventually they will be suppressed during codegen, but eliminating		// Eventually they will be suppressed during codegen, but eliminating
// here enables more opportunity for GlobalDCE as it may make		// here enables more opportunity for GlobalDCE as it may make
// globals referenced by available external functions dead		// globals referenced by available external functions dead
// and saves running remaining passes on the eliminated functions.		// and saves running remaining passes on the eliminated functions.
MPM.add(createEliminateAvailableExternallyPass());		MPM.add(createEliminateAvailableExternallyPass());

		if (PerformThinLTO) {
		// Remove dead fns and globals. Removing unreferenced functions could lead
		// to more opportunities for globalopt.
		mgrangUnsubmitted Done Reply Inline Actions Period required at the end of comment. mgrang: Period required at the end of comment.
		MPM.add(createGlobalDCEPass());
		MPM.add(createGlobalOptimizerPass());
		// Remove dead fns and globals after globalopt.
		mgrangUnsubmitted Done Reply Inline Actions Period required at the end of comment. mgrang: Period required at the end of comment.
		MPM.add(createGlobalDCEPass());
		addFunctionSimplificationPasses(MPM);
}		}

if (EnableNonLTOGlobalsModRef)		if (EnableNonLTOGlobalsModRef)
// We add a fresh GlobalsModRef run at this point. This is particularly		// We add a fresh GlobalsModRef run at this point. This is particularly
// useful as the above will have inlined, DCE'ed, and function-attr		// useful as the above will have inlined, DCE'ed, and function-attr
// propagated everything. We should at this point have a reasonably minimal		// propagated everything. We should at this point have a reasonably minimal
// and richly annotated call graph. By computing aliasing and mod/ref		// and richly annotated call graph. By computing aliasing and mod/ref
// information for all local globals here, the late loop passes and notably		// information for all local globals here, the late loop passes and notably
▲ Show 20 Lines • Show All 259 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLateLTOOptimizationPasses(
PM.add(createGlobalDCEPass());		PM.add(createGlobalDCEPass());

// FIXME: this is profitable (for compiler time) to do at -O0 too, but		// FIXME: this is profitable (for compiler time) to do at -O0 too, but
// currently it damages debug info.		// currently it damages debug info.
if (MergeFunctions)		if (MergeFunctions)
PM.add(createMergeFunctionsPass());		PM.add(createMergeFunctionsPass());
}		}

		void PassManagerBuilder::populateThinLTOPassManager(
		tejohnsonUnsubmitted Done Reply Inline Actions Don't we also need to do some of the LTO passes here? E.g. FunctionImporting is added in addLTOOptimizationPasses. tejohnson: Don't we also need to do some of the LTO passes here? E.g. FunctionImporting is added in…
		mehdi_aminiAuthorUnsubmitted Done Reply Inline Actions Yes we can add it the same way it is done in the LTO pipeline, i.e. guarded by the presence of the FunctionIndex. Right now I rather have the logic in the linker plugin and separate the import from the optimization. This is also required for the incremental scheme (see the beginning of ProcessThinLTOModule () in D17066). mehdi_amini: Yes we can add it the same way it is done in the LTO pipeline, i.e. guarded by the presence of…
		tejohnsonUnsubmitted Not Done Reply Inline Actions Ok, for now I suppose you can just set the FunctionIndex to null before invoking this. I haven't done a full comparison between the LTO pipeline and the new ThinLTO pipeline here, is there anything else done in the LTO pipeline that is worth adding to this, or is it all covered by the module passes? Looks like it from my quick scan but I didn't compare extensively... Also, after this goes in clang should be changed to invoke this instead of populateLTOPassManager in EmitAssemblyHelper::CreatePasses when we have a function index. And I can change my gold threads patch for ThinLTO (D15390) to do the same for ThinLTO compiles. tejohnson: Ok, for now I suppose you can just set the FunctionIndex to null before invoking this. I…
		mehdi_aminiAuthorUnsubmitted Not Done Reply Inline Actions I tend to be data driven, so we're looking at benchmarks and tracking issues right now, but this is a good starting point. mehdi_amini: I tend to be data driven, so we're looking at benchmarks and tracking issues right now, but…
		legacy::PassManagerBase &PM) {
		PerformThinLTO = true;

		if (VerifyInput)
		PM.add(createVerifierPass());

		if (FunctionIndex)
		PM.add(createFunctionImportPass(FunctionIndex));

		populateModulePassManager(PM);

		if (VerifyOutput)
		PM.add(createVerifierPass());
		PerformThinLTO = false;
		}

void PassManagerBuilder::populateLTOPassManager(legacy::PassManagerBase &PM) {		void PassManagerBuilder::populateLTOPassManager(legacy::PassManagerBase &PM) {
if (LibraryInfo)		if (LibraryInfo)
PM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));		PM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));

if (VerifyInput)		if (VerifyInput)
PM.add(createVerifierPass());		PM.add(createVerifierPass());

if (OptLevel != 0)		if (OptLevel != 0)
▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines