This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/LTO/
-
llvm/
-
LTO/
-
Config.h
-
lib/LTO/
-
LTO/
3
LTO.cpp
-
LTOBackend.cpp
-
test/tools/gold/X86/
-
tools/
-
gold/
-
X86/
-
Inputs/
-
afdo.prof
-
thinlto_afdo.ll
-
tools/gold/
-
gold/
-
gold-plugin.cpp

Differential D27790

Pass sample pgo flags to thinlto.
ClosedPublic

Authored by danielcdh on Dec 14 2016, 6:00 PM.

Download Raw Diff

Details

Reviewers

tejohnson
davidxl
mehdi_amini

Commits

rG279780059557: Pass sample pgo flags to thinlto.
rL289957: Pass sample pgo flags to thinlto.

Summary

ThinLTO needs to invoke SampleProfileLoader pass during link time in order to annotate profile correctly after module importing.

Diff Detail

Build Status

Buildable 2204
Build 2204: arc lint + arc unit

Event Timeline

danielcdh updated this revision to Diff 81509.Dec 14 2016, 6:00 PM

danielcdh retitled this revision from to Pass sample pgo flags to thinlto..

danielcdh updated this object.

danielcdh added reviewers: tejohnson, davidxl.

danielcdh added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptDec 14 2016, 6:00 PM

Do we need a similar change for lld?

Do we need to add the sample profile to the hash in computeCacheKey?

In D27790#623176, @pcc wrote:

Do we need to add the sample profile to the hash in computeCacheKey?

I'm concerned about this indeed.

Why is this SampleProfileLoaderPass not performed during the compile phase?

In D27790#623195, @mehdi_amini wrote:

In D27790#623176, @pcc wrote:

Do we need to add the sample profile to the hash in computeCacheKey?

I was planning to have it in a separate patch as it needs to hash the profile content too. Let me know if you think it should be part of this patch.

I'm concerned about this indeed.

Why is this SampleProfileLoaderPass not performed during the compile phase?

It is invoked in compiler phase. But if the profile is collected from ThinLTO binary, i.e. there is cross-module inlines in the profiling binary (thus in profile), the profile annotation needs to happen after all these inlines happened. i.e. annotation needs to be invoked once again in the linking phase.

Thanks,
Dehao

Ouch, I see: with sample-based profiling you can't change the callgraph after the instrumentation point to have the correct information, correct? Is changing the CFG OK though? (If you have a pointer about sample-based profiling where I can find the answer to these subtle points).

Hashing the full content of the profiles would break the "fine" grain of incremental build we have now. So I'd be cautious about that, even if this is likely a marginal use-case.

Considering sample pgo profile as a weighted-forest of inline stacks, each root corresponds to a symbol in the profile binary. The job of profile preparation is to expand the IR to resemble the hot part of the forest as in profile. More details about the sample pgo can be found in http://dl.acm.org/citation.cfm?id=2854044

At ThinLTO compile step, not all nodes is expandable as they could come from other modules. That's why we need to have SampleProfileLoader invoked after all nodes are visible.

In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?

In D27790#623209, @danielcdh wrote:

Considering sample pgo profile as a weighted-forest of inline stacks, each root corresponds to a symbol in the profile binary. The job of profile preparation is to expand the IR to resemble the hot part of the forest as in profile. More details about the sample pgo can be found in http://dl.acm.org/citation.cfm?id=2854044

At ThinLTO compile step, not all nodes is expandable as they could come from other modules. That's why we need to have SampleProfileLoader invoked after all nodes are visible.

I have to read the citation to understand :)

In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?

I don't think so: for instrumentation based PGO is applied during the compile phase (there is certainly a key difference that does not make it possible with the sample based one).

In D27790#623219, @mehdi_amini wrote:

In D27790#623209, @danielcdh wrote:

Considering sample pgo profile as a weighted-forest of inline stacks, each root corresponds to a symbol in the profile binary. The job of profile preparation is to expand the IR to resemble the hot part of the forest as in profile. More details about the sample pgo can be found in http://dl.acm.org/citation.cfm?id=2854044

At ThinLTO compile step, not all nodes is expandable as they could come from other modules. That's why we need to have SampleProfileLoader invoked after all nodes are visible.

I have to read the citation to understand :)

In Sample PGO, the change of profile is far less frequent than source code. But once the profile is changed, the incremental build will be broken: everything needs to be built from scratch. I guess the same applies to instrumentation based pgo?

I don't think so: for instrumentation based PGO is applied during the compile phase (there is certainly a key difference that does not make it possible with the sample based one).

Yes, it's in compile phase, but if the profile changes, the profile summary is very likely to change, which makes it necessary to recompile all functions unless the function has no profile at all. Or am I missing something?

In D27790#623220, @danielcdh wrote:

Yes, it's in compile phase, but if the profile changes, the profile summary is very likely to change, which makes it necessary to recompile all functions unless the function has no profile at all. Or am I missing something?

Oh right, I missed your point, this seems valid to me!

The change looks ok to me, but I'm concerned about having the caching broken with ThinLTO + sample PGO. At least let's get that part reviewed, so these patches can go in back to back.

Add hashing of sample profile content.

Looks pretty easy to add the hashing, so I included it in this patch.

LGTM.

This revision is now accepted and ready to land.Dec 15 2016, 4:15 PM

tejohnson added inline comments.Dec 15 2016, 4:24 PM

lib/LTO/LTO.cpp
135	Will this be expensive, for large apps with large profiles? Since this is the same for each backend, can we compute once and pass down?

mehdi_amini added inline comments.Dec 15 2016, 4:29 PM

lib/LTO/LTO.cpp
135	If we go this route (and we should), then I rather do it as a separate patch and do as I suggested originally when the config was added there: have a `hash` method on the config itself where we would hash early the whole config part and do it once early.

tejohnson accepted this revision.Dec 15 2016, 4:35 PM

tejohnson edited edge metadata.

tejohnson added inline comments.

lib/LTO/LTO.cpp
135	Agreed, all the module-independent stuff should be hashed early and passed down. I suppose this is the best approach for now - maybe slow, but safe and enables this change to go in. We can pull it out as part of the config early hashing implementation in a separate patch.

danielcdh closed this revision.Dec 16 2016, 8:59 AM

Revision Contents

Path

Size

include/

llvm/

LTO/

Config.h

3 lines

lib/

LTO/

LTO.cpp

6 lines

LTOBackend.cpp

1 line

test/

tools/

gold/

X86/

Inputs/

afdo.prof

2 lines

thinlto_afdo.ll

25 lines

tools/

gold/

gold-plugin.cpp

7 lines

Diff 81680

include/llvm/LTO/Config.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	struct Config {
/// Setting this field will replace target triples in input files with this		/// Setting this field will replace target triples in input files with this
/// triple.		/// triple.
std::string OverrideTriple;		std::string OverrideTriple;

/// Setting this field will replace unspecified target triples in input files		/// Setting this field will replace unspecified target triples in input files
/// with this triple.		/// with this triple.
std::string DefaultTriple;		std::string DefaultTriple;

		/// Sample PGO profile path.
		std::string SampleProfile;

bool ShouldDiscardValueNames = true;		bool ShouldDiscardValueNames = true;
DiagnosticHandlerFunction DiagHandler;		DiagnosticHandlerFunction DiagHandler;

/// If this field is set, LTO will write input file paths and symbol		/// If this field is set, LTO will write input file paths and symbol
/// resolutions here in llvm-lto2 command line flag format. This can be		/// resolutions here in llvm-lto2 command line flag format. This can be
/// used for testing and for running the LTO pipeline outside of the linker		/// used for testing and for running the LTO pipeline outside of the linker
/// with llvm-lto2.		/// with llvm-lto2.
std::unique_ptr<raw_ostream> ResolutionFile;		std::unique_ptr<raw_ostream> ResolutionFile;
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

lib/LTO/LTO.cpp

Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	#endif
// Include the hash for the linkage type to reflect internalization and weak		// Include the hash for the linkage type to reflect internalization and weak
// resolution.		// resolution.
for (auto &GS : DefinedGlobals) {		for (auto &GS : DefinedGlobals) {
GlobalValue::LinkageTypes Linkage = GS.second->linkage();		GlobalValue::LinkageTypes Linkage = GS.second->linkage();
Hasher.update(		Hasher.update(
ArrayRef<uint8_t>((const uint8_t *)&Linkage, sizeof(Linkage)));		ArrayRef<uint8_t>((const uint8_t *)&Linkage, sizeof(Linkage)));
}		}

		if (!Conf.SampleProfile.empty()) {
		auto FileOrErr = MemoryBuffer::getFile(Conf.SampleProfile);
		if (FileOrErr)
		Hasher.update(FileOrErr.get()->getBuffer());
		tejohnsonUnsubmitted Not Done Reply Inline Actions Will this be expensive, for large apps with large profiles? Since this is the same for each backend, can we compute once and pass down? tejohnson: Will this be expensive, for large apps with large profiles? Since this is the same for each…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions If we go this route (and we should), then I rather do it as a separate patch and do as I suggested originally when the config was added there: have a `hash` method on the config itself where we would hash early the whole config part and do it once early. mehdi_amini: If we go this route (and we should), then I rather do it as a separate patch and do as I…
		tejohnsonUnsubmitted Not Done Reply Inline Actions Agreed, all the module-independent stuff should be hashed early and passed down. I suppose this is the best approach for now - maybe slow, but safe and enables this change to go in. We can pull it out as part of the config early hashing implementation in a separate patch. tejohnson: Agreed, all the module-independent stuff should be hashed early and passed down. I suppose this…
		}

Key = toHex(Hasher.result());		Key = toHex(Hasher.result());
}		}

static void thinLTOResolveWeakForLinkerGUID(		static void thinLTOResolveWeakForLinkerGUID(
GlobalValueSummaryList &GVSummaryList, GlobalValue::GUID GUID,		GlobalValueSummaryList &GVSummaryList, GlobalValue::GUID GUID,
DenseSet<GlobalValueSummary *> &GlobalInvolvedWithAlias,		DenseSet<GlobalValueSummary *> &GlobalInvolvedWithAlias,
function_ref<bool(GlobalValue::GUID, const GlobalValueSummary *)>		function_ref<bool(GlobalValue::GUID, const GlobalValueSummary *)>
isPrevailing,		isPrevailing,
▲ Show 20 Lines • Show All 767 Lines • Show Last 20 Lines

lib/LTO/LTOBackend.cpp

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	static void runOldPMPasses(Config &Conf, Module &Mod, TargetMachine *TM,
PMB.Inliner = createFunctionInliningPass();		PMB.Inliner = createFunctionInliningPass();
// Unconditionally verify input since it is not verified before this		// Unconditionally verify input since it is not verified before this
// point and has unknown origin.		// point and has unknown origin.
PMB.VerifyInput = true;		PMB.VerifyInput = true;
PMB.VerifyOutput = !Conf.DisableVerify;		PMB.VerifyOutput = !Conf.DisableVerify;
PMB.LoopVectorize = true;		PMB.LoopVectorize = true;
PMB.SLPVectorize = true;		PMB.SLPVectorize = true;
PMB.OptLevel = Conf.OptLevel;		PMB.OptLevel = Conf.OptLevel;
		PMB.PGOSampleUse = Conf.SampleProfile;
if (IsThinLTO)		if (IsThinLTO)
PMB.populateThinLTOPassManager(passes);		PMB.populateThinLTOPassManager(passes);
else		else
PMB.populateLTOPassManager(passes);		PMB.populateLTOPassManager(passes);
passes.run(Mod);		passes.run(Mod);
}		}

bool opt(Config &Conf, TargetMachine *TM, unsigned Task, Module &Mod,		bool opt(Config &Conf, TargetMachine *TM, unsigned Task, Module &Mod,
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

test/tools/gold/X86/Inputs/afdo.prof

This file was added.

				f:100:3
				1: 100

test/tools/gold/X86/thinlto_afdo.ll

This file was added.

				; Generate summary sections
				; RUN: opt -module-summary %s -o %t1.o
				; RUN: opt -module-summary %p/Inputs/thinlto.ll -o %t2.o

				; RUN: rm -f %t1.o.4.opt.bc
				; RUN: %gold -plugin %llvmshlibdir/LLVMgold.so \
				; RUN: --plugin-opt=thinlto \
				; RUN: --plugin-opt=save-temps \
				; RUN: --plugin-opt=sample-profile=%p/Inputs/afdo.prof \
				; RUN: --plugin-opt=jobs=1 \
				; RUN: -shared %t1.o %t2.o -o %t3
				; RUN: opt -S %t1.o.4.opt.bc \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; CHECK: ProfileSummary
				declare void @g(...)
				declare void @h(...)

				define void @f() {
				entry:
				call void (...) @g()
				call void (...) @h()
				ret void
				}

tools/gold/gold-plugin.cpp

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	#endif
// bitcode file's path prefix matching oldprefix with newprefix.		// bitcode file's path prefix matching oldprefix with newprefix.
static std::string thinlto_prefix_replace;		static std::string thinlto_prefix_replace;
// Optional path to a directory for caching ThinLTO objects.		// Optional path to a directory for caching ThinLTO objects.
static std::string cache_dir;		static std::string cache_dir;
// Additional options to pass into the code generator.		// Additional options to pass into the code generator.
// Note: This array will contain all plugin options which are not claimed		// Note: This array will contain all plugin options which are not claimed
// as plugin exclusive to pass to the code generator.		// as plugin exclusive to pass to the code generator.
static std::vector<const char *> extra;		static std::vector<const char *> extra;
		// Sample profile file path
		static std::string sample_profile;

static void process_plugin_option(const char *opt_)		static void process_plugin_option(const char *opt_)
{		{
if (opt_ == nullptr)		if (opt_ == nullptr)
return;		return;
llvm::StringRef opt = opt_;		llvm::StringRef opt = opt_;

if (opt.startswith("mcpu=")) {		if (opt.startswith("mcpu=")) {
Show All 33 Lines	if (opt.startswith("mcpu=")) {
if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))		if (StringRef(opt_ + 5).getAsInteger(10, Parallelism))
message(LDPL_FATAL, "Invalid parallelism level: %s", opt_ + 5);		message(LDPL_FATAL, "Invalid parallelism level: %s", opt_ + 5);
} else if (opt.startswith("lto-partitions=")) {		} else if (opt.startswith("lto-partitions=")) {
if (opt.substr(strlen("lto-partitions="))		if (opt.substr(strlen("lto-partitions="))
.getAsInteger(10, ParallelCodeGenParallelismLevel))		.getAsInteger(10, ParallelCodeGenParallelismLevel))
message(LDPL_FATAL, "Invalid codegen partition level: %s", opt_ + 5);		message(LDPL_FATAL, "Invalid codegen partition level: %s", opt_ + 5);
} else if (opt == "disable-verify") {		} else if (opt == "disable-verify") {
DisableVerify = true;		DisableVerify = true;
		} else if (opt.startswith("sample-profile=")) {
		sample_profile= opt.substr(strlen("sample-profile="));
} else {		} else {
// Save this option to pass to the code generator.		// Save this option to pass to the code generator.
// ParseCommandLineOptions() expects argv[0] to be program name. Lazily		// ParseCommandLineOptions() expects argv[0] to be program name. Lazily
// add that.		// add that.
if (extra.empty())		if (extra.empty())
extra.push_back("LLVMgold");		extra.push_back("LLVMgold");

extra.push_back(opt_);		extra.push_back(opt_);
▲ Show 20 Lines • Show All 492 Lines • ▼ Show 20 Lines	case options::OT_BC_ONLY:
break;		break;

case options::OT_SAVE_TEMPS:		case options::OT_SAVE_TEMPS:
check(Conf.addSaveTemps(output_name + ".",		check(Conf.addSaveTemps(output_name + ".",
/* UseInputModulePath */ true));		/* UseInputModulePath */ true));
break;		break;
}		}

		if (!options::sample_profile.empty())
		Conf.SampleProfile = options::sample_profile;

return llvm::make_unique<LTO>(std::move(Conf), Backend,		return llvm::make_unique<LTO>(std::move(Conf), Backend,
options::ParallelCodeGenParallelismLevel);		options::ParallelCodeGenParallelismLevel);
}		}

// Write empty files that may be expected by a distributed build		// Write empty files that may be expected by a distributed build
// system when invoked with thinlto_index_only. This is invoked when		// system when invoked with thinlto_index_only. This is invoked when
// the linker has decided not to include the given module in the		// the linker has decided not to include the given module in the
// final link. Frequently the distributed build system will want to		// final link. Frequently the distributed build system will want to
▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines