This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ReplayInlineAdvisor.h
-
lib/
-
Analysis/
-
CMakeLists.txt
2/2
ReplayInlineAdvisor.cpp
-
Transforms/IPO/
-
IPO/
2/3
SampleProfile.cpp
-
test/Transforms/SampleProfile/
-
Transforms/
-
SampleProfile/
-
Inputs/
1/2
inline-replay.txt
-
inline-replay.ll

Differential D83743

[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
ClosedPublic

Authored by wenlei on Jul 13 2020, 10:16 PM.

Download Raw Diff

Details

Reviewers

davidxl
mtrofin
wmi
hoy

Commits

rG029946b11268: [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks

Summary

This change added a new inline advisor that takes optimization remarks from previous inlining as input, and provides the decision as advice so current inlining can replay inline decisions of a different compilation. Dwarf inline stack with line and discriminator is used as anchor for call sites including call context. The change can be useful for Inliner tuning as it provides a channel to allow external input for tweaking inline decisions. Existing alternatives like alwaysinline attribute is per-function, not per-callsite. Per-callsite inline intrinsic can be another solution (not yet existing), but it's intrusive to implement and also does not differentiate call context.

A switch -sample-profile-inline-replay=<inline_remarks_file> is added to hook up the new inline advisor with SampleProfileLoader's inline decision for replay. Since SampleProfileLoader does top-down inlining, inline decision can be specialized for each call context, hence we should be able to replay inlining accurately. However with a bottom-up inliner like CGSCC inlining, the replay can be limited due to lack of specialization for different call context. Apart from that limitation, the new inline advisor can still be used by regular CGSCC inliner later if needed for tuning purpose.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	500 ms	linux > MemorySanitizer-X86_64.MemorySanitizer-X86_64::Unknown Unit Message ("")
	210 ms	linux > MemorySanitizer-lld-X86_64.MemorySanitizer-lld-X86_64::Unknown Unit Message ("")
	530 ms	linux > SanitizerCommon-asan-x86_64-Linux.Linux::Unknown Unit Message ("")
	310 ms	linux > SanitizerCommon-lsan-x86_64-Linux.Linux::Unknown Unit Message ("")
	460 ms	linux > SanitizerCommon-msan-x86_64-Linux.Linux::Unknown Unit Message ("")
		View Full Test Results (9 Failed)

Event Timeline

wenlei created this revision.Jul 13 2020, 10:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2020, 10:16 PM

Herald added subscribers: llvm-commits, hiraditya, aprantl. · View Herald Transcript

Harbormaster failed remote builds in B64099: Diff 277666!Jul 13 2020, 10:17 PM

mtrofin added inline comments.Jul 13 2020, 11:02 PM

llvm/include/llvm/Analysis/InlineAdvisor.h
184 ↗	(On Diff #277666)	Nit: could this be factored in its own .h file (and .cpp)?
189 ↗	(On Diff #277666)	nit: lower case first letter, also verb form (e.g. areReplayRemarksLoaded)
193 ↗	(On Diff #277666)	Nit: initialize HasReplayRemarks here, then the state of the object is deterministic.
llvm/lib/Analysis/InlineAdvisor.cpp
162 ↗	(On Diff #277666)	this assumes DIL->getLine() >= DIL->getScope()->getSubprogram()->getLine(). Perhaps an assert before, or Offset could be int, and check it's non-negative?
llvm/lib/Transforms/IPO/SampleProfile.cpp
330	I'm curious, is there a case when FAM would be not passed? (i.e. why not just require it?)
912	why not define Advice within the if below?

wenlei marked 6 inline comments as done.Jul 14 2020, 7:59 AM

wenlei added inline comments.

llvm/lib/Analysis/InlineAdvisor.cpp
162 ↗	(On Diff #277666)	This is consistent with `addLocationToRemarks` where we construct the location string for remarks, so the output can be consumed without change. Negative line offset is possible, we've seen that in FDO profile, and they're encoded with unsigned int there too - I don't know why that's the case, but seems there's convention and functionality-wise it works as long as everything is consistent..
llvm/lib/Transforms/IPO/SampleProfile.cpp
330	There's another call site from old pass manager pipeline, where we don't have FAM.

Address feedbacks.

Herald added a subscriber: mgorny. · View Herald TranscriptJul 14 2020, 8:00 AM

mtrofin added inline comments.Jul 14 2020, 8:03 AM

llvm/lib/Analysis/InlineAdvisor.cpp
162 ↗	(On Diff #277666)	I don't think that makes it right, though. I'd suggest at least having a test here, then, i.e. bailing out if the this is greater than lhs. I'll look into the other API.

wenlei marked an inline comment as done.Jul 14 2020, 8:36 AM

wenlei added inline comments.

llvm/lib/Analysis/InlineAdvisor.cpp
162 ↗	(On Diff #277666)	I'm not sure if we want to bail out on negative offsets. I added `addLocationToRemarks` couple weeks back. Unsigned int was used there so we can match line offset from remarks to the line offset from FDO profile for negative offsets, which is important from tooling perspective. Ideally we shouldn't have negative offset, but practically, we see it. If we put an assertion there, it will fire. Due to macro, there could be cases where LHS and RHS come from different files, hence the subtraction can lead to negative offset; and there could be other cases due to not having a line number on LHS so 0 is retrieved for LHS. We've been living with that for a while (see `FunctionSamples::getOffset`), and here's an example of FDO profile where negative offset is encoded using unsigned int (the last three lines). Not advocating for that, but just thought asserting non-negative offset or fixing what's breaking that assertion is a separate work. And being consistent with what we've been doing (e.g. `FunctionSamples::getOffset`) should be good. What do you think? ZSTD_decompressSequences_bmi2:736988668:1677 0: 1557 6: 93 5: ZSTD_decompressSequences_body:736954111 8: 1557 9: 1557 10: 1557 11: 1557 53: 109 55: 109 58: 0 64693: 1520 64940: 1564 65080: 1264073

lint

Harbormaster failed remote builds in B64157: Diff 277835!Jul 14 2020, 8:49 AM

Harbormaster failed remote builds in B64165: Diff 277845!Jul 14 2020, 9:15 AM

Can this be extended to the SCC inliner?

In D83743#2150703, @davidxl wrote:

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

In D83743#2150725, @wenlei wrote:

In D83743#2150703, @davidxl wrote:

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

Thanks for the work, using inline advisor to replay inline is exactly something we want too.

Currently every inline remark message has only one level: one caller, one callee and one callsite (maybe with multiple levels of inline stack associated with the callsite). One problem I am thinking of is: the inline advisor may only work with top-down inlining. In bottom-up inlining, if we decide to inline a specific callsite, it will actually be inlined at many places after its caller also being inlined into multiple places in the caller's callers. Since SCC inliner uses bottom-up inlining, current format of inline advise may not provide precise enough information to specify an exact location where inlining is expected to happen.

Have you ever considered to use a callpath to specify the inlining location?

mtrofin added inline comments.Jul 14 2020, 1:27 PM

llvm/lib/Analysis/InlineAdvisor.cpp
162 ↗	(On Diff #277666)	So I make sure I understand: remarks already output unsigned ints negative offsets are possible, due to cases like you mentioned In this case, I agree that the choice of supporting negative offsets is a separate problem from this CL. Could you add a comment explaining the current motivation for unsigned it? (i.e. match what remarks do) Also, could it be uint32_t, to ensure 32 bit-ness?

In D83743#2150725, @wenlei wrote:

In D83743#2150703, @davidxl wrote:

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

I'd be curious what the scenario in the SCC case would be. IIUC, here, the value is that you can replay decisions made with a profile, and use a different profile for everything else in the compiler. Hmm... I suppose maybe a similar scenario could be articulated for the SCC case?

On this - could you please add in the patch description the motivating scenario (helps with understanding) - thanks!

In D83743#2151020, @wmi wrote:

In D83743#2150725, @wenlei wrote:

In D83743#2150703, @davidxl wrote:

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

Thanks for the work, using inline advisor to replay inline is exactly something we want too.

Currently every inline remark message has only one level: one caller, one callee and one callsite (maybe with multiple levels of inline stack associated with the callsite). One problem I am thinking of is: the inline advisor may only work with top-down inlining. In bottom-up inlining, if we decide to inline a specific callsite, it will actually be inlined at many places after its caller also being inlined into multiple places in the caller's callers. Since SCC inliner uses bottom-up inlining, current format of inline advise may not provide precise enough information to specify an exact location where inlining is expected to happen.

Have you ever considered to use a callpath to specify the inlining location?

Glad to know that it will be useful for your case too. You're right that with a bottom-up CGSCC inliner, we won't be able to replay arbitrary inline decisions. However I think that's a limitation of the inliner, not how inline decisions are represented. E.g. even if we have a full inline tree (subset of call graph), we still won't be able to replay that with CGSCC inliner if there's context-sensitive inline (specialization) in it.

The use case we had was for tuning full context-sensitive AutoFDO's early inlining - we can replay baseline AutoFDO early inlining there, or the other way around. We could also replay CGSCC inline of one build in another. The way it's implemented would work for those scenarios. But replay FDO early inline in CGSCC inline or the opposite may not get us what we wanted for the reasons you mentioned.

In D83743#2151983, @mtrofin wrote:

In D83743#2150725, @wenlei wrote:

In D83743#2150703, @davidxl wrote:

Can this be extended to the SCC inliner?

Yes, we can use it in SCC inliner as well. We just need extra plumbing there. We can do that in a separate change if needed.

I'd be curious what the scenario in the SCC case would be. IIUC, here, the value is that you can replay decisions made with a profile, and use a different profile for everything else in the compiler. Hmm... I suppose maybe a similar scenario could be articulated for the SCC case?

On this - could you please add in the patch description the motivating scenario (helps with understanding) - thanks!

Our current use case is not in the SCC inliner. But yeah, I image it could be useful there too. In general I feel some mechanism like this one to allow external input for tweaking inlining decision can be useful, mostly for tuning and experimental purpose. We currently don't have a good way of doing that, existing attributes like alwaysinline is function level, not call site level. And per-callsite inline intrinsic like https://reviews.llvm.org/D51200 is intrusive and hard to push through either. I thought this is a relatively easy and clean way of getting that functionality for tuning. I will update the description to include the motivation.

wenlei edited the summary of this revision. (Show Details)Jul 14 2020, 5:59 PM

Address feedback, update comment.

Harbormaster failed remote builds in B64270: Diff 278037!Jul 14 2020, 6:36 PM

Thanks for the explanation.

We could also replay CGSCC inline of one build in another. The way it's implemented would work for those scenarios.

When you replay CGSCC, the inline decision in replay file should have order? If it is unordered, I cannot see how it can replay CGSCC inline from another build exactly.

Our current use case is not in the SCC inliner. But yeah, I image it could be useful there too. In general I feel some mechanism like this one to allow external input for tweaking inlining decision can be useful, mostly for tuning and experimental purpose. We currently don't have a good way of doing that, existing attributes like alwaysinline is function level, not call site level. And per-callsite inline intrinsic like https://reviews.llvm.org/D51200 is intrusive and hard to push through either. I thought this is a relatively easy and clean way of getting that functionality for tuning.

That is the scenario I am interested -- using it for tweaking inlining decision for tuning/experimental purpose. I understand it could need a lot more work than this patch and we may not need to address them currently. I just want to know how it should work in your mind. Could you explain it in more detail?

In D83743#2153532, @wmi wrote:

Thanks for the explanation.

We could also replay CGSCC inline of one build in another. The way it's implemented would work for those scenarios.

When you replay CGSCC, the inline decision in replay file should have order? If it is unordered, I cannot see how it can replay CGSCC inline from another build exactly.

I might be missing something, but here's how I think about this. For a given inline decision, say D on A->B->C path, we would only evaluate once and make that decision once. Then for replay, assuming we follow the same bottom-up order, for each call site, we can retrieve that exact decision from input remarks, and as long as we make the same decisions, the sequence of call sites that gets exposed for evaluation will also be the same. Because we evaluate call site for each path only once, any decision recorded in remarks should be unique, then I'm not sure how order of the inline remarks interfere with replay.

Our current use case is not in the SCC inliner. But yeah, I image it could be useful there too. In general I feel some mechanism like this one to allow external input for tweaking inlining decision can be useful, mostly for tuning and experimental purpose. We currently don't have a good way of doing that, existing attributes like alwaysinline is function level, not call site level. And per-callsite inline intrinsic like https://reviews.llvm.org/D51200 is intrusive and hard to push through either. I thought this is a relatively easy and clean way of getting that functionality for tuning.

That is the scenario I am interested -- using it for tweaking inlining decision for tuning/experimental purpose. I understand it could need a lot more work than this patch and we may not need to address them currently. I just want to know how it should work in your mind. Could you explain it in more detail?

I am considering different modes for replay in the future, the first is strict mode like what we have here - we simply replay everything faithfully. The second mode is positive replay, e.g. we only apply extra inlining using the input. A third mode can be negative replay, that is we prohibit inlining specified in input without affecting others. Orthogonally, we sometime may also want to focus on a specific inline tree, so it may be useful to have an unknown decision from advisor, then we can fall back to other advisors for decisions outside of the inline tree specified in input.

rebase

Harbormaster failed remote builds in B64547: Diff 278517!Jul 16 2020, 9:53 AM

wmi added inline comments.Jul 16 2020, 10:46 AM

llvm/lib/Analysis/ReplayInlineAdvisor.cpp
59–79	Can we extract this part to a function? I think it can be reused by other types of InlineAdvisor, for tuning purpose for example.
llvm/test/Transforms/SampleProfile/Inputs/inline-replay.txt
2	Why it is sum:1 instead of _Z3sumii:1 @ main:3.1?

wenlei marked 3 inline comments as done.Jul 16 2020, 11:43 AM

wenlei added inline comments.

llvm/lib/Analysis/ReplayInlineAdvisor.cpp
59–79	Sure, done.
llvm/test/Transforms/SampleProfile/Inputs/inline-replay.txt
2	Oh, this has to do with the contrived `!dbg` metadata where `linkageName` was missing. Changed both the remarks here and `!dbg` to include mangle names now.

address feedbacks.

wenlei marked 2 inline comments as done.Jul 16 2020, 11:45 AM

wenlei added inline comments.

llvm/lib/Analysis/InlineAdvisor.cpp
162 ↗	(On Diff #277666)	Comment added, and also changed to use `uint32_t`.

LGTM. Better wait another day to see if other reviewers have further comments.

This revision is now accepted and ready to land.Jul 16 2020, 12:07 PM

lgtm - a small nit, could you also add in the description the current scope (the top down, SampleProfile case) - for clarity. Thanks!

Harbormaster failed remote builds in B64571: Diff 278561!Jul 16 2020, 12:26 PM

In D83743#2156697, @mtrofin wrote:

lgtm - a small nit, could you also add in the description the current scope (the top down, SampleProfile case) - for clarity. Thanks!

Sure, updated. Thank you all for feedbacks and quick review!

Closed by commit rG029946b11268: [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks (authored by wenlei). · Explain WhyJul 17 2020, 1:31 PM

This revision was automatically updated to reflect the committed changes.

The inline-replay.ll test that this added appears to be broken (as was noted by the pre-merge check https://reviews.llvm.org/harbormaster/unit/view/120108/) and is failing in our build.

In D83743#2159588, @stellaraccident wrote:

The inline-replay.ll test that this added appears to be broken (as was noted by the pre-merge check https://reviews.llvm.org/harbormaster/unit/view/120108/) and is failing in our build.

Sorry for the breakage, this wasn't caught in my local build. Taking a look.. Feel free to revert to get unblock.

wenlei mentioned this in rG577e58bcc754: [InlineAdvisor] New inliner advisor to replay inlining from optimization remarks.Aug 15 2020, 8:17 PM

modimo mentioned this in D94334: [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from optimization remarks.Jan 8 2021, 1:39 PM

modimo mentioned this in rGce7f9cdb50a9: [InlineAdvisor] Allow replay of inline decisions for the CGSCC inliner from….Jan 25 2021, 3:39 PM

wenlei mentioned this in D110658: [InlineAdvisor] Add -inline-replay-scope=<Function|Module> to control replay scope.Oct 1 2021, 12:04 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ReplayInlineAdvisor.h

37 lines

lib/

Analysis/

CMakeLists.txt

1 line

ReplayInlineAdvisor.cpp

83 lines

Transforms/

IPO/

SampleProfile.cpp

37 lines

test/

Transforms/

SampleProfile/

Inputs/

inline-replay.txt

2 lines

inline-replay.ll

122 lines

Diff 278517

llvm/include/llvm/Analysis/ReplayInlineAdvisor.h

This file was added.

				//===- ReplayInlineAdvisor.h - Replay Inline Advisor interface -- C++ ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				#ifndef LLVM_REPLAYINLINEADVISOR_H_
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] not useful Lint: Pre-merge checks: clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] [[https…
				#define LLVM_REPLAYINLINEADVISOR_H_

				#include "llvm/ADT/StringSet.h"
				#include "llvm/Analysis/InlineAdvisor.h"
				#include "llvm/IR/LLVMContext.h"

				namespace llvm {
				class BasicBlock;
				class CallBase;
				class Function;
				class Module;
				class OptimizationRemarkEmitter;

				/// Replay inline advisor that uses optimization remarks from inlining of
				/// previous build to guide current inlining. This is useful for inliner tuning.
				class ReplayInlineAdvisor : public InlineAdvisor {
				public:
				ReplayInlineAdvisor(FunctionAnalysisManager &FAM, LLVMContext &Context,
				StringRef RemarksFile);
				std::unique_ptr<InlineAdvice> getAdvice(CallBase &CB) override;
				bool areReplayRemarksLoaded() const { return HasReplayRemarks; }

				private:
				StringSet<> InlineSitesFromRemarks;
				bool HasReplayRemarks = false;
				};
				} // namespace llvm
				#endif // LLVM_REPLAYINLINEADVISOR_H_
				No newline at end of file

llvm/lib/Analysis/CMakeLists.txt

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	add_llvm_component_library(LLVMAnalysis
PHITransAddr.cpp		PHITransAddr.cpp
PhiValues.cpp		PhiValues.cpp
PostDominators.cpp		PostDominators.cpp
ProfileSummaryInfo.cpp		ProfileSummaryInfo.cpp
PtrUseVisitor.cpp		PtrUseVisitor.cpp
RegionInfo.cpp		RegionInfo.cpp
RegionPass.cpp		RegionPass.cpp
RegionPrinter.cpp		RegionPrinter.cpp
		ReplayInlineAdvisor.cpp
ScalarEvolution.cpp		ScalarEvolution.cpp
ScalarEvolutionAliasAnalysis.cpp		ScalarEvolutionAliasAnalysis.cpp
ScalarEvolutionDivision.cpp		ScalarEvolutionDivision.cpp
ScalarEvolutionNormalization.cpp		ScalarEvolutionNormalization.cpp
StackLifetime.cpp		StackLifetime.cpp
StackSafetyAnalysis.cpp		StackSafetyAnalysis.cpp
SyncDependenceAnalysis.cpp		SyncDependenceAnalysis.cpp
SyntheticCountsUtils.cpp		SyntheticCountsUtils.cpp
Show All 22 Lines

llvm/lib/Analysis/ReplayInlineAdvisor.cpp

This file was added.

				//===- ReplayInlineAdvisor.cpp - Replay InlineAdvisor ---------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements ReplayInlineAdvisor that replays inline decision based
				// on previous inline remarks from optimization remark log.
				//
				//===----------------------------------------------------------------------===//
				#include <sstream>

				#include "llvm/Analysis/InlineAdvisor.h"
				#include "llvm/Analysis/ReplayInlineAdvisor.h"
				#include "llvm/IR/DebugInfoMetadata.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/Support/LineIterator.h"

				using namespace llvm;

				#define DEBUG_TYPE "inline-replay"

				ReplayInlineAdvisor::ReplayInlineAdvisor(FunctionAnalysisManager &FAM,
				LLVMContext &Context,
				StringRef RemarksFile)
				: InlineAdvisor(FAM), HasReplayRemarks(false) {
				auto BufferOrErr = MemoryBuffer::getFileOrSTDIN(RemarksFile);
				std::error_code EC = BufferOrErr.getError();
				if (EC) {
				Context.emitError("Could not open remarks file: " + EC.message());
				return;
				}

				line_iterator LineIt(BufferOrErr.get(), /SkipBlanks=*/true);
				for (; !LineIt.is_at_eof(); ++LineIt) {
				StringRef Line = *LineIt;
				auto Pair = Line.split(" at callsite ");
				if (Pair.second.empty())
				continue;
				InlineSitesFromRemarks.insert(Pair.second);
				}
				HasReplayRemarks = true;
				}

				std::unique_ptr<InlineAdvice> ReplayInlineAdvisor::getAdvice(CallBase &CB) {
				assert(HasReplayRemarks);

				Function &Caller = *CB.getCaller();
				auto &ORE = FAM.getResult<OptimizationRemarkEmitterAnalysis>(Caller);

				if (InlineSitesFromRemarks.empty())
				return std::make_unique<InlineAdvice>(this, CB, ORE, false);

				// Example for inline remarks to parse:
				// _Z3subii inlined into main [details] at callsite sum:1 @ main:3.1
				// We use the callsite string after `at callsite` to replay inlining.
				std::ostringstream CallSiteLoc;
				auto DLoc = CB.getDebugLoc();
				bool First = true;
				for (DILocation *DIL = DLoc.get(); DIL; DIL = DIL->getInlinedAt()) {
				if (!First)
				CallSiteLoc << " @ ";
				// Note that negative line offset is actually possible, but we use
				// unsigned int to match line offset representation in remarks so
				// it's directly consumable by relay advisor.
				uint32_t Offset =
				DIL->getLine() - DIL->getScope()->getSubprogram()->getLine();
				uint32_t Discriminator = DIL->getBaseDiscriminator();
				StringRef Name = DIL->getScope()->getSubprogram()->getLinkageName();
				if (Name.empty())
				Name = DIL->getScope()->getSubprogram()->getName();
				CallSiteLoc << Name.str() << ":" << llvm::utostr(Offset);
				if (Discriminator) {
				CallSiteLoc << "." << llvm::utostr(Discriminator);
				}
				First = false;
				}
				wmiUnsubmitted Done Reply Inline Actions Can we extract this part to a function? I think it can be reused by other types of InlineAdvisor, for tuning purpose for example. wmi: Can we extract this part to a function? I think it can be reused by other types of…
				wenleiAuthorUnsubmitted Done Reply Inline Actions Sure, done. wenlei: Sure, done.

				bool InlineRecommended = InlineSitesFromRemarks.count(CallSiteLoc.str()) > 0;
				return std::make_unique<InlineAdvice>(this, CB, ORE, InlineRecommended);
				}

llvm/lib/Transforms/IPO/SampleProfile.cpp

Show All 37 Lines
#include "llvm/Analysis/CallGraph.h"		#include "llvm/Analysis/CallGraph.h"
#include "llvm/Analysis/CallGraphSCCPass.h"		#include "llvm/Analysis/CallGraphSCCPass.h"
#include "llvm/Analysis/InlineAdvisor.h"		#include "llvm/Analysis/InlineAdvisor.h"
#include "llvm/Analysis/InlineCost.h"		#include "llvm/Analysis/InlineCost.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
		#include "llvm/Analysis/ReplayInlineAdvisor.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	static cl::opt<bool> ProfileSizeInline(
"sample-profile-inline-size", cl::Hidden, cl::init(false),		"sample-profile-inline-size", cl::Hidden, cl::init(false),
cl::desc("Inline cold call sites in profile loader if it's beneficial "		cl::desc("Inline cold call sites in profile loader if it's beneficial "
"for code size."));		"for code size."));

static cl::opt<int> SampleColdCallSiteThreshold(		static cl::opt<int> SampleColdCallSiteThreshold(
"sample-profile-cold-inline-threshold", cl::Hidden, cl::init(45),		"sample-profile-cold-inline-threshold", cl::Hidden, cl::init(45),
cl::desc("Threshold for inlining cold callsites"));		cl::desc("Threshold for inlining cold callsites"));

		static cl::opt<std::string> ProfileInlineReplayFile(
		"sample-profile-inline-replay", cl::init(""), cl::value_desc("filename"),
		cl::desc(
		"Optimization remarks file containing inline remarks to be replayed "
		"by inlining from sample profile loader."),
		cl::Hidden);

namespace {		namespace {

using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>;		using BlockWeightMap = DenseMap<const BasicBlock *, uint64_t>;
using EquivalenceClassMap = DenseMap<const BasicBlock , const BasicBlock >;		using EquivalenceClassMap = DenseMap<const BasicBlock , const BasicBlock >;
using Edge = std::pair<const BasicBlock , const BasicBlock >;		using Edge = std::pair<const BasicBlock , const BasicBlock >;
using EdgeWeightMap = DenseMap<Edge, uint64_t>;		using EdgeWeightMap = DenseMap<Edge, uint64_t>;
using BlockEdgeMap =		using BlockEdgeMap =
DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;		DenseMap<const BasicBlock , SmallVector<const BasicBlock , 8>>;
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines	SampleProfileLoader(
std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo,		std::function<TargetTransformInfo &(Function &)> GetTargetTransformInfo,
std::function<const TargetLibraryInfo &(Function &)> GetTLI)		std::function<const TargetLibraryInfo &(Function &)> GetTLI)
: GetAC(std::move(GetAssumptionCache)),		: GetAC(std::move(GetAssumptionCache)),
GetTTI(std::move(GetTargetTransformInfo)), GetTLI(std::move(GetTLI)),		GetTTI(std::move(GetTargetTransformInfo)), GetTLI(std::move(GetTLI)),
CoverageTracker(*this), Filename(std::string(Name)),		CoverageTracker(*this), Filename(std::string(Name)),
RemappingFilename(std::string(RemapName)),		RemappingFilename(std::string(RemapName)),
IsThinLTOPreLink(IsThinLTOPreLink) {}		IsThinLTOPreLink(IsThinLTOPreLink) {}

bool doInitialization(Module &M);		bool doInitialization(Module &M, FunctionAnalysisManager *FAM = nullptr);
		mtrofinUnsubmitted Not Done Reply Inline Actions I'm curious, is there a case when FAM would be not passed? (i.e. why not just require it?) mtrofin: I'm curious, is there a case when FAM would be not passed? (i.e. why not just require it?)
		wenleiAuthorUnsubmitted Done Reply Inline Actions There's another call site from old pass manager pipeline, where we don't have FAM. wenlei: There's another call site from old pass manager pipeline, where we don't have FAM.
bool runOnModule(Module &M, ModuleAnalysisManager *AM,		bool runOnModule(Module &M, ModuleAnalysisManager *AM,
ProfileSummaryInfo _PSI, CallGraph CG);		ProfileSummaryInfo _PSI, CallGraph CG);

void dump() { Reader->dump(); }		void dump() { Reader->dump(); }

protected:		protected:
friend class SampleCoverageTracker;		friend class SampleCoverageTracker;

▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	protected:
StringSet<> NamesInProfile;		StringSet<> NamesInProfile;

// For symbol in profile symbol list, whether to regard their profiles		// For symbol in profile symbol list, whether to regard their profiles
// to be accurate. It is mainly decided by existance of profile symbol		// to be accurate. It is mainly decided by existance of profile symbol
// list and -profile-accurate-for-symsinlist flag, but it can be		// list and -profile-accurate-for-symsinlist flag, but it can be
// overriden by -profile-sample-accurate or profile-sample-accurate		// overriden by -profile-sample-accurate or profile-sample-accurate
// attribute.		// attribute.
bool ProfAccForSymsInList;		bool ProfAccForSymsInList;

		// External inline advisor used to replay inline decision from remarks.
		std::unique_ptr<ReplayInlineAdvisor> ExternalInlineAdvisor;
};		};

class SampleProfileLoaderLegacyPass : public ModulePass {		class SampleProfileLoaderLegacyPass : public ModulePass {
public:		public:
// Class identification, replacement for typeinfo		// Class identification, replacement for typeinfo
static char ID;		static char ID;

SampleProfileLoaderLegacyPass(StringRef Name = SampleProfileFile,		SampleProfileLoaderLegacyPass(StringRef Name = SampleProfileFile,
▲ Show 20 Lines • Show All 409 Lines • ▼ Show 20 Lines	SampleProfileLoader::findFunctionSamples(const Instruction &Inst) const {

auto it = DILocation2SampleMap.try_emplace(DIL,nullptr);		auto it = DILocation2SampleMap.try_emplace(DIL,nullptr);
if (it.second)		if (it.second)
it.first->second = Samples->findFunctionSamples(DIL);		it.first->second = Samples->findFunctionSamples(DIL);
return it.first->second;		return it.first->second;
}		}

bool SampleProfileLoader::inlineCallInstruction(CallBase &CB) {		bool SampleProfileLoader::inlineCallInstruction(CallBase &CB) {
		if (ExternalInlineAdvisor) {
		mtrofinUnsubmitted Done Reply Inline Actions why not define Advice within the if below? mtrofin: why not define Advice within the if below?
		auto Advice = ExternalInlineAdvisor->getAdvice(CB);
		if (!Advice->isInliningRecommended()) {
		Advice->recordUnattemptedInlining();
		return false;
		}
		// Dummy record, we don't use it for replay.
		Advice->recordInlining();
		}

Function *CalledFunction = CB.getCalledFunction();		Function *CalledFunction = CB.getCalledFunction();
assert(CalledFunction);		assert(CalledFunction);
DebugLoc DLoc = CB.getDebugLoc();		DebugLoc DLoc = CB.getDebugLoc();
BasicBlock *BB = CB.getParent();		BasicBlock *BB = CB.getParent();
InlineParams Params = getInlineParams();		InlineParams Params = getInlineParams();
Params.ComputeFullInlineCost = true;		Params.ComputeFullInlineCost = true;
// Checks if there is anything in the reachable portion of the callee at		// Checks if there is anything in the reachable portion of the callee at
// this callsite that makes this inlining potentially illegal. Need to		// this callsite that makes this inlining potentially illegal. Need to
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	for (auto &BB : F) {
localNotInlinedCallSites.try_emplace(CB, FS);		localNotInlinedCallSites.try_emplace(CB, FS);
if (callsiteIsHot(FS, PSI))		if (callsiteIsHot(FS, PSI))
Hot = true;		Hot = true;
else if (shouldInlineColdCallee(*CB))		else if (shouldInlineColdCallee(*CB))
ColdCandidates.push_back(CB);		ColdCandidates.push_back(CB);
}		}
}		}
}		}
if (Hot) {		if (Hot \|\| ExternalInlineAdvisor) {
CIS.insert(CIS.begin(), AllCandidates.begin(), AllCandidates.end());		CIS.insert(CIS.begin(), AllCandidates.begin(), AllCandidates.end());
emitOptimizationRemarksForInlineCandidates(AllCandidates, F, true);		emitOptimizationRemarksForInlineCandidates(AllCandidates, F, true);
} else {		} else {
CIS.insert(CIS.begin(), ColdCandidates.begin(), ColdCandidates.end());		CIS.insert(CIS.begin(), ColdCandidates.begin(), ColdCandidates.end());
emitOptimizationRemarksForInlineCandidates(ColdCandidates, F, false);		emitOptimizationRemarksForInlineCandidates(ColdCandidates, F, false);
}		}
}		}
for (CallBase *I : CIS) {		for (CallBase *I : CIS) {
▲ Show 20 Lines • Show All 796 Lines • ▼ Show 20 Lines	while (!CGI.isAtEnd()) {
}		}
++CGI;		++CGI;
}		}

std::reverse(FunctionOrderList.begin(), FunctionOrderList.end());		std::reverse(FunctionOrderList.begin(), FunctionOrderList.end());
return FunctionOrderList;		return FunctionOrderList;
}		}

bool SampleProfileLoader::doInitialization(Module &M) {		bool SampleProfileLoader::doInitialization(Module &M,
		FunctionAnalysisManager *FAM) {
auto &Ctx = M.getContext();		auto &Ctx = M.getContext();

std::unique_ptr<SampleProfileReaderItaniumRemapper> RemapReader;		std::unique_ptr<SampleProfileReaderItaniumRemapper> RemapReader;
auto ReaderOrErr =		auto ReaderOrErr =
SampleProfileReader::create(Filename, Ctx, RemappingFilename);		SampleProfileReader::create(Filename, Ctx, RemappingFilename);
if (std::error_code EC = ReaderOrErr.getError()) {		if (std::error_code EC = ReaderOrErr.getError()) {
std::string Msg = "Could not open profile: " + EC.message();		std::string Msg = "Could not open profile: " + EC.message();
Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));		Ctx.diagnose(DiagnosticInfoSampleProfile(Filename, Msg));
return false;		return false;
}		}
Reader = std::move(ReaderOrErr.get());		Reader = std::move(ReaderOrErr.get());
Reader->collectFuncsFrom(M);		Reader->collectFuncsFrom(M);
ProfileIsValid = (Reader->read() == sampleprof_error::success);		ProfileIsValid = (Reader->read() == sampleprof_error::success);
PSL = Reader->getProfileSymbolList();		PSL = Reader->getProfileSymbolList();

// While profile-sample-accurate is on, ignore symbol list.		// While profile-sample-accurate is on, ignore symbol list.
ProfAccForSymsInList =		ProfAccForSymsInList =
ProfileAccurateForSymsInList && PSL && !ProfileSampleAccurate;		ProfileAccurateForSymsInList && PSL && !ProfileSampleAccurate;
if (ProfAccForSymsInList) {		if (ProfAccForSymsInList) {
NamesInProfile.clear();		NamesInProfile.clear();
if (auto NameTable = Reader->getNameTable())		if (auto NameTable = Reader->getNameTable())
NamesInProfile.insert(NameTable->begin(), NameTable->end());		NamesInProfile.insert(NameTable->begin(), NameTable->end());
}		}

		if (FAM && !ProfileInlineReplayFile.empty()) {
		ExternalInlineAdvisor = std::make_unique<ReplayInlineAdvisor>(
		*FAM, Ctx, ProfileInlineReplayFile);
		if (!ExternalInlineAdvisor->areReplayRemarksLoaded())
		ExternalInlineAdvisor.reset();
		}

return true;		return true;
}		}

ModulePass *llvm::createSampleProfileLoaderPass() {		ModulePass *llvm::createSampleProfileLoaderPass() {
return new SampleProfileLoaderLegacyPass();		return new SampleProfileLoaderLegacyPass();
}		}

ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {		ModulePass *llvm::createSampleProfileLoaderPass(StringRef Name) {
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	PreservedAnalyses SampleProfileLoaderPass::run(Module &M,
};		};

SampleProfileLoader SampleLoader(		SampleProfileLoader SampleLoader(
ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,		ProfileFileName.empty() ? SampleProfileFile : ProfileFileName,
ProfileRemappingFileName.empty() ? SampleProfileRemappingFile		ProfileRemappingFileName.empty() ? SampleProfileRemappingFile
: ProfileRemappingFileName,		: ProfileRemappingFileName,
IsThinLTOPreLink, GetAssumptionCache, GetTTI, GetTLI);		IsThinLTOPreLink, GetAssumptionCache, GetTTI, GetTLI);

if (!SampleLoader.doInitialization(M))		if (!SampleLoader.doInitialization(M, &FAM))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);		ProfileSummaryInfo *PSI = &AM.getResult<ProfileSummaryAnalysis>(M);
CallGraph &CG = AM.getResult<CallGraphAnalysis>(M);		CallGraph &CG = AM.getResult<CallGraphAnalysis>(M);
if (!SampleLoader.runOnModule(M, &AM, PSI, &CG))		if (!SampleLoader.runOnModule(M, &AM, PSI, &CG))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

return PreservedAnalyses::none();		return PreservedAnalyses::none();
}		}

llvm/test/Transforms/SampleProfile/Inputs/inline-replay.txt

This file was added.

				remark: calls.cc:10:0: _Z3sumii inlined into main to match profiling context with (cost=45, threshold=337) at callsite main:3.1
				remark: calls.cc:4:0: _Z3subii inlined into main to match profiling context with (cost=-5, threshold=337) at callsite sum:1 @ main:3.1
				wmiUnsubmitted Not Done Reply Inline Actions Why it is sum:1 instead of _Z3sumii:1 @ main:3.1? wmi: Why it is sum:1 instead of _Z3sumii:1 @ main:3.1?
				wenleiAuthorUnsubmitted Done Reply Inline Actions Oh, this has to do with the contrived `!dbg` metadata where `linkageName` was missing. Changed both the remarks here and `!dbg` to include mangle names now. wenlei: Oh, this has to do with the contrived `!dbg` metadata where `linkageName` was missing. Changed…

llvm/test/Transforms/SampleProfile/inline-replay.ll

This file was added.

				;; Note that this needs new pass manager for now. Passing `-sample-profile-inline-replay` to legacy pass manager is a no-op.

				;; Check baseline inline decisions
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-merge-inlinee -sample-profile-top-down-load -pass-remarks=inline -S 2>&1 \| FileCheck -check-prefix=DEFAULT %s

				;; Check replay inline decisions
				; RUN: opt < %s -passes=sample-profile -sample-profile-file=%S/Inputs/inline-topdown.prof -sample-profile-inline-replay=%S/Inputs/inline-replay.txt -sample-profile-merge-inlinee -sample-profile-top-down-load -pass-remarks=inline -S 2>&1 \| FileCheck -check-prefix=REPLAY %s

				@.str = private unnamed_addr constant [11 x i8] c"sum is %d\0A\00", align 1

				define i32 @_Z3sumii(i32 %x, i32 %y) #0 !dbg !6 {
				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %y, i32* %y.addr, align 4
				%tmp = load i32, i32* %x.addr, align 4, !dbg !8
				%tmp1 = load i32, i32* %y.addr, align 4, !dbg !8
				%add = add nsw i32 %tmp, %tmp1, !dbg !8
				%tmp2 = load i32, i32* %x.addr, align 4, !dbg !8
				%tmp3 = load i32, i32* %y.addr, align 4, !dbg !8
				%call = call i32 @_Z3subii(i32 %tmp2, i32 %tmp3), !dbg !8
				ret i32 %add, !dbg !8
				}

				define i32 @_Z3subii(i32 %x, i32 %y) #0 !dbg !9 {
				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				store i32 %y, i32* %y.addr, align 4
				%tmp = load i32, i32* %x.addr, align 4, !dbg !10
				%tmp1 = load i32, i32* %y.addr, align 4, !dbg !10
				%add = sub nsw i32 %tmp, %tmp1, !dbg !10
				ret i32 %add, !dbg !11
				}

				define i32 @main() #0 !dbg !12 {
				entry:
				%retval = alloca i32, align 4
				%s = alloca i32, align 4
				%i = alloca i32, align 4
				store i32 0, i32* %retval
				store i32 0, i32* %i, align 4, !dbg !13
				br label %while.cond, !dbg !14

				while.cond: ; preds = %if.end, %entry
				%tmp = load i32, i32* %i, align 4, !dbg !15
				%inc = add nsw i32 %tmp, 1, !dbg !15
				store i32 %inc, i32* %i, align 4, !dbg !15
				%cmp = icmp slt i32 %tmp, 400000000, !dbg !15
				br i1 %cmp, label %while.body, label %while.end, !dbg !15

				while.body: ; preds = %while.cond
				%tmp1 = load i32, i32* %i, align 4, !dbg !17
				%cmp1 = icmp ne i32 %tmp1, 100, !dbg !17
				br i1 %cmp1, label %if.then, label %if.else, !dbg !17

				if.then: ; preds = %while.body
				%tmp2 = load i32, i32* %i, align 4, !dbg !19
				%tmp3 = load i32, i32* %s, align 4, !dbg !19
				%call = call i32 @_Z3sumii(i32 %tmp2, i32 %tmp3), !dbg !19
				store i32 %call, i32* %s, align 4, !dbg !19
				br label %if.end, !dbg !19

				if.else: ; preds = %while.body
				store i32 30, i32* %s, align 4, !dbg !21
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				br label %while.cond, !dbg !23

				while.end: ; preds = %while.cond
				%tmp4 = load i32, i32* %s, align 4, !dbg !25
				%call2 = call i32 (i8, ...) @printf(i8 getelementptr inbounds ([11 x i8], [11 x i8]* @.str, i32 0, i32 0), i32 %tmp4), !dbg !25
				ret i32 0, !dbg !26
				}

				declare i32 @printf(i8*, ...)

				attributes #0 = { "use-sample-profile" }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4}
				!llvm.ident = !{!5}

				!0 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !1, producer: "clang version 3.5 ", isOptimized: false, runtimeVersion: 0, emissionKind: NoDebug, enums: !2, retainedTypes: !2, globals: !2, imports: !2)
				!1 = !DIFile(filename: "calls.cc", directory: ".")
				!2 = !{}
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 1, !"Debug Info Version", i32 3}
				!5 = !{!"clang version 3.5 "}
				!6 = distinct !DISubprogram(name: "sum", scope: !1, file: !1, line: 3, type: !7, scopeLine: 3, virtualIndex: 6, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!7 = !DISubroutineType(types: !2)
				!8 = !DILocation(line: 4, scope: !6)
				!9 = distinct !DISubprogram(name: "sub", scope: !1, file: !1, line: 20, type: !7, scopeLine: 20, virtualIndex: 6, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!10 = !DILocation(line: 20, scope: !9)
				!11 = !DILocation(line: 21, scope: !9)
				!12 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 7, type: !7, scopeLine: 7, virtualIndex: 6, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition, unit: !0, retainedNodes: !2)
				!13 = !DILocation(line: 8, scope: !12)
				!14 = !DILocation(line: 9, scope: !12)
				!15 = !DILocation(line: 9, scope: !16)
				!16 = !DILexicalBlockFile(scope: !12, file: !1, discriminator: 2)
				!17 = !DILocation(line: 10, scope: !18)
				!18 = distinct !DILexicalBlock(scope: !12, file: !1, line: 10)
				!19 = !DILocation(line: 10, scope: !20)
				!20 = !DILexicalBlockFile(scope: !18, file: !1, discriminator: 2)
				!21 = !DILocation(line: 10, scope: !22)
				!22 = !DILexicalBlockFile(scope: !18, file: !1, discriminator: 4)
				!23 = !DILocation(line: 10, scope: !24)
				!24 = !DILexicalBlockFile(scope: !18, file: !1, discriminator: 6)
				!25 = !DILocation(line: 11, scope: !12)
				!26 = !DILocation(line: 12, scope: !12)


				; DEFAULT: _Z3sumii inlined into main
				; DEFAULT: _Z3subii inlined into _Z3sumii
				; DEFAULT-NOT: _Z3subii inlined into main

				; REPLAY: _Z3sumii inlined into main
				; REPLAY: _Z3subii inlined into main
				; REPLA-NOT: _Z3subii inlined into _Z3sumii

This is an archive of the discontinued LLVM Phabricator instance.

[InlineAdvisor] New inliner advisor to replay inlining from optimization remarksClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 278517

llvm/include/llvm/Analysis/ReplayInlineAdvisor.h

llvm/lib/Analysis/CMakeLists.txt

llvm/lib/Analysis/ReplayInlineAdvisor.cpp

llvm/lib/Transforms/IPO/SampleProfile.cpp

llvm/test/Transforms/SampleProfile/Inputs/inline-replay.txt

llvm/test/Transforms/SampleProfile/inline-replay.ll

[InlineAdvisor] New inliner advisor to replay inlining from optimization remarks
ClosedPublic