This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
3/3
inline-cs-pseudoprobe.test
3/8
noinline-cs-pseudoprobe.test
-
tools/llvm-profgen/
-
llvm-profgen/
-
PerfReader.cpp
2/2
ProfileGenerator.h
14/17
ProfileGenerator.cpp
-
ProfiledBinary.h
-
PseudoProbe.h
-
PseudoProbe.cpp

Differential D92998

[CSSPGO][llvm-profgen] Pseudo probe based CS profile generation
ClosedPublic

Authored by wlei on Dec 9 2020, 9:18 PM.

Download Raw Diff

Details

Reviewers

wmi
davidxl
hoy
wenlei

Commits

rGc82b24f4756e: [CSSPGO][llvm-profgen] Pseudo probe based CS profile generation

Summary

This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.

Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

Implementation

Extended PseudoProbeProfileGenerator for pseudo probe based profile generation.
populateBodySamplesWithProbes reading range counter is responsible for recording function body samples and inferring caller's body samples.
populateBoundarySamplesWithProbes reading branch counter is responsible for recording call site target samples.
Each sample is recorded with its calling context(named ContextId). Remind that the probe based context key doesn't include the leaf frame probe info, so the ContextId string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
Added regression test

Test Plan:

ninja & ninja check-llvm

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wlei created this revision.Dec 9 2020, 9:18 PM

Herald added subscribers: hoy, wenlei, lxfind. · View Herald TranscriptDec 9 2020, 9:18 PM

wlei requested review of this revision.Dec 9 2020, 9:18 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2020, 9:18 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B81765: Diff 310761.Dec 9 2020, 9:19 PM

wlei edited the summary of this revision. (Show Details)Dec 10 2020, 3:50 PM

wlei added reviewers: wmi, davidxl, hoy, wenlei.

wlei added a parent revision: D92896: [CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe.Dec 10 2020, 6:24 PM

hoy added inline comments.Dec 18 2020, 10:16 AM

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
16	If you rebase on the latest, a checksum will be in presence here.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
415	Please add a comment about why a dangling probe gets a zero count.
422	Nit: function
473	You mean foo:2 @ bar?
481	Move this out of the loop to save the check?

wlei added a child revision: D93556: [CSSPGO][llvm-profgen] Compress recursive cycles in calling context.Dec 18 2020, 11:33 AM

Address Hongtao's feedback: support profile checksum and some other NFC work

wlei marked 5 inline comments as done.Dec 21 2020, 9:46 AM

wlei added inline comments.

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
16	Thanks for reminding this, code to print checksum is added: `FunctionProile.setFunctionHash(LeafFuncDesc->FuncHash);`
llvm/tools/llvm-profgen/ProfileGenerator.cpp
415	comments added
473	typo fixed
481	Yeah, moved the last elements out of the loop.

Harbormaster completed remote builds in B83166: Diff 313128.Dec 21 2020, 9:50 AM

hoy added inline comments.Dec 21 2020, 2:45 PM

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
5	Nit: use `CHECK-NEXT`
llvm/tools/llvm-profgen/ProfileGenerator.cpp
42	The setting of the two flags should not be necessary on the profile generation side. They are used on the loader side. Do you see any issue without setting them?

wlei marked 4 inline comments as done.Dec 21 2020, 2:54 PM

wlei added inline comments.

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Yeah, if not explicitly setting here, the value will be false.

you see in llvm/lib/Transforms/IPO/SampleProfile.cpp:

// Apply tweaks if context-sensitive profile is available.
if (Reader->profileIsCS()) {
  ProfileIsCS = true;
  FunctionSamples::ProfileIsCS = true;

  // Tracker for profiles under different context
  ContextTracker =
      std::make_unique<SampleContextTracker>(Reader->getProfiles());
}

It's set when the reader know it's a CS profile. But for llvm-profgen side, it doesn't set this.

hoy added inline comments.Dec 21 2020, 2:58 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
42	I see. So `ContextTracker` used in profile generation?

wlei added inline comments.Dec 21 2020, 4:08 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Not ContextTracker, it's the SampleProfileWriter to write the text format of profile, like:

if (FunctionSamples::ProfileIsCS)
  OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples();
else
  OS << S.getName() << ":" << S.getTotalSamples();

if (FunctionSamples::ProfileIsProbeBased) {
  OS.indent(Indent + 1);
  OS << "!CFGChecksum: " << S.getFunctionHash() << "\n";
}

hoy added inline comments.Dec 21 2020, 5:34 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
43	I see. Sounds like the initializations should be moved into `ProfileGenerator::generateProfile` or `ProfileGenerator::write`. What do you think?

wlei added inline comments.Dec 21 2020, 5:47 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
43	Sounds good! So since it's only used for the writer, I will move them to `ProfileGenerator::write`

wlei added inline comments.Dec 21 2020, 5:51 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Just confirm, it's also used in getEntrySamples, so ProfileGenerator::generateProfile might be the right place.

uint64_t getEntrySamples() const {
  if (FunctionSamples::ProfileIsCS && getHeadSamples()) {
    // For CS profile, if we already have more accurate head samples
    // counted by branch sample from caller, use them as entry samples.
    return getHeadSamples();
  }

hoy added inline comments.Dec 21 2020, 5:52 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
43	Yeah, `generateProfile` sounds the right place.

added CHECK-NEXT and other refactoring work

fix lint

Harbormaster completed remote builds in B83321: Diff 313417.Dec 22 2020, 1:34 PM

Harbormaster completed remote builds in B83322: Diff 313419.Dec 22 2020, 2:07 PM

hoy accepted this revision.Jan 5 2021, 5:20 PM

This revision is now accepted and ready to land.Jan 5 2021, 5:20 PM

rebase and fix clang-tidy

Harbormaster completed remote builds in B84143: Diff 314760.Jan 5 2021, 5:34 PM

wlei mentioned this in D92584: [CSSPGO][llvm-profgen] Refactor to unify hashable interface for trace sample and context-sensitive counter.Jan 6 2021, 10:23 AM

wlei mentioned this in D92896: [CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe.Jan 12 2021, 4:44 PM

rebase

Harbormaster completed remote builds in B84956: Diff 316292.Jan 12 2021, 6:44 PM

wmi added inline comments.Jan 14 2021, 10:04 PM

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	In [main:2 @ foo], probes with 0 count are not dumped. In which case probe with 0 count will be dumped?
llvm/tools/llvm-profgen/ProfileGenerator.cpp
350–352	Move it to header comment of extractPrefixContextId in case extractPrefixContextId is used elsewhere.
llvm/tools/llvm-profgen/ProfileGenerator.h
139	Make it an overload function of getFunctionProfileForLeafProbe so it is known to be used for probe?

Addressing Wei's feedback

fix typo

wlei added inline comments.Jan 15 2021, 2:08 PM

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	The 0 count is for a dangling probe case to distinguish a missing count, see the ProfileGenerator.cpp:420 Copied the comments for you to refer: // Drop the samples collected for a dangling probe since it's misleading. // We still report the probe but with a special zero count. The compiler // won't trust the zero count and will rely on the counts inference // algorithm to get the probe a reasonable count. Note that a zero count is // different from a missing count, where the latter really tells the // compiler that a probe is never executed.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
350–352	Good catch, moved.
llvm/tools/llvm-profgen/ProfileGenerator.h
139	Good suggestion!

Harbormaster completed remote builds in B85418: Diff 317064.Jan 15 2021, 3:14 PM

Harbormaster completed remote builds in B85421: Diff 317070.Jan 15 2021, 3:27 PM

wmi added inline comments.Jan 21 2021, 2:35 PM

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	Note that a zero count is different from a missing count, where the latter really tells the compiler that a probe is never executed. That is contrary to the debug info based profile where zero count for a line means the line is never executed. Missing count for a line means compiler has to infer the count. Why the probe based profile is implemented in such way?

hoy added inline comments.Jan 21 2021, 3:10 PM

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	Good question. The current counts inference algorithm in sample profile loader doesn't seem to differentiate a missing sample from a zero sample. For example, in `SampleProfileLoader::propagateThroughEdges`, accesses to `BlockWeights` are not preceded with a check. Therefore a missing entry in `BlockWeights` will be created as a zero entry. With pseudo probes, a missing sample really means the probe is not executed. A zero sample, on the other hand, wouldn't be created by the profile generator. We are leveraging zero sample to represent a special type of probe, called `dangling` probe, which will need the compiler to infer its count. We were thinking about using UINT64_MAX instead, but UINT64_MAX literally is a legal number of samples. The compiler change that dangles probes haven't been sent out yet. @wlei I think we should exclude this detail from this change for now. But it's better to discuss here if this approach makes sense.

wmi added inline comments.Jan 21 2021, 5:29 PM

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	For example, in SampleProfileLoader::propagateThroughEdges, accesses to BlockWeights are not preceded with a check. Therefore a missing entry in BlockWeights will be created as a zero entry. IIRC, VisitedBlocks is used to differentiate missing count with zero count BB. With pseudo probes, a missing sample really means the probe is not executed. We are leveraging zero sample to represent a special type of probe, called dangling probe, which will need the compiler to infer its count. I think we discussed it in another patch so I understand for probe, a missing sample means the probe is not executed. It is a little weird to leverage zero sample to tell compiler to infer the count. UINT64_MAX looks better to me because there cannot be any real sample counter with value being UINT64_MAX. In addition, using zero is misleading when we read the profile -- for a location which has zero count, it is actually not zero from compiler's perspective.

hoy added inline comments.Jan 21 2021, 5:55 PM

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	IIRC, VisitedBlocks is used to differentiate missing count with zero count BB. Thanks for pointing that out. I misread that code. Sounds like in order to reuse the existing inference algorithm, we will need to give probes without any samples an explicit zero count in `BlockWeights`. Agreed that using `UINT64_MAX` looks more clear than using zero.

wenlei added inline comments.Jan 21 2021, 11:42 PM

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	It is a little weird to leverage zero sample to tell compiler to infer the count. Agreed. Sounds like in order to reuse the existing inference algorithm, we will need to give probes without any samples an explicit zero count in BlockWeights. Agreed that using UINT64_MAX looks more clear than using zero. This is counter-intuitive, regardless of the inference algorithm, it'd be good if we can avoid using zero to mark dangling. This is actually one of the TODOs on internal patch, I think it's time to take care of it now.

Removed the dangling probe related code, solution will come in another patch

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14	Thanks for your conversation to let me have a good understanding. So I just removed the zero count related code and the dangling probe issue will be addressed in other diff.

Harbormaster completed remote builds in B86378: Diff 318706.Jan 22 2021, 6:30 PM

Fix one bug with context id: the inferred callsite context id should be generated from
string splitting since the callee's context id might be compressed they should share the
same context prefix

Harbormaster completed remote builds in B87377: Diff 320492.Feb 1 2021, 10:32 AM

@wmi Hi, could you take a further look at this patch? another three accepted patches depend on this, thank you!

Thanks for pinging me. LGTM.

Closed by commit rGc82b24f4756e: [CSSPGO][llvm-profgen] Pseudo probe based CS profile generation (authored by wlei). · Explain WhyFeb 3 2021, 4:22 PM

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rGc82b24f4756e: [CSSPGO][llvm-profgen] Pseudo probe based CS profile generation.

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-profgen/

inline-cs-pseudoprobe.test

17 lines

noinline-cs-pseudoprobe.test

16 lines

tools/

llvm-profgen/

4 lines

41 lines

198 lines

11 lines

13 lines

36 lines

Diff 321259

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test

	; RUN: llvm-profgen --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER			; RUN: llvm-profgen --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER
				; RUN: FileCheck %s --input-file %t

				; CHECK: [main:2 @ foo]:74:0
				; CHECK-NEXT: 2: 15
				hoyUnsubmitted Done Reply Inline Actions Nit: use `CHECK-NEXT` hoy: Nit: use `CHECK-NEXT`
				; CHECK-NEXT: 3: 15
				; CHECK-NEXT: 4: 14
				; CHECK-NEXT: 5: 1
				; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 8: 14 bar:14
				; CHECK-NEXT: !CFGChecksum: 138950591924
				; CHECK-NEXT:[main:2 @ foo:8 @ bar]:56:14
				; CHECK-NEXT: 1: 14
				; CHECK-NEXT: 2: 14
				; CHECK-NEXT: 3: 14
				; CHECK-NEXT: 4: 14
				hoyUnsubmitted Done Reply Inline Actions If you rebase on the latest, a checksum will be in presence here. hoy: If you rebase on the latest, a checksum will be in presence here.
				wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for reminding this, code to print checksum is added: `FunctionProile.setFunctionHash(LeafFuncDesc->FuncHash);` wlei: Thanks for reminding this, code to print checksum is added: ` FunctionProile.setFunctionHash…
				; CHECK-NEXT: !CFGChecksum: 72617220756


	; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Range Counter:			; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Range Counter:
	; CHECK-UNWINDER-EMPTY:			; CHECK-UNWINDER-EMPTY:
	; CHECK-UNWINDER-NEXT: (800, 858): 1			; CHECK-UNWINDER-NEXT: (800, 858): 1
	; CHECK-UNWINDER-NEXT: (80e, 82b): 1			; CHECK-UNWINDER-NEXT: (80e, 82b): 1
	; CHECK-UNWINDER-NEXT: (80e, 858): 13			; CHECK-UNWINDER-NEXT: (80e, 858): 13

	; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Branch Counter:			; CHECK-UNWINDER: Binary(inline-cs-pseudoprobe.perfbin)'s Branch Counter:
	Show All 28 Lines

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

	; RUN: llvm-profgen --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER			; RUN: llvm-profgen --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t --show-unwinder-output \| FileCheck %s --check-prefix=CHECK-UNWINDER
				; RUN: FileCheck %s --input-file %t

				; CHECK: [main:2 @ foo]:75:0
				; CHECK-NEXT: 2: 15
				; CHECK-NEXT: 3: 15
				; CHECK-NEXT: 4: 15
				; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 8: 15 bar:15
				; CHECK-NEXT: !CFGChecksum: 138950591924
				; CHECK-NEXT:[main:2 @ foo:8 @ bar]:60:15
				; CHECK-NEXT: 1: 15
				; CHECK-NEXT: 2: 15
				; CHECK-NEXT: 3: 15
				wmiUnsubmitted Done Reply Inline Actions In [main:2 @ foo], probes with 0 count are not dumped. In which case probe with 0 count will be dumped? wmi: In [main:2 @ foo], probes with 0 count are not dumped. In which case probe with 0 count will be…
				wleiAuthorUnsubmitted Done Reply Inline Actions The 0 count is for a dangling probe case to distinguish a missing count, see the ProfileGenerator.cpp:420 Copied the comments for you to refer: // Drop the samples collected for a dangling probe since it's misleading. // We still report the probe but with a special zero count. The compiler // won't trust the zero count and will rely on the counts inference // algorithm to get the probe a reasonable count. Note that a zero count is // different from a missing count, where the latter really tells the // compiler that a probe is never executed. wlei: The 0 count is for a dangling probe case to distinguish a missing count, see the…
				wmiUnsubmitted Not Done Reply Inline Actions Note that a zero count is different from a missing count, where the latter really tells the compiler that a probe is never executed. That is contrary to the debug info based profile where zero count for a line means the line is never executed. Missing count for a line means compiler has to infer the count. Why the probe based profile is implemented in such way? wmi: > Note that a zero count is different from a missing count, where the latter really tells the…
				hoyUnsubmitted Not Done Reply Inline Actions Good question. The current counts inference algorithm in sample profile loader doesn't seem to differentiate a missing sample from a zero sample. For example, in `SampleProfileLoader::propagateThroughEdges`, accesses to `BlockWeights` are not preceded with a check. Therefore a missing entry in `BlockWeights` will be created as a zero entry. With pseudo probes, a missing sample really means the probe is not executed. A zero sample, on the other hand, wouldn't be created by the profile generator. We are leveraging zero sample to represent a special type of probe, called `dangling` probe, which will need the compiler to infer its count. We were thinking about using UINT64_MAX instead, but UINT64_MAX literally is a legal number of samples. The compiler change that dangles probes haven't been sent out yet. @wlei I think we should exclude this detail from this change for now. But it's better to discuss here if this approach makes sense. hoy: Good question. The current counts inference algorithm in sample profile loader doesn't seem to…
				wmiUnsubmitted Not Done Reply Inline Actions For example, in SampleProfileLoader::propagateThroughEdges, accesses to BlockWeights are not preceded with a check. Therefore a missing entry in BlockWeights will be created as a zero entry. IIRC, VisitedBlocks is used to differentiate missing count with zero count BB. With pseudo probes, a missing sample really means the probe is not executed. We are leveraging zero sample to represent a special type of probe, called dangling probe, which will need the compiler to infer its count. I think we discussed it in another patch so I understand for probe, a missing sample means the probe is not executed. It is a little weird to leverage zero sample to tell compiler to infer the count. UINT64_MAX looks better to me because there cannot be any real sample counter with value being UINT64_MAX. In addition, using zero is misleading when we read the profile -- for a location which has zero count, it is actually not zero from compiler's perspective. wmi: > For example, in SampleProfileLoader::propagateThroughEdges, accesses to BlockWeights are not…
				hoyUnsubmitted Not Done Reply Inline Actions IIRC, VisitedBlocks is used to differentiate missing count with zero count BB. Thanks for pointing that out. I misread that code. Sounds like in order to reuse the existing inference algorithm, we will need to give probes without any samples an explicit zero count in `BlockWeights`. Agreed that using `UINT64_MAX` looks more clear than using zero. hoy: > IIRC, VisitedBlocks is used to differentiate missing count with zero count BB. Thanks for…
				wenleiUnsubmitted Not Done Reply Inline Actions It is a little weird to leverage zero sample to tell compiler to infer the count. Agreed. Sounds like in order to reuse the existing inference algorithm, we will need to give probes without any samples an explicit zero count in BlockWeights. Agreed that using UINT64_MAX looks more clear than using zero. This is counter-intuitive, regardless of the inference algorithm, it'd be good if we can avoid using zero to mark dangling. This is actually one of the TODOs on internal patch, I think it's time to take care of it now. wenlei: > It is a little weird to leverage zero sample to tell compiler to infer the count. Agreed.
				wleiAuthorUnsubmitted Done Reply Inline Actions Thanks for your conversation to let me have a good understanding. So I just removed the zero count related code and the dangling probe issue will be addressed in other diff. wlei: Thanks for your conversation to let me have a good understanding. So I just removed the zero…
				; CHECK-NEXT: 4: 15
				; CHECK-NEXT: !CFGChecksum: 72617220756


	; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Range Counter:			; CHECK-UNWINDER: Binary(noinline-cs-pseudoprobe.perfbin)'s Range Counter:
	; CHECK-UNWINDER-NEXT: main:2			; CHECK-UNWINDER-NEXT: main:2
	; CHECK-UNWINDER-NEXT: (79e, 7bf): 15			; CHECK-UNWINDER-NEXT: (79e, 7bf): 15
	; CHECK-UNWINDER-NEXT: (7c4, 7cf): 15			; CHECK-UNWINDER-NEXT: (7c4, 7cf): 15
	; CHECK-UNWINDER-NEXT: main:2 @ foo:8			; CHECK-UNWINDER-NEXT: main:2 @ foo:8
	; CHECK-UNWINDER-NEXT: (760, 77f): 15			; CHECK-UNWINDER-NEXT: (760, 77f): 15

	Show All 32 Lines

llvm/tools/llvm-profgen/PerfReader.cpp

Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines	void PerfReader::checkAndSetPerfType(
for (auto FileName : PerfTraceFilenames) {		for (auto FileName : PerfTraceFilenames) {
if (!isHybridPerfScript(FileName)) {		if (!isHybridPerfScript(FileName)) {
HasHybridPerf = false;		HasHybridPerf = false;
break;		break;
}		}
}		}

if (HasHybridPerf) {		if (HasHybridPerf) {
// Set up ProfileIsCS to enable context-sensitive functionalities
// in SampleProf
FunctionSamples::ProfileIsCS = true;
PerfType = PERF_LBR_STACK;		PerfType = PERF_LBR_STACK;

} else {		} else {
// TODO: Support other type of perf script		// TODO: Support other type of perf script
PerfType = PERF_INVILID;		PerfType = PERF_INVILID;
}		}

if (BinaryTable.size() > 1) {		if (BinaryTable.size() > 1) {
// TODO: remove this if everything is ready to support multiple binaries.		// TODO: remove this if everything is ready to support multiple binaries.
exitWithError("Currently only support one input binary, multiple binaries' "		exitWithError("Currently only support one input binary, multiple binaries' "
Show All 27 Lines

llvm/tools/llvm-profgen/ProfileGenerator.h

Show All 19 Lines
namespace sampleprof {		namespace sampleprof {

class ProfileGenerator {		class ProfileGenerator {

public:		public:
ProfileGenerator(){};		ProfileGenerator(){};
virtual ~ProfileGenerator() = default;		virtual ~ProfileGenerator() = default;
static std::unique_ptr<ProfileGenerator>		static std::unique_ptr<ProfileGenerator>
create(const BinarySampleCounterMap &SampleCounters,		create(const BinarySampleCounterMap &BinarySampleCounters,
enum PerfScriptType SampleType);		enum PerfScriptType SampleType);
virtual void generateProfile() = 0;		virtual void generateProfile() = 0;

// Use SampleProfileWriter to serialize profile map		// Use SampleProfileWriter to serialize profile map
void write();		void write();

protected:		protected:
/*		/*
For each region boundary point, mark if it is begin or end (or both) of		For each region boundary point, mark if it is begin or end (or both) of
the region. Boundary points are inclusive. Log the sample count as well		the region. Boundary points are inclusive. Log the sample count as well
so we can use it when we compute the sample count of each disjoint region		so we can use it when we compute the sample count of each disjoint region
later. Note that there might be multiple ranges with different sample		later. Note that there might be multiple ranges with different sample
count that share same begin/end point. We need to accumulate the sample		count that share same begin/end point. We need to accumulate the sample
count for the boundary point for such case, because for the example		count for the boundary point for such case, because for the example
below,		below,

\|<--100-->\|		\|<--100-->\|
\|<------200------>\|		\|<------200------>\|
A B C		A B C

sample count for disjoint region [A,B] would be 300.		sample count for disjoint region [A,B] would be 300.
*/		*/
void findDisjointRanges(RangeSample &DisjointRanges,		void findDisjointRanges(RangeSample &DisjointRanges,
const RangeSample &Ranges);		const RangeSample &Ranges);

// Used by SampleProfileWriter		// Used by SampleProfileWriter
StringMap<FunctionSamples> ProfileMap;		StringMap<FunctionSamples> ProfileMap;
};		};

class CSProfileGenerator : public ProfileGenerator {		class CSProfileGenerator : public ProfileGenerator {
protected:		protected:
const BinarySampleCounterMap &BinarySampleCounters;		const BinarySampleCounterMap &BinarySampleCounters;

public:		public:
CSProfileGenerator(const BinarySampleCounterMap &Counters)		CSProfileGenerator(const BinarySampleCounterMap &Counters)
: BinarySampleCounters(Counters){};		: BinarySampleCounters(Counters){};

public:		public:
void generateProfile() override {		void generateProfile() override {
		// Enable context-sensitive functionalities in SampleProf
		FunctionSamples::ProfileIsCS = true;
for (const auto &BI : BinarySampleCounters) {		for (const auto &BI : BinarySampleCounters) {
ProfiledBinary *Binary = BI.first;		ProfiledBinary *Binary = BI.first;
for (const auto &CI : BI.second) {		for (const auto &CI : BI.second) {
const StringBasedCtxKey *CtxKey =		const StringBasedCtxKey *CtxKey =
dyn_cast<StringBasedCtxKey>(CI.first.getPtr());		dyn_cast<StringBasedCtxKey>(CI.first.getPtr());
StringRef ContextId(CtxKey->Context);		StringRef ContextId(CtxKey->Context);
// Get or create function profile for the range		// Get or create function profile for the range
FunctionSamples &FunctionProfile =		FunctionSamples &FunctionProfile =
Show All 9 Lines	void generateProfile() override {
}		}
// Fill in call site value sample for inlined calls and also use context to		// Fill in call site value sample for inlined calls and also use context to
// infer missing samples. Since we don't have call count for inlined		// infer missing samples. Since we don't have call count for inlined
// functions, we estimate it from inlinee's profile using the entry of the		// functions, we estimate it from inlinee's profile using the entry of the
// body sample.		// body sample.
populateInferredFunctionSamples();		populateInferredFunctionSamples();
}		}

		protected:
		// Lookup or create FunctionSamples for the context
		FunctionSamples &getFunctionProfileForContext(StringRef ContextId);

private:		private:
// Helper function for updating body sample for a leaf location in		// Helper function for updating body sample for a leaf location in
// FunctionProfile		// FunctionProfile
void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,		void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,
const FrameLocation &LeafLoc,		const FrameLocation &LeafLoc,
uint64_t Count);		uint64_t Count);
// Lookup or create FunctionSamples for the context
FunctionSamples &getFunctionProfileForContext(StringRef ContextId);
void populateFunctionBodySamples(FunctionSamples &FunctionProfile,		void populateFunctionBodySamples(FunctionSamples &FunctionProfile,
const RangeSample &RangeCounters,		const RangeSample &RangeCounters,
ProfiledBinary *Binary);		ProfiledBinary *Binary);
void populateFunctionBoundarySamples(StringRef ContextId,		void populateFunctionBoundarySamples(StringRef ContextId,
FunctionSamples &FunctionProfile,		FunctionSamples &FunctionProfile,
const BranchSample &BranchCounters,		const BranchSample &BranchCounters,
ProfiledBinary *Binary);		ProfiledBinary *Binary);
void populateInferredFunctionSamples();		void populateInferredFunctionSamples();
};		};

		using ProbeCounterMap = std::unordered_map<const PseudoProbe *, uint64_t>;

class PseudoProbeCSProfileGenerator : public CSProfileGenerator {		class PseudoProbeCSProfileGenerator : public CSProfileGenerator {

public:		public:
PseudoProbeCSProfileGenerator(const BinarySampleCounterMap &Counters)		PseudoProbeCSProfileGenerator(const BinarySampleCounterMap &Counters)
: CSProfileGenerator(Counters) {}		: CSProfileGenerator(Counters) {}
void generateProfile() override {		void generateProfile() override;
// TODO
}		private:
		// Go through each address from range to extract the top frame probe by
		// looking up in the Address2ProbeMap
		void extractProbesFromRange(const RangeSample &RangeCounter,
		ProbeCounterMap &ProbeCounter,
		ProfiledBinary *Binary);
		// Fill in function body samples from probes
		void populateBodySamplesWithProbes(const RangeSample &RangeCounter,
		StringRef PrefixContextId,
		ProfiledBinary *Binary);
		// Fill in boundary samples for a call probe
		void populateBoundarySamplesWithProbes(const BranchSample &BranchCounter,
		StringRef PrefixContextId,
		ProfiledBinary *Binary);
		// Helper function to get FunctionSamples for the leaf inlined context
		FunctionSamples &getFunctionProfileForLeafProbe(
		StringRef PrefixContextId,
		wmiUnsubmitted Done Reply Inline Actions Make it an overload function of getFunctionProfileForLeafProbe so it is known to be used for probe? wmi: Make it an overload function of getFunctionProfileForLeafProbe so it is known to be used for…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good suggestion! wlei: Good suggestion!
		SmallVector<std::string, 16> &LeafInlinedContext,
		const PseudoProbeFuncDesc *LeafFuncDesc);
		// Helper function to get FunctionSamples for the leaf probe
		FunctionSamples &getFunctionProfileForLeafProbe(StringRef PrefixContextId,
		const PseudoProbe *LeafProbe,
		ProfiledBinary *Binary);
};		};

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Show All 33 Lines	return BinarySampleCounters.size() &&
BinarySampleCounters.begin()->first->usePseudoProbes();		BinarySampleCounters.begin()->first->usePseudoProbes();
}		}

std::unique_ptr<ProfileGenerator>		std::unique_ptr<ProfileGenerator>
ProfileGenerator::create(const BinarySampleCounterMap &BinarySampleCounters,		ProfileGenerator::create(const BinarySampleCounterMap &BinarySampleCounters,
enum PerfScriptType SampleType) {		enum PerfScriptType SampleType) {
std::unique_ptr<ProfileGenerator> ProfileGenerator;		std::unique_ptr<ProfileGenerator> ProfileGenerator;
if (SampleType == PERF_LBR_STACK) {		if (SampleType == PERF_LBR_STACK) {
if (usePseudoProbes(BinarySampleCounters)) {		if (usePseudoProbes(BinarySampleCounters)) {
		hoyUnsubmitted Done Reply Inline Actions The setting of the two flags should not be necessary on the profile generation side. They are used on the loader side. Do you see any issue without setting them? hoy: The setting of the two flags should not be necessary on the profile generation side. They are…
		wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, if not explicitly setting here, the value will be false. you see in llvm/lib/Transforms/IPO/SampleProfile.cpp: // Apply tweaks if context-sensitive profile is available. if (Reader->profileIsCS()) { ProfileIsCS = true; FunctionSamples::ProfileIsCS = true; // Tracker for profiles under different context ContextTracker = std::make_unique<SampleContextTracker>(Reader->getProfiles()); } It's set when the reader know it's a CS profile. But for llvm-profgen side, it doesn't set this. wlei: Yeah, if not explicitly setting here, the value will be false. you see in…
		hoyUnsubmitted Not Done Reply Inline Actions I see. So `ContextTracker` used in profile generation? hoy: I see. So `ContextTracker` used in profile generation?
		wleiAuthorUnsubmitted Done Reply Inline Actions Not `ContextTracker`, it's the SampleProfileWriter to write the text format of profile, like: if (FunctionSamples::ProfileIsCS) OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples(); else OS << S.getName() << ":" << S.getTotalSamples(); if (FunctionSamples::ProfileIsProbeBased) { OS.indent(Indent + 1); OS << "!CFGChecksum: " << S.getFunctionHash() << "\n"; } wlei: Not `ContextTracker`, it's the SampleProfileWriter to write the text format of profile, like…
ProfileGenerator.reset(		ProfileGenerator.reset(
		hoyUnsubmitted Not Done Reply Inline Actions I see. Sounds like the initializations should be moved into `ProfileGenerator::generateProfile` or `ProfileGenerator::write`. What do you think? hoy: I see. Sounds like the initializations should be moved into `ProfileGenerator::generateProfile`…
		wleiAuthorUnsubmitted Done Reply Inline Actions Sounds good! So since it's only used for the writer, I will move them to `ProfileGenerator::write` wlei: Sounds good! So since it's only used for the writer, I will move them to `ProfileGenerator…
		wleiAuthorUnsubmitted Done Reply Inline Actions Just confirm, it's also used in `getEntrySamples`, so `ProfileGenerator::generateProfile` might be the right place. uint64_t getEntrySamples() const { if (FunctionSamples::ProfileIsCS && getHeadSamples()) { // For CS profile, if we already have more accurate head samples // counted by branch sample from caller, use them as entry samples. return getHeadSamples(); } wlei: Just confirm, it's also used in `getEntrySamples`, so `ProfileGenerator::generateProfile` might…
		hoyUnsubmitted Not Done Reply Inline Actions Yeah, `generateProfile` sounds the right place. hoy: Yeah, `generateProfile` sounds the right place.
new PseudoProbeCSProfileGenerator(BinarySampleCounters));		new PseudoProbeCSProfileGenerator(BinarySampleCounters));
} else {		} else {
ProfileGenerator.reset(new CSProfileGenerator(BinarySampleCounters));		ProfileGenerator.reset(new CSProfileGenerator(BinarySampleCounters));
}		}
} else {		} else {
// TODO:		// TODO:
llvm_unreachable("Unsupported perfscript!");		llvm_unreachable("Unsupported perfscript!");
}		}
Show All 10 Lines
}		}

void ProfileGenerator::findDisjointRanges(RangeSample &DisjointRanges,		void ProfileGenerator::findDisjointRanges(RangeSample &DisjointRanges,
const RangeSample &Ranges) {		const RangeSample &Ranges) {

/*		/*
Regions may overlap with each other. Using the boundary info, find all		Regions may overlap with each other. Using the boundary info, find all
disjoint ranges and their sample count. BoundaryPoint contains the count		disjoint ranges and their sample count. BoundaryPoint contains the count
mutiple samples begin/end at this points.		multiple samples begin/end at this points.

\|<--100-->\| Sample1		\|<--100-->\| Sample1
\|<------200------>\| Sample2		\|<------200------>\| Sample2
A B C		A B C

In the example above,		In the example above,
Sample1 begins at A, ends at B, its value is 100.		Sample1 begins at A, ends at B, its value is 100.
Sample2 beings at A, ends at C, its value is 200.		Sample2 beings at A, ends at C, its value is 200.
▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	void CSProfileGenerator::populateFunctionBoundarySamples(
}		}
}		}

static FrameLocation getCallerContext(StringRef CalleeContext,		static FrameLocation getCallerContext(StringRef CalleeContext,
StringRef &CallerNameWithContext) {		StringRef &CallerNameWithContext) {
StringRef CallerContext = CalleeContext.rsplit(" @ ").first;		StringRef CallerContext = CalleeContext.rsplit(" @ ").first;
CallerNameWithContext = CallerContext.rsplit(':').first;		CallerNameWithContext = CallerContext.rsplit(':').first;
auto ContextSplit = CallerContext.rsplit(" @ ");		auto ContextSplit = CallerContext.rsplit(" @ ");
		StringRef CallerFrameStr = ContextSplit.second.size() == 0
		? ContextSplit.first
		: ContextSplit.second;
FrameLocation LeafFrameLoc = {"", {0, 0}};		FrameLocation LeafFrameLoc = {"", {0, 0}};
StringRef Funcname;		StringRef Funcname;
SampleContext::decodeContextString(ContextSplit.second, Funcname,		SampleContext::decodeContextString(CallerFrameStr, Funcname,
LeafFrameLoc.second);		LeafFrameLoc.second);
LeafFrameLoc.first = Funcname.str();		LeafFrameLoc.first = Funcname.str();
return LeafFrameLoc;		return LeafFrameLoc;
}		}

void CSProfileGenerator::populateInferredFunctionSamples() {		void CSProfileGenerator::populateInferredFunctionSamples() {
for (const auto &Item : ProfileMap) {		for (const auto &Item : ProfileMap) {
const StringRef CalleeContext = Item.first();		const StringRef CalleeContext = Item.first();
Show All 33 Lines	CallerProfile.addCalledTargetSamples(
EstimatedCallCount);		EstimatedCallCount);
CallerProfile.addBodySamples(CallerLeafFrameLoc.second.LineOffset,		CallerProfile.addBodySamples(CallerLeafFrameLoc.second.LineOffset,
CallerLeafFrameLoc.second.Discriminator,		CallerLeafFrameLoc.second.Discriminator,
EstimatedCallCount);		EstimatedCallCount);
CallerProfile.addTotalSamples(EstimatedCallCount);		CallerProfile.addTotalSamples(EstimatedCallCount);
}		}
}		}

		// Helper function to extract context prefix
		// PrefixContextId is the context id string except for the leaf probe's
		// context, the final ContextId will be:
		// ContextId = PrefixContextId + LeafContextId;
		// Remind that the string in ContextStrStack is in callee-caller order
		// So process the string vector reversely
		static std::string
		extractPrefixContextId(const SmallVector<const PseudoProbe *, 16> &Probes,
		ProfiledBinary *Binary) {
		SmallVector<std::string, 16> ContextStrStack;
		for (const auto *P : Probes) {
		Binary->getInlineContextForProbe(P, ContextStrStack, true);
		}
		std::ostringstream OContextStr;
		for (auto &CxtStr : ContextStrStack) {
		if (OContextStr.str().size())
		OContextStr << " @ ";
		OContextStr << CxtStr;
		}
		return OContextStr.str();
		}

		void PseudoProbeCSProfileGenerator::generateProfile() {
		// Enable CS and pseudo probe functionalities in SampleProf
		FunctionSamples::ProfileIsCS = true;
		FunctionSamples::ProfileIsProbeBased = true;
		for (const auto &BI : BinarySampleCounters) {
		ProfiledBinary *Binary = BI.first;
		for (const auto &CI : BI.second) {
		const ProbeBasedCtxKey *CtxKey =
		dyn_cast<ProbeBasedCtxKey>(CI.first.getPtr());
		wmiUnsubmitted Done Reply Inline Actions Move it to header comment of extractPrefixContextId in case extractPrefixContextId is used elsewhere. wmi: Move it to header comment of extractPrefixContextId in case extractPrefixContextId is used…
		wleiAuthorUnsubmitted Done Reply Inline Actions Good catch, moved. wlei: Good catch, moved.
		std::string PrefixContextId =
		extractPrefixContextId(CtxKey->Probes, Binary);
		// Fill in function body samples from probes, also infer caller's samples
		// from callee's probe
		populateBodySamplesWithProbes(CI.second.RangeCounter, PrefixContextId,
		Binary);
		// Fill in boundary samples for a call probe
		populateBoundarySamplesWithProbes(CI.second.BranchCounter,
		PrefixContextId, Binary);
		}
		}
		}

		void PseudoProbeCSProfileGenerator::extractProbesFromRange(
		const RangeSample &RangeCounter, ProbeCounterMap &ProbeCounter,
		ProfiledBinary *Binary) {
		RangeSample Ranges;
		findDisjointRanges(Ranges, RangeCounter);
		for (const auto &Range : Ranges) {
		uint64_t RangeBegin = Binary->offsetToVirtualAddr(Range.first.first);
		uint64_t RangeEnd = Binary->offsetToVirtualAddr(Range.first.second);
		uint64_t Count = Range.second;
		// Disjoint ranges have introduce zero-filled gap that
		// doesn't belong to current context, filter them out.
		if (Count == 0)
		continue;

		InstructionPointer IP(Binary, RangeBegin, true);

		// Disjoint ranges may have range in the middle of two instr,
		// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range
		// can be Addr1+1 to Addr2-1. We should ignore such range.
		if (IP.Address > RangeEnd)
		continue;

		while (IP.Address <= RangeEnd) {
		const AddressProbesMap &Address2ProbesMap =
		Binary->getAddress2ProbesMap();
		auto It = Address2ProbesMap.find(IP.Address);
		if (It != Address2ProbesMap.end()) {
		for (const auto &Probe : It->second) {
		if (!Probe.isBlock())
		continue;
		ProbeCounter[&Probe] += Count;
		}
		}

		IP.advance();
		}
		}
		}

		void PseudoProbeCSProfileGenerator::populateBodySamplesWithProbes(
		const RangeSample &RangeCounter, StringRef PrefixContextId,
		ProfiledBinary *Binary) {
		ProbeCounterMap ProbeCounter;
		// Extract the top frame probes by looking up each address among the range in
		// the Address2ProbeMap
		extractProbesFromRange(RangeCounter, ProbeCounter, Binary);
		for (auto PI : ProbeCounter) {
		const PseudoProbe *Probe = PI.first;
		uint64_t Count = PI.second;
		FunctionSamples &FunctionProfile =
		hoyUnsubmitted Done Reply Inline Actions Please add a comment about why a dangling probe gets a zero count. hoy: Please add a comment about why a dangling probe gets a zero count.
		wleiAuthorUnsubmitted Done Reply Inline Actions comments added wlei: comments added
		getFunctionProfileForLeafProbe(PrefixContextId, Probe, Binary);

		FunctionProfile.addBodySamples(Probe->Index, 0, Count);
		FunctionProfile.addTotalSamples(Count);
		if (Probe->isEntry()) {
		FunctionProfile.addHeadSamples(Count);
		// Look up for the caller's function profile
		hoyUnsubmitted Done Reply Inline Actions Nit: function hoy: Nit: function
		const auto *InlinerDesc = Binary->getInlinerDescForProbe(Probe);
		if (InlinerDesc != nullptr) {
		// Since the context id will be compressed, we have to use callee's
		// context id to infer caller's context id to ensure they share the
		// same context prefix.
		StringRef CalleeContextId =
		FunctionProfile.getContext().getNameWithContext(true);
		StringRef CallerContextId;
		FrameLocation &&CallerLeafFrameLoc =
		getCallerContext(CalleeContextId, CallerContextId);
		uint64_t CallerIndex = CallerLeafFrameLoc.second.LineOffset;
		assert(CallerIndex &&
		"Inferred caller's location index shouldn't be zero!");
		FunctionSamples &CallerProfile =
		getFunctionProfileForContext(CallerContextId);
		CallerProfile.setFunctionHash(InlinerDesc->FuncHash);
		CallerProfile.addBodySamples(CallerIndex, 0, Count);
		CallerProfile.addTotalSamples(Count);
		CallerProfile.addCalledTargetSamples(CallerIndex, 0,
		FunctionProfile.getName(), Count);
		}
		}
		}
		}

		void PseudoProbeCSProfileGenerator::populateBoundarySamplesWithProbes(
		const BranchSample &BranchCounter, StringRef PrefixContextId,
		ProfiledBinary *Binary) {
		for (auto BI : BranchCounter) {
		uint64_t SourceOffset = BI.first.first;
		uint64_t TargetOffset = BI.first.second;
		uint64_t Count = BI.second;
		uint64_t SourceAddress = Binary->offsetToVirtualAddr(SourceOffset);
		const PseudoProbe *CallProbe = Binary->getCallProbeForAddr(SourceAddress);
		if (CallProbe == nullptr)
		continue;
		FunctionSamples &FunctionProfile =
		getFunctionProfileForLeafProbe(PrefixContextId, CallProbe, Binary);
		FunctionProfile.addBodySamples(CallProbe->Index, 0, Count);
		FunctionProfile.addTotalSamples(Count);
		StringRef CalleeName = FunctionSamples::getCanonicalFnName(
		Binary->getFuncFromStartOffset(TargetOffset));
		if (CalleeName.size() == 0)
		continue;
		FunctionProfile.addCalledTargetSamples(CallProbe->Index, 0, CalleeName,
		Count);
		}
		}

		FunctionSamples &PseudoProbeCSProfileGenerator::getFunctionProfileForLeafProbe(
		StringRef PrefixContextId, SmallVector<std::string, 16> &LeafInlinedContext,
		hoyUnsubmitted Done Reply Inline Actions You mean foo:2 @ bar? hoy: You mean foo:2 @ bar?
		wleiAuthorUnsubmitted Done Reply Inline Actions typo fixed wlei: typo fixed
		const PseudoProbeFuncDesc *LeafFuncDesc) {
		assert(LeafInlinedContext.size() &&
		"Profile context must have the leaf frame");
		std::ostringstream OContextStr;
		OContextStr << PrefixContextId.str();

		for (uint32_t I = 0; I < LeafInlinedContext.size() - 1; I++) {
		if (OContextStr.str().size())
		hoyUnsubmitted Done Reply Inline Actions Move this out of the loop to save the check? hoy: Move this out of the loop to save the check?
		wleiAuthorUnsubmitted Done Reply Inline Actions Yeah, moved the last elements out of the loop. wlei: Yeah, moved the last elements out of the loop.
		OContextStr << " @ ";
		OContextStr << LeafInlinedContext[I];
		}
		// For leaf inlined context with the top frame, we should strip off the top
		// frame's probe id, like:
		// Inlined stack: [foo:1, bar:2], the ContextId will be "foo:1 @ bar"
		if (OContextStr.str().size())
		OContextStr << " @ ";
		StringRef LeafLoc = LeafInlinedContext.back();
		OContextStr << LeafLoc.split(":").first.str();

		FunctionSamples &FunctionProile =
		getFunctionProfileForContext(OContextStr.str());
		FunctionProile.setFunctionHash(LeafFuncDesc->FuncHash);
		return FunctionProile;
		}

		FunctionSamples &PseudoProbeCSProfileGenerator::getFunctionProfileForLeafProbe(
		StringRef PrefixContextId, const PseudoProbe *LeafProbe,
		ProfiledBinary *Binary) {
		SmallVector<std::string, 16> LeafInlinedContext;
		Binary->getInlineContextForProbe(LeafProbe, LeafInlinedContext);
		// Note that the context from probe doesn't include leaf frame,
		// hence we need to retrieve and append the leaf frame.
		const auto *FuncDesc = Binary->getFuncDescForGUID(LeafProbe->GUID);
		LeafInlinedContext.emplace_back(FuncDesc->FuncName + ":" +
		Twine(LeafProbe->Index).str());
		return getFunctionProfileForLeafProbe(PrefixContextId, LeafInlinedContext,
		FuncDesc);
		}

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

llvm/tools/llvm-profgen/ProfiledBinary.h

Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	public:
std::string getExpandedContextStr(const std::list<uint64_t> &stack) const;		std::string getExpandedContextStr(const std::list<uint64_t> &stack) const;

const PseudoProbe *getCallProbeForAddr(uint64_t Address) const {		const PseudoProbe *getCallProbeForAddr(uint64_t Address) const {
return ProbeDecoder.getCallProbeForAddr(Address);		return ProbeDecoder.getCallProbeForAddr(Address);
}		}
void		void
getInlineContextForProbe(const PseudoProbe *Probe,		getInlineContextForProbe(const PseudoProbe *Probe,
SmallVector<std::string, 16> &InlineContextStack,		SmallVector<std::string, 16> &InlineContextStack,
bool IncludeLeaf) const {		bool IncludeLeaf = false) const {
return ProbeDecoder.getInlineContextForProbe(Probe, InlineContextStack,		return ProbeDecoder.getInlineContextForProbe(Probe, InlineContextStack,
IncludeLeaf);		IncludeLeaf);
}		}
		const AddressProbesMap &getAddress2ProbesMap() const {
		return ProbeDecoder.getAddress2ProbesMap();
		}
		const PseudoProbeFuncDesc *getFuncDescForGUID(uint64_t GUID) {
		return ProbeDecoder.getFuncDescForGUID(GUID);
		}
		const PseudoProbeFuncDesc getInlinerDescForProbe(const PseudoProbe Probe) {
		return ProbeDecoder.getInlinerDescForProbe(Probe);
		}
};		};

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/PseudoProbe.h

Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines	PseudoProbeInlineTree *getOrAddNode(const InlineSite &Site) {
auto Ret =		auto Ret =
Children.emplace(Site, std::make_unique<PseudoProbeInlineTree>(Site));		Children.emplace(Site, std::make_unique<PseudoProbeInlineTree>(Site));
Ret.first->second->Parent = this;		Ret.first->second->Parent = this;
return Ret.first->second.get();		return Ret.first->second.get();
}		}

void addProbes(PseudoProbe *Probe) { ProbeVector.push_back(Probe); }		void addProbes(PseudoProbe *Probe) { ProbeVector.push_back(Probe); }
// Return false if it's a dummy inline site		// Return false if it's a dummy inline site
bool hasInlineSite() const { return !std::get<0>(ISite); }		bool hasInlineSite() const { return std::get<0>(ISite) != 0; }
};		};

// Function descriptor decoded from .pseudo_probe_desc section		// Function descriptor decoded from .pseudo_probe_desc section
struct PseudoProbeFuncDesc {		struct PseudoProbeFuncDesc {
uint64_t FuncGUID = 0;		uint64_t FuncGUID = 0;
uint64_t FuncHash = 0;		uint64_t FuncHash = 0;
std::string FuncName;		std::string FuncName;

▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	public:
void printGUID2FuncDescMap(raw_ostream &OS);		void printGUID2FuncDescMap(raw_ostream &OS);

// Print pseudo_probe section info, used along with show-disassembly		// Print pseudo_probe section info, used along with show-disassembly
void printProbeForAddress(raw_ostream &OS, uint64_t Address);		void printProbeForAddress(raw_ostream &OS, uint64_t Address);

// Look up the probe of a call for the input address		// Look up the probe of a call for the input address
const PseudoProbe *getCallProbeForAddr(uint64_t Address) const;		const PseudoProbe *getCallProbeForAddr(uint64_t Address) const;

		const PseudoProbeFuncDesc *getFuncDescForGUID(uint64_t GUID) const;

// Helper function to populate one probe's inline stack into		// Helper function to populate one probe's inline stack into
// \p InlineContextStack.		// \p InlineContextStack.
// Current leaf location info will be added if IncludeLeaf is true		// Current leaf location info will be added if IncludeLeaf is true
// Example:		// Example:
// Current probe(bar:3) inlined at foo:2 then inlined at main:1		// Current probe(bar:3) inlined at foo:2 then inlined at main:1
// IncludeLeaf = true, Output: [main:1, foo:2, bar:3]		// IncludeLeaf = true, Output: [main:1, foo:2, bar:3]
// IncludeLeaf = false, OUtput: [main:1, foo:2]		// IncludeLeaf = false, Output: [main:1, foo:2]
void		void
getInlineContextForProbe(const PseudoProbe *Probe,		getInlineContextForProbe(const PseudoProbe *Probe,
SmallVector<std::string, 16> &InlineContextStack,		SmallVector<std::string, 16> &InlineContextStack,
bool IncludeLeaf) const;		bool IncludeLeaf) const;

		const AddressProbesMap &getAddress2ProbesMap() const {
		return Address2ProbesMap;
		}

		const PseudoProbeFuncDesc *
		getInlinerDescForProbe(const PseudoProbe *Probe) const;
};		};

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/tools/llvm-profgen/PseudoProbe.cpp

Show All 35 Lines

void PseudoProbe::getInlineContext(SmallVector<std::string, 16> &ContextStack,		void PseudoProbe::getInlineContext(SmallVector<std::string, 16> &ContextStack,
const GUIDProbeFunctionMap &GUID2FuncMAP,		const GUIDProbeFunctionMap &GUID2FuncMAP,
bool ShowName) const {		bool ShowName) const {
uint32_t Begin = ContextStack.size();		uint32_t Begin = ContextStack.size();
PseudoProbeInlineTree *Cur = InlineTree;		PseudoProbeInlineTree *Cur = InlineTree;
// It will add the string of each node's inline site during iteration.		// It will add the string of each node's inline site during iteration.
// Note that it won't include the probe's belonging function(leaf location)		// Note that it won't include the probe's belonging function(leaf location)
while (!Cur->hasInlineSite()) {		while (Cur->hasInlineSite()) {
std::string ContextStr;		std::string ContextStr;
if (ShowName) {		if (ShowName) {
StringRef FuncName =		StringRef FuncName =
getProbeFNameForGUID(GUID2FuncMAP, std::get<0>(Cur->ISite));		getProbeFNameForGUID(GUID2FuncMAP, std::get<0>(Cur->ISite));
ContextStr += FuncName.str();		ContextStr += FuncName.str();
} else {		} else {
ContextStr += Twine(std::get<0>(Cur->ISite)).str();		ContextStr += Twine(std::get<0>(Cur->ISite)).str();
}		}
▲ Show 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	if (Probe.isCall()) {
"There should be only one call probe corresponding to address "		"There should be only one call probe corresponding to address "
"which is a callsite.");		"which is a callsite.");
CallProbe = &Probe;		CallProbe = &Probe;
}		}
}		}
return CallProbe;		return CallProbe;
}		}

		const PseudoProbeFuncDesc *
		PseudoProbeDecoder::getFuncDescForGUID(uint64_t GUID) const {
		auto It = GUID2FuncDescMap.find(GUID);
		assert(It != GUID2FuncDescMap.end() && "Function descriptor doesn't exist");
		return &It->second;
		}

void PseudoProbeDecoder::getInlineContextForProbe(		void PseudoProbeDecoder::getInlineContextForProbe(
const PseudoProbe *Probe, SmallVector<std::string, 16> &InlineContextStack,		const PseudoProbe *Probe, SmallVector<std::string, 16> &InlineContextStack,
bool IncludeLeaf) const {		bool IncludeLeaf) const {
if (IncludeLeaf) {		Probe->getInlineContext(InlineContextStack, GUID2FuncDescMap, true);
		if (!IncludeLeaf)
		return;
// Note that the context from probe doesn't include leaf frame,		// Note that the context from probe doesn't include leaf frame,
// hence we need to retrieve and prepend leaf if requested.		// hence we need to retrieve and prepend leaf if requested.
auto It = GUID2FuncDescMap.find(Probe->GUID);		const auto *FuncDesc = getFuncDescForGUID(Probe->GUID);
assert(It != GUID2FuncDescMap.end() &&		InlineContextStack.emplace_back(FuncDesc->FuncName + ":" +
"Should have function descriptor for a valid GUID");
StringRef FuncName = It->second.FuncName;
// InlineContextStack is in callee-caller order, so push leaf in the front
InlineContextStack.emplace_back(FuncName.str() + ":" +
Twine(Probe->Index).str());		Twine(Probe->Index).str());
}		}

Probe->getInlineContext(InlineContextStack, GUID2FuncDescMap, true);		const PseudoProbeFuncDesc *
		PseudoProbeDecoder::getInlinerDescForProbe(const PseudoProbe *Probe) const {
		PseudoProbeInlineTree *InlinerNode = Probe->InlineTree;
		if (!InlinerNode->hasInlineSite())
		return nullptr;
		return getFuncDescForGUID(std::get<0>(InlinerNode->ISite));
}		}

} // end namespace sampleprof		} // end namespace sampleprof
} // end namespace llvm		} // end namespace llvm

This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Pseudo probe based CS profile generationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 321259

llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

llvm/tools/llvm-profgen/PerfReader.cpp

llvm/tools/llvm-profgen/ProfileGenerator.h

llvm/tools/llvm-profgen/ProfileGenerator.cpp

llvm/tools/llvm-profgen/ProfiledBinary.h

llvm/tools/llvm-profgen/PseudoProbe.h

llvm/tools/llvm-profgen/PseudoProbe.cpp

[CSSPGO][llvm-profgen] Pseudo probe based CS profile generation
ClosedPublic