This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
1/2
inline-pseudoprobe.test
-
noinline-pseudoprobe.test
-
tools/llvm-profgen/
-
llvm-profgen/
-
ProfileGenerator.h
7/14
ProfileGenerator.cpp

Differential D120335

[llvm-profgen] Generating probe-based non-CS profile.
ClosedPublic

Authored by hoy on Feb 22 2022, 8:54 AM.

Download Raw Diff

Details

Reviewers

wenlei
wlei

Commits

rG23391febd877: [llvm-profgen] Generating probe-based non-CS profile.

Summary

I'm bring up the support of pseudo-probe-based non-CS profile generation. The approach is quite similar to generating dwarf-based non-CS profile. The main difference is for a given linear instruction range, instead of each disassembled instruction, pseudo probes that are covered by the range are processed. The pseudo probe extraction code is shared with CS probe profile generation.

I'm seeing 0.7% performance win for one of our internal large benchmark compared to using non-CS dwarf-based profile, and 0.5% win for another large benchmark when combined with profi.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,130 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,090 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,090 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vloxseg.c
	60,080 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vlsegff.c
	60,100 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics::vluxseg.c
		View Full Test Results (9 Failed)

Event Timeline

hoy created this revision.Feb 22 2022, 8:54 AM

Herald added subscribers: modimo, wenlei. · View Herald TranscriptFeb 22 2022, 8:54 AM

hoy requested review of this revision.Feb 22 2022, 8:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 8:54 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hoy added reviewers: wenlei, wlei.Feb 22 2022, 8:55 AM

Harbormaster completed remote builds in B150877: Diff 410554.Feb 22 2022, 9:47 AM

Thanks for adding support for probe-based non-CS profile.

0.5% win for another large benchmark when combined with profi.

is this because with probe, it now have a good support for the unknown(dangling) block, so PROFI can work well for this?

llvm/test/tools/llvm-profgen/inline-pseudoprobe.test
5	Out of curiosity, for the text profile, how do we know the profile is a CS nested profile or a non-CS nested profile?
llvm/tools/llvm-profgen/ProfileGenerator.cpp
444	So here we don't use the way like CS-profile to generate the zero-count(in ProfileGenerator.cpp: 981), instead we reuse the `preprocessRangeCounter` to initialize all function range with zero, the probe inside the function will naturally be added with zero count. I guess this is the same to the way of CS-profile, right?

In D120335#3339225, @wlei wrote:

Thanks for adding support for probe-based non-CS profile.

0.5% win for another large benchmark when combined with profi.

is this because with probe, it now have a good support for the unknown(dangling) block, so PROFI can work well for this?

I think so. Profi should help probes more than line-based profile.

llvm/test/tools/llvm-profgen/inline-pseudoprobe.test
5	CS nested profile comes with a "shouldInline" attribute for each nested profile, see test/tools/llvm-profdata/cs-sample-nested-profile.test.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
444	Yes, we are using the non-CS way of reporting zero counted probes. There is a difference between CS and non-CS in that for CS, the non-executed probes are reported for its owner frame only, while for non-CS, such probes are reported for the whole inline nest.

wlei added inline comments.Feb 28 2022, 4:31 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
942	`Probe.isBlock()` check is hoisted from `extractProbesFromRange` to this for CS profile but Why doesn't add this check back for non-CS profile?

hoy added inline comments.Feb 28 2022, 4:54 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
942	So we don't count callsite probes here since they are handled in `populateBoundarySamplesWithProbes`. Otherwise they will be double counted. We don't do this for non-CS probe profile. This is to be consistent with non-CS dwarf implementation where we report zero count in `populateBoundarySamplesForAllFunctions`. I guess I can also do some refactoring to unify the two implementations.

hoy edited the summary of this revision. (Show Details)Feb 28 2022, 4:55 PM

wlei added inline comments.Feb 28 2022, 5:07 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
942	Yeah, I understand for `count==0`. Here I'm asking for whether we should add `Probe->isBlock()` for non-CS probe profile. My understanding is for body sample, we only accumulate total sample for Block profile? otherwise, the count from callsite probes will affect the total samples.

hoy added inline comments.Feb 28 2022, 6:23 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
942	For non-CS, I'm using `count==0` for probe to be consistent with dwarf, i.e, both block probes and call probes are counted by `populateBodySamples`. For CS, only block probes are counted by `populateBodySamples`, call probes are counted by `populateBoundarySamples`. That's the inconsistency we should probably fix.

wenlei added inline comments.Feb 28 2022, 11:19 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
444	Both preprocessRangeCounter and extractProbesFromRange calls findDisjointRanges, which is duplicated. for CS, the non-executed probes are reported for its owner frame only, while for non-CS, such probes are reported for the whole inline nest. what does the term "owner frame" refer to? can you elaborate the above?
495	Perhaps we don't need a parameter for this, just use `Binary->usePseudoProbes()` instead.
871–872	nit: move this closer to the functions definitions for ProfileGeneratorBase.

hoy added inline comments.Mar 1 2022, 9:20 AM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
444	By owner frame I mean the inlinee frame that originally directly contains the probe. Eg., given function A, A inlines B and one original probe of B is sampled, for CS, all of other original probes of B will be reported. None of the original probes of A will be reported if none of A's probe is sampled. But for non-CS, all of A's and B's probes will be reported even if only one B's probe is sampled. A real example is in the attached inline-pseudoprobe.test where we have ; CHECK: main:88:0 ; CHECK-NEXT: 1: 0 ; CHECK-NEXT: 2: foo:88 ; CHECK-NEXT: 1: 0 ; CHECK-NEXT: 2: 15 the corresponding CS profile is in inline-cs-pseudoprobe.test where there is no profile generated for the main function.
495	Sounds good.
871–872	done.

Addressing comments.

wenlei added inline comments.Mar 1 2022, 10:38 AM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
444	Ok, thanks for clarification - that makes sense. Suggestion: use canonical terms like "inlinee frame" or "leaf inlinee frame" instead of nebulous terms like "owner frame" to avoid confusion. Both preprocessRangeCounter and extractProbesFromRange calls findDisjointRanges, which is duplicated. Something we can do to avoid redundant findDisjointRanges?

Harbormaster completed remote builds in B151981: Diff 412137.Mar 1 2022, 10:52 AM

hoy added inline comments.Mar 1 2022, 12:18 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
444	Something we can do to avoid redundant findDisjointRanges? Good point. Sorry for missing this earliser. I made it conditional.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 12:18 PM

Updating D120335: [llvm-profgen] Generating probe-based non-CS profile.

Harbormaster completed remote builds in B152024: Diff 412201.Mar 1 2022, 1:38 PM

What is the size overhead on binary with probe? What is the impact on source drifting tolerance level?

In D120335#3352701, @davidxl wrote:

What is the size overhead on binary with probe? What is the impact on source drifting tolerance level?

Pseudo probes have negligible impact on code size. When it comes to binary size, the two encoded probe sections can cause the binary 10% - 15% bigger as of now.

As for source drifting, probe-based profile should be resilient to source changes that do not affect CFG structure and the number of callsites.

lgtm, thanks.

This revision is now accepted and ready to land.Mar 1 2022, 6:27 PM

This revision was landed with ongoing or failed builds.Mar 1 2022, 6:49 PM

Closed by commit rG23391febd877: [llvm-profgen] Generating probe-based non-CS profile. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG23391febd877: [llvm-profgen] Generating probe-based non-CS profile..

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-profgen/

inline-pseudoprobe.test

46 lines

noinline-pseudoprobe.test

48 lines

tools/

llvm-profgen/

ProfileGenerator.h

21 lines

ProfileGenerator.cpp

142 lines

Diff 412137

llvm/test/tools/llvm-profgen/inline-pseudoprobe.test

This file was added.

				; RUN: llvm-profgen --format=text --ignore-stack-samples --perfscript=%S/Inputs/inline-cs-pseudoprobe.perfscript --binary=%S/Inputs/inline-cs-pseudoprobe.perfbin --output=%t --profile-summary-cold-count=0
				; RUN: FileCheck %s --input-file %t

				; CHECK: main:88:0
				; CHECK-NEXT: 1: 0
				wleiUnsubmitted Not Done Reply Inline Actions Out of curiosity, for the text profile, how do we know the profile is a CS nested profile or a non-CS nested profile? wlei: Out of curiosity, for the text profile, how do we know the profile is a CS nested profile or a…
				hoyAuthorUnsubmitted Done Reply Inline Actions CS nested profile comes with a "shouldInline" attribute for each nested profile, see test/tools/llvm-profdata/cs-sample-nested-profile.test. hoy: CS nested profile comes with a "shouldInline" attribute for each nested profile, see…
				; CHECK-NEXT: 2: foo:88
				; CHECK-NEXT: 1: 0
				; CHECK-NEXT: 2: 15
				; CHECK-NEXT: 3: 15
				; CHECK-NEXT: 4: 14
				; CHECK-NEXT: 5: 1
				; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 7: 0
				; CHECK-NEXT: 9: 0
				; CHECK-NEXT: 8: bar:28
				; CHECK-NEXT: 1: 14
				; CHECK-NEXT: 4: 14
				; CHECK-NEXT: !CFGChecksum: 72617220756
				; CHECK-NEXT: !CFGChecksum: 563088904013236
				; CHECK-NEXT: !CFGChecksum: 281479271677951


				; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling
				; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls
				; -g test.c -o a.out

				#include <stdio.h>

				int bar(int x, int y) {
				if (x % 3) {
				return x - y;
				}
				return x + y;
				}

				void foo() {
				int s, i = 0;
				while (i++ < 4000 * 4000)
				if (i % 91) s = bar(i, s); else s += 30;
				printf("sum is %d\n", s);
				}

				int main() {
				foo();
				return 0;
				}

llvm/test/tools/llvm-profgen/noinline-pseudoprobe.test

This file was added.

				; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-pseudoprobe.perfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t1 --ignore-stack-samples
				; RUN: FileCheck %s --input-file %t1
				; RUN: llvm-profgen --format=text --perfscript=%S/Inputs/noinline-cs-pseudoprobe.aggperfscript --binary=%S/Inputs/noinline-cs-pseudoprobe.perfbin --output=%t2 --ignore-stack-samples
				; RUN: FileCheck %s --input-file %t2


				; CHECK: foo:75:0
				; CHECK-NEXT: 1: 0
				; CHECK-NEXT: 2: 15
				; CHECK-NEXT: 3: 15
				; CHECK-NEXT: 4: 15
				; CHECK-NEXT: 5: 0
				; CHECK-NEXT: 6: 15
				; CHECK-NEXT: 7: 0
				; CHECK-NEXT: 8: 15 bar:15
				; CHECK-NEXT: 9: 0
				; CHECK-NEXT: !CFGChecksum: 563088904013236
				; CHECK-NEXT: bar:30:15
				; CHECK-NEXT: 1: 15
				; CHECK-NEXT: 4: 15
				; CHECK-NEXT: !CFGChecksum: 72617220756



				; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling
				; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls
				; -fno-inline-functions -g test.c -o a.out

				#include <stdio.h>

				int bar(int x, int y) {
				if (x % 3) {
				return x - y;
				}
				return x + y;
				}

				void foo() {
				int s, i = 0;
				while (i++ < 4000 * 4000)
				if (i % 91) s = bar(i, s); else s += 30;
				printf("sum is %d\n", s);
				}

				int main() {
				foo();
				return 0;
				}

llvm/tools/llvm-profgen/ProfileGenerator.h

Show All 16 Lines
#include <unordered_set>		#include <unordered_set>

using namespace llvm;		using namespace llvm;
using namespace sampleprof;		using namespace sampleprof;

namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {

		using ProbeCounterMap =
		std::unordered_map<const MCDecodedPseudoProbe *, uint64_t>;

// This base class for profile generation of sample-based PGO. We reuse all		// This base class for profile generation of sample-based PGO. We reuse all
// structures relating to function profiles and profile writers as seen in		// structures relating to function profiles and profile writers as seen in
// /ProfileData/SampleProf.h.		// /ProfileData/SampleProf.h.
class ProfileGeneratorBase {		class ProfileGeneratorBase {

public:		public:
ProfileGeneratorBase(ProfiledBinary *Binary,		ProfileGeneratorBase(ProfiledBinary *Binary,
const ContextSampleCounterMap &Counters)		const ContextSampleCounterMap &Counters)
Show All 39 Lines	protected:
\|<--100-->\|		\|<--100-->\|
\|<------200------>\|		\|<------200------>\|
A B C		A B C

sample count for disjoint region [A,B] would be 300.		sample count for disjoint region [A,B] would be 300.
*/		*/
void findDisjointRanges(RangeSample &DisjointRanges,		void findDisjointRanges(RangeSample &DisjointRanges,
const RangeSample &Ranges);		const RangeSample &Ranges);

		// Go through each address from range to extract the top frame probe by
		// looking up in the Address2ProbeMap
		void extractProbesFromRange(const RangeSample &RangeCounter,
		ProbeCounterMap &ProbeCounter);

// Helper function for updating body sample for a leaf location in		// Helper function for updating body sample for a leaf location in
// FunctionProfile		// FunctionProfile
void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,		void updateBodySamplesforFunctionProfile(FunctionSamples &FunctionProfile,
const SampleContextFrame &LeafLoc,		const SampleContextFrame &LeafLoc,
uint64_t Count);		uint64_t Count);
void updateTotalSamples();		void updateTotalSamples();

StringRef getCalleeNameForOffset(uint64_t TargetOffset);		StringRef getCalleeNameForOffset(uint64_t TargetOffset);
Show All 25 Lines
public:		public:
ProfileGenerator(ProfiledBinary *Binary,		ProfileGenerator(ProfiledBinary *Binary,
const ContextSampleCounterMap &Counters)		const ContextSampleCounterMap &Counters)
: ProfileGeneratorBase(Binary, Counters){};		: ProfileGeneratorBase(Binary, Counters){};
void generateProfile() override;		void generateProfile() override;

private:		private:
void generateLineNumBasedProfile();		void generateLineNumBasedProfile();
		void generateProbeBasedProfile();
RangeSample preprocessRangeCounter(const RangeSample &RangeCounter);		RangeSample preprocessRangeCounter(const RangeSample &RangeCounter);
FunctionSamples &getTopLevelFunctionProfile(StringRef FuncName);		FunctionSamples &getTopLevelFunctionProfile(StringRef FuncName);
// Helper function to get the leaf frame's FunctionProfile by traversing the		// Helper function to get the leaf frame's FunctionProfile by traversing the
// inline stack and meanwhile it adds the total samples for each frame's		// inline stack and meanwhile it adds the total samples for each frame's
// function profile.		// function profile.
FunctionSamples &		FunctionSamples &
getLeafProfileAndAddTotalSamples(const SampleContextFrameVector &FrameVec,		getLeafProfileAndAddTotalSamples(const SampleContextFrameVector &FrameVec,
uint64_t Count);		uint64_t Count);
void populateBodySamplesForAllFunctions(const RangeSample &RangeCounter);		void populateBodySamplesForAllFunctions(const RangeSample &RangeCounter);
void		void
populateBoundarySamplesForAllFunctions(const BranchSample &BranchCounters);		populateBoundarySamplesForAllFunctions(const BranchSample &BranchCounters);
		void populateBodySamplesWithProbesForAllFunctions(const RangeSample &RangeCounter);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - void populateBodySamplesWithProbesForAllFunctions(const RangeSample &RangeCounter); Lint: Pre-merge checks: clang-format: please reformat the code ``` - void populateBodySamplesWithProbesForAllFunctions…
		void
		populateBoundarySamplesWithProbesForAllFunctions(const BranchSample &BranchCounters);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - populateBoundarySamplesWithProbesForAllFunctions(const BranchSample &BranchCounters); + populateBodySamplesWithProbesForAllFunctions(const RangeSample &RangeCounter); + void populateBoundarySamplesWithProbesForAllFunctions( + const BranchSample &BranchCounters); Lint: Pre-merge checks: clang-format: please reformat the code ``` - populateBoundarySamplesWithProbesForAllFunctions…
void postProcessProfiles();		void postProcessProfiles();
void trimColdProfiles(const SampleProfileMap &Profiles,		void trimColdProfiles(const SampleProfileMap &Profiles,
uint64_t ColdCntThreshold);		uint64_t ColdCntThreshold);
};		};

using ProbeCounterMap =
std::unordered_map<const MCDecodedPseudoProbe *, uint64_t>;

class CSProfileGenerator : public ProfileGeneratorBase {		class CSProfileGenerator : public ProfileGeneratorBase {
public:		public:
CSProfileGenerator(ProfiledBinary *Binary,		CSProfileGenerator(ProfiledBinary *Binary,
const ContextSampleCounterMap &Counters)		const ContextSampleCounterMap &Counters)
: ProfileGeneratorBase(Binary, Counters){};		: ProfileGeneratorBase(Binary, Counters){};

void generateProfile() override;		void generateProfile() override;

▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	private:
void populateBodySamplesForFunction(FunctionSamples &FunctionProfile,		void populateBodySamplesForFunction(FunctionSamples &FunctionProfile,
const RangeSample &RangeCounters);		const RangeSample &RangeCounters);
void populateBoundarySamplesForFunction(SampleContextFrames ContextId,		void populateBoundarySamplesForFunction(SampleContextFrames ContextId,
FunctionSamples &FunctionProfile,		FunctionSamples &FunctionProfile,
const BranchSample &BranchCounters);		const BranchSample &BranchCounters);
void populateInferredFunctionSamples();		void populateInferredFunctionSamples();

void generateProbeBasedProfile();		void generateProbeBasedProfile();
// Go through each address from range to extract the top frame probe by
// looking up in the Address2ProbeMap
void extractProbesFromRange(const RangeSample &RangeCounter,
ProbeCounterMap &ProbeCounter);
// Fill in function body samples from probes		// Fill in function body samples from probes
void populateBodySamplesWithProbes(const RangeSample &RangeCounter,		void populateBodySamplesWithProbes(const RangeSample &RangeCounter,
SampleContextFrames ContextStack);		SampleContextFrames ContextStack);
// Fill in boundary samples for a call probe		// Fill in boundary samples for a call probe
void populateBoundarySamplesWithProbes(const BranchSample &BranchCounter,		void populateBoundarySamplesWithProbes(const BranchSample &BranchCounter,
SampleContextFrames ContextStack);		SampleContextFrames ContextStack);
// Helper function to get FunctionSamples for the leaf probe		// Helper function to get FunctionSamples for the leaf probe
FunctionSamples &		FunctionSamples &
Show All 17 Lines

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Show First 20 Lines • Show All 376 Lines • ▼ Show 20 Lines	if (Ret.second) {
FunctionSamples &FProfile = Ret.first->second;		FunctionSamples &FProfile = Ret.first->second;
FProfile.setContext(Context);		FProfile.setContext(Context);
}		}
return Ret.first->second;		return Ret.first->second;
}		}

void ProfileGenerator::generateProfile() {		void ProfileGenerator::generateProfile() {
if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
// TODO: Support probe based profile generation		generateProbeBasedProfile();
exitWithError("Probe based profile generation not supported for AutoFDO, "
"consider dropping `--ignore-stack-samples` or adding `--use-dwarf-correlation`.");
} else {		} else {
generateLineNumBasedProfile();		generateLineNumBasedProfile();
}		}
postProcessProfiles();		postProcessProfiles();
}		}

void ProfileGenerator::postProcessProfiles() {		void ProfileGenerator::postProcessProfiles() {
computeSummaryAndThreshold();		computeSummaryAndThreshold();
Show All 25 Lines	void ProfileGenerator::generateLineNumBasedProfile() {
// Fill in function body samples		// Fill in function body samples
populateBodySamplesForAllFunctions(SC.RangeCounter);		populateBodySamplesForAllFunctions(SC.RangeCounter);
// Fill in boundary sample counts as well as call site samples for calls		// Fill in boundary sample counts as well as call site samples for calls
populateBoundarySamplesForAllFunctions(SC.BranchCounter);		populateBoundarySamplesForAllFunctions(SC.BranchCounter);

updateTotalSamples();		updateTotalSamples();
}		}

		void ProfileGenerator::generateProbeBasedProfile() {
		assert(SampleCounters.size() == 1 &&
		"Must have one entry for profile generation.");
		// Enable pseudo probe functionalities in SampleProf
		FunctionSamples::ProfileIsProbeBased = true;
		const SampleCounter &SC = SampleCounters.begin()->second;
		// Fill in function body samples
		populateBodySamplesWithProbesForAllFunctions(SC.RangeCounter);
		// Fill in boundary sample counts as well as call site samples for calls
		populateBoundarySamplesWithProbesForAllFunctions(SC.BranchCounter);

		updateTotalSamples();
		}

		void ProfileGenerator::populateBodySamplesWithProbesForAllFunctions(
		const RangeSample &RangeCounter) {
		ProbeCounterMap ProbeCounter;
		extractProbesFromRange(preprocessRangeCounter(RangeCounter), ProbeCounter);
		wleiUnsubmitted Not Done Reply Inline Actions So here we don't use the way like CS-profile to generate the zero-count(in ProfileGenerator.cpp: 981), instead we reuse the `preprocessRangeCounter` to initialize all function range with zero, the probe inside the function will naturally be added with zero count. I guess this is the same to the way of CS-profile, right? wlei: So here we don't use the way like CS-profile to generate the zero-count(in ProfileGenerator.cpp…
		wenleiUnsubmitted Not Done Reply Inline Actions Both preprocessRangeCounter and extractProbesFromRange calls findDisjointRanges, which is duplicated. for CS, the non-executed probes are reported for its owner frame only, while for non-CS, such probes are reported for the whole inline nest. what does the term "owner frame" refer to? can you elaborate the above? wenlei: Both preprocessRangeCounter and extractProbesFromRange calls findDisjointRanges, which is…
		hoyAuthorUnsubmitted Done Reply Inline Actions Yes, we are using the non-CS way of reporting zero counted probes. There is a difference between CS and non-CS in that for CS, the non-executed probes are reported for its owner frame only, while for non-CS, such probes are reported for the whole inline nest. hoy: Yes, we are using the non-CS way of reporting zero counted probes. There is a difference…
		hoyAuthorUnsubmitted Done Reply Inline Actions By owner frame I mean the inlinee frame that originally directly contains the probe. Eg., given function A, A inlines B and one original probe of B is sampled, for CS, all of other original probes of B will be reported. None of the original probes of A will be reported if none of A's probe is sampled. But for non-CS, all of A's and B's probes will be reported even if only one B's probe is sampled. A real example is in the attached inline-pseudoprobe.test where we have ; CHECK: main:88:0 ; CHECK-NEXT: 1: 0 ; CHECK-NEXT: 2: foo:88 ; CHECK-NEXT: 1: 0 ; CHECK-NEXT: 2: 15 the corresponding CS profile is in inline-cs-pseudoprobe.test where there is no profile generated for the main function. hoy: By owner frame I mean the inlinee frame that originally directly contains the probe. Eg., given…
		wenleiUnsubmitted Not Done Reply Inline Actions Ok, thanks for clarification - that makes sense. Suggestion: use canonical terms like "inlinee frame" or "leaf inlinee frame" instead of nebulous terms like "owner frame" to avoid confusion. Both preprocessRangeCounter and extractProbesFromRange calls findDisjointRanges, which is duplicated. Something we can do to avoid redundant findDisjointRanges? wenlei: Ok, thanks for clarification - that makes sense. Suggestion: use canonical terms like "inlinee…
		hoyAuthorUnsubmitted Done Reply Inline Actions Something we can do to avoid redundant findDisjointRanges? Good point. Sorry for missing this earliser. I made it conditional. hoy: > Something we can do to avoid redundant findDisjointRanges? Good point. Sorry for missing…

		for (const auto &PI : ProbeCounter) {
		const MCDecodedPseudoProbe *Probe = PI.first;
		uint64_t Count = PI.second;
		SampleContextFrameVector FrameVec;
		Binary->getInlineContextForProbe(Probe, FrameVec, true);
		FunctionSamples &FunctionProfile = getLeafProfileAndAddTotalSamples(FrameVec, Count);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - FunctionSamples &FunctionProfile = getLeafProfileAndAddTotalSamples(FrameVec, Count); + FunctionSamples &FunctionProfile = + getLeafProfileAndAddTotalSamples(FrameVec, Count); Lint: Pre-merge checks: clang-format: please reformat the code ``` - FunctionSamples &FunctionProfile =…
		FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);
		if (Probe->isEntry())
		FunctionProfile.addHeadSamples(Count);
		}
		}

		void ProfileGenerator::populateBoundarySamplesWithProbesForAllFunctions(
		const BranchSample &BranchCounters) {
		for (const auto &Entry : BranchCounters) {
		uint64_t SourceOffset = Entry.first.first;
		uint64_t TargetOffset = Entry.first.second;
		uint64_t Count = Entry.second;
		assert(Count != 0 && "Unexpected zero weight branch");

		StringRef CalleeName = getCalleeNameForOffset(TargetOffset);
		if (CalleeName.size() == 0)
		continue;

		uint64_t SourceAddress = Binary->offsetToVirtualAddr(SourceOffset);
		const MCDecodedPseudoProbe *CallProbe =
		Binary->getCallProbeForAddr(SourceAddress);
		if (CallProbe == nullptr)
		continue;

		// Record called target sample and its count.
		SampleContextFrameVector FrameVec;
		Binary->getInlineContextForProbe(CallProbe, FrameVec, true);

		if (!FrameVec.empty()) {
		FunctionSamples &FunctionProfile =
		getLeafProfileAndAddTotalSamples(FrameVec, 0);
		FunctionProfile.addCalledTargetSamples(
		FrameVec.back().Location.LineOffset, 0, CalleeName, Count);
		}
		}
		}

FunctionSamples &ProfileGenerator::getLeafProfileAndAddTotalSamples(		FunctionSamples &ProfileGenerator::getLeafProfileAndAddTotalSamples(
const SampleContextFrameVector &FrameVec, uint64_t Count) {		const SampleContextFrameVector &FrameVec, uint64_t Count) {
// Get top level profile		// Get top level profile
FunctionSamples *FunctionProfile =		FunctionSamples *FunctionProfile =
&getTopLevelFunctionProfile(FrameVec[0].FuncName);		&getTopLevelFunctionProfile(FrameVec[0].FuncName);
FunctionProfile->addTotalSamples(Count);		FunctionProfile->addTotalSamples(Count);
		if (Binary->usePseudoProbes()) {
		wenleiUnsubmitted Not Done Reply Inline Actions Perhaps we don't need a parameter for this, just use `Binary->usePseudoProbes()` instead. wenlei: Perhaps we don't need a parameter for this, just use `Binary->usePseudoProbes()` instead.
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. hoy: Sounds good.
		const auto *FuncDesc = Binary->getFuncDescForGUID(Function::getGUID(FunctionProfile->getName()));
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - const auto FuncDesc = Binary->getFuncDescForGUID(Function::getGUID(FunctionProfile->getName())); + const auto FuncDesc = Binary->getFuncDescForGUID( + Function::getGUID(FunctionProfile->getName())); Lint: Pre-merge checks: clang-format: please reformat the code ``` - const auto *FuncDesc = Binary…
		FunctionProfile->setFunctionHash(FuncDesc->FuncHash);
		}

for (size_t I = 1; I < FrameVec.size(); I++) {		for (size_t I = 1; I < FrameVec.size(); I++) {
LineLocation Callsite(		LineLocation Callsite(
FrameVec[I - 1].Location.LineOffset,		FrameVec[I - 1].Location.LineOffset,
getBaseDiscriminator(FrameVec[I - 1].Location.Discriminator));		getBaseDiscriminator(FrameVec[I - 1].Location.Discriminator));
FunctionSamplesMap &SamplesMap =		FunctionSamplesMap &SamplesMap =
FunctionProfile->functionSamplesAt(Callsite);		FunctionProfile->functionSamplesAt(Callsite);
auto Ret =		auto Ret =
SamplesMap.emplace(FrameVec[I].FuncName.str(), FunctionSamples());		SamplesMap.emplace(FrameVec[I].FuncName.str(), FunctionSamples());
if (Ret.second) {		if (Ret.second) {
SampleContext Context(FrameVec[I].FuncName);		SampleContext Context(FrameVec[I].FuncName);
Ret.first->second.setContext(Context);		Ret.first->second.setContext(Context);
}		}
FunctionProfile = &Ret.first->second;		FunctionProfile = &Ret.first->second;
FunctionProfile->addTotalSamples(Count);		FunctionProfile->addTotalSamples(Count);
		if (Binary->usePseudoProbes()) {
		const auto *FuncDesc = Binary->getFuncDescForGUID(Function::getGUID(FunctionProfile->getName()));
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - const auto FuncDesc = Binary->getFuncDescForGUID(Function::getGUID(FunctionProfile->getName())); + const auto FuncDesc = Binary->getFuncDescForGUID( + Function::getGUID(FunctionProfile->getName())); Lint: Pre-merge checks: clang-format: please reformat the code ``` - const auto *FuncDesc = Binary…
		FunctionProfile->setFunctionHash(FuncDesc->FuncHash);
		}
}		}

return *FunctionProfile;		return *FunctionProfile;
}		}

RangeSample		RangeSample
ProfileGenerator::preprocessRangeCounter(const RangeSample &RangeCounter) {		ProfileGenerator::preprocessRangeCounter(const RangeSample &RangeCounter) {
RangeSample Ranges(RangeCounter.begin(), RangeCounter.end());		RangeSample Ranges(RangeCounter.begin(), RangeCounter.end());
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines

void CSProfileGenerator::generateProfile() {		void CSProfileGenerator::generateProfile() {
FunctionSamples::ProfileIsCSFlat = true;		FunctionSamples::ProfileIsCSFlat = true;

if (Binary->getTrackFuncContextSize())		if (Binary->getTrackFuncContextSize())
computeSizeForProfiledFunctions();		computeSizeForProfiledFunctions();

if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
// Enable pseudo probe functionalities in SampleProf
FunctionSamples::ProfileIsProbeBased = true;
generateProbeBasedProfile();		generateProbeBasedProfile();
} else {		} else {
generateLineNumBasedProfile();		generateLineNumBasedProfile();
}		}
postProcessProfiles();		postProcessProfiles();
}		}

void CSProfileGenerator::computeSizeForProfiledFunctions() {		void CSProfileGenerator::computeSizeForProfiledFunctions() {
▲ Show 20 Lines • Show All 205 Lines • ▼ Show 20 Lines
void ProfileGeneratorBase::computeSummaryAndThreshold() {		void ProfileGeneratorBase::computeSummaryAndThreshold() {
SampleProfileSummaryBuilder Builder(ProfileSummaryBuilder::DefaultCutoffs);		SampleProfileSummaryBuilder Builder(ProfileSummaryBuilder::DefaultCutoffs);
auto Summary = Builder.computeSummaryForProfiles(ProfileMap);		auto Summary = Builder.computeSummaryForProfiles(ProfileMap);
HotCountThreshold = ProfileSummaryBuilder::getHotCountThreshold(		HotCountThreshold = ProfileSummaryBuilder::getHotCountThreshold(
(Summary->getDetailedSummary()));		(Summary->getDetailedSummary()));
ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(		ColdCountThreshold = ProfileSummaryBuilder::getColdCountThreshold(
(Summary->getDetailedSummary()));		(Summary->getDetailedSummary()));
}		}

// Helper function to extract context prefix string stack		void ProfileGeneratorBase::extractProbesFromRange(
		wenleiUnsubmitted Not Done Reply Inline Actions nit: move this closer to the functions definitions for ProfileGeneratorBase. wenlei: nit: move this closer to the functions definitions for ProfileGeneratorBase.
		hoyAuthorUnsubmitted Done Reply Inline Actions done. hoy: done.
// Extract context stack for reusing, leaf context stack will		const RangeSample &RangeCounter, ProbeCounterMap &ProbeCounter) {
// be added compressed while looking up function profile
static void extractPrefixContextStack(
SampleContextFrameVector &ContextStack,
const SmallVectorImpl<const MCDecodedPseudoProbe *> &Probes,
ProfiledBinary *Binary) {
for (const auto *P : Probes) {
Binary->getInlineContextForProbe(P, ContextStack, true);
}
}

void CSProfileGenerator::generateProbeBasedProfile() {
for (const auto &CI : SampleCounters) {
const auto *CtxKey = cast<ProbeBasedCtxKey>(CI.first.getPtr());
SampleContextFrameVector ContextStack;
extractPrefixContextStack(ContextStack, CtxKey->Probes, Binary);
// Fill in function body samples from probes, also infer caller's samples
// from callee's probe
populateBodySamplesWithProbes(CI.second.RangeCounter, ContextStack);
// Fill in boundary samples for a call probe
populateBoundarySamplesWithProbes(CI.second.BranchCounter, ContextStack);
}
}

void CSProfileGenerator::extractProbesFromRange(const RangeSample &RangeCounter,
ProbeCounterMap &ProbeCounter) {
RangeSample Ranges;		RangeSample Ranges;
findDisjointRanges(Ranges, RangeCounter);		findDisjointRanges(Ranges, RangeCounter);
for (const auto &Range : Ranges) {		for (const auto &Range : Ranges) {
uint64_t RangeBegin = Binary->offsetToVirtualAddr(Range.first.first);		uint64_t RangeBegin = Binary->offsetToVirtualAddr(Range.first.first);
uint64_t RangeEnd = Binary->offsetToVirtualAddr(Range.first.second);		uint64_t RangeEnd = Binary->offsetToVirtualAddr(Range.first.second);
uint64_t Count = Range.second;		uint64_t Count = Range.second;
// Disjoint ranges have introduce zero-filled gap that
// doesn't belong to current context, filter them out.
if (Count == 0)
continue;

InstructionPointer IP(Binary, RangeBegin, true);		InstructionPointer IP(Binary, RangeBegin, true);
// Disjoint ranges may have range in the middle of two instr,		// Disjoint ranges may have range in the middle of two instr,
// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range		// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range
// can be Addr1+1 to Addr2-1. We should ignore such range.		// can be Addr1+1 to Addr2-1. We should ignore such range.
if (IP.Address > RangeEnd)		if (IP.Address > RangeEnd)
continue;		continue;

do {		do {
const AddressProbesMap &Address2ProbesMap =		const AddressProbesMap &Address2ProbesMap =
Binary->getAddress2ProbesMap();		Binary->getAddress2ProbesMap();
auto It = Address2ProbesMap.find(IP.Address);		auto It = Address2ProbesMap.find(IP.Address);
if (It != Address2ProbesMap.end()) {		if (It != Address2ProbesMap.end()) {
for (const auto &Probe : It->second) {		for (const auto &Probe : It->second) {
if (!Probe.isBlock())
continue;
ProbeCounter[&Probe] += Count;		ProbeCounter[&Probe] += Count;
}		}
}		}
} while (IP.advance() && IP.Address <= RangeEnd);		} while (IP.advance() && IP.Address <= RangeEnd);
}		}
}		}

		// Helper function to extract context prefix string stack
		// Extract context stack for reusing, leaf context stack will
		// be added compressed while looking up function profile
		static void extractPrefixContextStack(
		SampleContextFrameVector &ContextStack,
		const SmallVectorImpl<const MCDecodedPseudoProbe *> &Probes,
		ProfiledBinary *Binary) {
		for (const auto *P : Probes) {
		Binary->getInlineContextForProbe(P, ContextStack, true);
		}
		}

		void CSProfileGenerator::generateProbeBasedProfile() {
		// Enable pseudo probe functionalities in SampleProf
		FunctionSamples::ProfileIsProbeBased = true;
		for (const auto &CI : SampleCounters) {
		const auto *CtxKey = cast<ProbeBasedCtxKey>(CI.first.getPtr());
		SampleContextFrameVector ContextStack;
		extractPrefixContextStack(ContextStack, CtxKey->Probes, Binary);
		// Fill in function body samples from probes, also infer caller's samples
		// from callee's probe
		populateBodySamplesWithProbes(CI.second.RangeCounter, ContextStack);
		// Fill in boundary samples for a call probe
		populateBoundarySamplesWithProbes(CI.second.BranchCounter, ContextStack);
		}
		}

void CSProfileGenerator::populateBodySamplesWithProbes(		void CSProfileGenerator::populateBodySamplesWithProbes(
const RangeSample &RangeCounter, SampleContextFrames ContextStack) {		const RangeSample &RangeCounter, SampleContextFrames ContextStack) {
ProbeCounterMap ProbeCounter;		ProbeCounterMap ProbeCounter;
// Extract the top frame probes by looking up each address among the range in		// Extract the top frame probes by looking up each address among the range in
// the Address2ProbeMap		// the Address2ProbeMap
extractProbesFromRange(RangeCounter, ProbeCounter);		extractProbesFromRange(RangeCounter, ProbeCounter);
std::unordered_map<MCDecodedPseudoProbeInlineTree *,		std::unordered_map<MCDecodedPseudoProbeInlineTree *,
std::unordered_set<FunctionSamples *>>		std::unordered_set<FunctionSamples *>>
FrameSamples;		FrameSamples;
for (const auto &PI : ProbeCounter) {		for (const auto &PI : ProbeCounter) {
const MCDecodedPseudoProbe *Probe = PI.first;		const MCDecodedPseudoProbe *Probe = PI.first;
uint64_t Count = PI.second;		uint64_t Count = PI.second;
		// Disjoint ranges have introduce zero-filled gap that
		// doesn't belong to current context, filter them out.
		if (!Probe->isBlock() \|\| Count == 0)
		wleiUnsubmitted Not Done Reply Inline Actions `Probe.isBlock()` check is hoisted from `extractProbesFromRange` to this for CS profile but Why doesn't add this check back for non-CS profile? wlei: `Probe.isBlock()` check is hoisted from `extractProbesFromRange` to this for CS profile but Why…
		hoyAuthorUnsubmitted Done Reply Inline Actions So we don't count callsite probes here since they are handled in `populateBoundarySamplesWithProbes`. Otherwise they will be double counted. We don't do this for non-CS probe profile. This is to be consistent with non-CS dwarf implementation where we report zero count in `populateBoundarySamplesForAllFunctions`. I guess I can also do some refactoring to unify the two implementations. hoy: So we don't count callsite probes here since they are handled in…
		wleiUnsubmitted Not Done Reply Inline Actions Yeah, I understand for `count==0`. Here I'm asking for whether we should add `Probe->isBlock()` for non-CS probe profile. My understanding is for body sample, we only accumulate total sample for Block profile? otherwise, the count from callsite probes will affect the total samples. wlei: Yeah, I understand for `count==0`. Here I'm asking for whether we should add `Probe->isBlock()`…
		hoyAuthorUnsubmitted Done Reply Inline Actions For non-CS, I'm using `count==0` for probe to be consistent with dwarf, i.e, both block probes and call probes are counted by `populateBodySamples`. For CS, only block probes are counted by `populateBodySamples`, call probes are counted by `populateBoundarySamples`. That's the inconsistency we should probably fix. hoy: For non-CS, I'm using `count==0` for probe to be consistent with dwarf, i.e, both block probes…
		continue;
FunctionSamples &FunctionProfile =		FunctionSamples &FunctionProfile =
getFunctionProfileForLeafProbe(ContextStack, Probe);		getFunctionProfileForLeafProbe(ContextStack, Probe);
// Record the current frame and FunctionProfile whenever samples are		// Record the current frame and FunctionProfile whenever samples are
// collected for non-danglie probes. This is for reporting all of the		// collected for non-danglie probes. This is for reporting all of the
// zero count probes of the frame later.		// zero count probes of the frame later.
FrameSamples[Probe->getInlineTreeNode()].insert(&FunctionProfile);		FrameSamples[Probe->getInlineTreeNode()].insert(&FunctionProfile);
FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);		FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);
FunctionProfile.addTotalSamples(Count);		FunctionProfile.addTotalSamples(Count);
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines