This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
-
MCPseudoProbe.h
-
lib/MC/
-
MC/
1/2
MCPseudoProbe.cpp
-
test/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
-
noinline-cs-pseudoprobe.test
-
recursion-compression-pseudoprobe.test
-
tools/llvm-profgen/
-
llvm-profgen/
1/2
PerfReader.h
-
PerfReader.cpp
-
ProfileGenerator.h
8/19
ProfileGenerator.cpp
-
ProfiledBinary.h
9/19
ProfiledBinary.cpp

Differential D121643

[llvm-profgen] Decoding pseudo probe for profiled function only.
ClosedPublic

Authored by hoy on Mar 14 2022, 2:29 PM.

Download Raw Diff

Details

Reviewers

wenlei
wlei

Commits

rG3f97016857b0: [llvm-profgen] Decoding pseudo probe for profiled function only.

Summary

Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled.

To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain.

I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding.

I'm seeing 20GB memory is saved for one of our internal large service.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hoy created this revision.Mar 14 2022, 2:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 2:29 PM

Herald added subscribers: modimo, wenlei, hiraditya. · View Herald Transcript

hoy requested review of this revision.Mar 14 2022, 2:29 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 2:29 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

hoy edited the summary of this revision. (Show Details)Mar 14 2022, 2:53 PM

hoy added reviewers: wenlei, wlei.

Harbormaster completed remote builds in B154197: Diff 415234.Mar 14 2022, 4:15 PM

Some linter warning seems legit, please format/fix.

llvm/lib/MC/MCPseudoProbe.cpp
363	`Guids` -> `GuildFilter` or `ProfiledProbeGuids`
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe-on-demand.test
18 ↗	(On Diff #415234)	This change is because main->foo call is not profiled, right?
llvm/tools/llvm-profgen/PerfReader.h
424–425	Should this and ProbeBasedCtxKey be removed?
llvm/tools/llvm-profgen/ProfileGenerator.cpp
394–397	Why is this needed?
975	CallProbe can also be null if the caller isn't profiled? Does the comment need to be updated here?
986	Inline this call here and move into the loop? there's no other calls to this callee now.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
58	how about `-decode-probe-for-profiled-functions-only`?
360	can we assert `!ProfiledFunctions.empty()` to make sure profile functions are collected already?
381	How about completely unify the workflow of on-demand vs full decoding? The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids. Correct me if I'm wrong, but I think this was we keep the same behavior for full decoding as of today (no context truncation before preinliner), and we can also remove ProbeStack and replace it with AddressStack everywhere.

hoy added inline comments.Mar 22 2022, 4:56 PM

llvm/lib/MC/MCPseudoProbe.cpp
363	`GuildFilter` sounds better.
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe-on-demand.test
18 ↗	(On Diff #415234)	Yes, main is not profiled so the calling context is truncated.
llvm/tools/llvm-profgen/PerfReader.h
424–425	Yes, this is no longer needed.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
394–397	Oh, this should be a change for D121655. With llvm-profgen reading in the final profile directly, profiled functions should be extracted from the input ProfileMap. Let me remove it here.
975	Comment updated.
986	Inlined. Not merging the loops since they iteration in different direction, note that the std::reverse in between.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
58	Sounds good.

i
Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wenlei added inline comments.Mar 22 2022, 5:45 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
939–940	Now remove this function?
975	nit: We may not find a probe for functions that are not sampled. -> We may not find a probe for functions that are not sampled when --decode-probe-for-profiled-functions-only is on. also clearer if you list the three scenario with bullets as it now grow larger.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
360	missed this one?
381	The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids. missed this one? By only difference, I meant the only place that need to check `DecodeProbeForProfiledFunctionsOnly`.

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wlei added inline comments.Mar 22 2022, 6:20 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	For CS profile, the input contains CallStack + LBRStack, but here it only considers the addresses from LBRStack(Range + Branch) to compute profiledFunctions. Does it miss some functions whose address(frame) is only from CallStack? For example, if we only have one following entry: [main @ foo] : {rangeFrom : rangeTo} Supposing rangeFrom and rangeTo only belong to `foo`(foo is not inlined to main), then the result of ProfiledFunctions only contains `foo` and misses `main`, right? From the term "ProfiledFunctions", it should be `foo`, but for CS inlining, I guess we still need the frame `main`?

wlei added inline comments.Mar 22 2022, 6:33 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	Sorry, I missed your notes on the summary. "Note that in on-demand mode, a calling context may be truncated at a caller which does not have sample. This is a discrepancy from the complete mode. However, with the CS preinliner such context will always be truncated from the compiler inlining point of view." So in on-demand mode, it will implicitly trim some contexts, even before CS-preinliner or cold-context-trimming.

hoy added inline comments.Mar 22 2022, 6:42 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	Yes, and those contexts, if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function.
975	Sounds good.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
360	assert added.
381	Right now for full decoding we provide an empty `ProfiledGuids` to the decoder instead of computing Guids for all disassembled functions which could be expansive. The only place that requires full decoding is when --show-assembly-only is on. Otherwise it is kind of unified? I'm adding an assert like this assert((!ProfiledFunctions.empty() \|\| !DecodeProbeForProfiledFunctionsOnly \|\| ShowDisassemblyOnly) && "Profiled functions should not be empty in on-demand probe decoding " "mode"); We can also use a 0 as a guid to represent all functions. WDYT?

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wlei added inline comments.Mar 22 2022, 7:08 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	Thanks for the clarification. Then it looks like there can be a future work to compute the ProfiledFunctions in early time and truncate the callStack during unwinding so that `SampleCounters` is reduced to save parsing time and memory.

Harbormaster completed remote builds in B155752: Diff 417462.Mar 22 2022, 7:21 PM

wenlei added inline comments.Mar 22 2022, 10:21 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function. Actually this is arguable. Top-down inlining could chose to also inline cold calls in certain cases. E.g. even if A->B is cold, inlining A->B->C provides an opportunity for B->C to specialize under A->B->C, without affecting D->B->C path. What you described could be benign at the moment, but I'm wondering if we keep main like Lei suggested, what's the extra memory overhead? Semantically, keeping the old behavior might be better because the generated profile is supposed to have complete context before preinliner/trimming. Ofc, if that behavior bloats memory too much, the way you optimize it here by trimming earlier helps.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
381	Passing empty guid for full decoding works too. Then can we only check `DecodeProbeForProfiledFunctionsOnly` within `decodePseudoProbe` and pass empty Guid when the flag is false? Currently `DecodeProbeForProfiledFunctionsOnly` is checked at many places before calling `decodePseudoProbe`. What I meant was that regardless of that flag, `decodePseudoProbe` should always be called at the same place (except for `ShowDisassemblyOnly`), but the flag only control what is passed as GuidFilter.

hoy added inline comments.Mar 22 2022, 11:58 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	For now the sample inliner only processes functions with profiles. So for cold function call like A->B, both A and B should have a profile in order for the call to be inlined. With this on-demand approach, if A is not sampled, we truncate the context to B only. Without this diff, we would generate a A:1 @ B like profile, but then during top-down inlining, context promotion would truncate the context as well. So for now we wouldn't lose any opportunity. This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions. Good point on moving the profile function computation earlier so that we could do more truncation during the unwinding to further save some memory. So far our memory usage looks good. I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful?
llvm/tools/llvm-profgen/ProfiledBinary.cpp
381	I see what you mean. Moved the flag checks into decodePseudoProbe.

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

Harbormaster completed remote builds in B155788: Diff 417513.Mar 23 2022, 1:08 AM

wenlei added inline comments.Mar 23 2022, 9:55 AM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	So for now we wouldn't lose any opportunity. This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions. Yes, we don't lose inlining now, but that's not the main concern. I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful? What I was thinking was that, if we keep all context probe decoded, this change will strictly not affect output in anyways and we can remove the switch which simplifies things and always same memory. I think that having full context also helps with verification even if we don't utilize that info for optimization now. So if the extra probes doesn't cost us much for memory, I'm leaning towards keeping them and remove the switch to make the new behavior the default and only choice. There's another alternative: we always only decode profiled functions (except for show-assembly), then there's a switch to control whether extra context probe is decoded.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
293–294	just fyi, you can run linter on your changes only instead of the whole file. Not a big deal though, and thanks for fixing it for entire files.
363–365	When `DecodeProbeForProfiledFunctionsOnly` is off, are we supposed to keep `ProfiledGuids` empty here for buildAddress2ProbeMap, so all functions will be decoded?
367	Others may not be aware what "on-demand" mode means, because with the current version, there's no concept of on-demand - it's decoding all functions or decoding profile functions only. Suggest rephase the wording "on-demand"

hoy added inline comments.Mar 23 2022, 12:14 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	okay, keeping full contexts doesn't seem to incur more cost, only 5% more functions are decoded. I'm including the change here.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
293–294	arc lint didn't seem to work. I ended up formatting the whole file with the editor.
360	I removed the assert since there could be an outlier case where no function is probed, eg., test/tools/llvm-profgen/mmapEvent.test
363–365	The function `decodePseudoProbe` was run in two places, one was in the very early binary loading, the other was in profile generation. The first time the function is executed `ProfiledFunctions` should always be empty. The second time it's run, the container may not be empty. This is confusing, I'm now changing it to only run in profile generation except for ShowDisassemblyOnly.
367	The comment seems not needed after latest refactoring.

Decoding for functions included in call stacks as well.

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wenlei added inline comments.Mar 23 2022, 12:23 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	Cool, in that case, maybe remove the new switch altogether? Then you can also eliminate many test changes for extra switch to keep old behavior.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
363–365	Maybe I missed something, but the only place I see `decodePseudoProbe` run earlier is for `ShowDisassemblyOnly`. When ShowDisassemblyOnly is off, when do we call decodePseudoProbe with empty Guid? And why do we need to run decodePseudoProbe early except for ShowDisassemblyOnly? That was part of the unification I mentioned.

hoy added inline comments.Mar 23 2022, 12:38 PM

llvm/tools/llvm-profgen/ProfileGenerator.cpp
373	Sounds good.
llvm/tools/llvm-profgen/ProfiledBinary.cpp
363–365	That was in the previous iterations. In the latest iteration decodePseudoProbe should be called with empty ProfiledFunctions for ShowDisassemblyOnly only.

Removing decode-probe-for-profiled-functions-only switch.

Harbormaster completed remote builds in B155931: Diff 417718.Mar 23 2022, 1:24 PM

Thanks for working through the changes, this looks great now. Please update the summary to reflect the latest version (we can also make it clear that this is not "on-demand" but "decode probes for profiled context only").

This revision is now accepted and ready to land.Mar 23 2022, 1:49 PM

hoy retitled this revision from [llvm-profgen] On-demand pseudo probe decoding to [llvm-profgen] Decoding pseudo probe for profiled function only..Mar 23 2022, 1:58 PM

hoy edited the summary of this revision. (Show Details)

This revision was landed with ongoing or failed builds.Mar 23 2022, 2:15 PM

Closed by commit rG3f97016857b0: [llvm-profgen] Decoding pseudo probe for profiled function only. (authored by hoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

hoy added a commit: rG3f97016857b0: [llvm-profgen] Decoding pseudo probe for profiled function only..

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCPseudoProbe.h

10 lines

lib/

MC/

MCPseudoProbe.cpp

164 lines

test/

tools/

llvm-profgen/

noinline-cs-pseudoprobe.test

4 lines

recursion-compression-pseudoprobe.test

12 lines

tools/

llvm-profgen/

60 lines

38 lines

2 lines

111 lines

16 lines

61 lines

Diff 417744

llvm/include/llvm/MC/MCPseudoProbe.h

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	#include "llvm/Support/ErrorOr.h"			#include "llvm/Support/ErrorOr.h"
	#include <list>			#include <list>
	#include <map>			#include <map>
	#include <memory>			#include <memory>
	#include <string>			#include <string>
	#include <tuple>			#include <tuple>
	#include <type_traits>			#include <type_traits>
	#include <unordered_map>			#include <unordered_map>
				#include <unordered_set>
	#include <vector>			#include <vector>

	namespace llvm {			namespace llvm {

	class MCSection;			class MCSection;
	class MCSymbol;			class MCSymbol;
	class MCObjectStreamer;			class MCObjectStreamer;
	class raw_ostream;			class raw_ostream;
	▲ Show 20 Lines • Show All 282 Lines • ▼ Show 20 Lines

	public:			public:
	// Decode pseudo_probe_desc section to build GUID to PseudoProbeFuncDesc map.			// Decode pseudo_probe_desc section to build GUID to PseudoProbeFuncDesc map.
	bool buildGUID2FuncDescMap(const uint8_t *Start, std::size_t Size);			bool buildGUID2FuncDescMap(const uint8_t *Start, std::size_t Size);

	// Decode pseudo_probe section to build address to probes map.			// Decode pseudo_probe section to build address to probes map.
	bool buildAddress2ProbeMap(const uint8_t *Start, std::size_t Size);			bool buildAddress2ProbeMap(const uint8_t *Start, std::size_t Size);

				// Decode pseudo_probe section to build address to probes map for specifed
				// functions only.
				bool buildAddress2ProbeMap(const uint8_t *Start, std::size_t Size,
				std::unordered_set<uint64_t> &GuildFilter);

				bool buildAddress2ProbeMap(MCDecodedPseudoProbeInlineTree *Cur,
				uint64_t &LastAddr,
				std::unordered_set<uint64_t> &GuildFilter);

	// Print pseudo_probe_desc section info			// Print pseudo_probe_desc section info
	void printGUID2FuncDescMap(raw_ostream &OS);			void printGUID2FuncDescMap(raw_ostream &OS);

	// Print pseudo_probe section info, used along with show-disassembly			// Print pseudo_probe section info, used along with show-disassembly
	void printProbeForAddress(raw_ostream &OS, uint64_t Address);			void printProbeForAddress(raw_ostream &OS, uint64_t Address);

	// do printProbeForAddress for all addresses			// do printProbeForAddress for all addresses
	void printProbesForAllAddresses(raw_ostream &OS);			void printProbesForAllAddresses(raw_ostream &OS);
	Show All 39 Lines

llvm/lib/MC/MCPseudoProbe.cpp

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	while (Data < End) {

// Initialize PseudoProbeFuncDesc and populate it into GUID2FuncDescMap		// Initialize PseudoProbeFuncDesc and populate it into GUID2FuncDescMap
GUID2FuncDescMap.emplace(GUID, MCPseudoProbeFuncDesc(GUID, Hash, Name));		GUID2FuncDescMap.emplace(GUID, MCPseudoProbeFuncDesc(GUID, Hash, Name));
}		}
assert(Data == End && "Have unprocessed data in pseudo_probe_desc section");		assert(Data == End && "Have unprocessed data in pseudo_probe_desc section");
return true;		return true;
}		}

bool MCPseudoProbeDecoder::buildAddress2ProbeMap(const uint8_t *Start,		bool MCPseudoProbeDecoder::buildAddress2ProbeMap(
std::size_t Size) {		MCDecodedPseudoProbeInlineTree *Cur, uint64_t &LastAddr,
		std::unordered_set<uint64_t> &GuildFilter) {
		wenleiUnsubmitted Not Done Reply Inline Actions `Guids` -> `GuildFilter` or `ProfiledProbeGuids` wenlei: `Guids` -> `GuildFilter` or `ProfiledProbeGuids`
		hoyAuthorUnsubmitted Done Reply Inline Actions `GuildFilter` sounds better. hoy: `GuildFilter` sounds better.
// The pseudo_probe section encodes an inline forest and each tree has a		// The pseudo_probe section encodes an inline forest and each tree has a
// format like:		// format like:
// FUNCTION BODY (one for each uninlined function present in the text		// FUNCTION BODY (one for each uninlined function present in the text
// section)		// section)
// GUID (uint64)		// GUID (uint64)
// GUID of the function		// GUID of the function
// NPROBES (ULEB128)		// NPROBES (ULEB128)
// Number of probes originating from this function.		// Number of probes originating from this function.
Show All 14 Lines	bool MCPseudoProbeDecoder::buildAddress2ProbeMap(
// INLINED FUNCTION RECORDS		// INLINED FUNCTION RECORDS
// A list of NUM_INLINED_FUNCTIONS entries describing each of the		// A list of NUM_INLINED_FUNCTIONS entries describing each of the
// inlined callees. Each record contains:		// inlined callees. Each record contains:
// INLINE SITE		// INLINE SITE
// Index of the callsite probe (ULEB128)		// Index of the callsite probe (ULEB128)
// FUNCTION BODY		// FUNCTION BODY
// A FUNCTION BODY entry describing the inlined function.		// A FUNCTION BODY entry describing the inlined function.

Data = Start;
End = Data + Size;

MCDecodedPseudoProbeInlineTree *Root = &DummyInlineRoot;
MCDecodedPseudoProbeInlineTree *Cur = &DummyInlineRoot;
uint64_t LastAddr = 0;
uint32_t Index = 0;		uint32_t Index = 0;
// A DFS-based decoding		if (Cur == &DummyInlineRoot) {
while (Data < End) {
if (Root == Cur) {
// Use a sequential id for top level inliner.		// Use a sequential id for top level inliner.
Index = Root->getChildren().size();		Index = Cur->getChildren().size();
} else {		} else {
// Read inline site for inlinees		// Read inline site for inlinees
auto ErrorOrIndex = readUnsignedNumber<uint32_t>();		auto ErrorOrIndex = readUnsignedNumber<uint32_t>();
if (!ErrorOrIndex)		if (!ErrorOrIndex)
return false;		return false;
Index = std::move(*ErrorOrIndex);		Index = std::move(*ErrorOrIndex);
}		}
// Switch/add to a new tree node(inlinee)
Cur = Cur->getOrAddNode(std::make_tuple(Cur->Guid, Index));
// Read guid		// Read guid
auto ErrorOrCurGuid = readUnencodedNumber<uint64_t>();		auto ErrorOrCurGuid = readUnencodedNumber<uint64_t>();
if (!ErrorOrCurGuid)		if (!ErrorOrCurGuid)
return false;		return false;
Cur->Guid = std::move(*ErrorOrCurGuid);		uint64_t Guid = std::move(*ErrorOrCurGuid);

		// Decide if top-level node should be disgarded.
		if (Cur == &DummyInlineRoot && !GuildFilter.empty() &&
		!GuildFilter.count(Guid))
		Cur = nullptr;

		// If the incoming node is null, all its children nodes should be disgarded.
		if (Cur) {
		// Switch/add to a new tree node(inlinee)
		Cur = Cur->getOrAddNode(std::make_tuple(Cur->Guid, Index));
		Cur->Guid = Guid;
		}

// Read number of probes in the current node.		// Read number of probes in the current node.
auto ErrorOrNodeCount = readUnsignedNumber<uint32_t>();		auto ErrorOrNodeCount = readUnsignedNumber<uint32_t>();
if (!ErrorOrNodeCount)		if (!ErrorOrNodeCount)
return false;		return false;
uint32_t NodeCount = std::move(*ErrorOrNodeCount);		uint32_t NodeCount = std::move(*ErrorOrNodeCount);
// Read number of direct inlinees		// Read number of direct inlinees
auto ErrorOrCurChildrenToProcess = readUnsignedNumber<uint32_t>();		auto ErrorOrCurChildrenToProcess = readUnsignedNumber<uint32_t>();
if (!ErrorOrCurChildrenToProcess)		if (!ErrorOrCurChildrenToProcess)
return false;		return false;
Cur->ChildrenToProcess = std::move(*ErrorOrCurChildrenToProcess);
// Read all probes in this node		// Read all probes in this node
for (std::size_t I = 0; I < NodeCount; I++) {		for (std::size_t I = 0; I < NodeCount; I++) {
// Read index		// Read index
auto ErrorOrIndex = readUnsignedNumber<uint32_t>();		auto ErrorOrIndex = readUnsignedNumber<uint32_t>();
if (!ErrorOrIndex)		if (!ErrorOrIndex)
return false;		return false;
uint32_t Index = std::move(*ErrorOrIndex);		uint32_t Index = std::move(*ErrorOrIndex);
// Read type \| flag.		// Read type \| flag.
auto ErrorOrValue = readUnencodedNumber<uint8_t>();		auto ErrorOrValue = readUnencodedNumber<uint8_t>();
if (!ErrorOrValue)		if (!ErrorOrValue)
return false;		return false;
uint8_t Value = std::move(*ErrorOrValue);		uint8_t Value = std::move(*ErrorOrValue);
uint8_t Kind = Value & 0xf;		uint8_t Kind = Value & 0xf;
uint8_t Attr = (Value & 0x70) >> 4;		uint8_t Attr = (Value & 0x70) >> 4;
// Read address		// Read address
uint64_t Addr = 0;		uint64_t Addr = 0;
if (Value & 0x80) {		if (Value & 0x80) {
auto ErrorOrOffset = readSignedNumber<int64_t>();		auto ErrorOrOffset = readSignedNumber<int64_t>();
if (!ErrorOrOffset)		if (!ErrorOrOffset)
return false;		return false;
int64_t Offset = std::move(*ErrorOrOffset);		int64_t Offset = std::move(*ErrorOrOffset);
Addr = LastAddr + Offset;		Addr = LastAddr + Offset;
} else {		} else {
auto ErrorOrAddr = readUnencodedNumber<int64_t>();		auto ErrorOrAddr = readUnencodedNumber<int64_t>();
if (!ErrorOrAddr)		if (!ErrorOrAddr)
return false;		return false;
Addr = std::move(*ErrorOrAddr);		Addr = std::move(*ErrorOrAddr);
}		}

		if (Cur) {
// Populate Address2ProbesMap		// Populate Address2ProbesMap
auto &Probes = Address2ProbesMap[Addr];		auto &Probes = Address2ProbesMap[Addr];
Probes.emplace_back(Addr, Cur->Guid, Index, PseudoProbeType(Kind), Attr,		Probes.emplace_back(Addr, Cur->Guid, Index, PseudoProbeType(Kind), Attr,
Cur);		Cur);
Cur->addProbes(&Probes.back());		Cur->addProbes(&Probes.back());
		}
LastAddr = Addr;		LastAddr = Addr;
}		}

// Look for the parent for the next node by subtracting the current		uint32_t ChildrenToProcess = std::move(*ErrorOrCurChildrenToProcess);
// node count from tree counts along the parent chain. The first node		for (uint32_t I = 0; I < ChildrenToProcess; I++) {
// in the chain that has a non-zero tree count is the target.		buildAddress2ProbeMap(Cur, LastAddr, GuildFilter);
while (Cur != Root) {
if (Cur->ChildrenToProcess == 0) {
Cur = static_cast<MCDecodedPseudoProbeInlineTree *>(Cur->Parent);
if (Cur != Root) {
assert(Cur->ChildrenToProcess > 0 &&
"Should have some unprocessed nodes");
Cur->ChildrenToProcess -= 1;
}
} else {
break;
}
}		}

		return true;
}		}

		bool MCPseudoProbeDecoder::buildAddress2ProbeMap(
		const uint8_t *Start, std::size_t Size,
		std::unordered_set<uint64_t> &GuildFilter) {
		Data = Start;
		End = Data + Size;
		uint64_t LastAddr = 0;
		while (Data < End)
		buildAddress2ProbeMap(&DummyInlineRoot, LastAddr, GuildFilter);
assert(Data == End && "Have unprocessed data in pseudo_probe section");		assert(Data == End && "Have unprocessed data in pseudo_probe section");
assert(Cur == Root &&
" Cur should point to root when the forest is fully built up");
return true;		return true;
}		}

		bool MCPseudoProbeDecoder::buildAddress2ProbeMap(const uint8_t *Start,
		std::size_t Size) {
		std::unordered_set<uint64_t> GuildFilter;
		return buildAddress2ProbeMap(Start, Size, GuildFilter);
		}

void MCPseudoProbeDecoder::printGUID2FuncDescMap(raw_ostream &OS) {		void MCPseudoProbeDecoder::printGUID2FuncDescMap(raw_ostream &OS) {
OS << "Pseudo Probe Desc:\n";		OS << "Pseudo Probe Desc:\n";
// Make the output deterministic		// Make the output deterministic
std::map<uint64_t, MCPseudoProbeFuncDesc> OrderedMap(GUID2FuncDescMap.begin(),		std::map<uint64_t, MCPseudoProbeFuncDesc> OrderedMap(GUID2FuncDescMap.begin(),
GUID2FuncDescMap.end());		GUID2FuncDescMap.end());
for (auto &I : OrderedMap) {		for (auto &I : OrderedMap) {
I.second.print(OS);		I.second.print(OS);
}		}
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

	Show All 18 Lines
	; CHECK-NEXT: 8: 15 bar:15			; CHECK-NEXT: 8: 15 bar:15
	; CHECK-NEXT: 9: 0			; CHECK-NEXT: 9: 0
	; CHECK-NEXT: !CFGChecksum: 563088904013236			; CHECK-NEXT: !CFGChecksum: 563088904013236
	; CHECK:[main:2 @ foo:8 @ bar]:30:15			; CHECK:[main:2 @ foo:8 @ bar]:30:15
	; CHECK-NEXT: 1: 15			; CHECK-NEXT: 1: 15
	; CHECK-NEXT: 4: 15			; CHECK-NEXT: 4: 15
	; CHECK-NEXT: !CFGChecksum: 72617220756			; CHECK-NEXT: !CFGChecksum: 72617220756

	; CHECK-UNWINDER: [main:2]			; CHECK-UNWINDER: [0x7f4]
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 79e-7bf:15			; CHECK-UNWINDER-NEXT: 79e-7bf:15
	; CHECK-UNWINDER-NEXT: 7c4-7cf:15			; CHECK-UNWINDER-NEXT: 7c4-7cf:15
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7bf->760:15			; CHECK-UNWINDER-NEXT: 7bf->760:15
	; CHECK-UNWINDER-NEXT: 7cf->79e:16			; CHECK-UNWINDER-NEXT: 7cf->79e:16
	; CHECK-UNWINDER-NEXT: [main:2 @ foo:8]			; CHECK-UNWINDER-NEXT: [0x7f4 @ 0x7bf]
	; CHECK-UNWINDER-NEXT: 1			; CHECK-UNWINDER-NEXT: 1
	; CHECK-UNWINDER-NEXT: 760-77f:15			; CHECK-UNWINDER-NEXT: 760-77f:15
	; CHECK-UNWINDER-NEXT: 1			; CHECK-UNWINDER-NEXT: 1
	; CHECK-UNWINDER-NEXT: 77f->7c4:17			; CHECK-UNWINDER-NEXT: 77f->7c4:17


	; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling			; clang -O3 -fexperimental-new-pass-manager -fuse-ld=lld -fpseudo-probe-for-profiling
	; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls			; -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -Xclang -mdisable-tail-calls
	Show All 22 Lines

llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	; CHECK: 6: 1 fa:1			; CHECK: 6: 1 fa:1
	; CHECK: !CFGChecksum: 563022570642068			; CHECK: !CFGChecksum: 563022570642068
	; CHECK: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:6 @ fa:8 @ fa:7 @ fb:6 @ fa:7 @ fb]:3:1			; CHECK: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:6 @ fa:8 @ fa:7 @ fb:6 @ fa:7 @ fb]:3:1
	; CHECK: 1: 1			; CHECK: 1: 1
	; CHECK: 3: 1			; CHECK: 3: 1
	; CHECK: 6: 1 fa:1			; CHECK: 6: 1 fa:1
	; CHECK: !CFGChecksum: 563022570642068			; CHECK: !CFGChecksum: 563022570642068

	; CHECK-UNWINDER: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5]			; CHECK-UNWINDER: [0x842 @ 0x7d4 @ 0x7e0 @ 0x7ab]
	; CHECK-UNWINDER-NEXT: 3			; CHECK-UNWINDER-NEXT: 3
	; CHECK-UNWINDER-NEXT: 7a0-7a7:1			; CHECK-UNWINDER-NEXT: 7a0-7a7:1
	; CHECK-UNWINDER-NEXT: 7a0-7ab:3			; CHECK-UNWINDER-NEXT: 7a0-7ab:3
	; CHECK-UNWINDER-NEXT: 7b2-7b5:1			; CHECK-UNWINDER-NEXT: 7b2-7b5:1
	; CHECK-UNWINDER-NEXT: 3			; CHECK-UNWINDER-NEXT: 3
	; CHECK-UNWINDER-NEXT: 7a7->7b2:1			; CHECK-UNWINDER-NEXT: 7a7->7b2:1
	; CHECK-UNWINDER-NEXT: 7ab->7a0:4			; CHECK-UNWINDER-NEXT: 7ab->7a0:4
	; CHECK-UNWINDER-NEXT: 7b5->7c0:1			; CHECK-UNWINDER-NEXT: 7b5->7c0:1
	; CHECK-UNWINDER-NEXT: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:6]			; CHECK-UNWINDER-NEXT: [0x842 @ 0x7d4 @ 0x7e0 @ 0x7ab @ 0x7b5]
	; CHECK-UNWINDER-NEXT: 1			; CHECK-UNWINDER-NEXT: 1
	; CHECK-UNWINDER-NEXT: 7c0-7d4:1			; CHECK-UNWINDER-NEXT: 7c0-7d4:1
	; CHECK-UNWINDER-NEXT: 1			; CHECK-UNWINDER-NEXT: 1
	; CHECK-UNWINDER-NEXT: 7d4->7c0:1			; CHECK-UNWINDER-NEXT: 7d4->7c0:1
	; CHECK-UNWINDER-NEXT: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:6 @ fa:8]			; CHECK-UNWINDER-NEXT: [0x842 @ 0x7d4 @ 0x7e0 @ 0x7ab @ 0x7b5 @ 0x7d4]
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7c0-7cd:1			; CHECK-UNWINDER-NEXT: 7c0-7cd:1
	; CHECK-UNWINDER-NEXT: 7db-7e0:1			; CHECK-UNWINDER-NEXT: 7db-7e0:1
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7cd->7db:1			; CHECK-UNWINDER-NEXT: 7cd->7db:1
	; CHECK-UNWINDER-NEXT: 7e0->7a0:1			; CHECK-UNWINDER-NEXT: 7e0->7a0:1
	; CHECK-UNWINDER-NEXT: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:6 @ fa:8 @ fa:7]			; CHECK-UNWINDER-NEXT: [0x842 @ 0x7d4 @ 0x7e0 @ 0x7ab @ 0x7b5 @ 0x7d4 @ 0x7e0]
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7a0-7a7:1			; CHECK-UNWINDER-NEXT: 7a0-7a7:1
	; CHECK-UNWINDER-NEXT: 7b2-7b5:1			; CHECK-UNWINDER-NEXT: 7b2-7b5:1
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7a7->7b2:1			; CHECK-UNWINDER-NEXT: 7a7->7b2:1
	; CHECK-UNWINDER-NEXT: 7b5->7c0:1			; CHECK-UNWINDER-NEXT: 7b5->7c0:1
	; CHECK-UNWINDER-NEXT: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:6 @ fa:8 @ fa:7 @ fb:6]			; CHECK-UNWINDER-NEXT: [0x842 @ 0x7d4 @ 0x7e0 @ 0x7ab @ 0x7b5 @ 0x7d4 @ 0x7e0 @ 0x7b5]
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7c0-7cd:2			; CHECK-UNWINDER-NEXT: 7c0-7cd:2
	; CHECK-UNWINDER-NEXT: 7db-7e0:1			; CHECK-UNWINDER-NEXT: 7db-7e0:1
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7cd->7db:2			; CHECK-UNWINDER-NEXT: 7cd->7db:2
	; CHECK-UNWINDER-NEXT: 7e0->7a0:1			; CHECK-UNWINDER-NEXT: 7e0->7a0:1
	; CHECK-UNWINDER-NEXT: [main:2 @ foo:5 @ fa:8 @ fa:7 @ fb:5 @ fb:6 @ fa:8 @ fa:7 @ fb:6 @ fa:7]			; CHECK-UNWINDER-NEXT: [0x842 @ 0x7d4 @ 0x7e0 @ 0x7ab @ 0x7b5 @ 0x7d4 @ 0x7e0 @ 0x7b5 @ 0x7e0]
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7a0-7a7:1			; CHECK-UNWINDER-NEXT: 7a0-7a7:1
	; CHECK-UNWINDER-NEXT: 7b2-7b5:1			; CHECK-UNWINDER-NEXT: 7b2-7b5:1
	; CHECK-UNWINDER-NEXT: 2			; CHECK-UNWINDER-NEXT: 2
	; CHECK-UNWINDER-NEXT: 7a7->7b2:1			; CHECK-UNWINDER-NEXT: 7a7->7b2:1
	; CHECK-UNWINDER-NEXT: 7b5->7c0:1			; CHECK-UNWINDER-NEXT: 7b5->7c0:1


	Show All 28 Lines

llvm/tools/llvm-profgen/PerfReader.h

Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	uint64_t getHashCode() {
return HashCode;		return HashCode;
}		}
virtual void genHashCode() = 0;		virtual void genHashCode() = 0;
virtual bool isEqual(const ContextKey *K) const {		virtual bool isEqual(const ContextKey *K) const {
return HashCode == K->HashCode;		return HashCode == K->HashCode;
};		};

// Utilities for LLVM-style RTTI		// Utilities for LLVM-style RTTI
enum ContextKind { CK_StringBased, CK_ProbeBased };		enum ContextKind { CK_StringBased, CK_AddrBased };
const ContextKind Kind;		const ContextKind Kind;
ContextKind getKind() const { return Kind; }		ContextKind getKind() const { return Kind; }
ContextKey(ContextKind K) : Kind(K){};		ContextKey(ContextKind K) : Kind(K){};
};		};

// String based context id		// String based context id
struct StringBasedCtxKey : public ContextKey {		struct StringBasedCtxKey : public ContextKey {
SampleContextFrameVector Context;		SampleContextFrameVector Context;
Show All 9 Lines	bool isEqual(const ContextKey *K) const override {
return Context == Other->Context;		return Context == Other->Context;
}		}

void genHashCode() override {		void genHashCode() override {
HashCode = hash_value(SampleContextFrames(Context));		HashCode = hash_value(SampleContextFrames(Context));
}		}
};		};

// Probe based context key as the intermediate key of context		// Address-based context id
// String based context key will introduce redundant string handling		struct AddrBasedCtxKey : public ContextKey {
// since the callee context is inferred from the context string which		SmallVector<uint64_t, 16> Context;
// need to be splitted by '@' to get the last location frame, so we
// can just use probe instead and generate the string in the end.
struct ProbeBasedCtxKey : public ContextKey {
SmallVector<const MCDecodedPseudoProbe *, 16> Probes;

ProbeBasedCtxKey() : ContextKey(CK_ProbeBased) {}		bool WasLeafInlined;
		AddrBasedCtxKey() : ContextKey(CK_AddrBased), WasLeafInlined(false){};
static bool classof(const ContextKey *K) {		static bool classof(const ContextKey *K) {
return K->getKind() == CK_ProbeBased;		return K->getKind() == CK_AddrBased;
}		}

bool isEqual(const ContextKey *K) const override {		bool isEqual(const ContextKey *K) const override {
const ProbeBasedCtxKey *O = dyn_cast<ProbeBasedCtxKey>(K);		const AddrBasedCtxKey *Other = dyn_cast<AddrBasedCtxKey>(K);
assert(O != nullptr && "Probe based key shouldn't be null in isEqual");		return Context == Other->Context;
return std::equal(Probes.begin(), Probes.end(), O->Probes.begin(),
O->Probes.end());
}		}

void genHashCode() override {		void genHashCode() override {
for (const auto *P : Probes) {		HashCode = hash_combine_range(Context.begin(), Context.end());
HashCode = hash_combine(HashCode, P);
}
if (HashCode == 0) {
// Avoid zero value of HashCode when it's an empty list
HashCode = 1;
}
}		}
};		};

// The counter of branch samples for one function indexed by the branch,		// The counter of branch samples for one function indexed by the branch,
// which is represented as the source and target offset pair.		// which is represented as the source and target offset pair.
using BranchSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;		using BranchSample = std::map<std::pair<uint64_t, uint64_t>, uint64_t>;
// The counter of range samples for one function indexed by the range,		// The counter of range samples for one function indexed by the range,
// which is represented as the start and end offset pair.		// which is represented as the start and end offset pair.
Show All 29 Lines	struct FrameStack {
}		}

void popFrame() {		void popFrame() {
if (!Stack.empty())		if (!Stack.empty())
Stack.pop_back();		Stack.pop_back();
}		}
std::shared_ptr<StringBasedCtxKey> getContextKey();		std::shared_ptr<StringBasedCtxKey> getContextKey();
};		};

struct ProbeStack {		struct AddressStack {
		wenleiUnsubmitted Not Done Reply Inline Actions Should this and ProbeBasedCtxKey be removed? wenlei: Should this and ProbeBasedCtxKey be removed?
		hoyAuthorUnsubmitted Done Reply Inline Actions Yes, this is no longer needed. hoy: Yes, this is no longer needed.
SmallVector<const MCDecodedPseudoProbe *, 16> Stack;		SmallVector<uint64_t, 16> Stack;
ProfiledBinary *Binary;		ProfiledBinary *Binary;
ProbeStack(ProfiledBinary *B) : Binary(B) {}		AddressStack(ProfiledBinary *B) : Binary(B) {}
bool pushFrame(UnwindState::ProfiledFrame *Cur) {		bool pushFrame(UnwindState::ProfiledFrame *Cur) {
assert(!Cur->isExternalFrame() &&		assert(!Cur->isExternalFrame() &&
"External frame's not expected for context stack.");		"External frame's not expected for context stack.");
const MCDecodedPseudoProbe *CallProbe =		Stack.push_back(Cur->Address);
Binary->getCallProbeForAddr(Cur->Address);
// We may not find a probe for a merged or external callsite.
// Callsite merging may cause the loss of original probe IDs.
// Cutting off the context from here since the inliner will
// not know how to consume a context with unknown callsites.
if (!CallProbe)
return false;
Stack.push_back(CallProbe);
return true;		return true;
}		}

void popFrame() {		void popFrame() {
if (!Stack.empty())		if (!Stack.empty())
Stack.pop_back();		Stack.pop_back();
}		}
// Use pseudo probe based context key to get the sample counter		std::shared_ptr<AddrBasedCtxKey> getContextKey();
// A context stands for a call path from 'main' to an uninlined
// callee with all inline frames recovered on that path. The probes
// belonging to that call path is the probes either originated from
// the callee or from any functions inlined into the callee. Since
// pseudo probes are organized in a tri-tree style after decoded,
// the tree path from the tri-tree root (which is the uninlined
// callee) to the probe node forms an inline context.
// Here we use a list of probe(pointer) as the context key to speed up
// aggregation and the final context string will be generate in
// ProfileGenerator
std::shared_ptr<ProbeBasedCtxKey> getContextKey();
};		};

/*		/*
As in hybrid sample we have a group of LBRs and the most recent sampling call		As in hybrid sample we have a group of LBRs and the most recent sampling call
stack, we can walk through those LBRs to infer more call stacks which would be		stack, we can walk through those LBRs to infer more call stacks which would be
used as context for profile. VirtualUnwinder is the class to do the call stack		used as context for profile. VirtualUnwinder is the class to do the call stack
unwinding based on LBR state. Two types of unwinding are processd here:		unwinding based on LBR state. Two types of unwinding are processd here:
1) LBR unwinding and 2) linear range unwinding.		1) LBR unwinding and 2) linear range unwinding.
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/PerfReader.cpp

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	std::shared_ptr<StringBasedCtxKey> FrameStack::getContextKey() {
std::shared_ptr<StringBasedCtxKey> KeyStr =		std::shared_ptr<StringBasedCtxKey> KeyStr =
std::make_shared<StringBasedCtxKey>();		std::make_shared<StringBasedCtxKey>();
KeyStr->Context = Binary->getExpandedContext(Stack, KeyStr->WasLeafInlined);		KeyStr->Context = Binary->getExpandedContext(Stack, KeyStr->WasLeafInlined);
if (KeyStr->Context.empty())		if (KeyStr->Context.empty())
return nullptr;		return nullptr;
return KeyStr;		return KeyStr;
}		}

std::shared_ptr<ProbeBasedCtxKey> ProbeStack::getContextKey() {		std::shared_ptr<AddrBasedCtxKey> AddressStack::getContextKey() {
std::shared_ptr<ProbeBasedCtxKey> ProbeBasedKey =		std::shared_ptr<AddrBasedCtxKey> KeyStr = std::make_shared<AddrBasedCtxKey>();
std::make_shared<ProbeBasedCtxKey>();		KeyStr->Context = Stack;
for (auto CallProbe : Stack) {		CSProfileGenerator::compressRecursionContext<uint64_t>(KeyStr->Context);
ProbeBasedKey->Probes.emplace_back(CallProbe);		CSProfileGenerator::trimContext<uint64_t>(KeyStr->Context);
}		return KeyStr;
CSProfileGenerator::compressRecursionContext<const MCDecodedPseudoProbe *>(
ProbeBasedKey->Probes);
CSProfileGenerator::trimContext<const MCDecodedPseudoProbe *>(
ProbeBasedKey->Probes);
return ProbeBasedKey;
}		}

template <typename T>		template <typename T>
void VirtualUnwinder::collectSamplesFromFrame(UnwindState::ProfiledFrame *Cur,		void VirtualUnwinder::collectSamplesFromFrame(UnwindState::ProfiledFrame *Cur,
T &Stack) {		T &Stack) {
if (Cur->RangeSamples.empty() && Cur->BranchSamples.empty())		if (Cur->RangeSamples.empty() && Cur->BranchSamples.empty())
return;		return;

▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	void VirtualUnwinder::collectSamplesFromFrameTrie(
}		}
// Recover the call stack		// Recover the call stack
Stack.popFrame();		Stack.popFrame();
}		}

void VirtualUnwinder::collectSamplesFromFrameTrie(		void VirtualUnwinder::collectSamplesFromFrameTrie(
UnwindState::ProfiledFrame *Cur) {		UnwindState::ProfiledFrame *Cur) {
if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
ProbeStack Stack(Binary);		AddressStack Stack(Binary);
collectSamplesFromFrameTrie<ProbeStack>(Cur, Stack);		collectSamplesFromFrameTrie<AddressStack>(Cur, Stack);
} else {		} else {
FrameStack Stack(Binary);		FrameStack Stack(Binary);
collectSamplesFromFrameTrie<FrameStack>(Cur, Stack);		collectSamplesFromFrameTrie<FrameStack>(Cur, Stack);
}		}
}		}

void VirtualUnwinder::recordBranchCount(const LBREntry &Branch,		void VirtualUnwinder::recordBranchCount(const LBREntry &Branch,
UnwindState &State, uint64_t Repeat) {		UnwindState &State, uint64_t Repeat) {
▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	if (Event.Offset == Binary->getTextSegmentOffset()) {
}		}
}		}
}		}

static std::string getContextKeyStr(ContextKey *K,		static std::string getContextKeyStr(ContextKey *K,
const ProfiledBinary *Binary) {		const ProfiledBinary *Binary) {
if (const auto *CtxKey = dyn_cast<StringBasedCtxKey>(K)) {		if (const auto *CtxKey = dyn_cast<StringBasedCtxKey>(K)) {
return SampleContext::getContextString(CtxKey->Context);		return SampleContext::getContextString(CtxKey->Context);
} else if (const auto *CtxKey = dyn_cast<ProbeBasedCtxKey>(K)) {		} else if (const auto *CtxKey = dyn_cast<AddrBasedCtxKey>(K)) {
SampleContextFrameVector ContextStack;		std::ostringstream OContextStr;
for (const auto *Probe : CtxKey->Probes) {		for (uint32_t I = 0; I < CtxKey->Context.size(); I++) {
Binary->getInlineContextForProbe(Probe, ContextStack, true);		if (OContextStr.str().size())
}		OContextStr << " @ ";
// Probe context key at this point does not have leaf probe, so do not		OContextStr << "0x"
// include the leaf inline location.		<< to_hexString(
return SampleContext::getContextString(ContextStack, true);		Binary->virtualAddrToOffset(CtxKey->Context[I]),
		false);
		}
		return OContextStr.str();
} else {		} else {
llvm_unreachable("unexpected key type");		llvm_unreachable("unexpected key type");
}		}
}		}

void HybridPerfReader::unwindSamples() {		void HybridPerfReader::unwindSamples() {
if (Binary->useFSDiscriminator())		if (Binary->useFSDiscriminator())
exitWithError("FS discriminator is not supported in CS profile.");		exitWithError("FS discriminator is not supported in CS profile.");
▲ Show 20 Lines • Show All 744 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfileGenerator.h

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	protected:

void calculateAndShowDensity(const SampleProfileMap &Profiles);		void calculateAndShowDensity(const SampleProfileMap &Profiles);

double calculateDensity(const SampleProfileMap &Profiles,		double calculateDensity(const SampleProfileMap &Profiles,
uint64_t HotCntThreshold);		uint64_t HotCntThreshold);

void showDensitySuggestion(double Density);		void showDensitySuggestion(double Density);

		void collectProfiledFunctions();

// Thresholds from profile summary to answer isHotCount/isColdCount queries.		// Thresholds from profile summary to answer isHotCount/isColdCount queries.
uint64_t HotCountThreshold;		uint64_t HotCountThreshold;

uint64_t ColdCountThreshold;		uint64_t ColdCountThreshold;

// Used by SampleProfileWriter		// Used by SampleProfileWriter
SampleProfileMap ProfileMap;		SampleProfileMap ProfileMap;

▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfileGenerator.cpp

//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//		//===-- ProfileGenerator.cpp - Profile Generator ---------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ProfileGenerator.h"		#include "ProfileGenerator.h"
#include "ErrorHandling.h"		#include "ErrorHandling.h"
#include "ProfiledBinary.h"		#include "ProfiledBinary.h"
#include "llvm/DebugInfo/Symbolize/SymbolizableModule.h"		#include "llvm/DebugInfo/Symbolize/SymbolizableModule.h"
#include "llvm/ProfileData/ProfileCommon.h"		#include "llvm/ProfileData/ProfileCommon.h"
		#include <algorithm>
#include <float.h>		#include <float.h>
#include <unordered_set>		#include <unordered_set>

cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),		cl::opt<std::string> OutputFilename("output", cl::value_desc("output"),
cl::Required,		cl::Required,
cl::desc("Output profile file"));		cl::desc("Output profile file"));
static cl::alias OutputA("o", cl::desc("Alias for --output"),		static cl::alias OutputA("o", cl::desc("Alias for --output"),
cl::aliasopt(OutputFilename));		cl::aliasopt(OutputFilename));
▲ Show 20 Lines • Show All 343 Lines • ▼ Show 20 Lines	if (!UpdateTotalSamples)
return;		return;

for (auto &Item : ProfileMap) {		for (auto &Item : ProfileMap) {
FunctionSamples &FunctionProfile = Item.second;		FunctionSamples &FunctionProfile = Item.second;
FunctionProfile.updateTotalSamples();		FunctionProfile.updateTotalSamples();
}		}
}		}

		void ProfileGeneratorBase::collectProfiledFunctions() {
		wleiUnsubmitted Not Done Reply Inline Actions For CS profile, the input contains CallStack + LBRStack, but here it only considers the addresses from LBRStack(Range + Branch) to compute profiledFunctions. Does it miss some functions whose address(frame) is only from CallStack? For example, if we only have one following entry: [main @ foo] : {rangeFrom : rangeTo} Supposing rangeFrom and rangeTo only belong to `foo`(foo is not inlined to main), then the result of ProfiledFunctions only contains `foo` and misses `main`, right? From the term "ProfiledFunctions", it should be `foo`, but for CS inlining, I guess we still need the frame `main`? wlei: For CS profile, the input contains CallStack + LBRStack, but here it only considers the…
		wleiUnsubmitted Not Done Reply Inline Actions Sorry, I missed your notes on the summary. "Note that in on-demand mode, a calling context may be truncated at a caller which does not have sample. This is a discrepancy from the complete mode. However, with the CS preinliner such context will always be truncated from the compiler inlining point of view." So in on-demand mode, it will implicitly trim some contexts, even before CS-preinliner or cold-context-trimming. wlei: Sorry, I missed your notes on the summary. > "Note that in on-demand mode, a calling context…
		hoyAuthorUnsubmitted Done Reply Inline Actions Yes, and those contexts, if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function. hoy: Yes, and those contexts, if not truncated by the CS preinliner, will also be truncated by the…
		wleiUnsubmitted Not Done Reply Inline Actions Thanks for the clarification. Then it looks like there can be a future work to compute the ProfiledFunctions in early time and truncate the callStack during unwinding so that `SampleCounters` is reduced to save parsing time and memory. wlei: Thanks for the clarification. Then it looks like there can be a future work to compute the…
		wenleiUnsubmitted Not Done Reply Inline Actions if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function. Actually this is arguable. Top-down inlining could chose to also inline cold calls in certain cases. E.g. even if A->B is cold, inlining A->B->C provides an opportunity for B->C to specialize under A->B->C, without affecting D->B->C path. What you described could be benign at the moment, but I'm wondering if we keep main like Lei suggested, what's the extra memory overhead? Semantically, keeping the old behavior might be better because the generated profile is supposed to have complete context before preinliner/trimming. Ofc, if that behavior bloats memory too much, the way you optimize it here by trimming earlier helps. wenlei: > if not truncated by the CS preinliner, will also be truncated by the compiler since the…
		hoyAuthorUnsubmitted Done Reply Inline Actions For now the sample inliner only processes functions with profiles. So for cold function call like A->B, both A and B should have a profile in order for the call to be inlined. With this on-demand approach, if A is not sampled, we truncate the context to B only. Without this diff, we would generate a A:1 @ B like profile, but then during top-down inlining, context promotion would truncate the context as well. So for now we wouldn't lose any opportunity. This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions. Good point on moving the profile function computation earlier so that we could do more truncation during the unwinding to further save some memory. So far our memory usage looks good. I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful? hoy: For now the sample inliner only processes functions with profiles. So for cold function call…
		wenleiUnsubmitted Not Done Reply Inline Actions So for now we wouldn't lose any opportunity. This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions. Yes, we don't lose inlining now, but that's not the main concern. I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful? What I was thinking was that, if we keep all context probe decoded, this change will strictly not affect output in anyways and we can remove the switch which simplifies things and always same memory. I think that having full context also helps with verification even if we don't utilize that info for optimization now. So if the extra probes doesn't cost us much for memory, I'm leaning towards keeping them and remove the switch to make the new behavior the default and only choice. There's another alternative: we always only decode profiled functions (except for show-assembly), then there's a switch to control whether extra context probe is decoded. wenlei: > So for now we wouldn't lose any opportunity. This will be a problem In the future when we…
		hoyAuthorUnsubmitted Done Reply Inline Actions okay, keeping full contexts doesn't seem to incur more cost, only 5% more functions are decoded. I'm including the change here. hoy: okay, keeping full contexts doesn't seem to incur more cost, only 5% more functions are decoded.
		wenleiUnsubmitted Not Done Reply Inline Actions Cool, in that case, maybe remove the new switch altogether? Then you can also eliminate many test changes for extra switch to keep old behavior. wenlei: Cool, in that case, maybe remove the new switch altogether? Then you can also eliminate many…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. hoy: Sounds good.
		std::unordered_set<const BinaryFunction *> ProfiledFunctions;
		// Go through all the stacks, ranges and branches in sample counters, use the
		// start of the range to look up the function it belongs and record the
		// function.
		for (const auto &CI : SampleCounters) {
		if (const auto *CtxKey = dyn_cast<AddrBasedCtxKey>(CI.first.getPtr())) {
		for (auto Addr : CtxKey->Context) {
		if (FuncRange *FRange = Binary->findFuncRangeForOffset(
		Binary->virtualAddrToOffset(Addr)))
		ProfiledFunctions.insert(FRange->Func);
		}
		}

		for (auto Item : CI.second.RangeCounter) {
		uint64_t StartOffset = Item.first.first;
		if (FuncRange *FRange = Binary->findFuncRangeForOffset(StartOffset))
		ProfiledFunctions.insert(FRange->Func);
		}

		for (auto Item : CI.second.BranchCounter) {
		uint64_t SourceOffset = Item.first.first;
		uint64_t TargetOffset = Item.first.first;
		if (FuncRange *FRange = Binary->findFuncRangeForOffset(SourceOffset))
		ProfiledFunctions.insert(FRange->Func);
		wenleiUnsubmitted Not Done Reply Inline Actions Why is this needed? wenlei: Why is this needed?
		hoyAuthorUnsubmitted Done Reply Inline Actions Oh, this should be a change for D121655. With llvm-profgen reading in the final profile directly, profiled functions should be extracted from the input ProfileMap. Let me remove it here. hoy: Oh, this should be a change for D121655. With llvm-profgen reading in the final profile…
		if (FuncRange *FRange = Binary->findFuncRangeForOffset(TargetOffset))
		ProfiledFunctions.insert(FRange->Func);
		}
		}

		Binary->setProfiledFunctions(ProfiledFunctions);
		}

FunctionSamples &		FunctionSamples &
ProfileGenerator::getTopLevelFunctionProfile(StringRef FuncName) {		ProfileGenerator::getTopLevelFunctionProfile(StringRef FuncName) {
SampleContext Context(FuncName);		SampleContext Context(FuncName);
auto Ret = ProfileMap.emplace(Context, FunctionSamples());		auto Ret = ProfileMap.emplace(Context, FunctionSamples());
if (Ret.second) {		if (Ret.second) {
FunctionSamples &FProfile = Ret.first->second;		FunctionSamples &FProfile = Ret.first->second;
FProfile.setContext(Context);		FProfile.setContext(Context);
}		}
return Ret.first->second;		return Ret.first->second;
}		}

void ProfileGenerator::generateProfile() {		void ProfileGenerator::generateProfile() {
		collectProfiledFunctions();
if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
generateProbeBasedProfile();		generateProbeBasedProfile();
} else {		} else {
generateLineNumBasedProfile();		generateLineNumBasedProfile();
}		}
postProcessProfiles();		postProcessProfiles();
}		}

Show All 30 Lines	void ProfileGenerator::generateLineNumBasedProfile() {
populateBoundarySamplesForAllFunctions(SC.BranchCounter);		populateBoundarySamplesForAllFunctions(SC.BranchCounter);

updateTotalSamples();		updateTotalSamples();
}		}

void ProfileGenerator::generateProbeBasedProfile() {		void ProfileGenerator::generateProbeBasedProfile() {
assert(SampleCounters.size() == 1 &&		assert(SampleCounters.size() == 1 &&
"Must have one entry for profile generation.");		"Must have one entry for profile generation.");
		Binary->decodePseudoProbe();
// Enable pseudo probe functionalities in SampleProf		// Enable pseudo probe functionalities in SampleProf
FunctionSamples::ProfileIsProbeBased = true;		FunctionSamples::ProfileIsProbeBased = true;
const SampleCounter &SC = SampleCounters.begin()->second;		const SampleCounter &SC = SampleCounters.begin()->second;
// Fill in function body samples		// Fill in function body samples
populateBodySamplesWithProbesForAllFunctions(SC.RangeCounter);		populateBodySamplesWithProbesForAllFunctions(SC.RangeCounter);
// Fill in boundary sample counts as well as call site samples for calls		// Fill in boundary sample counts as well as call site samples for calls
populateBoundarySamplesWithProbesForAllFunctions(SC.BranchCounter);		populateBoundarySamplesWithProbesForAllFunctions(SC.BranchCounter);

updateTotalSamples();		updateTotalSamples();
}		}

void ProfileGenerator::populateBodySamplesWithProbesForAllFunctions(		void ProfileGenerator::populateBodySamplesWithProbesForAllFunctions(
const RangeSample &RangeCounter) {		const RangeSample &RangeCounter) {
ProbeCounterMap ProbeCounter;		ProbeCounterMap ProbeCounter;
// preprocessRangeCounter returns disjoint ranges, so no longer to redo it inside		// preprocessRangeCounter returns disjoint ranges, so no longer to redo it
// extractProbesFromRange.		// inside extractProbesFromRange.
extractProbesFromRange(preprocessRangeCounter(RangeCounter), ProbeCounter, false);		extractProbesFromRange(preprocessRangeCounter(RangeCounter), ProbeCounter,
		false);

for (const auto &PI : ProbeCounter) {		for (const auto &PI : ProbeCounter) {
const MCDecodedPseudoProbe *Probe = PI.first;		const MCDecodedPseudoProbe *Probe = PI.first;
uint64_t Count = PI.second;		uint64_t Count = PI.second;
SampleContextFrameVector FrameVec;		SampleContextFrameVector FrameVec;
Binary->getInlineContextForProbe(Probe, FrameVec, true);		Binary->getInlineContextForProbe(Probe, FrameVec, true);
FunctionSamples &FunctionProfile = getLeafProfileAndAddTotalSamples(FrameVec, Count);		FunctionSamples &FunctionProfile =
		getLeafProfileAndAddTotalSamples(FrameVec, Count);
FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);		FunctionProfile.addBodySamplesForProbe(Probe->getIndex(), Count);
if (Probe->isEntry())		if (Probe->isEntry())
FunctionProfile.addHeadSamples(Count);		FunctionProfile.addHeadSamples(Count);
}		}
}		}

void ProfileGenerator::populateBoundarySamplesWithProbesForAllFunctions(		void ProfileGenerator::populateBoundarySamplesWithProbesForAllFunctions(
const BranchSample &BranchCounters) {		const BranchSample &BranchCounters) {
Show All 28 Lines

FunctionSamples &ProfileGenerator::getLeafProfileAndAddTotalSamples(		FunctionSamples &ProfileGenerator::getLeafProfileAndAddTotalSamples(
const SampleContextFrameVector &FrameVec, uint64_t Count) {		const SampleContextFrameVector &FrameVec, uint64_t Count) {
// Get top level profile		// Get top level profile
FunctionSamples *FunctionProfile =		FunctionSamples *FunctionProfile =
&getTopLevelFunctionProfile(FrameVec[0].FuncName);		&getTopLevelFunctionProfile(FrameVec[0].FuncName);
FunctionProfile->addTotalSamples(Count);		FunctionProfile->addTotalSamples(Count);
if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
const auto *FuncDesc = Binary->getFuncDescForGUID(Function::getGUID(FunctionProfile->getName()));		const auto *FuncDesc = Binary->getFuncDescForGUID(
		Function::getGUID(FunctionProfile->getName()));
FunctionProfile->setFunctionHash(FuncDesc->FuncHash);		FunctionProfile->setFunctionHash(FuncDesc->FuncHash);
}		}

for (size_t I = 1; I < FrameVec.size(); I++) {		for (size_t I = 1; I < FrameVec.size(); I++) {
LineLocation Callsite(		LineLocation Callsite(
FrameVec[I - 1].Location.LineOffset,		FrameVec[I - 1].Location.LineOffset,
getBaseDiscriminator(FrameVec[I - 1].Location.Discriminator));		getBaseDiscriminator(FrameVec[I - 1].Location.Discriminator));
FunctionSamplesMap &SamplesMap =		FunctionSamplesMap &SamplesMap =
FunctionProfile->functionSamplesAt(Callsite);		FunctionProfile->functionSamplesAt(Callsite);
auto Ret =		auto Ret =
SamplesMap.emplace(FrameVec[I].FuncName.str(), FunctionSamples());		SamplesMap.emplace(FrameVec[I].FuncName.str(), FunctionSamples());
if (Ret.second) {		if (Ret.second) {
SampleContext Context(FrameVec[I].FuncName);		SampleContext Context(FrameVec[I].FuncName);
Ret.first->second.setContext(Context);		Ret.first->second.setContext(Context);
}		}
FunctionProfile = &Ret.first->second;		FunctionProfile = &Ret.first->second;
FunctionProfile->addTotalSamples(Count);		FunctionProfile->addTotalSamples(Count);
if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
const auto *FuncDesc = Binary->getFuncDescForGUID(Function::getGUID(FunctionProfile->getName()));		const auto *FuncDesc = Binary->getFuncDescForGUID(
		Function::getGUID(FunctionProfile->getName()));
FunctionProfile->setFunctionHash(FuncDesc->FuncHash);		FunctionProfile->setFunctionHash(FuncDesc->FuncHash);
}		}
}		}

return *FunctionProfile;		return *FunctionProfile;
}		}

RangeSample		RangeSample
▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	if (I == ProfileMap.end()) {
return Ret.first->second;		return Ret.first->second;
}		}
return I->second;		return I->second;
}		}

void CSProfileGenerator::generateProfile() {		void CSProfileGenerator::generateProfile() {
FunctionSamples::ProfileIsCSFlat = true;		FunctionSamples::ProfileIsCSFlat = true;

if (Binary->getTrackFuncContextSize())		collectProfiledFunctions();
computeSizeForProfiledFunctions();

if (Binary->usePseudoProbes()) {		if (Binary->usePseudoProbes()) {
generateProbeBasedProfile();		generateProbeBasedProfile();
} else {		} else {
generateLineNumBasedProfile();		generateLineNumBasedProfile();
}		}

		if (Binary->getTrackFuncContextSize())
		computeSizeForProfiledFunctions();

postProcessProfiles();		postProcessProfiles();
}		}

void CSProfileGenerator::computeSizeForProfiledFunctions() {		void CSProfileGenerator::computeSizeForProfiledFunctions() {
std::unordered_set<const BinaryFunction *> ProfiledFunctions;		std::unordered_set<const BinaryFunction *> ProfiledFunctions;
		for (auto *Func : Binary->getProfiledFunctions())
// Go through all the ranges in the CS counters, use the start of the range to
// look up the function it belongs and record the function.
for (const auto &CI : SampleCounters) {
for (const auto &Item : CI.second.RangeCounter) {
// FIXME: Filter the bogus crossing function range.
uint64_t StartOffset = Item.first.first;
if (FuncRange *FRange = Binary->findFuncRangeForOffset(StartOffset))
ProfiledFunctions.insert(FRange->Func);
}
}

for (auto *Func : ProfiledFunctions)
Binary->computeInlinedContextSizeForFunc(Func);		Binary->computeInlinedContextSizeForFunc(Func);

// Flush the symbolizer to save memory.		// Flush the symbolizer to save memory.
Binary->flushSymbolizer();		Binary->flushSymbolizer();
}		}

void CSProfileGenerator::generateLineNumBasedProfile() {		void CSProfileGenerator::generateLineNumBasedProfile() {
for (const auto &CI : SampleCounters) {		for (const auto &CI : SampleCounters) {
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	do {
if (It != Address2ProbesMap.end()) {		if (It != Address2ProbesMap.end()) {
for (const auto &Probe : It->second) {		for (const auto &Probe : It->second) {
ProbeCounter[&Probe] += Count;		ProbeCounter[&Probe] += Count;
}		}
}		}
} while (IP.advance() && IP.Address <= RangeEnd);		} while (IP.advance() && IP.Address <= RangeEnd);
}		}
}		}

// Helper function to extract context prefix string stack		static void
		wenleiUnsubmitted Not Done Reply Inline Actions Now remove this function? wenlei: Now remove this function?
// Extract context stack for reusing, leaf context stack will		extractPrefixContextStack(SampleContextFrameVector &ContextStack,
// be added compressed while looking up function profile		const SmallVectorImpl<uint64_t> &Addresses,
static void extractPrefixContextStack(
SampleContextFrameVector &ContextStack,
const SmallVectorImpl<const MCDecodedPseudoProbe *> &Probes,
ProfiledBinary *Binary) {		ProfiledBinary *Binary) {
		SmallVector<const MCDecodedPseudoProbe *, 16> Probes;
		for (auto Addr : reverse(Addresses)) {
		const MCDecodedPseudoProbe *CallProbe = Binary->getCallProbeForAddr(Addr);
		// These could be the cases when a probe is not found at a calliste. Cutting
		// off the context from here since the inliner will not know how to consume
		// a context with unknown callsites.
		// 1. for functions that are not sampled when
		// --decode-probe-for-profiled-functions-only is on.
		// 2. for a merged callsite. Callsite merging may cause the loss of original
		// probe IDs.
		// 3. for an external callsite.
		if (!CallProbe)
		break;
		Probes.push_back(CallProbe);
		}

		std::reverse(Probes.begin(), Probes.end());

		// Extract context stack for reusing, leaf context stack will be added
		// compressed while looking up function profile.
for (const auto *P : Probes) {		for (const auto *P : Probes) {
Binary->getInlineContextForProbe(P, ContextStack, true);		Binary->getInlineContextForProbe(P, ContextStack, true);
}		}
}		}

void CSProfileGenerator::generateProbeBasedProfile() {		void CSProfileGenerator::generateProbeBasedProfile() {
		Binary->decodePseudoProbe();
// Enable pseudo probe functionalities in SampleProf		// Enable pseudo probe functionalities in SampleProf
FunctionSamples::ProfileIsProbeBased = true;		FunctionSamples::ProfileIsProbeBased = true;
for (const auto &CI : SampleCounters) {		for (const auto &CI : SampleCounters) {
const auto *CtxKey = cast<ProbeBasedCtxKey>(CI.first.getPtr());		const AddrBasedCtxKey *CtxKey =
		dyn_cast<AddrBasedCtxKey>(CI.first.getPtr());
		wenleiUnsubmitted Not Done Reply Inline Actions CallProbe can also be null if the caller isn't profiled? Does the comment need to be updated here? wenlei: CallProbe can also be null if the caller isn't profiled? Does the comment need to be updated…
		hoyAuthorUnsubmitted Done Reply Inline Actions Comment updated. hoy: Comment updated.
		wenleiUnsubmitted Not Done Reply Inline Actions nit: We may not find a probe for functions that are not sampled. -> We may not find a probe for functions that are not sampled when --decode-probe-for-profiled-functions-only is on. also clearer if you list the three scenario with bullets as it now grow larger. wenlei: nit: We may not find a probe for functions that are not sampled. -> We may not find a probe…
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. hoy: Sounds good.
SampleContextFrameVector ContextStack;		SampleContextFrameVector ContextStack;
extractPrefixContextStack(ContextStack, CtxKey->Probes, Binary);		extractPrefixContextStack(ContextStack, CtxKey->Context, Binary);
// Fill in function body samples from probes, also infer caller's samples		// Fill in function body samples from probes, also infer caller's samples
// from callee's probe		// from callee's probe
populateBodySamplesWithProbes(CI.second.RangeCounter, ContextStack);		populateBodySamplesWithProbes(CI.second.RangeCounter, ContextStack);
// Fill in boundary samples for a call probe		// Fill in boundary samples for a call probe
populateBoundarySamplesWithProbes(CI.second.BranchCounter, ContextStack);		populateBoundarySamplesWithProbes(CI.second.BranchCounter, ContextStack);
}		}
}		}

void CSProfileGenerator::populateBodySamplesWithProbes(		void CSProfileGenerator::populateBodySamplesWithProbes(
		wenleiUnsubmitted Not Done Reply Inline Actions Inline this call here and move into the loop? there's no other calls to this callee now. wenlei: Inline this call here and move into the loop? there's no other calls to this callee now.
		hoyAuthorUnsubmitted Done Reply Inline Actions Inlined. Not merging the loops since they iteration in different direction, note that the std::reverse in between. hoy: Inlined. Not merging the loops since they iteration in different direction, note that the std…
const RangeSample &RangeCounter, SampleContextFrames ContextStack) {		const RangeSample &RangeCounter, SampleContextFrames ContextStack) {
ProbeCounterMap ProbeCounter;		ProbeCounterMap ProbeCounter;
// Extract the top frame probes by looking up each address among the range in		// Extract the top frame probes by looking up each address among the range in
// the Address2ProbeMap		// the Address2ProbeMap
extractProbesFromRange(RangeCounter, ProbeCounter);		extractProbesFromRange(RangeCounter, ProbeCounter);
std::unordered_map<MCDecodedPseudoProbeInlineTree *,		std::unordered_map<MCDecodedPseudoProbeInlineTree *,
std::unordered_set<FunctionSamples *>>		std::unordered_set<FunctionSamples *>>
FrameSamples;		FrameSamples;
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfiledBinary.h

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	class ProfiledBinary {
std::unique_ptr<MCInstPrinter> IPrinter;		std::unique_ptr<MCInstPrinter> IPrinter;
// A list of text sections sorted by start RVA and size. Used to check		// A list of text sections sorted by start RVA and size. Used to check
// if a given RVA is a valid code address.		// if a given RVA is a valid code address.
std::set<std::pair<uint64_t, uint64_t>> TextSections;		std::set<std::pair<uint64_t, uint64_t>> TextSections;

// A map of mapping function name to BinaryFunction info.		// A map of mapping function name to BinaryFunction info.
std::unordered_map<std::string, BinaryFunction> BinaryFunctions;		std::unordered_map<std::string, BinaryFunction> BinaryFunctions;

		// A list of binary functions that have samples.
		std::unordered_set<const BinaryFunction *> ProfiledFunctions;

// An ordered map of mapping function's start offset to function range		// An ordered map of mapping function's start offset to function range
// relevant info. Currently to determine if the offset of ELF is the start of		// relevant info. Currently to determine if the offset of ELF is the start of
// a real function, we leverage the function range info from DWARF.		// a real function, we leverage the function range info from DWARF.
std::map<uint64_t, FuncRange> StartOffset2FuncRangeMap;		std::map<uint64_t, FuncRange> StartOffset2FuncRangeMap;

// Offset to context location map. Used to expand the context.		// Offset to context location map. Used to expand the context.
std::unordered_map<uint64_t, SampleContextFrameVector> Offset2LocStackMap;		std::unordered_map<uint64_t, SampleContextFrameVector> Offset2LocStackMap;

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	class ProfiledBinary {
// Use to avoid redundant warning.		// Use to avoid redundant warning.
bool MissingMMapWarned = false;		bool MissingMMapWarned = false;

void setPreferredTextSegmentAddresses(const ELFObjectFileBase *O);		void setPreferredTextSegmentAddresses(const ELFObjectFileBase *O);

template <class ELFT>		template <class ELFT>
void setPreferredTextSegmentAddresses(const ELFFile<ELFT> &Obj, StringRef FileName);		void setPreferredTextSegmentAddresses(const ELFFile<ELFT> &Obj, StringRef FileName);

		void checkPseudoProbe(const ELFObjectFileBase *Obj);

void decodePseudoProbe(const ELFObjectFileBase *Obj);		void decodePseudoProbe(const ELFObjectFileBase *Obj);

void		void
checkUseFSDiscriminator(const ELFObjectFileBase *Obj,		checkUseFSDiscriminator(const ELFObjectFileBase *Obj,
std::map<SectionRef, SectionSymbolsTy> &AllSymbols);		std::map<SectionRef, SectionSymbolsTy> &AllSymbols);

// Set up disassembler and related components.		// Set up disassembler and related components.
void setUpDisassembler(const ELFObjectFileBase *Obj);		void setUpDisassembler(const ELFObjectFileBase *Obj);
Show All 37 Lines	ProfiledBinary(const StringRef ExeBinPath, const StringRef DebugBinPath)
: Path(ExeBinPath), DebugBinaryPath(DebugBinPath), ProEpilogTracker(this),		: Path(ExeBinPath), DebugBinaryPath(DebugBinPath), ProEpilogTracker(this),
TrackFuncContextSize(EnableCSPreInliner &&		TrackFuncContextSize(EnableCSPreInliner &&
UseContextCostForPreInliner) {		UseContextCostForPreInliner) {
// Point to executable binary if debug info binary is not specified.		// Point to executable binary if debug info binary is not specified.
SymbolizerPath = DebugBinPath.empty() ? ExeBinPath : DebugBinPath;		SymbolizerPath = DebugBinPath.empty() ? ExeBinPath : DebugBinPath;
setupSymbolizer();		setupSymbolizer();
load();		load();
}		}

		void decodePseudoProbe();

uint64_t virtualAddrToOffset(uint64_t VirtualAddress) const {		uint64_t virtualAddrToOffset(uint64_t VirtualAddress) const {
return VirtualAddress - BaseAddress;		return VirtualAddress - BaseAddress;
}		}
uint64_t offsetToVirtualAddr(uint64_t Offset) const {		uint64_t offsetToVirtualAddr(uint64_t Offset) const {
return Offset + BaseAddress;		return Offset + BaseAddress;
}		}
StringRef getPath() const { return Path; }		StringRef getPath() const { return Path; }
StringRef getName() const { return llvm::sys::path::filename(Path); }		StringRef getName() const { return llvm::sys::path::filename(Path); }
▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	RangesTy getRangesForOffset(uint64_t Offset) {
return FRange->Func->Ranges;		return FRange->Func->Ranges;
}		}

const std::unordered_map<std::string, BinaryFunction> &		const std::unordered_map<std::string, BinaryFunction> &
getAllBinaryFunctions() {		getAllBinaryFunctions() {
return BinaryFunctions;		return BinaryFunctions;
}		}

		std::unordered_set<const BinaryFunction *> &getProfiledFunctions() {
		return ProfiledFunctions;
		}

		void setProfiledFunctions(std::unordered_set<const BinaryFunction *> &Funcs) {
		ProfiledFunctions = Funcs;
		}

BinaryFunction *getBinaryFunction(StringRef FName) {		BinaryFunction *getBinaryFunction(StringRef FName) {
auto I = BinaryFunctions.find(FName.str());		auto I = BinaryFunctions.find(FName.str());
if (I == BinaryFunctions.end())		if (I == BinaryFunctions.end())
return nullptr;		return nullptr;
return &I->second;		return &I->second;
}		}

uint32_t getFuncSizeForContext(SampleContext &Context) {		uint32_t getFuncSizeForContext(SampleContext &Context) {
▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfiledBinary.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	DWPPath("dwp", cl::init(""), cl::ZeroOrMore,
"<binary>.dwp in the same directory as the main binary."));		"<binary>.dwp in the same directory as the main binary."));

static cl::list<std::string> DisassembleFunctions(		static cl::list<std::string> DisassembleFunctions(
"disassemble-functions", cl::CommaSeparated,		"disassemble-functions", cl::CommaSeparated,
cl::desc("List of functions to print disassembly for. Accept demangled "		cl::desc("List of functions to print disassembly for. Accept demangled "
"names only. Only work with show-disassembly-only"));		"names only. Only work with show-disassembly-only"));

extern cl::opt<bool> ShowDetailedWarning;		extern cl::opt<bool> ShowDetailedWarning;

		wenleiUnsubmitted Not Done Reply Inline Actions how about `-decode-probe-for-profiled-functions-only`? wenlei: how about `-decode-probe-for-profiled-functions-only`?
		hoyAuthorUnsubmitted Done Reply Inline Actions Sounds good. hoy: Sounds good.
namespace llvm {		namespace llvm {
namespace sampleprof {		namespace sampleprof {

static const Target getTarget(const ObjectFile Obj) {		static const Target getTarget(const ObjectFile Obj) {
Triple TheTriple = Obj->makeTriple();		Triple TheTriple = Obj->makeTriple();
std::string Error;		std::string Error;
std::string ArchName;		std::string ArchName;
const Target *TheTarget =		const Target *TheTarget =
▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	if (!ProbeNode.getProbes().empty()) {
// Add 0 size to make known.		// Add 0 size to make known.
SizeContext->addFunctionSize(0);		SizeContext->addFunctionSize(0);
}		}

// DFS down the probe inline tree		// DFS down the probe inline tree
for (const auto &ChildNode : ProbeNode.getChildren()) {		for (const auto &ChildNode : ProbeNode.getChildren()) {
InlineSite Location = ChildNode.first;		InlineSite Location = ChildNode.first;
ProbeContext.back().second = std::get<1>(Location);		ProbeContext.back().second = std::get<1>(Location);
trackInlineesOptimizedAway(ProbeDecoder, *ChildNode.second.get(), ProbeContext);		trackInlineesOptimizedAway(ProbeDecoder, *ChildNode.second.get(),
		ProbeContext);
}		}

ProbeContext.pop_back();		ProbeContext.pop_back();
}		}

void ProfiledBinary::warnNoFuncEntry() {		void ProfiledBinary::warnNoFuncEntry() {
uint64_t NoFuncEntryNum = 0;		uint64_t NoFuncEntryNum = 0;
for (auto &F : BinaryFunctions) {		for (auto &F : BinaryFunctions) {
Show All 35 Lines	void ProfiledBinary::load() {
// Current only support X86		// Current only support X86
if (!TheTriple.isX86())		if (!TheTriple.isX86())
exitWithError("unsupported target", TheTriple.getTriple());		exitWithError("unsupported target", TheTriple.getTriple());
LLVM_DEBUG(dbgs() << "Loading " << Path << "\n");		LLVM_DEBUG(dbgs() << "Loading " << Path << "\n");

// Find the preferred load address for text sections.		// Find the preferred load address for text sections.
setPreferredTextSegmentAddresses(Obj);		setPreferredTextSegmentAddresses(Obj);

// Decode pseudo probe related section		checkPseudoProbe(Obj);

		if (ShowDisassemblyOnly)
decodePseudoProbe(Obj);		decodePseudoProbe(Obj);

// Load debug info of subprograms from DWARF section.		// Load debug info of subprograms from DWARF section.
// If path of debug info binary is specified, use the debug info from it,		// If path of debug info binary is specified, use the debug info from it,
// otherwise use the debug info from the executable binary.		// otherwise use the debug info from the executable binary.
if (!DebugBinaryPath.empty()) {		if (!DebugBinaryPath.empty()) {
OwningBinary<Binary> DebugPath =		OwningBinary<Binary> DebugPath =
unwrapOrError(createBinary(DebugBinaryPath), DebugBinaryPath);		unwrapOrError(createBinary(DebugBinaryPath), DebugBinaryPath);
loadSymbolsFromDWARF(*cast<ObjectFile>(DebugPath.getBinary()));		loadSymbolsFromDWARF(*cast<ObjectFile>(DebugPath.getBinary()));
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	ProfiledBinary::getExpandedContext(const SmallVectorImpl<uint64_t> &Stack,
ContextVec.pop_back();		ContextVec.pop_back();
CSProfileGenerator::compressRecursionContext(ContextVec);		CSProfileGenerator::compressRecursionContext(ContextVec);
CSProfileGenerator::trimContext(ContextVec);		CSProfileGenerator::trimContext(ContextVec);
ContextVec.push_back(LeafFrame);		ContextVec.push_back(LeafFrame);
return ContextVec;		return ContextVec;
}		}

template <class ELFT>		template <class ELFT>
void ProfiledBinary::setPreferredTextSegmentAddresses(const ELFFile<ELFT> &Obj, StringRef FileName) {		void ProfiledBinary::setPreferredTextSegmentAddresses(const ELFFile<ELFT> &Obj,
		StringRef FileName) {
		wenleiUnsubmitted Not Done Reply Inline Actions just fyi, you can run linter on your changes only instead of the whole file. Not a big deal though, and thanks for fixing it for entire files. wenlei: just fyi, you can run linter on your changes only instead of the whole file. Not a big deal…
		hoyAuthorUnsubmitted Done Reply Inline Actions arc lint didn't seem to work. I ended up formatting the whole file with the editor. hoy: arc lint didn't seem to work. I ended up formatting the whole file with the editor.
const auto &PhdrRange = unwrapOrError(Obj.program_headers(), FileName);		const auto &PhdrRange = unwrapOrError(Obj.program_headers(), FileName);
// FIXME: This should be the page size of the system running profiling.		// FIXME: This should be the page size of the system running profiling.
// However such info isn't available at post-processing time, assuming		// However such info isn't available at post-processing time, assuming
// 4K page now. Note that we don't use EXEC_PAGESIZE from <linux/param.h>		// 4K page now. Note that we don't use EXEC_PAGESIZE from <linux/param.h>
// because we may build the tools on non-linux.		// because we may build the tools on non-linux.
uint32_t PageSize = 0x1000;		uint32_t PageSize = 0x1000;
for (const typename ELFT::Phdr &Phdr : PhdrRange) {		for (const typename ELFT::Phdr &Phdr : PhdrRange) {
if (Phdr.p_type == ELF::PT_LOAD) {		if (Phdr.p_type == ELF::PT_LOAD) {
if (!FirstLoadableAddress)		if (!FirstLoadableAddress)
FirstLoadableAddress = Phdr.p_vaddr & ~(PageSize - 1U);		FirstLoadableAddress = Phdr.p_vaddr & ~(PageSize - 1U);
if (Phdr.p_flags & ELF::PF_X) {		if (Phdr.p_flags & ELF::PF_X) {
// Segments will always be loaded at a page boundary.		// Segments will always be loaded at a page boundary.
PreferredTextSegmentAddresses.push_back(Phdr.p_vaddr &		PreferredTextSegmentAddresses.push_back(Phdr.p_vaddr &
~(PageSize - 1U));		~(PageSize - 1U));
TextSegmentOffsets.push_back(Phdr.p_offset & ~(PageSize - 1U));		TextSegmentOffsets.push_back(Phdr.p_offset & ~(PageSize - 1U));
}		}
}		}
}		}

if (PreferredTextSegmentAddresses.empty())		if (PreferredTextSegmentAddresses.empty())
exitWithError("no executable segment found", FileName);		exitWithError("no executable segment found", FileName);
}		}

void ProfiledBinary::setPreferredTextSegmentAddresses(const ELFObjectFileBase *Obj) {		void ProfiledBinary::setPreferredTextSegmentAddresses(
		const ELFObjectFileBase *Obj) {
if (const auto *ELFObj = dyn_cast<ELF32LEObjectFile>(Obj))		if (const auto *ELFObj = dyn_cast<ELF32LEObjectFile>(Obj))
setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());		setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());
else if (const auto *ELFObj = dyn_cast<ELF32BEObjectFile>(Obj))		else if (const auto *ELFObj = dyn_cast<ELF32BEObjectFile>(Obj))
setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());		setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());
else if (const auto *ELFObj = dyn_cast<ELF64LEObjectFile>(Obj))		else if (const auto *ELFObj = dyn_cast<ELF64LEObjectFile>(Obj))
setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());		setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());
else if (const auto *ELFObj = cast<ELF64BEObjectFile>(Obj))		else if (const auto *ELFObj = cast<ELF64BEObjectFile>(Obj))
setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());		setPreferredTextSegmentAddresses(ELFObj->getELFFile(), Obj->getFileName());
else		else
llvm_unreachable("invalid ELF object format");		llvm_unreachable("invalid ELF object format");
}		}

void ProfiledBinary::decodePseudoProbe(const ELFObjectFileBase *Obj) {		void ProfiledBinary::checkPseudoProbe(const ELFObjectFileBase *Obj) {
if (UseDwarfCorrelation)		if (UseDwarfCorrelation)
return;		return;

		bool HasProbeDescSection = false;
		bool HasPseudoProbeSection = false;

		StringRef FileName = Obj->getFileName();
		for (section_iterator SI = Obj->section_begin(), SE = Obj->section_end();
		SI != SE; ++SI) {
		const SectionRef &Section = *SI;
		StringRef SectionName = unwrapOrError(Section.getName(), FileName);
		if (SectionName == ".pseudo_probe_desc") {
		HasProbeDescSection = true;
		} else if (SectionName == ".pseudo_probe") {
		HasPseudoProbeSection = true;
		}
		}

		// set UsePseudoProbes flag, used for PerfReader
		UsePseudoProbes = HasProbeDescSection && HasPseudoProbeSection;
		}

		void ProfiledBinary::decodePseudoProbe(const ELFObjectFileBase *Obj) {
		if (!UsePseudoProbes)
		return;

		std::unordered_set<uint64_t> ProfiledGuids;
		if (!ShowDisassemblyOnly)
		wenleiUnsubmitted Not Done Reply Inline Actions can we assert `!ProfiledFunctions.empty()` to make sure profile functions are collected already? wenlei: can we assert `!ProfiledFunctions.empty()` to make sure profile functions are collected already?
		wenleiUnsubmitted Not Done Reply Inline Actions missed this one? wenlei: missed this one?
		hoyAuthorUnsubmitted Done Reply Inline Actions assert added. hoy: assert added.
		hoyAuthorUnsubmitted Done Reply Inline Actions I removed the assert since there could be an outlier case where no function is probed, eg., test/tools/llvm-profgen/mmapEvent.test hoy: I removed the assert since there could be an outlier case where no function is probed, eg.
		for (auto *F : ProfiledFunctions)
		ProfiledGuids.insert(Function::getGUID(F->FuncName));

StringRef FileName = Obj->getFileName();		StringRef FileName = Obj->getFileName();
for (section_iterator SI = Obj->section_begin(), SE = Obj->section_end();		for (section_iterator SI = Obj->section_begin(), SE = Obj->section_end();
		wenleiUnsubmitted Not Done Reply Inline Actions When `DecodeProbeForProfiledFunctionsOnly` is off, are we supposed to keep `ProfiledGuids` empty here for buildAddress2ProbeMap, so all functions will be decoded? wenlei: When `DecodeProbeForProfiledFunctionsOnly` is off, are we supposed to keep `ProfiledGuids`…
		hoyAuthorUnsubmitted Done Reply Inline Actions The function `decodePseudoProbe` was run in two places, one was in the very early binary loading, the other was in profile generation. The first time the function is executed `ProfiledFunctions` should always be empty. The second time it's run, the container may not be empty. This is confusing, I'm now changing it to only run in profile generation except for ShowDisassemblyOnly. hoy: The function `decodePseudoProbe` was run in two places, one was in the very early binary…
		wenleiUnsubmitted Not Done Reply Inline Actions Maybe I missed something, but the only place I see `decodePseudoProbe` run earlier is for `ShowDisassemblyOnly`. When ShowDisassemblyOnly is off, when do we call decodePseudoProbe with empty Guid? And why do we need to run decodePseudoProbe early except for ShowDisassemblyOnly? That was part of the unification I mentioned. wenlei: Maybe I missed something, but the only place I see `decodePseudoProbe` run earlier is for…
		hoyAuthorUnsubmitted Done Reply Inline Actions That was in the previous iterations. In the latest iteration decodePseudoProbe should be called with empty ProfiledFunctions for ShowDisassemblyOnly only. hoy: That was in the previous iterations. In the latest iteration decodePseudoProbe should be…
SI != SE; ++SI) {		SI != SE; ++SI) {
const SectionRef &Section = *SI;		const SectionRef &Section = *SI;
		wenleiUnsubmitted Not Done Reply Inline Actions Others may not be aware what "on-demand" mode means, because with the current version, there's no concept of on-demand - it's decoding all functions or decoding profile functions only. Suggest rephase the wording "on-demand" wenlei: Others may not be aware what "on-demand" mode means, because with the current version, there's…
		hoyAuthorUnsubmitted Done Reply Inline Actions The comment seems not needed after latest refactoring. hoy: The comment seems not needed after latest refactoring.
StringRef SectionName = unwrapOrError(Section.getName(), FileName);		StringRef SectionName = unwrapOrError(Section.getName(), FileName);

if (SectionName == ".pseudo_probe_desc") {		if (SectionName == ".pseudo_probe_desc") {
StringRef Contents = unwrapOrError(Section.getContents(), FileName);		StringRef Contents = unwrapOrError(Section.getContents(), FileName);
if (!ProbeDecoder.buildGUID2FuncDescMap(		if (!ProbeDecoder.buildGUID2FuncDescMap(
reinterpret_cast<const uint8_t *>(Contents.data()),		reinterpret_cast<const uint8_t *>(Contents.data()),
Contents.size()))		Contents.size()))
exitWithError("Pseudo Probe decoder fail in .pseudo_probe_desc section");		exitWithError(
		"Pseudo Probe decoder fail in .pseudo_probe_desc section");
} else if (SectionName == ".pseudo_probe") {		} else if (SectionName == ".pseudo_probe") {
StringRef Contents = unwrapOrError(Section.getContents(), FileName);		StringRef Contents = unwrapOrError(Section.getContents(), FileName);
if (!ProbeDecoder.buildAddress2ProbeMap(		if (!ProbeDecoder.buildAddress2ProbeMap(
reinterpret_cast<const uint8_t *>(Contents.data()),		reinterpret_cast<const uint8_t *>(Contents.data()),
Contents.size()))		Contents.size(), ProfiledGuids))
		wenleiUnsubmitted Not Done Reply Inline Actions How about completely unify the workflow of on-demand vs full decoding? The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids. Correct me if I'm wrong, but I think this was we keep the same behavior for full decoding as of today (no context truncation before preinliner), and we can also remove ProbeStack and replace it with AddressStack everywhere. wenlei: How about completely unify the workflow of on-demand vs full decoding? The only difference…
		wenleiUnsubmitted Not Done Reply Inline Actions The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids. missed this one? By only difference, I meant the only place that need to check `DecodeProbeForProfiledFunctionsOnly`. wenlei: > The only difference would be - for full decoding, we just need to provide a full Guid set…
		hoyAuthorUnsubmitted Done Reply Inline Actions Right now for full decoding we provide an empty `ProfiledGuids` to the decoder instead of computing Guids for all disassembled functions which could be expansive. The only place that requires full decoding is when --show-assembly-only is on. Otherwise it is kind of unified? I'm adding an assert like this assert((!ProfiledFunctions.empty() \|\| !DecodeProbeForProfiledFunctionsOnly \|\| ShowDisassemblyOnly) && "Profiled functions should not be empty in on-demand probe decoding " "mode"); We can also use a 0 as a guid to represent all functions. WDYT? hoy: Right now for full decoding we provide an empty `ProfiledGuids` to the decoder instead of…
		wenleiUnsubmitted Not Done Reply Inline Actions Passing empty guid for full decoding works too. Then can we only check `DecodeProbeForProfiledFunctionsOnly` within `decodePseudoProbe` and pass empty Guid when the flag is false? Currently `DecodeProbeForProfiledFunctionsOnly` is checked at many places before calling `decodePseudoProbe`. What I meant was that regardless of that flag, `decodePseudoProbe` should always be called at the same place (except for `ShowDisassemblyOnly`), but the flag only control what is passed as GuidFilter. wenlei: Passing empty guid for full decoding works too. Then can we only check…
		hoyAuthorUnsubmitted Done Reply Inline Actions I see what you mean. Moved the flag checks into decodePseudoProbe. hoy: I see what you mean. Moved the flag checks into decodePseudoProbe.
exitWithError("Pseudo Probe decoder fail in .pseudo_probe section");		exitWithError("Pseudo Probe decoder fail in .pseudo_probe section");
// set UsePseudoProbes flag, used for PerfReader
UsePseudoProbes = true;
}		}
}		}

// Build TopLevelProbeFrameMap to track size for optimized inlinees when probe		// Build TopLevelProbeFrameMap to track size for optimized inlinees when probe
// is available		// is available
if (UsePseudoProbes && TrackFuncContextSize) {		if (TrackFuncContextSize) {
for (const auto &Child : ProbeDecoder.getDummyInlineRoot().getChildren()) {		for (const auto &Child : ProbeDecoder.getDummyInlineRoot().getChildren()) {
auto *Frame = Child.second.get();		auto *Frame = Child.second.get();
StringRef FuncName =		StringRef FuncName =
ProbeDecoder.getFuncDescForGUID(Frame->Guid)->FuncName;		ProbeDecoder.getFuncDescForGUID(Frame->Guid)->FuncName;
TopLevelProbeFrameMap[FuncName] = Frame;		TopLevelProbeFrameMap[FuncName] = Frame;
}		}
}		}

if (ShowPseudoProbe)		if (ShowPseudoProbe)
ProbeDecoder.printGUID2FuncDescMap(outs());		ProbeDecoder.printGUID2FuncDescMap(outs());
}		}

		void ProfiledBinary::decodePseudoProbe() {
		OwningBinary<Binary> OBinary = unwrapOrError(createBinary(Path), Path);
		Binary &ExeBinary = *OBinary.getBinary();
		auto *Obj = dyn_cast<ELFObjectFileBase>(&ExeBinary);
		decodePseudoProbe(Obj);
		}

void ProfiledBinary::setIsFuncEntry(uint64_t Offset, StringRef RangeSymName) {		void ProfiledBinary::setIsFuncEntry(uint64_t Offset, StringRef RangeSymName) {
// Note that the start offset of each ELF section can be a non-function		// Note that the start offset of each ELF section can be a non-function
// symbol, we need to binary search for the start of a real function range.		// symbol, we need to binary search for the start of a real function range.
auto *FuncRange = findFuncRangeForOffset(Offset);		auto *FuncRange = findFuncRangeForOffset(Offset);
// Skip external function symbol.		// Skip external function symbol.
if (!FuncRange)		if (!FuncRange)
return;		return;

▲ Show 20 Lines • Show All 472 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Decoding pseudo probe for profiled function only.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 417744

llvm/include/llvm/MC/MCPseudoProbe.h

llvm/lib/MC/MCPseudoProbe.cpp

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test

llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test

llvm/tools/llvm-profgen/PerfReader.h

llvm/tools/llvm-profgen/PerfReader.cpp

llvm/tools/llvm-profgen/ProfileGenerator.h

llvm/tools/llvm-profgen/ProfileGenerator.cpp

llvm/tools/llvm-profgen/ProfiledBinary.h

llvm/tools/llvm-profgen/ProfiledBinary.cpp

[llvm-profgen] Decoding pseudo probe for profiled function only.
ClosedPublic