This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/tools/llvm-profgen/
-
tools/
-
llvm-profgen/
-
ProfileGenerator.cpp
2/5
ProfiledBinary.h
-
ProfiledBinary.cpp

Differential D128859

[llvm-profgen] Do not cache the frame location stack during computing inlined context size
ClosedPublic

Authored by wlei on Jun 29 2022, 2:56 PM.

Download Raw Diff

Details

Reviewers

hoy
wenlei

Commits

rG91cc53d5a455: [llvm-profgen] Do not cache the frame location stack during computing inlined…

Summary

In computeInlinedContextSizeForRange, the offset of range is only used one time, there is no need to cache the frame location stack.
Measured on one internal service binary, this can save 2GB memory usage and reduce a small run time (avoid one hash search).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,360 ms	x64 debian > Clang.Driver::arm-cortex-cpus-2.c
	60,060 ms	x64 debian > Clang.Driver::emit-reproducer.c
	60,380 ms	x64 debian > Clang.Driver::fsanitize.c
	60,920 ms	x64 debian > Clang.OpenMP::target_defaultmap_codegen_01.cpp
	60,910 ms	x64 debian > Clang.OpenMP::target_update_codegen.cpp

Event Timeline

wlei created this revision.Jun 29 2022, 2:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 2:56 PM

Herald added subscribers: hoy, wenlei. · View Herald Transcript

wlei requested review of this revision.Jun 29 2022, 2:56 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 2:56 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

wlei edited the summary of this revision. (Show Details)Jun 29 2022, 3:05 PM

wlei added reviewers: hoy, wenlei.

Harbormaster completed remote builds in B172879: Diff 441180.Jun 29 2022, 4:32 PM

Nice catch!

llvm/tools/llvm-profgen/ProfiledBinary.h
496–502	How about extend this function with a parameter for caching or not?

wlei added inline comments.Jul 11 2022, 9:30 AM

llvm/tools/llvm-profgen/ProfiledBinary.h
496–502	It seems the return types of the two functions should be different, the one without caching should return the value type instead of the reference as we don't store it to a map.

hoy added inline comments.Jul 11 2022, 11:15 AM

llvm/tools/llvm-profgen/ProfiledBinary.h
496–502	I see. Then it makes sense to leave them separate. How about flip the names, i.e getFrameLocationStackWithoutCaching -> getFrameLocationStack getFrameLocationStack -> getCachedFrameLocation Caching is expansive so we want users to be explicitly aware of it. Also perhaps in the future we want to use two containers for cached contexts, one for UseProbeDiscriminator==true and one for UseProbeDiscriminator==false.

wlei added inline comments.Jul 11 2022, 4:01 PM

llvm/tools/llvm-profgen/ProfiledBinary.h
496–502	Renamed, thanks for the suggestion!

renaming according to reviewer's suggestion

hoy accepted this revision.Jul 11 2022, 4:36 PM

This revision is now accepted and ready to land.Jul 11 2022, 4:36 PM

hoy added inline comments.Jul 11 2022, 4:37 PM

llvm/tools/llvm-profgen/ProfiledBinary.h
502	nit: getCachedFrameLocation -> getCachedFrameLocationStack Sorry for missing it previously.

Harbormaster completed remote builds in B174763: Diff 443782.Jul 11 2022, 4:57 PM

Updating D128859: [llvm-profgen] Do not cache the frame location stack during computing inlined context size

Harbormaster completed remote builds in B174772: Diff 443793.Jul 11 2022, 6:15 PM

Good find, thanks for the memory improvement!

This revision was landed with ongoing or failed builds.Oct 25 2022, 9:11 PM

Closed by commit rG91cc53d5a455: [llvm-profgen] Do not cache the frame location stack during computing inlined… (authored by wlei). · Explain Why

This revision was automatically updated to reflect the committed changes.

wlei added a commit: rG91cc53d5a455: [llvm-profgen] Do not cache the frame location stack during computing inlined….

Revision Contents

Path

Size

llvm/

tools/

llvm-profgen/

ProfileGenerator.cpp

4 lines

ProfiledBinary.h

16 lines

ProfiledBinary.cpp

8 lines

Diff 443782

llvm/tools/llvm-profgen/ProfileGenerator.cpp

Show First 20 Lines • Show All 664 Lines • ▼ Show 20 Lines	for (const auto &Range : preprocessRangeCounter(RangeCounter)) {
// Disjoint ranges may have range in the middle of two instr,		// Disjoint ranges may have range in the middle of two instr,
// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range		// e.g. If Instr1 at Addr1, and Instr2 at Addr2, disjoint range
// can be Addr1+1 to Addr2-1. We should ignore such range.		// can be Addr1+1 to Addr2-1. We should ignore such range.
if (IP.Address > RangeEnd)		if (IP.Address > RangeEnd)
continue;		continue;

do {		do {
uint64_t Offset = Binary->virtualAddrToOffset(IP.Address);		uint64_t Offset = Binary->virtualAddrToOffset(IP.Address);
const SampleContextFrameVector &FrameVec =		const SampleContextFrameVector FrameVec =
Binary->getFrameLocationStack(Offset);		Binary->getFrameLocationStack(Offset);
if (!FrameVec.empty()) {		if (!FrameVec.empty()) {
// FIXME: As accumulating total count per instruction caused some		// FIXME: As accumulating total count per instruction caused some
// regression, we changed to accumulate total count per byte as a		// regression, we changed to accumulate total count per byte as a
// workaround. Tuning hotness threshold on the compiler side might be		// workaround. Tuning hotness threshold on the compiler side might be
// necessary in the future.		// necessary in the future.
FunctionSamples &FunctionProfile = getLeafProfileAndAddTotalSamples(		FunctionSamples &FunctionProfile = getLeafProfileAndAddTotalSamples(
FrameVec, Count * Binary->getInstSize(Offset));		FrameVec, Count * Binary->getInstSize(Offset));
Show All 24 Lines	for (const auto &Entry : BranchCounters) {
uint64_t Count = Entry.second;		uint64_t Count = Entry.second;
assert(Count != 0 && "Unexpected zero weight branch");		assert(Count != 0 && "Unexpected zero weight branch");

StringRef CalleeName = getCalleeNameForOffset(TargetOffset);		StringRef CalleeName = getCalleeNameForOffset(TargetOffset);
if (CalleeName.size() == 0)		if (CalleeName.size() == 0)
continue;		continue;
// Record called target sample and its count.		// Record called target sample and its count.
const SampleContextFrameVector &FrameVec =		const SampleContextFrameVector &FrameVec =
Binary->getFrameLocationStack(SourceOffset);		Binary->getCachedFrameLocation(SourceOffset);
if (!FrameVec.empty()) {		if (!FrameVec.empty()) {
FunctionSamples &FunctionProfile =		FunctionSamples &FunctionProfile =
getLeafProfileAndAddTotalSamples(FrameVec, 0);		getLeafProfileAndAddTotalSamples(FrameVec, 0);
FunctionProfile.addCalledTargetSamples(		FunctionProfile.addCalledTargetSamples(
FrameVec.back().Location.LineOffset,		FrameVec.back().Location.LineOffset,
getBaseDiscriminator(FrameVec.back().Location.Discriminator),		getBaseDiscriminator(FrameVec.back().Location.Discriminator),
CalleeName, Count);		CalleeName, Count);
}		}
▲ Show 20 Lines • Show All 521 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfiledBinary.h

Show First 20 Lines • Show All 486 Lines • ▼ Show 20 Lines	public:

uint32_t getFuncSizeForContext(const ContextTrieNode *ContextNode) {		uint32_t getFuncSizeForContext(const ContextTrieNode *ContextNode) {
return FuncSizeTracker.getFuncSizeForContext(ContextNode);		return FuncSizeTracker.getFuncSizeForContext(ContextNode);
}		}

// Load the symbols from debug table and populate into symbol list.		// Load the symbols from debug table and populate into symbol list.
void populateSymbolListFromDWARF(ProfileSymbolList &SymbolList);		void populateSymbolListFromDWARF(ProfileSymbolList &SymbolList);

const SampleContextFrameVector &		SampleContextFrameVector
getFrameLocationStack(uint64_t Offset, bool UseProbeDiscriminator = false) {		getFrameLocationStack(uint64_t Offset, bool UseProbeDiscriminator = false) {
auto I = Offset2LocStackMap.emplace(Offset, SampleContextFrameVector());
if (I.second) {
InstructionPointer IP(this, Offset);		InstructionPointer IP(this, Offset);
I.first->second = symbolize(IP, true, UseProbeDiscriminator);		return symbolize(IP, true, UseProbeDiscriminator);
}		}

		const SampleContextFrameVector &
		getCachedFrameLocation(uint64_t Offset, bool UseProbeDiscriminator = false) {
		hoyUnsubmitted Not Done Reply Inline Actions How about extend this function with a parameter for caching or not? hoy: How about extend this function with a parameter for caching or not?
		wleiAuthorUnsubmitted Done Reply Inline Actions It seems the return types of the two functions should be different, the one without caching should return the value type instead of the reference as we don't store it to a map. wlei: It seems the return types of the two functions should be different, the one without caching…
		hoyUnsubmitted Not Done Reply Inline Actions I see. Then it makes sense to leave them separate. How about flip the names, i.e getFrameLocationStackWithoutCaching -> getFrameLocationStack getFrameLocationStack -> getCachedFrameLocation Caching is expansive so we want users to be explicitly aware of it. Also perhaps in the future we want to use two containers for cached contexts, one for UseProbeDiscriminator==true and one for UseProbeDiscriminator==false. hoy: I see. Then it makes sense to leave them separate. How about flip the names, i.e…
		wleiAuthorUnsubmitted Done Reply Inline Actions Renamed, thanks for the suggestion! wlei: Renamed, thanks for the suggestion!
		hoyUnsubmitted Not Done Reply Inline Actions nit: getCachedFrameLocation -> getCachedFrameLocationStack Sorry for missing it previously. hoy: nit: getCachedFrameLocation -> getCachedFrameLocationStack Sorry for missing it previously.
		auto I = Offset2LocStackMap.emplace(Offset, SampleContextFrameVector());
		if (I.second)
		I.first->second = getFrameLocationStack(Offset, UseProbeDiscriminator);
return I.first->second;		return I.first->second;
}		}

Optional<SampleContextFrame> getInlineLeafFrameLoc(uint64_t Offset) {		Optional<SampleContextFrame> getInlineLeafFrameLoc(uint64_t Offset) {
const auto &Stack = getFrameLocationStack(Offset);		const auto &Stack = getCachedFrameLocation(Offset);
if (Stack.empty())		if (Stack.empty())
return {};		return {};
return Stack.back();		return Stack.back();
}		}

void flushSymbolizer() { Symbolizer.reset(); }		void flushSymbolizer() { Symbolizer.reset(); }

// Compare two addresses' inline context		// Compare two addresses' inline context
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/tools/llvm-profgen/ProfiledBinary.cpp

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	void ProfiledBinary::load() {
warnNoFuncEntry();		warnNoFuncEntry();

// TODO: decode other sections.		// TODO: decode other sections.
}		}

bool ProfiledBinary::inlineContextEqual(uint64_t Address1, uint64_t Address2) {		bool ProfiledBinary::inlineContextEqual(uint64_t Address1, uint64_t Address2) {
uint64_t Offset1 = virtualAddrToOffset(Address1);		uint64_t Offset1 = virtualAddrToOffset(Address1);
uint64_t Offset2 = virtualAddrToOffset(Address2);		uint64_t Offset2 = virtualAddrToOffset(Address2);
const SampleContextFrameVector &Context1 = getFrameLocationStack(Offset1);		const SampleContextFrameVector &Context1 = getCachedFrameLocation(Offset1);
const SampleContextFrameVector &Context2 = getFrameLocationStack(Offset2);		const SampleContextFrameVector &Context2 = getCachedFrameLocation(Offset2);
if (Context1.size() != Context2.size())		if (Context1.size() != Context2.size())
return false;		return false;
if (Context1.empty())		if (Context1.empty())
return false;		return false;
// The leaf frame contains location within the leaf, and it		// The leaf frame contains location within the leaf, and it
// needs to be remove that as it's not part of the calling context		// needs to be remove that as it's not part of the calling context
return std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1,		return std::equal(Context1.begin(), Context1.begin() + Context1.size() - 1,
Context2.begin(), Context2.begin() + Context2.size() - 1);		Context2.begin(), Context2.begin() + Context2.size() - 1);
}		}

SampleContextFrameVector		SampleContextFrameVector
ProfiledBinary::getExpandedContext(const SmallVectorImpl<uint64_t> &Stack,		ProfiledBinary::getExpandedContext(const SmallVectorImpl<uint64_t> &Stack,
bool &WasLeafInlined) {		bool &WasLeafInlined) {
SampleContextFrameVector ContextVec;		SampleContextFrameVector ContextVec;
if (Stack.empty())		if (Stack.empty())
return ContextVec;		return ContextVec;
// Process from frame root to leaf		// Process from frame root to leaf
for (auto Address : Stack) {		for (auto Address : Stack) {
uint64_t Offset = virtualAddrToOffset(Address);		uint64_t Offset = virtualAddrToOffset(Address);
const SampleContextFrameVector &ExpandedContext =		const SampleContextFrameVector &ExpandedContext =
getFrameLocationStack(Offset);		getCachedFrameLocation(Offset);
// An instruction without a valid debug line will be ignored by sample		// An instruction without a valid debug line will be ignored by sample
// processing		// processing
if (ExpandedContext.empty())		if (ExpandedContext.empty())
return SampleContextFrameVector();		return SampleContextFrameVector();
// Set WasLeafInlined to the size of inlined frame count for the last		// Set WasLeafInlined to the size of inlined frame count for the last
// address which is leaf		// address which is leaf
WasLeafInlined = (ExpandedContext.size() > 1);		WasLeafInlined = (ExpandedContext.size() > 1);
ContextVec.append(ExpandedContext);		ContextVec.append(ExpandedContext);
▲ Show 20 Lines • Show All 543 Lines • ▼ Show 20 Lines	if (IP.Address != RangeBegin)
WithColor::warning() << "Invalid start instruction at "		WithColor::warning() << "Invalid start instruction at "
<< format("%8" PRIx64, RangeBegin) << "\n";		<< format("%8" PRIx64, RangeBegin) << "\n";

if (IP.Address >= RangeEnd)		if (IP.Address >= RangeEnd)
return;		return;

do {		do {
uint64_t Offset = virtualAddrToOffset(IP.Address);		uint64_t Offset = virtualAddrToOffset(IP.Address);
const SampleContextFrameVector &SymbolizedCallStack =		const SampleContextFrameVector SymbolizedCallStack =
getFrameLocationStack(Offset, UsePseudoProbes);		getFrameLocationStack(Offset, UsePseudoProbes);
uint64_t Size = Offset2InstSizeMap[Offset];		uint64_t Size = Offset2InstSizeMap[Offset];

// Record instruction size for the corresponding context		// Record instruction size for the corresponding context
FuncSizeTracker.addInstructionForContext(SymbolizedCallStack, Size);		FuncSizeTracker.addInstructionForContext(SymbolizedCallStack, Size);

} while (IP.advance() && IP.Address < RangeEnd);		} while (IP.advance() && IP.Address < RangeEnd);
}		}
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines