This is an archive of the discontinued LLVM Phabricator instance.

Differential D120430

[memprof] Symbolize and cache stack frames.
ClosedPublic

Authored by snehasish on Feb 23 2022, 12:44 PM.

Download Raw Diff

Details

Reviewers

davidxl
tejohnson

Commits

rGdda7b74967cc: [memprof] Symbolize and cache stack frames.

Summary

Currently, symbolization of stack frames occurs on demand when the instrprof writer
iterates over all the records in the raw memprof reader. With this
change we symbolize and cache the frames immediately after reading the
raw profiles. For a large internal binary this results in a runtime
reduction of ~50% (2m -> 48s) when merging a memprof raw profile with a
raw instr profile to generate an indexed profile. This change also makes
it simpler in the future to generate additional calling context
metadata to attach to each memprof record.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

snehasish created this revision.Feb 23 2022, 12:44 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 23 2022, 12:44 PM

snehasish requested review of this revision.Feb 23 2022, 12:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 12:44 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B151124: Diff 410914.Feb 23 2022, 1:14 PM

davidxl added inline comments.Feb 27 2022, 4:25 PM

llvm/lib/ProfileData/RawMemProfReader.cpp
354	Using a map from addr to Frames for caching can also avoid redundant symbolization computation. Is there an advantage of doing eager symbolization?

What is the memory impact of caching? Hopefully not too onerous since this is a big speedup!

llvm/include/llvm/ProfileData/RawMemProfReader.h
81	I take it initialize() not get called in this case? Maybe add a comment.
llvm/lib/ProfileData/RawMemProfReader.cpp
354	I suspect because it makes it easier for adding additional records, such as identifying interior call stack nodes within their functions (we need to mark these with metadata as well).

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 5:47 PM

Address comment.

Updated the patch with comments. PTAL, thanks!

In D120430#3355934, @tejohnson wrote:

What is the memory impact of caching? Hopefully not too onerous since this is a big speedup!

There is hardly any change in peak memory consumption, valgrind massif shows 8.618G (now) vs 8.615G (before). In this patch we have introduced a layer of indirection to where the indexing (keys in map of PC->Frame) account for the small increase in memory.

llvm/lib/ProfileData/RawMemProfReader.cpp
354	Caching the frames outside the local context allows us to decouple the record generation from the symbolization and makes it easier to extend for pruning (D120860) and marking interior callstack nodes (as Teresa noted). Note that the symbolization here isn't eager in the sense that more work is performed, the number of unique addresses symbolized remains the same before and after this patch.

snehasish added a child revision: D120860: [memprof] Filter out callstack frames which cannot be symbolized..Mar 2 2022, 6:27 PM

Harbormaster completed remote builds in B152279: Diff 412584.Mar 2 2022, 7:19 PM

lgtm

This revision is now accepted and ready to land.Mar 3 2022, 9:45 AM

lgtm

This revision was landed with ongoing or failed builds.Mar 3 2022, 11:01 AM

Closed by commit rGdda7b74967cc: [memprof] Symbolize and cache stack frames. (authored by snehasish). · Explain Why

This revision was automatically updated to reflect the committed changes.

snehasish added a commit: rGdda7b74967cc: [memprof] Symbolize and cache stack frames..

snehasish mentioned this in D121179: [memprof] Store callsite metadata with memprof records..Mar 21 2022, 11:29 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

ProfileData/

MemProf.h

2 lines

RawMemProfReader.h

17 lines

lib/

ProfileData/

RawMemProfReader.cpp

69 lines

unittests/

ProfileData/

MemProfTest.cpp

2 lines

Diff 412777

llvm/include/llvm/ProfileData/MemProf.h

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	struct Frame {
// Returns the size of the frame information.		// Returns the size of the frame information.
static constexpr size_t serializedSize() {		static constexpr size_t serializedSize() {
return sizeof(Frame::Function) + sizeof(Frame::LineOffset) +		return sizeof(Frame::Function) + sizeof(Frame::LineOffset) +
sizeof(Frame::Column) + sizeof(Frame::IsInlineFrame);		sizeof(Frame::Column) + sizeof(Frame::IsInlineFrame);
}		}
};		};

// The dynamic calling context for the allocation.		// The dynamic calling context for the allocation.
std::vector<Frame> CallStack;		llvm::SmallVector<Frame> CallStack;
// The statistics obtained from the runtime for the allocation.		// The statistics obtained from the runtime for the allocation.
PortableMemInfoBlock Info;		PortableMemInfoBlock Info;

void clear() {		void clear() {
CallStack.clear();		CallStack.clear();
Info.clear();		Info.clear();
}		}

▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/include/llvm/ProfileData/RawMemProfReader.h

Show All 26 Lines

#include <cstddef>		#include <cstddef>

namespace llvm {		namespace llvm {
namespace memprof {		namespace memprof {

// Map from id (recorded from sanitizer stack depot) to virtual addresses for		// Map from id (recorded from sanitizer stack depot) to virtual addresses for
// each program counter address in the callstack.		// each program counter address in the callstack.
using CallStackMap = llvm::DenseMap<uint64_t, llvm::SmallVector<uint64_t, 32>>;		using CallStackMap = llvm::DenseMap<uint64_t, llvm::SmallVector<uint64_t>>;

class RawMemProfReader {		class RawMemProfReader {
public:		public:
RawMemProfReader(std::unique_ptr<MemoryBuffer> DataBuffer)		RawMemProfReader(std::unique_ptr<MemoryBuffer> DataBuffer)
: DataBuffer(std::move(DataBuffer)) {}		: DataBuffer(std::move(DataBuffer)) {}
RawMemProfReader(const RawMemProfReader &) = delete;		RawMemProfReader(const RawMemProfReader &) = delete;
RawMemProfReader &operator=(const RawMemProfReader &) = delete;		RawMemProfReader &operator=(const RawMemProfReader &) = delete;

Show All 26 Lines	public:
InstrProfKind getProfileKind() const { return InstrProfKind::MemProf; }		InstrProfKind getProfileKind() const { return InstrProfKind::MemProf; }

// Constructor for unittests only.		// Constructor for unittests only.
RawMemProfReader(std::unique_ptr<llvm::symbolize::SymbolizableModule> Sym,		RawMemProfReader(std::unique_ptr<llvm::symbolize::SymbolizableModule> Sym,
llvm::SmallVectorImpl<SegmentEntry> &Seg,		llvm::SmallVectorImpl<SegmentEntry> &Seg,
llvm::MapVector<uint64_t, MemInfoBlock> &Prof,		llvm::MapVector<uint64_t, MemInfoBlock> &Prof,
CallStackMap &SM)		CallStackMap &SM)
: Symbolizer(std::move(Sym)), SegmentInfo(Seg.begin(), Seg.end()),		: Symbolizer(std::move(Sym)), SegmentInfo(Seg.begin(), Seg.end()),
ProfileData(Prof), StackMap(SM) {}		ProfileData(Prof), StackMap(SM) {
		// We don't call initialize here since there is no raw profile to read. The
		// test should pass in the raw profile as structured data.

		tejohnsonUnsubmitted Done Reply Inline Actions I take it initialize() not get called in this case? Maybe add a comment. tejohnson: I take it initialize() not get called in this case? Maybe add a comment.
		// If there is an error here then the mock symbolizer has not been
		// initialized properly.
		if (Error E = symbolizeStackFrames())
		report_fatal_error(std::move(E));
		}

private:		private:
RawMemProfReader(std::unique_ptr<MemoryBuffer> DataBuffer,		RawMemProfReader(std::unique_ptr<MemoryBuffer> DataBuffer,
object::OwningBinary<object::Binary> &&Bin)		object::OwningBinary<object::Binary> &&Bin)
: DataBuffer(std::move(DataBuffer)), Binary(std::move(Bin)) {}		: DataBuffer(std::move(DataBuffer)), Binary(std::move(Bin)) {}
Error initialize();		Error initialize();
Error readRawProfile();		Error readRawProfile();
		Error symbolizeStackFrames();

object::SectionedAddress getModuleOffset(uint64_t VirtualAddress);		object::SectionedAddress getModuleOffset(uint64_t VirtualAddress);
Error fillRecord(const uint64_t Id, const MemInfoBlock &MIB,		Error fillRecord(const uint64_t Id, const MemInfoBlock &MIB,
MemProfRecord &Record);		MemProfRecord &Record);
// Prints aggregate counts for each raw profile parsed from the DataBuffer in		// Prints aggregate counts for each raw profile parsed from the DataBuffer in
// YAML format.		// YAML format.
void printSummaries(raw_ostream &OS) const;		void printSummaries(raw_ostream &OS) const;

std::unique_ptr<MemoryBuffer> DataBuffer;		std::unique_ptr<MemoryBuffer> DataBuffer;
object::OwningBinary<object::Binary> Binary;		object::OwningBinary<object::Binary> Binary;
std::unique_ptr<llvm::symbolize::SymbolizableModule> Symbolizer;		std::unique_ptr<llvm::symbolize::SymbolizableModule> Symbolizer;

// The contents of the raw profile.		// The contents of the raw profile.
llvm::SmallVector<SegmentEntry, 16> SegmentInfo;		llvm::SmallVector<SegmentEntry, 16> SegmentInfo;
// A map from callstack id (same as key in CallStackMap below) to the heap		// A map from callstack id (same as key in CallStackMap below) to the heap
// information recorded for that allocation context.		// information recorded for that allocation context.
llvm::MapVector<uint64_t, MemInfoBlock> ProfileData;		llvm::MapVector<uint64_t, MemInfoBlock> ProfileData;
CallStackMap StackMap;		CallStackMap StackMap;

		// Cached symbolization from PC to Frame.
		llvm::DenseMap<uint64_t, llvm::SmallVector<MemProfRecord::Frame>>
		SymbolizedFrame;

// Iterator to read from the ProfileData MapVector.		// Iterator to read from the ProfileData MapVector.
llvm::MapVector<uint64_t, MemInfoBlock>::iterator Iter = ProfileData.end();		llvm::MapVector<uint64_t, MemInfoBlock>::iterator Iter = ProfileData.end();
};		};

} // namespace memprof		} // namespace memprof
} // namespace llvm		} // namespace llvm

#endif // LLVM_PROFILEDATA_RAWMEMPROFREADER_H_		#endif // LLVM_PROFILEDATA_RAWMEMPROFREADER_H_

llvm/lib/ProfileData/RawMemProfReader.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	CallStackMap readStackInfo(const char *Ptr) {
const uint64_t NumItemsToRead =		const uint64_t NumItemsToRead =
endian::readNext<uint64_t, little, unaligned>(Ptr);		endian::readNext<uint64_t, little, unaligned>(Ptr);
CallStackMap Items;		CallStackMap Items;

for (uint64_t I = 0; I < NumItemsToRead; I++) {		for (uint64_t I = 0; I < NumItemsToRead; I++) {
const uint64_t StackId = endian::readNext<uint64_t, little, unaligned>(Ptr);		const uint64_t StackId = endian::readNext<uint64_t, little, unaligned>(Ptr);
const uint64_t NumPCs = endian::readNext<uint64_t, little, unaligned>(Ptr);		const uint64_t NumPCs = endian::readNext<uint64_t, little, unaligned>(Ptr);

SmallVector<uint64_t, 32> CallStack;		SmallVector<uint64_t> CallStack;
for (uint64_t J = 0; J < NumPCs; J++) {		for (uint64_t J = 0; J < NumPCs; J++) {
CallStack.push_back(endian::readNext<uint64_t, little, unaligned>(Ptr));		CallStack.push_back(endian::readNext<uint64_t, little, unaligned>(Ptr));
}		}

Items[StackId] = CallStack;		Items[StackId] = CallStack;
}		}
return Items;		return Items;
}		}
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	std::unique_ptr<DIContext> Context = DWARFContext::create(
*Object, DWARFContext::ProcessDebugRelocations::Process);		*Object, DWARFContext::ProcessDebugRelocations::Process);

auto SOFOr = symbolize::SymbolizableObjectFile::create(		auto SOFOr = symbolize::SymbolizableObjectFile::create(
Object, std::move(Context), /UntagAddresses=/false);		Object, std::move(Context), /UntagAddresses=/false);
if (!SOFOr)		if (!SOFOr)
return report(SOFOr.takeError(), FileName);		return report(SOFOr.takeError(), FileName);
Symbolizer = std::move(SOFOr.get());		Symbolizer = std::move(SOFOr.get());

return readRawProfile();		if (Error E = readRawProfile())
		return E;

		return symbolizeStackFrames();
		}

		Error RawMemProfReader::symbolizeStackFrames() {
		// The specifier to use when symbolization is requested.
		const DILineInfoSpecifier Specifier(
		DILineInfoSpecifier::FileLineInfoKind::RawValue,
		DILineInfoSpecifier::FunctionNameKind::LinkageName);

		for (const auto &Entry : StackMap) {
		for (const uint64_t VAddr : Entry.getSecond()) {
		// Check if we have already symbolized and cached the result.
		if (SymbolizedFrame.count(VAddr) > 0)
		continue;

		Expected<DIInliningInfo> DIOr = Symbolizer->symbolizeInlinedCode(
		getModuleOffset(VAddr), Specifier, /UseSymbolTable=/false);
		if (!DIOr)
		return DIOr.takeError();
		DIInliningInfo DI = DIOr.get();

		for (size_t I = 0; I < DI.getNumberOfFrames(); I++) {
		const auto &Frame = DI.getFrame(I);
		SymbolizedFrame[VAddr].emplace_back(
		// We use the function guid which we expect to be a uint64_t. At
		// this time, it is the lower 64 bits of the md5 of the function
		// name. Any suffix with .llvm. is trimmed since these are added by
		// thinLTO global promotion. At the time the profile is consumed,
		// these suffixes will not be present.
		Function::getGUID(trimSuffix(Frame.FunctionName)),
		Frame.Line - Frame.StartLine, Frame.Column,
		// Only the first entry is not an inlined location.
		I != 0);
		}
		}
		}
		return Error::success();
}		}

Error RawMemProfReader::readRawProfile() {		Error RawMemProfReader::readRawProfile() {
const char *Next = DataBuffer->getBufferStart();		const char *Next = DataBuffer->getBufferStart();

while (Next < DataBuffer->getBufferEnd()) {		while (Next < DataBuffer->getBufferEnd()) {
auto Header = reinterpret_cast<const memprof::Header >(Next);		auto Header = reinterpret_cast<const memprof::Header >(Next);

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	RawMemProfReader::getModuleOffset(const uint64_t VirtualAddress) {
// TODO: Compute the file offset based on the maps and program headers. For		// TODO: Compute the file offset based on the maps and program headers. For
// now this only works for non PIE binaries.		// now this only works for non PIE binaries.
return object::SectionedAddress{VirtualAddress};		return object::SectionedAddress{VirtualAddress};
}		}

Error RawMemProfReader::fillRecord(const uint64_t Id, const MemInfoBlock &MIB,		Error RawMemProfReader::fillRecord(const uint64_t Id, const MemInfoBlock &MIB,
MemProfRecord &Record) {		MemProfRecord &Record) {
auto &CallStack = StackMap[Id];		auto &CallStack = StackMap[Id];
DILineInfoSpecifier Specifier(
DILineInfoSpecifier::FileLineInfoKind::RawValue,
DILineInfoSpecifier::FunctionNameKind::LinkageName);
for (const uint64_t Address : CallStack) {		for (const uint64_t Address : CallStack) {
Expected<DIInliningInfo> DIOr = Symbolizer->symbolizeInlinedCode(		assert(SymbolizedFrame.count(Address) &&
davidxlUnsubmitted Not Done Reply Inline Actions Using a map from addr to Frames for caching can also avoid redundant symbolization computation. Is there an advantage of doing eager symbolization? davidxl: Using a map from addr to Frames for caching can also avoid redundant symbolization computation.
tejohnsonUnsubmitted Not Done Reply Inline Actions I suspect because it makes it easier for adding additional records, such as identifying interior call stack nodes within their functions (we need to mark these with metadata as well). tejohnson: I suspect because it makes it easier for adding additional records, such as identifying…
snehasishAuthorUnsubmitted Done Reply Inline Actions Caching the frames outside the local context allows us to decouple the record generation from the symbolization and makes it easier to extend for pruning (D120860) and marking interior callstack nodes (as Teresa noted). Note that the symbolization here isn't eager in the sense that more work is performed, the number of unique addresses symbolized remains the same before and after this patch. snehasish: Caching the frames outside the local context allows us to decouple the record generation from…
getModuleOffset(Address), Specifier, /UseSymbolTable=/false);		"Address not found in symbolized frame cache.");
		Record.CallStack.append(SymbolizedFrame[Address]);
if (!DIOr)
return DIOr.takeError();
DIInliningInfo DI = DIOr.get();

for (size_t I = 0; I < DI.getNumberOfFrames(); I++) {
const auto &Frame = DI.getFrame(I);
Record.CallStack.emplace_back(
// We use the function guid which we expect to be a uint64_t. At this
// time, it is the lower 64 bits of the md5 of the function name. Any
// suffix with .llvm. is trimmed since these are added by thinLTO
// global promotion. At the time the profile is consumed, these
// suffixes will not be present.
Function::getGUID(trimSuffix(Frame.FunctionName)),
Frame.Line - Frame.StartLine, Frame.Column,
// Only the first entry is not an inlined location.
I != 0);
}
}		}
Record.Info = PortableMemInfoBlock(MIB);		Record.Info = PortableMemInfoBlock(MIB);
return Error::success();		return Error::success();
}		}

Error RawMemProfReader::readNextRecord(MemProfRecord &Record) {		Error RawMemProfReader::readNextRecord(MemProfRecord &Record) {
if (ProfileData.empty())		if (ProfileData.empty())
return make_error<InstrProfError>(instrprof_error::empty_raw_profile);		return make_error<InstrProfError>(instrprof_error::empty_raw_profile);
Show All 13 Lines

llvm/unittests/ProfileData/MemProfTest.cpp

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	#undef MIBEntryDef
return Schema;		return Schema;
}		}

TEST(MemProf, FillsValue) {		TEST(MemProf, FillsValue) {
std::unique_ptr<MockSymbolizer> Symbolizer(new MockSymbolizer());		std::unique_ptr<MockSymbolizer> Symbolizer(new MockSymbolizer());

EXPECT_CALL(*Symbolizer, symbolizeInlinedCode(SectionedAddress{0x2000},		EXPECT_CALL(*Symbolizer, symbolizeInlinedCode(SectionedAddress{0x2000},
specifier(), false))		specifier(), false))
.Times(2)		.Times(1) // Only once since we cache the result for future lookups.
.WillRepeatedly(Return(makeInliningInfo({		.WillRepeatedly(Return(makeInliningInfo({
{"foo", 10, 5, 30},		{"foo", 10, 5, 30},
{"bar", 201, 150, 20},		{"bar", 201, 150, 20},
})));		})));

EXPECT_CALL(*Symbolizer, symbolizeInlinedCode(SectionedAddress{0x6000},		EXPECT_CALL(*Symbolizer, symbolizeInlinedCode(SectionedAddress{0x6000},
specifier(), false))		specifier(), false))
.Times(1)		.Times(1)
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[memprof] Symbolize and cache stack frames.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 412777

llvm/include/llvm/ProfileData/MemProf.h

llvm/include/llvm/ProfileData/RawMemProfReader.h

llvm/lib/ProfileData/RawMemProfReader.cpp

llvm/unittests/ProfileData/MemProfTest.cpp

[memprof] Symbolize and cache stack frames.
ClosedPublic