This is an archive of the discontinued LLVM Phabricator instance.

[memprof] Symbolize and cache stack frames.
ClosedPublic

Authored by snehasish on Feb 23 2022, 12:44 PM.

Details

Summary

Currently, symbolization of stack frames occurs on demand when the instrprof writer
iterates over all the records in the raw memprof reader. With this
change we symbolize and cache the frames immediately after reading the
raw profiles. For a large internal binary this results in a runtime
reduction of ~50% (2m -> 48s) when merging a memprof raw profile with a
raw instr profile to generate an indexed profile. This change also makes
it simpler in the future to generate additional calling context
metadata to attach to each memprof record.

Diff Detail

Event Timeline

snehasish created this revision.Feb 23 2022, 12:44 PM
snehasish requested review of this revision.Feb 23 2022, 12:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 12:44 PM
davidxl added inline comments.Feb 27 2022, 4:25 PM
llvm/lib/ProfileData/RawMemProfReader.cpp
354

Using a map from addr to Frames for caching can also avoid redundant symbolization computation. Is there an advantage of doing eager symbolization?

What is the memory impact of caching? Hopefully not too onerous since this is a big speedup!

llvm/include/llvm/ProfileData/RawMemProfReader.h
81

I take it initialize() not get called in this case? Maybe add a comment.

llvm/lib/ProfileData/RawMemProfReader.cpp
354

I suspect because it makes it easier for adding additional records, such as identifying interior call stack nodes within their functions (we need to mark these with metadata as well).

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 5:47 PM
snehasish updated this revision to Diff 412584.Mar 2 2022, 5:57 PM

Address comment.

snehasish marked an inline comment as done.Mar 2 2022, 6:27 PM

Updated the patch with comments. PTAL, thanks!

What is the memory impact of caching? Hopefully not too onerous since this is a big speedup!

There is hardly any change in peak memory consumption, valgrind massif shows 8.618G (now) vs 8.615G (before). In this patch we have introduced a layer of indirection to where the indexing (keys in map of PC->Frame) account for the small increase in memory.

llvm/lib/ProfileData/RawMemProfReader.cpp
354

Caching the frames outside the local context allows us to decouple the record generation from the symbolization and makes it easier to extend for pruning (D120860) and marking interior callstack nodes (as Teresa noted). Note that the symbolization here isn't eager in the sense that more work is performed, the number of unique addresses symbolized remains the same before and after this patch.

This revision is now accepted and ready to land.Mar 3 2022, 9:45 AM
davidxl accepted this revision.Mar 3 2022, 10:19 AM

lgtm

This revision was landed with ongoing or failed builds.Mar 3 2022, 11:01 AM
This revision was automatically updated to reflect the committed changes.