Currently, symbolization of stack frames occurs on demand when the instrprof writer
iterates over all the records in the raw memprof reader. With this
change we symbolize and cache the frames immediately after reading the
raw profiles. For a large internal binary this results in a runtime
reduction of ~50% (2m -> 48s) when merging a memprof raw profile with a
raw instr profile to generate an indexed profile. This change also makes
it simpler in the future to generate additional calling context
metadata to attach to each memprof record.
Details
- Reviewers
davidxl tejohnson - Commits
- rGdda7b74967cc: [memprof] Symbolize and cache stack frames.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/ProfileData/RawMemProfReader.cpp | ||
---|---|---|
355 | Using a map from addr to Frames for caching can also avoid redundant symbolization computation. Is there an advantage of doing eager symbolization? |
What is the memory impact of caching? Hopefully not too onerous since this is a big speedup!
llvm/include/llvm/ProfileData/RawMemProfReader.h | ||
---|---|---|
81 | I take it initialize() not get called in this case? Maybe add a comment. | |
llvm/lib/ProfileData/RawMemProfReader.cpp | ||
355 | I suspect because it makes it easier for adding additional records, such as identifying interior call stack nodes within their functions (we need to mark these with metadata as well). |
Updated the patch with comments. PTAL, thanks!
There is hardly any change in peak memory consumption, valgrind massif shows 8.618G (now) vs 8.615G (before). In this patch we have introduced a layer of indirection to where the indexing (keys in map of PC->Frame) account for the small increase in memory.
llvm/lib/ProfileData/RawMemProfReader.cpp | ||
---|---|---|
355 | Caching the frames outside the local context allows us to decouple the record generation from the symbolization and makes it easier to extend for pruning (D120860) and marking interior callstack nodes (as Teresa noted). Note that the symbolization here isn't eager in the sense that more work is performed, the number of unique addresses symbolized remains the same before and after this patch. |
I take it initialize() not get called in this case? Maybe add a comment.