This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] On-demand track optimized-away inlinees for preinliner.
ClosedPublic

Authored by hoy on Jan 28 2022, 3:54 PM.

Details

Summary

Tracking optimized-away inlinees based on all probes in a binary is expansive in terms of memory usage I'm making the tracking on-demand based on profiled functions only. This saves about 10% memory overall for a medium-sized benchmark.

Before:

note: After parsePerfTraces
note: Thu Jan 27 18:42:09 2022
note: VM: 8.68 GB   RSS: 8.39 GB
note: After computeSizeForProfiledFunctions
note: Thu Jan 27 18:42:41 2022
note: **VM: 10.63 GB   RSS: 10.20 GB**
note: After generateProbeBasedProfile
note: Thu Jan 27 18:45:49 2022
note: VM: 25.00 GB   RSS: 24.95 GB
note: After postProcessProfiles
note: Thu Jan 27 18:49:29 2022
note: VM: 26.34 GB   RSS: 26.27 GB

After:

note: After parsePerfTraces
note: Fri Jan 28 12:04:49 2022
note: VM: 8.68 GB   RSS: 7.65 GB
note: After computeSizeForProfiledFunctions
note: Fri Jan 28 12:05:26 2022
note: **VM: 8.68 GB   RSS: 8.42 GB**
note: After generateProbeBasedProfile
note: Fri Jan 28 12:08:03 2022
note: VM: 22.93 GB   RSS: 22.89 GB
note: After postProcessProfiles
note: Fri Jan 28 12:11:30 2022
note: VM: 24.27 GB   RSS: 24.22 GB

This should be a no-diff change in terms of profile quality.

Diff Detail

Event Timeline

hoy created this revision.Jan 28 2022, 3:54 PM
hoy requested review of this revision.Jan 28 2022, 3:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2022, 3:54 PM
hoy updated this revision to Diff 404189.Jan 28 2022, 3:55 PM

Updating D118515: [llvm-profgen] On-demand track optimized-away inlinees for preinliner.

hoy edited the summary of this revision. (Show Details)Jan 28 2022, 4:03 PM
hoy added reviewers: wenlei, wlei.
hoy edited the summary of this revision. (Show Details)
wenlei added inline comments.Feb 5 2022, 10:13 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
608–609

It is weird to have profile generator flush symbolizer of ProfileBinary, how much saving is this comparing to on-demand context size tracking?

I assume the symbolization here is only needed for dwarf base profile, can we free symbolizer for probe profile generation right after ProfiledBinary::load?

llvm/tools/llvm-profgen/ProfiledBinary.cpp
351

Actually ProbeDecoder.getDummyInlineRoot().getChildren() is already a map, wondering can we make it possible to look up by function name without building an extra map?

I can see that the key is Guid, ProbeId pair - we can get Guid from names, but ProbeId under dummy root is index. What was the reason for top level nodes to have different probe Id instead of a dummy probe Id? For top level, we don't expect same name to appear more than once.

llvm/tools/llvm-profgen/ProfiledBinary.h
483

If this invalidates symbolizer, i.e making it not functional, we should also reset the pointer to null.

hoy added inline comments.Feb 7 2022, 8:56 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
608–609

The symbolizer consumes quite some memory and it's needed in both probe and dwarf case to calculate code sizes for inlinees. Perhaps I should just reset the symbolizer pointer instead of flushing it.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
351

Top level nodes should have a dummy probe id, i.e, 0. The key in ProbeDecoder.getDummyInlineRoot().getChildren() is the guid. and we need a guid to look up a node in the children map. Unfortunately, there isn't a func name to guid map available currently. What we have is the GUID2FuncDescMap. So I'm building a name to top level node map. Alternatively we can build a name to guid map here but that'll incur two hash lookups to get the node. Since either map is only used by llvm-profgen, I'm not placing the map to MCPseudoProbe which is also shared by the Bolt probe decorder.

hoy updated this revision to Diff 406492.Feb 7 2022, 9:11 AM

Updating D118515: [llvm-profgen] On-demand track optimized-away inlinees for preinliner.

wenlei added inline comments.Feb 7 2022, 10:17 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
351

Unfortunately, there isn't a func name to guid map available currently

I thought function guid can be computed from names?

Top level nodes should have a dummy probe id, i.e, 0

if i read the code correctly in MCPseudoProbeDecoder::buildAddress2ProbeMap, that's not the case:

while (Data < End) {
    if (Root == Cur) {
      // Use a sequential id for top level inliner.
      Index = Root->getChildren().size();
    }
hoy added inline comments.Feb 7 2022, 11:00 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
351

Ah, you are right on both. Now I completely remember why using guid based lookup didn't work. The key of the children map is <caller's guid, callsite probe id> and I was using callee's guid to lookup.

// A DFS-based decoding
  while (Data < End) {
    if (Root == Cur) {
      // Use a sequential id for top level inliner.
      Index = Root->getChildren().size();
    } else {
      ...
    }
    // Switch/add to a new tree node(inlinee)
    Cur = Cur->getOrAddNode(std::make_tuple(Cur->Guid, Index));

We are now building callee name to callee node map here.

wenlei added inline comments.Feb 7 2022, 11:07 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
351

I guess it then goes back to my original question.. why the index for top level node needs to be non-zero? (I remember it was all zero initially, and then changed to use index). But there's no actual call site calling into top level frames, so the probe id isn't meaningful. If we have 0 probe id for them, name/guid would be sufficient for looking up the probe frame.

hoy added inline comments.Feb 7 2022, 11:15 PM
llvm/tools/llvm-profgen/ProfiledBinary.cpp
351

The sequential id was made intentionally in https://reviews.llvm.org/D100235 for the use of reporting zero samples towards non-executed probes in a frame. Using zero index caused all top-level inliners to share the same probe inline frame.

Even with the zero index, guid based hash lookup still won't work. Note that the guid part of the key of the Children map is the caller guid, which is the guid of the dummy root, which is zero.

wenlei accepted this revision.Feb 7 2022, 11:20 PM

lgtm, thanks.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
608–609

Ok, sounds good. It works for saving memory, but just wanted to note that this is hacky to flush symbolizer owned by ProfiledBinary at a random place by its user and yet APIs of ProfiledBinary depend on the availability of symbolizer..

This revision is now accepted and ready to land.Feb 7 2022, 11:20 PM
This revision was landed with ongoing or failed builds.Feb 8 2022, 8:33 AM
This revision was automatically updated to reflect the committed changes.