This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Decoding pseudo probe for profiled function only.
ClosedPublic

Authored by hoy on Mar 14 2022, 2:29 PM.

Details

Summary

Complete pseudo probes decoding can result in large memory usage. In practice only a small porting of the decoded probes are used in profile generation. I'm changing the full decoding mode to be decoding for profiled functions only, though we still do a full scan of the .pseudoprobe section due to a missing table-of-content but we don't have to build the in-memory data structure for functions not sampled.

To build the in-memory data structure for profiled functions only, I'm rewriting the previous non-recursive probe decoding logic to be recursive. This is easy to read and maintain.

I also have to change the previous representation of unsymbolized context from probe-based stack to address-based stack since the profiled functions are unknown yet by the time of virtual unwinding. The address-based stack will be converted to probe-based stack after virtual unwinding and on-demand probe decoding.

I'm seeing 20GB memory is saved for one of our internal large service.

Diff Detail

Event Timeline

hoy created this revision.Mar 14 2022, 2:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 2:29 PM
hoy requested review of this revision.Mar 14 2022, 2:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2022, 2:29 PM
hoy edited the summary of this revision. (Show Details)Mar 14 2022, 2:53 PM
hoy added reviewers: wenlei, wlei.

Some linter warning seems legit, please format/fix.

llvm/lib/MC/MCPseudoProbe.cpp
363

Guids -> GuildFilter or ProfiledProbeGuids

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe-on-demand.test
18 ↗(On Diff #415234)

This change is because main->foo call is not profiled, right?

llvm/tools/llvm-profgen/PerfReader.h
424–425

Should this and ProbeBasedCtxKey be removed?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
394–397

Why is this needed?

975

CallProbe can also be null if the caller isn't profiled? Does the comment need to be updated here?

986

Inline this call here and move into the loop? there's no other calls to this callee now.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
58

how about -decode-probe-for-profiled-functions-only?

360

can we assert !ProfiledFunctions.empty() to make sure profile functions are collected already?

381

How about completely unify the workflow of on-demand vs full decoding?

The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids.

Correct me if I'm wrong, but I think this was we keep the same behavior for full decoding as of today (no context truncation before preinliner), and we can also remove ProbeStack and replace it with AddressStack everywhere.

hoy added inline comments.Mar 22 2022, 4:56 PM
llvm/lib/MC/MCPseudoProbe.cpp
363

GuildFilter sounds better.

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe-on-demand.test
18 ↗(On Diff #415234)

Yes, main is not profiled so the calling context is truncated.

llvm/tools/llvm-profgen/PerfReader.h
424–425

Yes, this is no longer needed.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
394–397

Oh, this should be a change for D121655. With llvm-profgen reading in the final profile directly, profiled functions should be extracted from the input ProfileMap. Let me remove it here.

975

Comment updated.

986

Inlined. Not merging the loops since they iteration in different direction, note that the std::reverse in between.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
58

Sounds good.

hoy updated this revision to Diff 417440.Mar 22 2022, 4:57 PM

i
Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wenlei added inline comments.Mar 22 2022, 5:45 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
939–940

Now remove this function?

975

nit: We may not find a probe for functions that are not sampled. -> We may not find a probe for functions that are not sampled when --decode-probe-for-profiled-functions-only is on.

also clearer if you list the three scenario with bullets as it now grow larger.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
360

missed this one?

381

The only difference would be - for full decoding, we just need to provide a full Guid set; and for on-demand, we provide ProfiledGuids.

missed this one? By only difference, I meant the only place that need to check DecodeProbeForProfiledFunctionsOnly.

hoy updated this revision to Diff 417451.Mar 22 2022, 5:46 PM

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wlei added inline comments.Mar 22 2022, 6:20 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

For CS profile, the input contains CallStack + LBRStack, but here it only considers the addresses from LBRStack(Range + Branch) to compute profiledFunctions. Does it miss some functions whose address(frame) is only from CallStack?

For example, if we only have one following entry:
[main @ foo] : {rangeFrom : rangeTo}

Supposing rangeFrom and rangeTo only belong to foo(foo is not inlined to main), then the result of ProfiledFunctions only contains foo and misses main, right?

From the term "ProfiledFunctions", it should be foo, but for CS inlining, I guess we still need the frame main?

wlei added inline comments.Mar 22 2022, 6:33 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

Sorry, I missed your notes on the summary.

"Note that in on-demand mode, a calling context may be truncated at a caller which does not have sample. This is a discrepancy from the complete mode. However, with the CS preinliner such context will always be truncated from the compiler inlining point of view."

So in on-demand mode, it will implicitly trim some contexts, even before CS-preinliner or cold-context-trimming.

hoy added inline comments.Mar 22 2022, 6:42 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

Yes, and those contexts, if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function.

975

Sounds good.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
360

assert added.

381

Right now for full decoding we provide an empty ProfiledGuids to the decoder instead of computing Guids for all disassembled functions which could be expansive. The only place that requires full decoding is when --show-assembly-only is on. Otherwise it is kind of unified?

I'm adding an assert like this

assert((!ProfiledFunctions.empty() || !DecodeProbeForProfiledFunctionsOnly ||
        ShowDisassemblyOnly) &&
       "Profiled functions should not be empty in on-demand probe decoding "
       "mode");

We can also use a 0 as a guid to represent all functions. WDYT?

hoy updated this revision to Diff 417462.Mar 22 2022, 6:43 PM

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wlei added inline comments.Mar 22 2022, 7:08 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

Thanks for the clarification. Then it looks like there can be a future work to compute the ProfiledFunctions in early time and truncate the callStack during unwinding so that SampleCounters is reduced to save parsing time and memory.

wenlei added inline comments.Mar 22 2022, 10:21 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

if not truncated by the CS preinliner, will also be truncated by the compiler since the sample loader wouldn't consider any inlining in a non-profiled function.

Actually this is arguable. Top-down inlining could chose to also inline cold calls in certain cases. E.g. even if A->B is cold, inlining A->B->C provides an opportunity for B->C to specialize under A->B->C, without affecting D->B->C path.

What you described could be benign at the moment, but I'm wondering if we keep main like Lei suggested, what's the extra memory overhead? Semantically, keeping the old behavior might be better because the generated profile is supposed to have complete context before preinliner/trimming. Ofc, if that behavior bloats memory too much, the way you optimize it here by trimming earlier helps.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
381

Passing empty guid for full decoding works too. Then can we only check DecodeProbeForProfiledFunctionsOnly within decodePseudoProbe and pass empty Guid when the flag is false?

Currently DecodeProbeForProfiledFunctionsOnly is checked at many places before calling decodePseudoProbe. What I meant was that regardless of that flag, decodePseudoProbe should always be called at the same place (except for ShowDisassemblyOnly), but the flag only control what is passed as GuidFilter.

hoy added inline comments.Mar 22 2022, 11:58 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

For now the sample inliner only processes functions with profiles. So for cold function call like A->B, both A and B should have a profile in order for the call to be inlined. With this on-demand approach, if A is not sampled, we truncate the context to B only. Without this diff, we would generate a A:1 @ B like profile, but then during top-down inlining, context promotion would truncate the context as well. So for now we wouldn't lose any opportunity.

This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions.

Good point on moving the profile function computation earlier so that we could do more truncation during the unwinding to further save some memory. So far our memory usage looks good.

I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful?

llvm/tools/llvm-profgen/ProfiledBinary.cpp
381

I see what you mean. Moved the flag checks into decodePseudoProbe.

hoy updated this revision to Diff 417513.Mar 22 2022, 11:59 PM

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wenlei added inline comments.Mar 23 2022, 9:55 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

So for now we wouldn't lose any opportunity. This will be a problem In the future when we extend the top-down inliner to handle non-sampled functions.

Yes, we don't lose inlining now, but that's not the main concern.

I don't have number for how many more functions will have their probes decoded if we want to keep all contexts. Maybe we could revisit this once those contexts are becoming useful?

What I was thinking was that, if we keep all context probe decoded, this change will strictly not affect output in anyways and we can remove the switch which simplifies things and always same memory.

I think that having full context also helps with verification even if we don't utilize that info for optimization now. So if the extra probes doesn't cost us much for memory, I'm leaning towards keeping them and remove the switch to make the new behavior the default and only choice.

There's another alternative: we always only decode profiled functions (except for show-assembly), then there's a switch to control whether extra context probe is decoded.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
293–294

just fyi, you can run linter on your changes only instead of the whole file. Not a big deal though, and thanks for fixing it for entire files.

363–365

When DecodeProbeForProfiledFunctionsOnly is off, are we supposed to keep ProfiledGuids empty here for buildAddress2ProbeMap, so all functions will be decoded?

367

Others may not be aware what "on-demand" mode means, because with the current version, there's no concept of on-demand - it's decoding all functions or decoding profile functions only. Suggest rephase the wording "on-demand"

hoy added inline comments.Mar 23 2022, 12:14 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

okay, keeping full contexts doesn't seem to incur more cost, only 5% more functions are decoded. I'm including the change here.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
293–294

arc lint didn't seem to work. I ended up formatting the whole file with the editor.

360

I removed the assert since there could be an outlier case where no function is probed, eg., test/tools/llvm-profgen/mmapEvent.test

363–365

The function decodePseudoProbe was run in two places, one was in the very early binary loading, the other was in profile generation. The first time the function is executed ProfiledFunctions should always be empty. The second time it's run, the container may not be empty. This is confusing, I'm now changing it to only run in profile generation except for ShowDisassemblyOnly.

367

The comment seems not needed after latest refactoring.

hoy updated this revision to Diff 417707.Mar 23 2022, 12:14 PM

Decoding for functions included in call stacks as well.

hoy updated this revision to Diff 417708.Mar 23 2022, 12:18 PM

Updating D121643: [llvm-profgen] On-demand pseudo probe decoding

wenlei added inline comments.Mar 23 2022, 12:23 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

Cool, in that case, maybe remove the new switch altogether? Then you can also eliminate many test changes for extra switch to keep old behavior.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
363–365

Maybe I missed something, but the only place I see decodePseudoProbe run earlier is for ShowDisassemblyOnly. When ShowDisassemblyOnly is off, when do we call decodePseudoProbe with empty Guid?

And why do we need to run decodePseudoProbe early except for ShowDisassemblyOnly? That was part of the unification I mentioned.

hoy added inline comments.Mar 23 2022, 12:38 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
373

Sounds good.

llvm/tools/llvm-profgen/ProfiledBinary.cpp
363–365

That was in the previous iterations. In the latest iteration decodePseudoProbe should be called with empty ProfiledFunctions for ShowDisassemblyOnly only.

hoy updated this revision to Diff 417718.Mar 23 2022, 12:39 PM

Removing decode-probe-for-profiled-functions-only switch.

wenlei accepted this revision.Mar 23 2022, 1:49 PM

Thanks for working through the changes, this looks great now. Please update the summary to reflect the latest version (we can also make it clear that this is not "on-demand" but "decode probes for profiled context only").

This revision is now accepted and ready to land.Mar 23 2022, 1:49 PM
hoy retitled this revision from [llvm-profgen] On-demand pseudo probe decoding to [llvm-profgen] Decoding pseudo probe for profiled function only..Mar 23 2022, 1:58 PM
hoy edited the summary of this revision. (Show Details)
This revision was landed with ongoing or failed builds.Mar 23 2022, 2:15 PM
This revision was automatically updated to reflect the committed changes.