This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][llvm-profgen] Virtual unwinding with pseudo probe
ClosedPublic

Authored by wlei on Dec 8 2020, 3:55 PM.

Details

Summary

This change extends virtual unwinder to support pseudo probe in llvm-profgen. Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

Implementation

  • Added ProbeBasedCtxKey derived from ContextKey for sample counter aggregation. As we need string splitting to infer the profile for callee function, string based context introduces more string handling overhead, here we just use probe pointer based context.
  • For linear unwinding, as inline context is encoded in each pseudo probe, we don't need to go through each instruction to extract range sharing same inliner. So just record the range for the context.
  • For probe based context, we should ignore the top frame probe since it will be extracted from the address range. we defer the extraction in ProfileGeneration.
  • Added PseudoProbeProfileGenerator for pseudo probe based profile generation.
  • Some helper function to get pseduo probe info(call probe, inline context) from profiled binary.
  • Added regression test for unwinder's output

The pseudo probe based profile generation will be in the upcoming patch.

Test Plan:

ninja & ninja check-llvm

Diff Detail

Event Timeline

wlei created this revision.Dec 8 2020, 3:55 PM
wlei requested review of this revision.Dec 8 2020, 3:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2020, 3:55 PM
wlei edited the summary of this revision. (Show Details)Dec 8 2020, 5:01 PM
wlei added reviewers: wmi, davidxl, hoy, wenlei.
wlei updated this revision to Diff 310737.Dec 9 2020, 6:05 PM

rebase

hoy added inline comments.Dec 10 2020, 6:20 PM
llvm/test/tools/llvm-profgen/Inputs/inline-cs-pseudoprobe.perfscript
2

Please remove this line here and in other .perfscript files.

llvm/tools/llvm-profgen/PerfReader.cpp
115

for (auto Address : reverse(CallStack)) ...

273–293

This can return StringRef when a ProbeBasedCtxKey holds a string data field.

llvm/tools/llvm-profgen/PerfReader.h
285

How about make a string field here to avoid string copying like in getContextKeyStr? Also the pseudo probe related code in getContextKeyStr can be moved here as an toString() API.

297

use std::equal?

309

assert Probes are empty here?

llvm/tools/llvm-profgen/ProfileGenerator.h
111

Name it PseudoProbeCSProfileGenerator ?

wlei added inline comments.Dec 11 2020, 5:36 PM
llvm/tools/llvm-profgen/PerfReader.cpp
115

seems it's a bug, it should not be reversed, let me fix.

llvm/tools/llvm-profgen/PerfReader.h
285

Good point. One question I got is we might not use string here directly due to context compression. The compression will implements on the vector of string/probe, but here we miss the top frame probe. So I just want to redesign to use string vector as the output of virtual unwinding. In ProfileGenerator, the top frame context string will be pushed and fed for the compression to generate the string. e.g:

[a:1, b:2, c:3]               virtual unwinding
[a:1, b:2, c:3, b2: d:4]   push back top frame context
[a:1, b:2, d:4]               compression
"a:1 @ b:2 @ d:4"       string joining
309

Same as above, the Probes can be empty since it doesn't include the top frame probe.

llvm/tools/llvm-profgen/ProfileGenerator.h
111

Changed. I see that in our internal tool, we use a switch variable to decide whether to gen CS profile. Here not sure whether we need to merge them in a general class or we should separate in different class. but I guess we can leave it when come to non-cs implementation.

wlei added inline comments.Dec 11 2020, 5:55 PM
llvm/tools/llvm-profgen/PerfReader.cpp
115

it's not a bug, the call stack is leaf to root order, should be reversed.
But here note that I need to skip the leaf frame. so the code will be like:

bool IsLeaf = true;
for (auto Address : reverse(CallStack)) {
    if (IsLeaf) {
        IsLeaf = false;
        continue;
   }
   ...
}

Btw, I didn't use reverse like this way before, so it will act as a iterator wrapper, won't reverse the vector's inside elements, right?

wlei updated this revision to Diff 311361.Dec 11 2020, 7:51 PM

Address some feedback from hongtao. The string handling part will be redesigned and come later

hoy added inline comments.Dec 13 2020, 10:48 PM
llvm/tools/llvm-profgen/PerfReader.cpp
115

Yes, see llvm/include/llvm/ADT/STLExtras.h

// Returns an iterator_range over the given container which iterates in reverse.
// Note that the container must have rbegin()/rend() methods for this to work.
template <typename ContainerTy>
auto reverse(ContainerTy &&C,
             std::enable_if_t<has_rbegin<ContainerTy>::value> * = nullptr) {
  return make_range(C.rbegin(), C.rend());
}
llvm/tools/llvm-profgen/PerfReader.h
285

I see. Yeah, compression should be done on vectors. Will the string form computed only once after compression is done? I was wondering if you ever need the string more than once, making a field and fill it out after compression is probably good. But if the sting is only needed once, what you have now is good enough.

309

Sure, I mean to add an assertion for it. Sorry for the confusion.

wlei updated this revision to Diff 313144.Dec 21 2020, 10:26 AM
wlei marked 6 inline comments as done.

rebase and fix clang-tidy

wlei added inline comments.Dec 21 2020, 10:28 AM
llvm/tools/llvm-profgen/PerfReader.cpp
115

As we discussed offline, we need to skip the top frame for this, so just keep using the iterator way.

llvm/tools/llvm-profgen/PerfReader.h
285

Yeah, after compression the string is only used once, so I will keep this.

309

Oh, I mean the probes doesn't include the top frame probe, so the probes will have the change to be empty.
See line 98, it has the assertion to check whether the Hashcode is 0

hoy accepted this revision.Dec 21 2020, 12:32 PM
This revision is now accepted and ready to land.Dec 21 2020, 12:32 PM
wmi added inline comments.Jan 8 2021, 5:18 PM
llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
4

Add CHECK for context similar as noinline-cs-pseudoprobe.test below? It is good to make sure we get the context right w/wo inline.

llvm/tools/llvm-profgen/PerfReader.cpp
102

Why it needs to be a shared_ptr?

122

Nit: !CallProbe

138–139

How about move the conditional statement into getOrCreateCounter?

llvm/tools/llvm-profgen/PerfReader.h
292

Add an assertion for O != nullptr?

379

pseduo => pseudo

llvm/tools/llvm-profgen/PseudoProbe.cpp
306

Nit: CallProbe == nullptr --> !CallProbe

322

Add an assertion message.

wlei updated this revision to Diff 315850.Jan 11 2021, 10:20 AM

Address Wei's feedback

wlei added inline comments.Jan 11 2021, 10:22 AM
llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
4

Yeah, this is because the context is empty. added the CHECK-UNWINDER-EMPTY for this.

llvm/tools/llvm-profgen/PerfReader.cpp
102

This is because the Hashable interface is using shared_ptr to allow being the key of unordered_map, I haven't found a workable way for unique_ptr yet, could we leave for later refining? I will keep an eye on this.

122

fixed!

138–139

Good suggestion, moved getOrCreateCounterForProbe into getOrCreateCounter

llvm/tools/llvm-profgen/PerfReader.h
292

assertion added

379

fixed

llvm/tools/llvm-profgen/PseudoProbe.cpp
306

fixed

322

assertion added

wmi added inline comments.Jan 12 2021, 4:25 PM
llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
4

Why the context is empty? To my understanding, for CSSPGO profile, the profile context will be the same no matter how the inlining is happening for the profiled binary, so I expect to see the context just like noinline-cs-pseudoprobe.test. What am I missing?

main:2
...
main:2 @ foo:8
...

llvm/tools/llvm-profgen/PerfReader.cpp
102

I see. Thanks.

wlei added inline comments.Jan 12 2021, 4:44 PM
llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
4

This is not the final context, the final context won't be empty. This is the output for the unwinder, has the context except for the leaf context. For unwinder, we record the leaf frame probe into the ranges, so the leaf context is not generated. In the profile generation patch(https://reviews.llvm.org/D92998), the range will be extract and the leaf context will be generated and appended.

Here for this case, because of inline, the contexts like [main @ foo @ bar] are all the leaf context, so the unwinder's context is empty.

wmi accepted this revision.Jan 12 2021, 5:45 PM

LGTM.

This revision was automatically updated to reflect the committed changes.