Page MenuHomePhabricator

[CSSPGO][llvm-profgen] Pseudo probe based CS profile generation
ClosedPublic

Authored by wlei on Dec 9 2020, 9:18 PM.

Details

Summary

This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.

Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

Implementation

  • Extended PseudoProbeProfileGenerator for pseudo probe based profile generation.
  • populateBodySamplesWithProbes reading range counter is responsible for recording function body samples and inferring caller's body samples.
  • populateBoundarySamplesWithProbes reading branch counter is responsible for recording call site target samples.
  • Each sample is recorded with its calling context(named ContextId). Remind that the probe based context key doesn't include the leaf frame probe info, so the ContextId string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
  • Added regression test

Test Plan:

ninja & ninja check-llvm

Diff Detail

Event Timeline

wlei created this revision.Dec 9 2020, 9:18 PM
wlei requested review of this revision.Dec 9 2020, 9:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2020, 9:18 PM
wlei edited the summary of this revision. (Show Details)Dec 10 2020, 3:50 PM
wlei added reviewers: wmi, davidxl, hoy, wenlei.
hoy added inline comments.Dec 18 2020, 10:16 AM
llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
16

If you rebase on the latest, a checksum will be in presence here.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
415

Please add a comment about why a dangling probe gets a zero count.

422

Nit: function

473

You mean foo:2 @ bar?

481

Move this out of the loop to save the check?

wlei updated this revision to Diff 313128.Dec 21 2020, 9:38 AM

Address Hongtao's feedback: support profile checksum and some other NFC work

wlei marked 5 inline comments as done.Dec 21 2020, 9:46 AM
wlei added inline comments.
llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
16

Thanks for reminding this, code to print checksum is added:
FunctionProile.setFunctionHash(LeafFuncDesc->FuncHash);

llvm/tools/llvm-profgen/ProfileGenerator.cpp
415

comments added

473

typo fixed

481

Yeah, moved the last elements out of the loop.

hoy added inline comments.Dec 21 2020, 2:45 PM
llvm/test/tools/llvm-profgen/inline-cs-pseudoprobe.test
5

Nit: use CHECK-NEXT

llvm/tools/llvm-profgen/ProfileGenerator.cpp
42

The setting of the two flags should not be necessary on the profile generation side. They are used on the loader side. Do you see any issue without setting them?

wlei marked 4 inline comments as done.Dec 21 2020, 2:54 PM
wlei added inline comments.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
42

Yeah, if not explicitly setting here, the value will be false.

you see in llvm/lib/Transforms/IPO/SampleProfile.cpp:

// Apply tweaks if context-sensitive profile is available.
if (Reader->profileIsCS()) {
  ProfileIsCS = true;
  FunctionSamples::ProfileIsCS = true;

  // Tracker for profiles under different context
  ContextTracker =
      std::make_unique<SampleContextTracker>(Reader->getProfiles());
}

It's set when the reader know it's a CS profile. But for llvm-profgen side, it doesn't set this.

hoy added inline comments.Dec 21 2020, 2:58 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
42

I see. So ContextTracker used in profile generation?

wlei added inline comments.Dec 21 2020, 4:08 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
42

Not ContextTracker, it's the SampleProfileWriter to write the text format of profile, like:

if (FunctionSamples::ProfileIsCS)
  OS << "[" << S.getNameWithContext() << "]:" << S.getTotalSamples();
else
  OS << S.getName() << ":" << S.getTotalSamples();
if (FunctionSamples::ProfileIsProbeBased) {
  OS.indent(Indent + 1);
  OS << "!CFGChecksum: " << S.getFunctionHash() << "\n";
}
hoy added inline comments.Dec 21 2020, 5:34 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
43

I see. Sounds like the initializations should be moved into ProfileGenerator::generateProfile or ProfileGenerator::write. What do you think?

wlei added inline comments.Dec 21 2020, 5:47 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
43

Sounds good! So since it's only used for the writer, I will move them to ProfileGenerator::write

wlei added inline comments.Dec 21 2020, 5:51 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
43

Just confirm, it's also used in getEntrySamples, so ProfileGenerator::generateProfile might be the right place.

uint64_t getEntrySamples() const {
  if (FunctionSamples::ProfileIsCS && getHeadSamples()) {
    // For CS profile, if we already have more accurate head samples
    // counted by branch sample from caller, use them as entry samples.
    return getHeadSamples();
  }
hoy added inline comments.Dec 21 2020, 5:52 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
43

Yeah, generateProfile sounds the right place.

wlei updated this revision to Diff 313417.Dec 22 2020, 12:45 PM

added CHECK-NEXT and other refactoring work

wlei updated this revision to Diff 313419.Dec 22 2020, 12:47 PM

fix lint

hoy accepted this revision.Jan 5 2021, 5:20 PM
This revision is now accepted and ready to land.Jan 5 2021, 5:20 PM
wlei updated this revision to Diff 314760.Jan 5 2021, 5:33 PM

rebase and fix clang-tidy

wmi added inline comments.Jan 14 2021, 10:04 PM
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

In [main:2 @ foo], probes with 0 count are not dumped. In which case probe with 0 count will be dumped?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
350–352

Move it to header comment of extractPrefixContextId in case extractPrefixContextId is used elsewhere.

llvm/tools/llvm-profgen/ProfileGenerator.h
139

Make it an overload function of getFunctionProfileForLeafProbe so it is known to be used for probe?

wlei updated this revision to Diff 317064.Jan 15 2021, 1:55 PM

Addressing Wei's feedback

wlei updated this revision to Diff 317070.Jan 15 2021, 2:05 PM

fix typo

wlei added inline comments.Jan 15 2021, 2:08 PM
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

The 0 count is for a dangling probe case to distinguish a missing count, see the ProfileGenerator.cpp:420

Copied the comments for you to refer:

// Drop the samples collected for a dangling probe since it's misleading.
// We still report the probe but with a special zero count. The compiler
// won't trust the zero count and will rely on the counts inference
// algorithm to get the probe a reasonable count. Note that a zero count is
// different from a missing count, where the latter really tells the
// compiler that a probe is never executed.
llvm/tools/llvm-profgen/ProfileGenerator.cpp
350–352

Good catch, moved.

llvm/tools/llvm-profgen/ProfileGenerator.h
139

Good suggestion!

wmi added inline comments.Jan 21 2021, 2:35 PM
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

Note that a zero count is different from a missing count, where the latter really tells the compiler that a probe is never executed.

That is contrary to the debug info based profile where zero count for a line means the line is never executed. Missing count for a line means compiler has to infer the count. Why the probe based profile is implemented in such way?

hoy added inline comments.Jan 21 2021, 3:10 PM
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

Good question. The current counts inference algorithm in sample profile loader doesn't seem to differentiate a missing sample from a zero sample. For example, in SampleProfileLoader::propagateThroughEdges, accesses to BlockWeights are not preceded with a check. Therefore a missing entry in BlockWeights will be created as a zero entry.

With pseudo probes, a missing sample really means the probe is not executed. A zero sample, on the other hand, wouldn't be created by the profile generator. We are leveraging zero sample to represent a special type of probe, called dangling probe, which will need the compiler to infer its count. We were thinking about using UINT64_MAX instead, but UINT64_MAX literally is a legal number of samples.

The compiler change that dangles probes haven't been sent out yet. @wlei I think we should exclude this detail from this change for now. But it's better to discuss here if this approach makes sense.

wmi added inline comments.Jan 21 2021, 5:29 PM
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

For example, in SampleProfileLoader::propagateThroughEdges, accesses to BlockWeights are not preceded with a check. Therefore a missing entry in BlockWeights will be created as a zero entry.

IIRC, VisitedBlocks is used to differentiate missing count with zero count BB.

With pseudo probes, a missing sample really means the probe is not executed.
We are leveraging zero sample to represent a special type of probe, called dangling probe, which will need the compiler to infer its count.

I think we discussed it in another patch so I understand for probe, a missing sample means the probe is not executed. It is a little weird to leverage zero sample to tell compiler to infer the count. UINT64_MAX looks better to me because there cannot be any real sample counter with value being UINT64_MAX. In addition, using zero is misleading when we read the profile -- for a location which has zero count, it is actually not zero from compiler's perspective.

hoy added inline comments.Jan 21 2021, 5:55 PM
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

IIRC, VisitedBlocks is used to differentiate missing count with zero count BB.

Thanks for pointing that out. I misread that code.

Sounds like in order to reuse the existing inference algorithm, we will need to give probes without any samples an explicit zero count in BlockWeights.

Agreed that using UINT64_MAX looks more clear than using zero.

wenlei added inline comments.Jan 21 2021, 11:42 PM
llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

It is a little weird to leverage zero sample to tell compiler to infer the count.

Agreed.

Sounds like in order to reuse the existing inference algorithm, we will need to give probes without any samples an explicit zero count in BlockWeights.
Agreed that using UINT64_MAX looks more clear than using zero.

This is counter-intuitive, regardless of the inference algorithm, it'd be good if we can avoid using zero to mark dangling. This is actually one of the TODOs on internal patch, I think it's time to take care of it now.

wlei updated this revision to Diff 318706.Jan 22 2021, 5:44 PM

Removed the dangling probe related code, solution will come in another patch

llvm/test/tools/llvm-profgen/noinline-cs-pseudoprobe.test
13–14

Thanks for your conversation to let me have a good understanding. So I just removed the zero count related code and the dangling probe issue will be addressed in other diff.

wlei updated this revision to Diff 320492.Mon, Feb 1, 9:08 AM

Fix one bug with context id: the inferred callsite context id should be generated from
string splitting since the callee's context id might be compressed they should share the
same context prefix

wlei marked 5 inline comments as done.Wed, Feb 3, 10:24 AM

@wmi Hi, could you take a further look at this patch? another three accepted patches depend on this, thank you!

wmi accepted this revision.Wed, Feb 3, 11:51 AM

Thanks for pinging me. LGTM.