This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profgen] Generating probe-based non-CS profile.
ClosedPublic

Authored by hoy on Feb 22 2022, 8:54 AM.

Details

Summary

I'm bring up the support of pseudo-probe-based non-CS profile generation. The approach is quite similar to generating dwarf-based non-CS profile. The main difference is for a given linear instruction range, instead of each disassembled instruction, pseudo probes that are covered by the range are processed. The pseudo probe extraction code is shared with CS probe profile generation.

I'm seeing 0.7% performance win for one of our internal large benchmark compared to using non-CS dwarf-based profile, and 0.5% win for another large benchmark when combined with profi.

Diff Detail

Event Timeline

hoy created this revision.Feb 22 2022, 8:54 AM
hoy requested review of this revision.Feb 22 2022, 8:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 8:54 AM
wlei added a comment.Feb 22 2022, 6:53 PM

Thanks for adding support for probe-based non-CS profile.

0.5% win for another large benchmark when combined with profi.

is this because with probe, it now have a good support for the unknown(dangling) block, so PROFI can work well for this?

llvm/test/tools/llvm-profgen/inline-pseudoprobe.test
4

Out of curiosity, for the text profile, how do we know the profile is a CS nested profile or a non-CS nested profile?

llvm/tools/llvm-profgen/ProfileGenerator.cpp
444

So here we don't use the way like CS-profile to generate the zero-count(in ProfileGenerator.cpp: 981), instead we reuse the preprocessRangeCounter to initialize all function range with zero, the probe inside the function will naturally be added with zero count. I guess this is the same to the way of CS-profile, right?

hoy added a comment.Feb 24 2022, 11:18 AM

Thanks for adding support for probe-based non-CS profile.

0.5% win for another large benchmark when combined with profi.

is this because with probe, it now have a good support for the unknown(dangling) block, so PROFI can work well for this?

I think so. Profi should help probes more than line-based profile.

llvm/test/tools/llvm-profgen/inline-pseudoprobe.test
4

CS nested profile comes with a "shouldInline" attribute for each nested profile, see test/tools/llvm-profdata/cs-sample-nested-profile.test.

llvm/tools/llvm-profgen/ProfileGenerator.cpp
444

Yes, we are using the non-CS way of reporting zero counted probes. There is a difference between CS and non-CS in that for CS, the non-executed probes are reported for its owner frame only, while for non-CS, such probes are reported for the whole inline nest.

wlei added inline comments.Feb 28 2022, 4:31 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
944

Probe.isBlock() check is hoisted from extractProbesFromRange to this for CS profile but Why doesn't add this check back for non-CS profile?

hoy added inline comments.Feb 28 2022, 4:54 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
944

So we don't count callsite probes here since they are handled in populateBoundarySamplesWithProbes. Otherwise they will be double counted.

We don't do this for non-CS probe profile. This is to be consistent with non-CS dwarf implementation where we report zero count in populateBoundarySamplesForAllFunctions. I guess I can also do some refactoring to unify the two implementations.

hoy edited the summary of this revision. (Show Details)Feb 28 2022, 4:55 PM
wlei added inline comments.Feb 28 2022, 5:07 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
944

Yeah, I understand for count==0. Here I'm asking for whether we should add Probe->isBlock() for non-CS probe profile. My understanding is for body sample, we only accumulate total sample for Block profile? otherwise, the count from callsite probes will affect the total samples.

hoy added inline comments.Feb 28 2022, 6:23 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
944

For non-CS, I'm using count==0 for probe to be consistent with dwarf, i.e, both block probes and call probes are counted by populateBodySamples.

For CS, only block probes are counted by populateBodySamples, call probes are counted by populateBoundarySamples.

That's the inconsistency we should probably fix.

wenlei added inline comments.Feb 28 2022, 11:19 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
444

Both preprocessRangeCounter and extractProbesFromRange calls findDisjointRanges, which is duplicated.

for CS, the non-executed probes are reported for its owner frame only, while for non-CS, such probes are reported for the whole inline nest.

what does the term "owner frame" refer to? can you elaborate the above?

497

Perhaps we don't need a parameter for this, just use Binary->usePseudoProbes() instead.

901

nit: move this closer to the functions definitions for ProfileGeneratorBase.

hoy added inline comments.Mar 1 2022, 9:20 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
444

By owner frame I mean the inlinee frame that originally directly contains the probe. Eg., given function A, A inlines B and one original probe of B is sampled, for CS, all of other original probes of B will be reported. None of the original probes of A will be reported if none of A's probe is sampled. But for non-CS, all of A's and B's probes will be reported even if only one B's probe is sampled.

A real example is in the attached inline-pseudoprobe.test where we have

; CHECK:     main:88:0
; CHECK-NEXT: 1: 0
; CHECK-NEXT: 2: foo:88
; CHECK-NEXT:  1: 0
; CHECK-NEXT:  2: 15

the corresponding CS profile is in inline-cs-pseudoprobe.test where there is no profile generated for the main function.

497

Sounds good.

901

done.

hoy updated this revision to Diff 412137.Mar 1 2022, 9:21 AM

Addressing comments.

wenlei added inline comments.Mar 1 2022, 10:38 AM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
444

Ok, thanks for clarification - that makes sense. Suggestion: use canonical terms like "inlinee frame" or "leaf inlinee frame" instead of nebulous terms like "owner frame" to avoid confusion.

Both preprocessRangeCounter and extractProbesFromRange calls findDisjointRanges, which is duplicated.

Something we can do to avoid redundant findDisjointRanges?

hoy added inline comments.Mar 1 2022, 12:18 PM
llvm/tools/llvm-profgen/ProfileGenerator.cpp
444

Something we can do to avoid redundant findDisjointRanges?

Good point. Sorry for missing this earliser. I made it conditional.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 12:18 PM
hoy updated this revision to Diff 412201.Mar 1 2022, 12:18 PM

Updating D120335: [llvm-profgen] Generating probe-based non-CS profile.

davidxl added a subscriber: davidxl.Mar 1 2022, 1:39 PM

What is the size overhead on binary with probe? What is the impact on source drifting tolerance level?

hoy added a comment.Mar 1 2022, 2:33 PM

What is the size overhead on binary with probe? What is the impact on source drifting tolerance level?

Pseudo probes have negligible impact on code size. When it comes to binary size, the two encoded probe sections can cause the binary 10% - 15% bigger as of now.

As for source drifting, probe-based profile should be resilient to source changes that do not affect CFG structure and the number of callsites.

wenlei accepted this revision.Mar 1 2022, 6:27 PM

lgtm, thanks.

This revision is now accepted and ready to land.Mar 1 2022, 6:27 PM
This revision was landed with ongoing or failed builds.Mar 1 2022, 6:49 PM
This revision was automatically updated to reflect the committed changes.