This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.
ClosedPublic

Authored by hoy on Aug 24 2020, 6:23 PM.

Details

Summary

This change introduces a new clang switch -fpseudo-probe-for-profiling to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Diff Detail

Event Timeline

hoy created this revision.Aug 24 2020, 6:23 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 24 2020, 6:23 PM
hoy requested review of this revision.Aug 24 2020, 6:23 PM
hoy retitled this revision from [CSSPGO] Pseudo probe instrumentation for basic blocks to [CSSPGO] A Clang switch -fpseudo-probe-for-profiling to enable pseudo-probe instrumentation..Aug 24 2020, 6:41 PM
hoy retitled this revision from [CSSPGO] A Clang switch -fpseudo-probe-for-profiling to enable pseudo-probe instrumentation. to [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation..
wmi added a comment.Aug 28 2020, 2:36 PM

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Just get a question from reading the above. Suppose A only has one BB and the BB has one PseudoProbe in it. If function A is inlined into B1 and B2 and both B1 and B2 inlined into C, the PseudoProbe from A will have two copies in C both carrying GUID of A. How the samples collected from A inlined into B1 inlined into C are categorized differently from A inlined into B2 inlined into C, especially when debug information is not enabled (so no inline stack information in the binary)?

llvm/include/llvm/Passes/PassBuilder.h
67–69

Need it to work with more types of action for example instrumentation FDO or cs instrumentation FDO. For instrumentation FDO optimized binary, we may want to collect AutoFDO profile for it for performance comparison, enhance the intrumentation profile with AutoFDO profile to make the profile more production representative, ...

Currently debug information based AutoFDO supports it.

hoy marked an inline comment as done.Aug 28 2020, 3:19 PM
In D86502#2245460, @wmi wrote:

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Just get a question from reading the above. Suppose A only has one BB and the BB has one PseudoProbe in it. If function A is inlined into B1 and B2 and both B1 and B2 inlined into C, the PseudoProbe from A will have two copies in C both carrying GUID of A. How the samples collected from A inlined into B1 inlined into C are categorized differently from A inlined into B2 inlined into C, especially when debug information is not enabled (so no inline stack information in the binary)?

This is a very good question. Inlined functions are differentiated by their original callsites. A pseudo probe is allocated for each callsite in the SampleProfileProbe pass. Nested inlining will produce a stack of pseudo probes, similar with the Dwarf inline stack. The work is not included in the first set of patches.

llvm/include/llvm/Passes/PassBuilder.h
67–69

I see. I just removed this assert and the let assert above handle both DebugInfoForProfiling and PseudoProbeForProfiling.

hoy updated this revision to Diff 288716.Aug 28 2020, 3:20 PM
hoy marked an inline comment as done.

Updating D86502: [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.

wmi added a comment.Aug 28 2020, 4:38 PM
In D86502#2245578, @hoy wrote:
In D86502#2245460, @wmi wrote:

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Just get a question from reading the above. Suppose A only has one BB and the BB has one PseudoProbe in it. If function A is inlined into B1 and B2 and both B1 and B2 inlined into C, the PseudoProbe from A will have two copies in C both carrying GUID of A. How the samples collected from A inlined into B1 inlined into C are categorized differently from A inlined into B2 inlined into C, especially when debug information is not enabled (so no inline stack information in the binary)?

This is a very good question. Inlined functions are differentiated by their original callsites. A pseudo probe is allocated for each callsite in the SampleProfileProbe pass. Nested inlining will produce a stack of pseudo probes, similar with the Dwarf inline stack. The work is not included in the first set of patches.

Thanks, then how does the pseudo probe for a callsite after inline to represent the inline scope it covers?

hoy added a comment.Nov 18 2020, 10:46 AM
In D86502#2245734, @wmi wrote:
In D86502#2245578, @hoy wrote:
In D86502#2245460, @wmi wrote:

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Just get a question from reading the above. Suppose A only has one BB and the BB has one PseudoProbe in it. If function A is inlined into B1 and B2 and both B1 and B2 inlined into C, the PseudoProbe from A will have two copies in C both carrying GUID of A. How the samples collected from A inlined into B1 inlined into C are categorized differently from A inlined into B2 inlined into C, especially when debug information is not enabled (so no inline stack information in the binary)?

This is a very good question. Inlined functions are differentiated by their original callsites. A pseudo probe is allocated for each callsite in the SampleProfileProbe pass. Nested inlining will produce a stack of pseudo probes, similar with the Dwarf inline stack. The work is not included in the first set of patches.

Thanks, then how does the pseudo probe for a callsite after inline to represent the inline scope it covers?

It is represented by the dwarf inline stack for that inlined pseudo probe. I'm about to send out a patch for it.

hoy updated this revision to Diff 306488.Nov 19 2020, 11:21 AM

Updating D86502: [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.

Changing test corresponding to intrinsic attributes.

wmi added inline comments.Nov 20 2020, 10:57 AM
clang/lib/Frontend/CompilerInvocation.cpp
953–958

Should it emit an error if DebugInfoForProfiling and PseudoProbeForProfiling are enabled at the same time? I see that from the other patch related with pseudoprobe discriminator, these two flags are incompatible.

hoy added inline comments.Nov 20 2020, 11:02 AM
clang/lib/Frontend/CompilerInvocation.cpp
953–958

Yes, we should. It's done in D91756 passbuilder.h : https://reviews.llvm.org/D91756#change-Bsibk2p32T1c for both clang and opt.

wmi accepted this revision.Nov 20 2020, 11:46 AM

LGTM.

clang/lib/Frontend/CompilerInvocation.cpp
953–958

Ok, thanks.

This revision is now accepted and ready to land.Nov 20 2020, 11:46 AM
This revision was landed with ongoing or failed builds.Nov 30 2020, 10:17 AM
This revision was automatically updated to reflect the committed changes.