Page MenuHomePhabricator

[CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.
Needs ReviewPublic

Authored by hoy on Aug 24 2020, 6:23 PM.

Details

Reviewers
davidxl
wmi
wenlei
Summary

This change introduces a new clang switch -fpseudo-probe-for-profiling to enable AutoFDO with pseudo instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

One implication from pseudo-probe instrumentation is that the profile is now sensitive to CFG changes. We perform the pseudo instrumentation very early in the pre-LTO pipeline, before any CFG transformation. This ensures that the CFG instrumented and annotated is stable and optimization-resilient.

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Diff Detail

Event Timeline

hoy created this revision.Aug 24 2020, 6:23 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 24 2020, 6:23 PM
hoy requested review of this revision.Aug 24 2020, 6:23 PM
hoy retitled this revision from [CSSPGO] Pseudo probe instrumentation for basic blocks to [CSSPGO] A Clang switch -fpseudo-probe-for-profiling to enable pseudo-probe instrumentation..Aug 24 2020, 6:41 PM
hoy retitled this revision from [CSSPGO] A Clang switch -fpseudo-probe-for-profiling to enable pseudo-probe instrumentation. to [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation..
wmi added a comment.Aug 28 2020, 2:36 PM

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Just get a question from reading the above. Suppose A only has one BB and the BB has one PseudoProbe in it. If function A is inlined into B1 and B2 and both B1 and B2 inlined into C, the PseudoProbe from A will have two copies in C both carrying GUID of A. How the samples collected from A inlined into B1 inlined into C are categorized differently from A inlined into B2 inlined into C, especially when debug information is not enabled (so no inline stack information in the binary)?

llvm/include/llvm/Passes/PassBuilder.h
67–69

Need it to work with more types of action for example instrumentation FDO or cs instrumentation FDO. For instrumentation FDO optimized binary, we may want to collect AutoFDO profile for it for performance comparison, enhance the intrumentation profile with AutoFDO profile to make the profile more production representative, ...

Currently debug information based AutoFDO supports it.

hoy marked an inline comment as done.Aug 28 2020, 3:19 PM
In D86502#2245460, @wmi wrote:

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Just get a question from reading the above. Suppose A only has one BB and the BB has one PseudoProbe in it. If function A is inlined into B1 and B2 and both B1 and B2 inlined into C, the PseudoProbe from A will have two copies in C both carrying GUID of A. How the samples collected from A inlined into B1 inlined into C are categorized differently from A inlined into B2 inlined into C, especially when debug information is not enabled (so no inline stack information in the binary)?

This is a very good question. Inlined functions are differentiated by their original callsites. A pseudo probe is allocated for each callsite in the SampleProfileProbe pass. Nested inlining will produce a stack of pseudo probes, similar with the Dwarf inline stack. The work is not included in the first set of patches.

llvm/include/llvm/Passes/PassBuilder.h
67–69

I see. I just removed this assert and the let assert above handle both DebugInfoForProfiling and PseudoProbeForProfiling.

hoy updated this revision to Diff 288716.Aug 28 2020, 3:20 PM
hoy marked an inline comment as done.

Updating D86502: [CSSPGO] A Clang switch -fpseudo-probe-for-profiling for pseudo-probe instrumentation.

wmi added a comment.Aug 28 2020, 4:38 PM
In D86502#2245578, @hoy wrote:
In D86502#2245460, @wmi wrote:

The early instrumentation also allows the inliner to duplicate probes for inlined instances. When a probe along with the other instructions of a callee function are inlined into its caller function, the GUID of the callee function goes with the probe. This allows samples collected on inlined probes to be reported for the original callee function.

Just get a question from reading the above. Suppose A only has one BB and the BB has one PseudoProbe in it. If function A is inlined into B1 and B2 and both B1 and B2 inlined into C, the PseudoProbe from A will have two copies in C both carrying GUID of A. How the samples collected from A inlined into B1 inlined into C are categorized differently from A inlined into B2 inlined into C, especially when debug information is not enabled (so no inline stack information in the binary)?

This is a very good question. Inlined functions are differentiated by their original callsites. A pseudo probe is allocated for each callsite in the SampleProfileProbe pass. Nested inlining will produce a stack of pseudo probes, similar with the Dwarf inline stack. The work is not included in the first set of patches.

Thanks, then how does the pseudo probe for a callsite after inline to represent the inline scope it covers?