Page MenuHomePhabricator

[CSSPGO] IR instrinsic for pseudo-probe block instrumentation
Needs ReviewPublic

Authored by hoy on Aug 24 2020, 2:52 PM.

Details

Reviewers
davidxl
wmi
wenlei
Summary

This change introduces a new IR intrinsic named llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

  1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
  2. The counter atomics may not be fully cleaned up from the code stream eventually.
  3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is IntrInaccessibleMemOnly, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:                                             
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:                                              
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:                                              
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

Diff Detail

Event Timeline

hoy created this revision.Aug 24 2020, 2:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2020, 2:52 PM
hoy requested review of this revision.Aug 24 2020, 2:52 PM
hoy edited the summary of this revision. (Show Details)Aug 24 2020, 2:59 PM
hoy edited the summary of this revision. (Show Details)Aug 24 2020, 3:54 PM
hoy added reviewers: davidxl, wmi, wenlei.
wmi added a comment.Aug 26 2020, 2:33 PM

Thanks for splitting the patch into smaller ones. Can you have a separate test just to check the pseudo probe can be parsed and not deleted by any pass?

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1986

Can we check CallBase::onlyAccessesInaccessibleMemory instead of checking PseudoProbeInst here?

hoy added inline comments.Aug 26 2020, 4:08 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1986

Good question. Checking against PseudoProbeInst is more than onlyAccessesInaccessibleMemory. The instruction identified here will be either converted to a select or deleted. I probably should move the PseudoProbeInst check into BrBB->instructionsWithoutDebug().

hoy added inline comments.Aug 26 2020, 4:56 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1986

On the second thought, it might not be a good idea to fuse the check into instructionsWithoutDebug since API is used in quite some places. We sometimes like a PseudoProbeInst treated like a debug intrinsic but sometimes not. It is specific to the transform we'd like to not block.

wmi added inline comments.Aug 28 2020, 2:40 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1986

I think it is a good idea to fuse the check into instructionsWithoutDebug. That will make psuedoProbe intrinsic behave more like debug intrinsic and block less transformations, like we discussed in D86193. What is the case you have concern about?

hoy added inline comments.Aug 28 2020, 3:27 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1986

I was thinking that some passes may just use instructionsWithoutDebug to ignore and remove debug instructions. We may want that happening to pseudo probe selectively. By searching where the API is used, it looks like it's fine have them handle pseudo probes as well. I'm testing to see if there's a regression to the profile quality.

hoy added inline comments.Aug 29 2020, 11:10 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1986

Test result looked OK. Moved the check into instructionsWithoutDebug .

hoy updated this revision to Diff 288838.Aug 29 2020, 11:11 PM

Updating D86490: [CSSPGO] IR instrinsic for pseudo-probe block instrumentation