Page MenuHomePhabricator

[CSSPGO] IR intrinsic for pseudo-probe block instrumentation
ClosedPublic

Authored by hoy on Aug 24 2020, 2:52 PM.

Details

Summary

This change introduces a new IR intrinsic named llvm.pseudoprobe for pseudo-probe block instrumentation. Please refer to https://reviews.llvm.org/D86193 for the whole story.

A pseudo probe is used to collect the execution count of the block where the probe is instrumented. This requires a pseudo probe to be persisting. The LLVM PGO instrumentation also instruments in similar places by placing a counter in the form of atomic read/write operations or runtime helper calls. While these operations are very persisting or optimization-resilient, in theory we can borrow the atomic read/write implementation from PGO counters and cut it off at the end of compilation with all the atomics converted into binary data. This was our initial design and we’ve seen promising sample correlation quality with it. However, the atomics approach has a couple issues:

  1. IR Optimizations are blocked unexpectedly. Those atomic instructions are not going to be physically present in the binary code, but since they are on the IR till very end of compilation, they can still prevent certain IR optimizations and result in lower code quality.
  2. The counter atomics may not be fully cleaned up from the code stream eventually.
  3. Extra work is needed for re-targeting.

We choose to implement pseudo probes based on a special LLVM intrinsic, which is expected to have most of the semantics that comes with an atomic operation but does not block desired optimizations as much as possible. More specifically the semantics associated with the new intrinsic enforces a pseudo probe to be virtually executed exactly the same number of times before and after an IR optimization. The intrinsic also comes with certain flags that are carefully chosen so that the places they are probing are not going to be messed up by the optimizer while most of the IR optimizations still work. The core flags given to the special intrinsic is IntrInaccessibleMemOnly, which means the intrinsic accesses memory and does have a side effect so that it is not removable, but is does not access memory locations that are accessible by any original instructions. This way the intrinsic does not alias with any original instruction and thus it does not block optimizations as much as an atomic operation does. We also assign a function GUID and a block index to an intrinsic so that they are uniquely identified and not merged in order to achieve good correlation quality.

Let's now look at an example. Given the following LLVM IR:

define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
  %cmp = icmp eq i32 %x, 0
   br i1 %cmp, label %bb1, label %bb2
bb1:
   br label %bb3
bb2:
   br label %bb3
bb3:
   ret void
}

The instrumented IR will look like below. Note that each llvm.pseudoprobe intrinsic call represents a pseudo probe at a block, of which the first parameter is the GUID of the probe’s owner function and the second parameter is the probe’s ID.

define internal void @foo2(i32 %x, void (i32)* %f) !dbg !4 {
bb0:
   %cmp = icmp eq i32 %x, 0
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 1)
   br i1 %cmp, label %bb1, label %bb2
bb1:                                             
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 2)
   br label %bb3
bb2:                                              
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 3)
   br label %bb3
bb3:                                              
   call void @llvm.pseudoprobe(i64 837061429793323041, i64 4)
   ret void
}

Diff Detail

Event Timeline

hoy created this revision.Aug 24 2020, 2:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 24 2020, 2:52 PM
hoy requested review of this revision.Aug 24 2020, 2:52 PM
hoy edited the summary of this revision. (Show Details)Aug 24 2020, 2:59 PM
hoy edited the summary of this revision. (Show Details)Aug 24 2020, 3:54 PM
hoy added reviewers: davidxl, wmi, wenlei.
wmi added a comment.Aug 26 2020, 2:33 PM

Thanks for splitting the patch into smaller ones. Can you have a separate test just to check the pseudo probe can be parsed and not deleted by any pass?

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1995

Can we check CallBase::onlyAccessesInaccessibleMemory instead of checking PseudoProbeInst here?

hoy added inline comments.Aug 26 2020, 4:08 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1995

Good question. Checking against PseudoProbeInst is more than onlyAccessesInaccessibleMemory. The instruction identified here will be either converted to a select or deleted. I probably should move the PseudoProbeInst check into BrBB->instructionsWithoutDebug().

hoy added inline comments.Aug 26 2020, 4:56 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1995

On the second thought, it might not be a good idea to fuse the check into instructionsWithoutDebug since API is used in quite some places. We sometimes like a PseudoProbeInst treated like a debug intrinsic but sometimes not. It is specific to the transform we'd like to not block.

wmi added inline comments.Aug 28 2020, 2:40 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1995

I think it is a good idea to fuse the check into instructionsWithoutDebug. That will make psuedoProbe intrinsic behave more like debug intrinsic and block less transformations, like we discussed in D86193. What is the case you have concern about?

hoy added inline comments.Aug 28 2020, 3:27 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1995

I was thinking that some passes may just use instructionsWithoutDebug to ignore and remove debug instructions. We may want that happening to pseudo probe selectively. By searching where the API is used, it looks like it's fine have them handle pseudo probes as well. I'm testing to see if there's a regression to the profile quality.

hoy added inline comments.Aug 29 2020, 11:10 PM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
1995

Test result looked OK. Moved the check into instructionsWithoutDebug .

hoy updated this revision to Diff 288838.Aug 29 2020, 11:11 PM

Updating D86490: [CSSPGO] IR instrinsic for pseudo-probe block instrumentation

How about inline cost analysis? It needs to skip the new instructions. Similarly for the Partial inliner, the static cost of this should be set to zero.

llvm/include/llvm/IR/BasicBlock.h
190

Is it possible to also need to skip both PseudoProbe and Lifetime Markers?

llvm/lib/CodeGen/CodeGenPrepare.cpp
2248–2249

Perhaps introduce a helper function to skip non-code instructions

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
243–247

why does it access memory?

hoy added a comment.Oct 28 2020, 5:18 PM

How about inline cost analysis? It needs to skip the new instructions. Similarly for the Partial inliner, the static cost of this should be set to zero.

Good point. Yes, pseudo probes should be excluded from inline cost analysis. We were planning to include the change in upcoming patches. Now I'm moving it here.

llvm/include/llvm/IR/BasicBlock.h
190

Good point. I just checked some uses of getFirstNonPHIOrDbgOrLifetime where pseudo probe should also be needed. I'm making a helper for that. I'm thinking about deferring the actual replacement work to when pseudoprobe is used in combination with those cases so that we have good testing there. What do you think?

llvm/lib/CodeGen/CodeGenPrepare.cpp
2248–2249

Changed to using getFirstNonPHIOrDbgOrPseudoProbe.

llvm/lib/Transforms/Scalar/TailRecursionElimination.cpp
243–247

Because it has the IntrInaccessibleMemOnly flag. I changed the comment to be less confusing.

hoy updated this revision to Diff 301487.Oct 28 2020, 5:19 PM

Updating D86490: [CSSPGO] IR instrinsic for pseudo-probe block instrumentation

hoy added a comment.Oct 28 2020, 6:21 PM

Added a new Instruction::getNextNonDebugOrPseudoProbeInstruction() helper.

hoy updated this revision to Diff 301501.Oct 28 2020, 6:21 PM

Updating D86490: [CSSPGO] IR instrinsic for pseudo-probe block instrumentation

davidxl added inline comments.Oct 28 2020, 8:19 PM
llvm/include/llvm/IR/BasicBlock.h
190

I have concerns on the proliferation of interfaces. How about extending exiting two interfaces (NonDebug, NonDebugOrLifemaker) with an optional argument 'bool SkipPseudoOp = true'?

llvm/lib/IR/BasicBlock.cpp
100–103

There are still quite a few such predicates (debug || pseudoop), or !(debug||pseudoop) in the patch. Perhaps commonize them.

hoy marked 2 inline comments as done.Oct 28 2020, 9:38 PM
hoy added inline comments.
llvm/include/llvm/IR/BasicBlock.h
190

Good idea, thanks for the suggestion!

SkipPseudoOp is set to false by default. The APIs are called with SkipPseudoOp=true where we are sure to skip pseudo probes.

hoy updated this revision to Diff 301513.Oct 28 2020, 9:38 PM
hoy marked an inline comment as done.

Updating D86490: [CSSPGO] IR instrinsic for pseudo-probe block instrumentation

hoy retitled this revision from [CSSPGO] IR instrinsic for pseudo-probe block instrumentation to [CSSPGO] IR intrinsic for pseudo-probe block instrumentation.Oct 29 2020, 3:20 PM
asbirlea removed a subscriber: asbirlea.Oct 30 2020, 3:06 PM
hoy added a comment.Nov 4 2020, 9:13 AM

@davidxl I'm wondering if the current changes look good to you. Please let me know if you have more comments. Thanks!

hoy updated this revision to Diff 306443.EditedThu, Nov 19, 9:29 AM

Updating D86490: [CSSPGO] IR intrinsic for pseudo-probe block instrumentation

Adding an attribute field for use later.

wmi accepted this revision.Thu, Nov 19, 10:03 PM

LGTM.

llvm/lib/CodeGen/CodeGenPrepare.cpp
2249–2250

Nit:

if (BB->getFirstNonPHIOrDbg(true) != RetI)
  return false;
This revision is now accepted and ready to land.Thu, Nov 19, 10:03 PM
wmi added a comment.Thu, Nov 19, 10:07 PM

By the way, David has gone through the patches and talked to me offline saying he is ok with the patches generally. Thanks for your great work and patience!

hoy added a comment.Fri, Nov 20, 8:47 AM
In D86490#2407293, @wmi wrote:

By the way, David has gone through the patches and talked to me offline saying he is ok with the patches generally. Thanks for your great work and patience!

A lot thanks to you and David for reviewing CSSPGO patches and being supportive to us all the time!

hoy updated this revision to Diff 306705.Fri, Nov 20, 8:48 AM

Updating D86490: [CSSPGO] IR intrinsic for pseudo-probe block instrumentation

Addressing Wei's feedback.

hoy marked an inline comment as done.Fri, Nov 20, 8:49 AM
This revision was landed with ongoing or failed builds.Fri, Nov 20, 10:40 AM
This revision was automatically updated to reflect the committed changes.