This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Unblocking optimizations by dangling pseudo probes.
ClosedPublic

Authored by hoy on Feb 25 2021, 9:05 AM.

Details

Summary

This change fixes a couple places where the pseudo probe intrinsic blocks optimizations because they are not naturally removable. To unblock those optimizations, the blocking pseudo probes are moved out of the original blocks and tagged dangling, instead of allowing pseudo probes to be literally removed. The reason is that when the original block is removed, we won't be able to sample it. Instead of assigning it a zero weight, moving all its pseudo probes into another block and marking them dangling should allow the counts inference a chance to assign them a more reasonable weight. We have not seen counts quality degradation from our experiments.

The optimizations being unblocked are:

  1. Removing conditional probes for if-converted branches. Conditional probes are tagged dangling when their homing branch arms are folded so that they will not be over-counted.
  2. Unblocking jump threading from removing empty blocks. Pseudo probe prevents jump threading from removing logically empty blocks that only has one unconditional jump instructions.
  3. Unblocking SimplifyCFG and MIR tail duplicate to thread empty blocks and blocks with redundant branch checks.

Since dangling probes are logically deleted, they should not consume any samples in LTO postLink. This can be achieved by setting their distribution factors to zero when dangled.

Diff Detail

Event Timeline

hoy created this revision.Feb 25 2021, 9:05 AM
hoy requested review of this revision.Feb 25 2021, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 9:05 AM
hoy updated this revision to Diff 326552.Feb 25 2021, 4:45 PM

Removing test code that belongs to another patch.

wmi added a comment.Feb 25 2021, 6:22 PM

A general question, do we care about where we should put dangling probe? Currently dangling probes are scattered in some predecessor/successor blocks they don't belong to. Looks like they can be anywhere. Wondering whether it is better to collect all of them to some place (entry block?). I am not sure about it either. Just want to know what you think.

llvm/include/llvm/CodeGen/MachineBasicBlock.h
673–690

In which case SkipPseudoOp needs to be true and which case to be false?

hoy added a comment.Feb 25 2021, 9:23 PM
In D97481#2589255, @wmi wrote:

A general question, do we care about where we should put dangling probe? Currently dangling probes are scattered in some predecessor/successor blocks they don't belong to. Looks like they can be anywhere. Wondering whether it is better to collect all of them to some place (entry block?). I am not sure about it either. Just want to know what you think.

That's a very good question. We don't care about where a dangling probe is placed technically. But in reality placing dangling probes around the original place they are from helps IR readability. It can help reason about why they are dangled.

llvm/include/llvm/CodeGen/MachineBasicBlock.h
673–690

Good question. SkipPseudoOp should be true when we are sure that the probes should not be ignored. For example, we don't want to merge blocks that look the same except for their pseudo probes because the merge will make it hard to infer the counts for the original edges. On the contrary, SkipPseudoOp should be false when we are sure probes are movable or removable.

wmi added a comment.Feb 26 2021, 3:42 PM

A general question, do we care about where we should put dangling probe? Currently dangling probes are scattered in some predecessor/successor blocks they don't belong to. Looks like they can be anywhere. Wondering whether it is better to collect all of them to some place (entry block?). I am not sure about it either. Just want to know what you think.

That's a very good question. We don't care about where a dangling probe is placed technically. But in reality placing dangling probes around the original place they are from helps IR readability. It can help reason about why they are dangled.

Ok, that is reasonable.

llvm/include/llvm/CodeGen/MachineBasicBlock.h
673–690

Feel a little hesitant to add a sampleFDO specific param to a common used interface because people may not understand when the param should be true or when it should be false. Even using the default version sometimes can block optimizations unintentionally in sampleFDO mode, like what the change tries to fix for taildup or branchfolding.

pseudo hook itself has some ambiguity here. When compiler is trying to merge blocks with different pseudo hook, they are essential instructions which could affect the profile result. When compiler is trying to remove or thread empty block, pseudo hook is non essential. Could you give some actionable guidance as comments about when SkipPseudoOp should be true?

wmi added inline comments.Feb 26 2021, 3:58 PM
llvm/include/llvm/CodeGen/MachineBasicBlock.h
673–690

Think about it a little more. If tail merge (merge blocks that look the same except for their pseudo probes) is causing trouble for pseudo hook based profile accuracy, is it possible to just directly turn off tail merge if pseudo hook is in use? In that way, we eliminate the ambiguity of pseudo hook and we can always treat it the same way as debug instructions in getFirstNonDebugInstr.

hoy added inline comments.Feb 26 2021, 5:26 PM
llvm/include/llvm/CodeGen/MachineBasicBlock.h
673–690

That's a valid concern. Yeah, the semantics of pseudo probes as far as whether optimizations should be blocked is ambiguous and requires user to be judicious. We are trying to make a trade off between profile quality and code quality. I went through all the uses of these APIs and changed some of them to skip probes for better quality while not damaging profile quality. I'll put some comments based on my experience.

Regarding tail merge, sometimes it is useful to blocks with exactly the same code including probes. A pass typically does not handle a single pattern like block merge or block duplicate. For example, both the merge and duplicate is done by the jump threading pass. Disabling a part of a pass (especially a future pass) may not be done automatically and still requires user's judgement.

I agree that letting optimization developer be aware of sample profile quality may not be practical. So far the approach I'm taking is to keep a good profile quality while fixing code quality regression. I expect at some time we can flip over the default value of SkipPseudoOp for maximum code quality and we keep an eye on the profile quality. We can also make a knob that automatically changes the default value to favor either factor.

hoy updated this revision to Diff 326868.Feb 26 2021, 6:41 PM

Adding comments for the usage of SkipPseudoOp.

wmi added inline comments.Feb 28 2021, 10:22 PM
llvm/include/llvm/CodeGen/MachineBasicBlock.h
675–677

This should work most of the time when ..., except for certain cases ...

It sounds like "SkipPseudoOp = true" is the more common cases and only certain cases will use the default value. It will lead to the confusion whether user should use the default value or not for now. I know you mentioned you would flip the default value in the future. If you think user is better to choose "SkipPseudoOp = true" in common cases, it may be good to suggest that in the front and add a TODO for flipping the default value.

llvm/lib/Transforms/IPO/SampleProfile.cpp
538–539

Nit: should not consume

hoy marked an inline comment as done.Feb 28 2021, 10:49 PM
hoy added inline comments.
llvm/include/llvm/CodeGen/MachineBasicBlock.h
675–677

Sure, comment added. I'd like to turn it on by default after a few more rounds until all optimizations that could hurt profile quality are understood.

llvm/lib/Transforms/IPO/SampleProfile.cpp
538–539

Fixed.

hoy updated this revision to Diff 327030.Feb 28 2021, 10:50 PM
hoy marked an inline comment as done.

Addressing Wei's feedbacks.

wmi accepted this revision.Mar 1 2021, 8:43 AM

LGTM.

This revision is now accepted and ready to land.Mar 1 2021, 8:43 AM
This revision was landed with ongoing or failed builds.Mar 3 2021, 10:45 PM
This revision was automatically updated to reflect the committed changes.