This is an archive of the discontinued LLVM Phabricator instance.

[SelectOpti][3/5] Base Heuristics
ClosedPublic

Authored by apostolakis on Feb 20 2022, 10:44 PM.

Details

Summary

This patch adds the base heuristics for determining whether branches are more profitable than conditional moves.
Base heuristics apply to all code apart from inner-most loops.

Depends on D122259

Diff Detail

Event Timeline

apostolakis created this revision.Feb 20 2022, 10:44 PM
apostolakis edited the summary of this revision. (Show Details)Feb 20 2022, 10:45 PM

Minor tweaks

apostolakis published this revision for review.Feb 23 2022, 10:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 10:14 PM

Skip selects with vector condition

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2022, 5:21 AM
bsmith added a subscriber: bsmith.Mar 21 2022, 7:59 AM

Rebase and move tests

apostolakis retitled this revision from [SelectOpti][2/4] Base Heuristics to [SelectOpti][3/5] Base Heuristics.Mar 22 2022, 2:04 PM
apostolakis edited the summary of this revision. (Show Details)
modimo added a subscriber: modimo.Mar 23 2022, 2:44 PM

Use opt tests instead of llc ones

davidxl added inline comments.
llvm/lib/CodeGen/SelectOptimize.cpp
109

add some comments to the methods.

385

is less profitable?

446

This function deserves more comments in the code.

464

non-loop regions?

477

should the relative hotness of II be checked instead of looking at the loop context?

Also It might worth check if the instructions in slice can actually be wrapped in the cold branch -- checking hasOneUse is for this, but is it more restricted?

apostolakis marked an inline comment as done.Apr 1 2022, 2:14 PM
apostolakis added inline comments.
llvm/lib/CodeGen/SelectOptimize.cpp
109

Done.

385

Yes. Fixed.

446

Added more comments.

477

That's a good idea. It is better to check the relative hotness rather than looking at loop contexts. Changed the code. Skipping instructions less hot than the source instruction (the value operand of the select).

Checking for hasOneUse is a bit conservative (e.g., an instruction could have two uses that are both meant for the computation of the cold value operand of the target select) but it is much simpler than accurately identifying all the instructions used solely for the computation of the select’s true/false value operands. I originally implemented the latter but the experimental results did not show improved performance and thus the extra implementation complexity was not justified.

apostolakis marked an inline comment as done.

Address review comments

davidxl added inline comments.Apr 5 2022, 2:31 PM
llvm/lib/CodeGen/SelectOptimize.cpp
394

If the SI is in a cold basic block (as determined by profile summary), it may not be worth converting.

419

When profile data is available, but the select instruction does not have a meta data attached, we may want to emit a missed optimization warning.

435

assert ColdI != NULL?

439

what if all defs in the chain are cheap, but they add up to be expensive?

450

Perhaps make the name of the method more general to match its intention: get instructions that can be wrapped into the cold branch when converted to control flow. Also make it clear that current heuristic chooses only oneUse defs.

469

Is this possible?

apostolakis added inline comments.Apr 9 2022, 8:16 PM
llvm/lib/CodeGen/SelectOptimize.cpp
394

That's true.
Selects in cold functions will not be considered for conversion since this is tested earlier by calling llvm::shouldOptimizeForSize. But cold basic blocks within a non-cold function are not checked. Added a check for coldness at the basic block level to prevent any conversion.

419

Sure, added.

435

The value operands of the select instructions are not necessarily instructions, could be for example, a function argument.

439

Yes, many cheap ones could amount to be as costly as one expensive one. Tweaked the heuristic to account for that.

450

Good point. Changed the function name and the comments.

469

This branch can go both ways:

Example where freq(II) < freq(I):

II = ...
for () {
    I = II  + ...
    x = select c, I, ...
}

Example where freq(II) > freq(I):

for () {
   II = ...
}
I = II  + ...
x = select c, I, ...

Address review.

davidxl added inline comments.Apr 11 2022, 11:07 PM
llvm/lib/CodeGen/SelectOptimize.cpp
453

Should the cost also consider the frequency difference between the SI and the cold operand?

basically the colder the operand, the more expensive it is to use CMOV.

455

Suggest adding an option which is the multiplier of the TCC_expensive

apostolakis added inline comments.Apr 18 2022, 9:51 PM
llvm/lib/CodeGen/SelectOptimize.cpp
453

Yes that would seem a good idea.
Tested a couple of different ways of adjusting the cost and evaluated perf impact on search workload.
Added the one that showed some positive (although only slight) effect.

455

Good idea to make it customizable. Added.

Address comments.

davidxl added inline comments.Apr 18 2022, 10:15 PM
llvm/lib/CodeGen/SelectOptimize.cpp
470

Is there a concern on the overflow of the multiplication? Probably not if the meta data weight is 32 bit.

apostolakis added inline comments.Apr 18 2022, 10:27 PM
llvm/lib/CodeGen/SelectOptimize.cpp
470

The weight metadata indeed contains 32-bit values and the SliceCost is expected to be small (these cold slices contain a handful of instructions with typical costs of 1 to 4 for each instruction). So, no real concern of overflowing here.

This revision is now accepted and ready to land.Apr 18 2022, 10:30 PM
bsmith added inline comments.Apr 20 2022, 8:01 AM
llvm/lib/CodeGen/SelectOptimize.cpp
179–181

Is this the correct thing to check for here?

Even if a select is cheap, the true/false values feeding it may not be, and given the later change you have that can sink these values when converted to a branch, this check may cause us to miss some cases where this pass is still useful. (Although perhaps changing this should be deferred to the patch that does the sinking).

davidxl added inline comments.Apr 20 2022, 9:10 AM
llvm/lib/CodeGen/SelectOptimize.cpp
179–181

This is a good point. Perhaps the target specific behavior should be modeled using a cost value (SelectCost) instead of using the binary knob.

apostolakis added inline comments.Apr 20 2022, 2:19 PM
llvm/lib/CodeGen/SelectOptimize.cpp
179–181

You are right that there are still cases where it is worth considering converting to a branch even for architectures with cheap selects. I left this check since it was used in CodeGenPrepare when optimizing selects and I wanted to avoid unexpected optimizations for non-x86 architectures. However, since this pass will be, at least initially, opt-in and this check is not always useful, I will remove it. It will allow for easier non-x86 testing of the pass. Actually, when I tested some small examples on AArch64, I had to comment out this check since this pass got skipped otherwise.

The cost of the select is already taken into account in the loop-level heuristics.
For the base heuristics, the heuristic looking for the expensive cold operand applies regardless of the cost of the select.
Yet, for the heuristic that converts highly predictable selects to branches, this check still applies. If a predictable select is not expensive, then the high predictability of the select does not suffice on its own for conversion to a branch. So, I added this check for this heuristic.

By the way, @bsmith let me know if you do any perfomance testing of this pass on ARM architectures. I will be happy to incorporate feedback and refine this pass to profitably target both x86 and ARM archs.

Limit the isPredictableSelectExpensive check to only the highly-predictable heuristic.

This revision was landed with ongoing or failed builds.May 23 2022, 7:02 PM
This revision was automatically updated to reflect the committed changes.