This is an archive of the discontinued LLVM Phabricator instance.

[SelectOpti][4/5] Loop Heuristics
ClosedPublic

Authored by apostolakis on Feb 20 2022, 10:49 PM.

Details

Summary

This patch adds the loop-level heuristics for determining whether branches are more profitable than conditional moves.
These heuristics apply to only inner-most loops.

Depends on D120231

Diff Detail

Event Timeline

apostolakis created this revision.Feb 20 2022, 10:49 PM
apostolakis published this revision for review.Feb 23 2022, 10:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 10:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2022, 5:21 AM
bsmith added a subscriber: bsmith.Mar 21 2022, 7:59 AM

Rebase and move tests

apostolakis retitled this revision from [SelectOpti][3/4] Loop Heuristics to [SelectOpti][4/5] Loop Heuristics.Mar 22 2022, 2:33 PM
modimo added a subscriber: modimo.Mar 23 2022, 2:44 PM

Use opt tests instead of llc ones

Rebase and add more comments for methods

A high level comment. I saw ScalingUpFactor is used in many places. It may be better to directly use Scaled64 data type instead of uint64 to avoid that.

Use Scaled64 instead of using ScalingUpFactor.

A high level comment. I saw ScalingUpFactor is used in many places. It may be better to directly use Scaled64 data type instead of uint64 to avoid that.

I used ScalingUpFactor to get a decent precision without increasing compile time (due to non-integer ops), but it is a bit hacky and it seems better to directly use Scaled64.

davidxl added inline comments.Apr 22 2022, 11:22 PM
llvm/lib/CodeGen/SelectOptimize.cpp
469

Not clear about what the second condition actually means. What is the cost of select group?

479

Should this be missed optimization or Analysis. The message probably also needs more context to be useful.

672

Add a parameter?

677

Add explanation of this heuristic.

687

Document the return value.

690

Perhaps passing by pointer explicitly?

694

why only 2 iterations?

722

Should this exclude the selection instruction itself for non-pred case?

750

should it weighted with misprediction rate?

754

Should Sum (of cost in a SI group) be used instead of max?

796

Should it be scaled by CondCost instead of taking the max?

816

Should there be an early return here?

apostolakis added inline comments.Apr 28 2022, 8:27 AM
llvm/lib/CodeGen/SelectOptimize.cpp
469

Yes that was unclear. Clarified the second condition.

479

I would prefer to stick to miss and pass remarks and not add analysis. It seems to me more natural and easier to know what to query for.
Changed though the code to emit more fine-grained and informative remarks for these cases.

672

Added.

677

Added.

687

Done.

690

Sure. Changed it.

694

Two iterations suffice to determine if the the loop's critical path involves loop-carried dependences and compute the rate that the critical path increases from one iteration to another.

722

It would be correct to exclude the select instructions from here but for the select instructions the INonPredCost is overwritten later on. So, we can skip adding a conditional here.

750

Actually it is weighted with the misprediction rate:
MispredictCost = max(MispredictPenalty, CondCost) * MispredictRate

754

This computes the critical path which essentially is the cost of the last instruction of the critical path (and thus the instruction with the highest latency). This is regardless of how many select groups are included in the loop. These groups may or may not be part of the path.

796

I do not think it should be scaled. The cost we want to compute are the cycles lost due to flushing the pipeline after a mispredict. If the CondCost is small, the operands of the condition are ready when needed and the MispredictPenalty (which depends on the pipeline depth and where the branch condition is determined in the pipeline) captures the mispredict cost . If the CondCost is very big, then the cycles lost end up being the latency of determining the condition, which is essentially CondCost. For in-between cases, the correct answer is not really the max but there is still some overlap. Therefore, max seemed a reasonable choice and it also seemed to be experimentally the best option among other versions I tested.

816

Good catch! Yes, erased by mistake the early return when I converted to Scaled64.

Address review comments.

davidxl added inline comments.Apr 30 2022, 10:05 PM
llvm/lib/CodeGen/SelectOptimize.cpp
471

The second heuristic is similar to the first one -- they are related to latency. Does it make more sense to consider the total pressure on resources?

Also it will be helpful to document in which function herusitc i and ii) are implemented.

678

Perhaps name Diff to Gain to indicate it is the gain from converting Pred to NonPred.

llvm/test/CodeGen/X86/select-optimize.ll
215

Add more test cases to cover other heuristics like gradiant

apostolakis added inline comments.May 2 2022, 10:16 PM
llvm/lib/CodeGen/SelectOptimize.cpp
471

Yes both heuristics are related to latency. But the first one is at a loop-level and all select groups are considered together (if all remain selects or all converted to branches; would be too expensive to try all different combinations), while the second one examines each select group separately. There are cases where one particular select group is unprofitable (or unnecessary) to convert while all the other ones should be converted. The second check allows us to prevent a conversion on the unprofitable one or save some code size if there is no perf diff.

I think considering total pressure on resources would be a separate patch that needs some exploration.

Improved the comments to clarify and added some pointers as suggested.

678

Good point. Changed it.
I actually also added more info in the opti remarks for this part of the code to make tuning of all the thresholds easier.

llvm/test/CodeGen/X86/select-optimize.ll
215

Yep I should have more testing coverage. Added.

Address comments with more testing and tweaks.

bsmith added inline comments.May 3 2022, 5:57 AM
llvm/lib/CodeGen/SelectOptimize.cpp
754

Is this really what we want though? As an example, a workload I'm looking at has a structure akin to:

----------------------------critical path
     \------------select-store
     \------------select-store
     \------------select-store
     \------------select-store
     \------------select-store

None of the selects are on the critical path, however they are still significant in terms of cost. This may be for a future patch that looks at resources, but it may be better to cost each path with a select on it individually, rather than costing the loop as a whole.

apostolakis added inline comments.May 3 2022, 9:35 AM
llvm/lib/CodeGen/SelectOptimize.cpp
754

There are definitely more cases that we might want to consider beyond the critical path. Yet, focusing on the critical path for this initial series of this new optimization will help catch the most obvious patterns and reduce the chances of regressions. Cases such as the one you mention will indeed need modeling of the resources and more testing, since there are definitely cases where you want to avoid a branch for non-critical-path selects.

I could add an option to disable the loop-level heuristics (that checks the reduction in the critical path, see line 462) and thus reduce the decision to the cost of each individual select (see line 465 and 478). This will allow experimenting with some of these cases. These costs will capture the gains for all the paths but it will not explicitly capture the total cost of each path. For your case, it will report the gain of converting the selects to branches (new vs old select cost) but it will not report the total cost of each path, i.e., select_cost + store_cost. Could try to add monitoring of all the local maximum in the loop and report them if you want.

bsmith added inline comments.May 4 2022, 3:19 AM
llvm/lib/CodeGen/SelectOptimize.cpp
754

An option to disable the critical path check would be good in the meantime I think. If I manually remove that check then this patch series, alongside the other patch you have to sink instructions, provide a very significant speedup for my case.

apostolakis added inline comments.May 4 2022, 8:55 AM
llvm/lib/CodeGen/SelectOptimize.cpp
754

Added the option to disable the critical path check.

Add option to disable the critical path check.

davidxl accepted this revision.May 5 2022, 1:48 PM

lgtm

This revision is now accepted and ready to land.May 5 2022, 1:48 PM
This revision was landed with ongoing or failed builds.May 23 2022, 8:03 PM
This revision was automatically updated to reflect the committed changes.