This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use VScaleForTuning in costing of operations whose cost depends on VL
ClosedPublic

Authored by reames on Aug 9 2022, 1:36 PM.

Details

Summary

On known hardware, reductions, gather, and scatter operations have execution latencies which correlated with the vector length (VL) of the operation. Most other operations (e.g. simply arithmetic) don't correlated in this way, and instead essentially fixed cost as VL varies.

When I'd implemented initial scalable cost model support for reductions, gather, and scatter operations, I had used an upper bound on the statically unknown VL. The argument at the time was that this prevented falsely low costs, and biased the vectorizer away from generating bad (on some hardware) code. Unfortunately, practical experience shows we were a bit too effective at that goal, and the high costs defacto prevents vectorization using these constructs at all.

This patch reverses course, and ties the returned cost not to the maximum possible VL, but the VL which would correspond to VScaleForTuning. This parameter is the same one the vectorizer uses when normalizing loop costs, so the term effectively cancels out. The result is that the vectorizer now sees these constructs as comparable in cost to their fixed length variants.

This does introduce the possibility of the cost for these operations being a significant under estimate on platforms where actual VLEN is far from that implied by VScaleForTuning. On such platforms, we might make poor heuristic choices. Probably not in LV itself (due to the cancellation mentioned above), but possibly during e.g. lowering. I'm not currently aware of any concrete examples of this, but this patch does open a concern which did not previously exist.

Previously, we had the problem of overestimating costs causing the same problem on machines much closer to default values for vscale for tuning. With this patch, we still have that problem potentially if vscale for tuning is set high (manually), and then the code is run on a narrow VLEN machine.

Diff Detail

Event Timeline

reames created this revision.Aug 9 2022, 1:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2022, 1:36 PM
reames requested review of this revision.Aug 9 2022, 1:36 PM
reames updated this revision to Diff 451264.Aug 9 2022, 1:54 PM

Forgot to include all the test changes in the diff.

This revision is now accepted and ready to land.Aug 18 2022, 11:42 AM
This revision was landed with ongoing or failed builds.Aug 18 2022, 1:10 PM
This revision was automatically updated to reflect the committed changes.