This is an archive of the discontinued LLVM Phabricator instance.

[Analysis][SVE] Make the costs for gathers/scatters/ordered reductions less pessimistic
AbandonedPublic

Authored by david-arm on Sep 1 2021, 8:32 AM.

Details

Summary

The cost model for gathers, scatters and ordered reductions is based on
a pessimistic algorithm that involves scalarising the operations. We use
the architectural maximum value of vscale to determine the worst case
number of elements. However, in practice the maximum vector length in
available hardware is currently 512 bits so I've modified

AArch64TargetTransform::getMaxNumElements

to allow callers to set the worst case vscale value to something more
pragmatic.

In the long term we want to come up with a more realistic cost model to
reflect the fact the operations are unlikely to be completely scalarised.
However, for now this minor tweak will permit many more loops to be
vectorised using scalable vectors.

Diff Detail

Event Timeline

david-arm created this revision.Sep 1 2021, 8:32 AM
david-arm requested review of this revision.Sep 1 2021, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2021, 8:32 AM
david-arm retitled this revision from [SVE] Make the costs for gathers/scatters/ordered reductions less pessimistic to [Analysis][SVE] Make the costs for gathers/scatters/ordered reductions less pessimistic.Sep 1 2021, 8:34 AM
Matt added a subscriber: Matt.Sep 2 2021, 7:01 AM
david-arm updated this revision to Diff 370505.Sep 3 2021, 1:31 AM
  • Rebase.
sdesmalen added inline comments.Sep 6 2021, 5:02 AM
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
142

In practice the vscale_range attribute is added by Clang to all functions now, so this change is artificial in that it only changes the cost for the unit-tests (which don't specify vscale_range).

I think what you're actually after is a function that returns the median value between min and max, e.g. for a vscale_range(0, 16) it chooses 8, and and for vscale_range(8, 8) it also chooses 8. It would be better if we don't start tuning for specific bit-widths based on implementations that are available today.

david-arm abandoned this revision.Sep 22 2021, 8:23 AM

Abandoning in the patch in favour of a different approach - see D110259