The cost model for gathers, scatters and ordered reductions is based on
a pessimistic algorithm that involves scalarising the operations. We use
the architectural maximum value of vscale to determine the worst case
number of elements. However, in practice the maximum vector length in
available hardware is currently 512 bits so I've modified
AArch64TargetTransform::getMaxNumElements
to allow callers to set the worst case vscale value to something more
pragmatic.
In the long term we want to come up with a more realistic cost model to
reflect the fact the operations are unlikely to be completely scalarised.
However, for now this minor tweak will permit many more loops to be
vectorised using scalable vectors.
clang-format: please reformat the code