The cost model for gathers, scatters and ordered reductions is based on
a pessimistic algorithm that involves scalarising the operations. We use
the architectural maximum value of vscale to determine the worst case
number of elements. However, in practice the maximum vector length in
available hardware is currently 512 bits so I've modified
AArch64TargetTransform::getMaxNumElements
to allow callers to set the worst case vscale value to something more
pragmatic.
In the long term we want to come up with a more realistic cost model to
reflect the fact the operations are unlikely to be completely scalarised.
However, for now this minor tweak will permit many more loops to be
vectorised using scalable vectors.
In practice the vscale_range attribute is added by Clang to all functions now, so this change is artificial in that it only changes the cost for the unit-tests (which don't specify vscale_range).
I think what you're actually after is a function that returns the median value between min and max, e.g. for a vscale_range(0, 16) it chooses 8, and and for vscale_range(8, 8) it also chooses 8. It would be better if we don't start tuning for specific bit-widths based on implementations that are available today.