This patch is part of a proof of concept for vectorising a loop using
scalable vectors. The patch is shared for reference and there is no
expectation for this patch to land in the current form.
For fixed-width vectors, the loopvectorizer assumes that certain operations
can be scalarized. For example, loads/stores from uniform pointers without
masking are scalarized, which is not possible for scalable vectors. For
these, use gather/scatter instructions instead until we've found a way to
properly widen these types.
void loop(int N, double *a, double *b) { #pragma clang loop vectorize_width(4, scalable) for (int i = 0; i < N; i++) { a[42] = b[i] + 1.0; // uses llvm.masked.scatter for the store } }
I suppose this is why you don't want to actually merge this currently? What happens if if it's not aarch64?