The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example:
for(int i = 0; i < n; i++) { for (int j = 0; j < m; j++) { int sum = 0; for (int k = 0; k < l; k++) sum += A[i*l + k] * B[k*m + j]; C[i*m + j] = sum; } }
However if we have access to efficient vector gather loads, they should be are a much better option than vectoizing with runtime checks for a stride of 1.
This adds a check into the place that appears to be dictating this, LAA, to check if the MaskedGather or MaskedScatter would be legal.
"represent \p V" >> "represent a vectorized version of \p V" ?
(This is the original comment, but the original context was LV.)