The LoopVectorizer/LAA has the ability to add runtime checks for memory accesses that look like they may be single stride accesses, in an attempt to still run vectorized code. This can happen in a boring matrix multiply kernel, for example:
for(int i = 0; i < n; i++) { for (int j = 0; j < m; j++) { int sum = 0; for (int k = 0; k < l; k++) sum += A[i*l + k] * B[k*m + j]; C[i*m + j] = sum; } }
However if we have access to efficient vector gather loads, they should be are a much better option than vectoizing with runtime checks for a stride of 1.
This adds a check into the place that appears to be dictating this, LAA, to check if the MaskedGather or MaskedScatter would be legal.
Separate the existing Value *Ptr = getLoadStorePointerOperand(MemAccess); if (!Ptr) return; part from the new gather/scatter consideration?
Would have been nice to reuse LV's isLegalGatherOrScatter(Value *V), or perhaps refactor it into if (TTI && TTI->isLegalGatherOrScatter(MemAccess)) return;?
Worth informing of filtered strides with LLVM_DEBUG messages.
(Can check if Ptr is already in SymbolicStrides and exit early; unrelated change.)