This patch is for vectorization of a loop with __builtin_prefetch inside the loop as follows
void foo(double * restrict a, double * restrict b, int n){ int i;. for(i=0; i<n; ++i){ a[i] = a[i] + b[i]; __builtin_prefetch(&(b[i+8])); } }
Two intrinsics are added: masked_prefetch for continuous prefetch and masked_gather_prefetch for gather prefetch.
In vectorization, the implementation basically uses the Load/Store processing path.
https://discourse.llvm.org/t/rfc-loop-vectorization-for-builtin-prefetch/72234