While working on D50665, I came across a latent bug in vectorizer which generates
incorrect code for uniform memory accesses that are executed conditionally.
This affects architectures that have masked gather/scatter support.
See added test case in X86. Without this patch, we were unconditionally
executing the load in the vectorized version. This can introduce a SEGFAULT
which never occurs in the scalar version.
The fix here is to avoid scalarizing of uniform loads that are executed
conditionally. On architectures with masked gather support, these loads
should use the masked gather instruction.
"non-zero" >> "non one": VF is always non-zero. VF=1 usually stands for the original scalar instructions (e.g., possibly unrolled by UF). Here, iiuc, VF>1 corresponds to asking willBeScalarWithPredication() - after per-VF decisions have been taken, whereas VF=1 corresponds to asking mustBeScalarWithPredication() - before per-VF decisions have been taken; as called by the assert of useEmulatedMaskMemRefHack() in one context. But tryToWiden() and handleReplication() may also pass VF=1, and they call after decisions have been made.
Best use isScalarWithPredication() always after per-VF decisions have been made. In any case, better have each caller specify the VF explicitly, to make sure it's passed whenever available; e.g., see below.
"Such instructions include" also conditional loads, with this patch.