While working on D50665, I came across a latent bug in vectorizer which generates
incorrect code for uniform memory accesses that are executed conditionally.
This affects architectures that have masked gather/scatter support.
See added test case in X86. Without this patch, we were unconditionally
executing the load in the vectorized version. This can introduce a SEGFAULT
which never occurs in the scalar version.
The fix here is to avoid scalarizing of uniform loads that are executed
conditionally. On architectures with masked gather support, these loads
should use the masked gather instruction.