Intel SDM, VPMASKMOV — Conditional SIMD Integer Packed Loads and Stores:
Faults occur only due to mask-bit required memory accesses that caused the faults. Faults will not occur due to referencing any memory location if the corresponding mask bit for that memory location is 0. For example, no faults will be detected if the mask bits are all zero.
I.e., if mask is all-zeros, any address is fine.
Masked load/store's prime use-case is e.g. tail masking the loop remainder,
where for the last iteration, only first some few elements of a vector exist.
So much similarly, i don't see why must we scalarize non-power-of-two vectors,
iff the element type is something we can masked- store/load.
We simply need to legalize it, widen the mask, and be done with it.
And we even already count the cost of widening the mask.