Implement a TODO from rL371452, and handle loop invariant addresses in predicated blocks. If we can prove that the load is safe to speculate into the header, then we can avoid using a masked.load in favour of a normal load.
This is mostly about vectorization robustness. In the common case, it's expected that LICM/LoadStorePromotion would have eliminated such loads entirely.
nit: simply check if (EltSize != Step->getAPInt())? Admittedly relevant to previous patch, but seems more evident now.