As noted in D93229, the transform from scalar load to vector load potentially leaks poison from the extra vector elements that are being loaded.
We could use freeze here (and x86 codegen at least appears to be the same either way), but we already have a shuffle in this logic to optionally change the vector size, so let's allow that instruction to serve both purposes.