By the magic of masked loads, a widened MLOAD is almost identical to the original MLOAD.
Need to handle a few more INSERT_SUBVECTOR legalization cases to avoid crashing on the testcase.
The code for computing the mask is unfortunately not very efficient; maybe we need a target-independent version of whilelo? Or a DAGCombine to form whilelo?
nit: Is/would this be the same as:
?