If we're inserting a scalar that is smaller than the element
size of the final VT, the value of the extra bits doesn't matter.
Previously we any_extended in the scalar domain before inserting.
This patch changes this to use a broadcast of the original
scalar type and then a bitcast to the final type. This might
enable the use of a broadcast load.
This recovers regressions from 07d68c24aa19483e44db4336b0935b00a5d69949
and 9fcd212e2f678fdbdf304399a1e58ca490dc54d1 without relying on
alignment of the load.
remove the comment?