I noticed that we weren't generating broadcasts as much I thought we would with D54271, and this is part of the problem.
Widening the shuffle elements means adding bitcasts and hiding the relationship between a splatted scalar and the vector. If we can form a broadcast, do that before going through the rest of the shuffle lowering because broadcasts should be cheap and can often be load-folded.