If the shl is at least half the bitwidth (i.e. the lower half of the bswap source is zero), then we can reduce the shift and perform the bswap at half the bitwidth and just zero extend.
I've currently allowed any shift value >= bw/2, but we could limit this to modulo16 so that the shl is always folded away? I'll probably enforce that limit for the InstCombine variant of this fold for PR53867, but I was wondering whether we should be more relaxed in DAG.
Based off PR51391 + PR53867