This affects a couple of changes that I need some target-specific advice on:
aarch64 - we're losing this as the zext is being simplified to aext, so the canonicalization fails to confirm that the upper bits are zero. I can try adding a zext(bswap(trunc(x))) variant if that'd be useful as an alternative?
amdgpu - I think these are all benign.
powerpc - this one is annoying, I think simplifying the shifts is enough to cause DAGCombiner::MatchLoadCombine to fail. It might be possible to improve this by adding BSWAP support to calculateByteProvider.