This affects a couple of changes that I need some target-specific advice on:
aarch64 - we're losing this as the zext is being simplified to aext, so the canonicalization fails to confirm that the upper bits are zero. I can try adding a zext(bswap(trunc(x))) variant if that'd be useful as an alternative?
amdgpu - I think these are all benign.