Diff Detail
Event Timeline
LGTM. See inline for some very minor possible improvements.
llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll | ||
---|---|---|
18 | Just curious: why is this v_mov needed? Can't v_perm read this value directly from s0? | |
384 | This would work out slightly better using a non-AMDGPU-specific lowering to something like x >> 8 | (x & 0xff) << 8. | |
393 | Could do a single v_perm with mask 03020001 to avoid the shift. (Or mask 0C0C0001 if you really want to guarantee the upper bits get zeroed.) | |
497 | If you care about v2i16 this whole sequence could be done with a single v_perm with mask 02030001. |
llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll | ||
---|---|---|
18 | This would violate the constant bus restriction. This could be folded on gfx10 where the limit is 2. However, this is only a problem because the constant is an SGPR in the first place. If we materialized the mask in a VGPR, we could fold it. We don't try to optimize this case yet |
Just curious: why is this v_mov needed? Can't v_perm read this value directly from s0?