Page MenuHomePhabricator

AMDGPU/GlobalISel: Handle G_BSWAP

Authored by arsenm on Thu, Feb 13, 9:57 AM.


Diff Detail

Event Timeline

arsenm created this revision.Thu, Feb 13, 9:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Feb 13, 9:57 AM
foad accepted this revision.Fri, Feb 14, 1:41 AM

LGTM. See inline for some very minor possible improvements.


Just curious: why is this v_mov needed? Can't v_perm read this value directly from s0?


This would work out slightly better using a non-AMDGPU-specific lowering to something like x >> 8 | (x & 0xff) << 8.


Could do a single v_perm with mask 03020001 to avoid the shift. (Or mask 0C0C0001 if you really want to guarantee the upper bits get zeroed.)


If you care about v2i16 this whole sequence could be done with a single v_perm with mask 02030001.

This revision is now accepted and ready to land.Fri, Feb 14, 1:41 AM
arsenm marked an inline comment as done.Fri, Feb 14, 8:34 AM
arsenm added inline comments.

This would violate the constant bus restriction. This could be folded on gfx10 where the limit is 2. However, this is only a problem because the constant is an SGPR in the first place. If we materialized the mask in a VGPR, we could fold it. We don't try to optimize this case yet