This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Improve 16-bit bswap
ClosedPublic

Authored by arsenm on Feb 14 2020, 9:31 AM.

Details

Reviewers
foad
rampitec
Summary

Match the new DAG behavior and use v_perm_b32 when available. Also
does better on SI/CI by expanding 16-bit swaps. Also fix
non-power-of-2 cases.

Diff Detail

Event Timeline

arsenm created this revision.Feb 14 2020, 9:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2020, 9:31 AM
This revision is now accepted and ready to land.Feb 14 2020, 11:52 AM
foad added inline comments.Feb 16 2020, 1:12 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll
348

Why do we get masking both before and after the operation (s_and and s_bfe)? It seems like only one or the other should be required, depending on whether the upper bits of the register are undefined or defined to be zero.

arsenm marked an inline comment as done.Feb 17 2020, 7:42 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/bswap.ll
348

We inserted a zext to satisfy the readfirstlane type constraint. We don't have really any optimizations that would take care of yet, and currently the readfirstlane would still be in the way when it would happen