This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] SDWA peephole: enable by default
ClosedPublic

Authored by SamWot on Apr 4 2017, 9:24 AM.

Event Timeline

SamWot created this revision.Apr 4 2017, 9:24 AM
SamWot added a subscriber: Restricted Project.Apr 4 2017, 9:25 AM
arsenm accepted this revision.Apr 4 2017, 9:29 AM

LGTM.

I have a patch which starts matching v_cvt_pk_u16_u32 in many of these cases, which conflicts with most of these test changes. In some of cases, it's probably better to use SDWA, but that won't work if the input sources don't have SDWA (like fmas). Do you think it would make sense to merge these passes to more broadly handle bit packing optimizations?

This revision is now accepted and ready to land.Apr 4 2017, 9:29 AM
This revision was automatically updated to reflect the committed changes.
SamWot added a comment.Apr 7 2017, 2:58 AM

Fixed and resubmitted in r299654