Page MenuHomePhabricator

[AMDGPU] Improve SDWA generation for V_OR_B32_E32.
Needs ReviewPublic

Authored by ronlieb on Dec 11 2018, 12:33 PM.

Details

Reviewers
rampitec
arsenm
Group Reviewers
Restricted Project
Summary

Add missing patterns for V_OR_B32_SDWA:

WORD_1, BYTE_3, BYTE_2, BYTE_1

Previously we only recognized WORD_0 and BYTE_0.

Transform:

%13:vgpr_32 = GLOBAL_LOAD_DWORD %2, 0, 0, 0, implicit $exec ::
              (volatile load 4, addrspace 1)
%14:sreg_32_xm0 = S_MOV_B32 65280
%15:vgpr_32 = V_AND_B32_e64 %13, killed %14, implicit $exec
%16:vgpr_32 = V_OR_B32_e64 killed %15, killed %13, implicit $exec

Into

%6:vgpr_32 = GLOBAL_LOAD_DWORD %1, 0, 0, 0, implicit $exec ::
             (volatile load 4, addrspace 1)
%9:vgpr_32 = V_OR_B32_sdwa 0, %6, 0, killed %6, 0, 6, 0, 1, 6, implicit $exec

A subsequent set of patches will address the XOR and AND pattern improvements.

Diff Detail

Event Timeline

ronlieb created this revision.Dec 11 2018, 12:33 PM
rampitec added inline comments.Dec 11 2018, 1:09 PM
test/CodeGen/AMDGPU/add.v2i16.ll
130–131

Check for specific sdwa operand.

test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
4

Fiji is VI, CIVIFIJI makes no sense.
Also please use VI or GFX8, do not add FIJI checks.

219

Check for sdwa operands.

ronlieb marked 3 inline comments as done.Dec 11 2018, 6:20 PM

next patch

ronlieb updated this revision to Diff 177808.Dec 11 2018, 6:21 PM
rampitec added inline comments.Dec 11 2018, 9:38 PM
test/CodeGen/AMDGPU/sdwa-xors-ands-ors.ll
33

-mcpu=fiji run line has no effect. Remove the attribute.

arsenm added inline comments.Dec 12 2018, 6:58 PM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
707

These are the same thing

ronlieb marked an inline comment as done.Dec 13 2018, 5:06 AM
ronlieb added inline comments.
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
707

actually, these are not always the same in the LLVM IR for Immediate constants, when i dump out the Imm value one can see

This is from sdwa-ors.mir
IMM 4294901760
ffff0000

and this is from load-log16.ll
IMM -65536
ffffffffffff0000

it would probably be easier to simply preserve the low 32 bits of the Imm which would allow me to get rid of the 2 additional expressions

*Imm == -65536

and

*Imm == -16777216
arsenm added inline comments.Feb 1 2019, 9:33 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
694

Seems like a bad use of auto

707

This is because foldToImm returns int64_t and somebody didn't sign extend this constant properly somewhere. You can either truncate the constant, but the constant probably should have been sign extended in the first place?

arsenm added inline comments.Jun 28 2019, 5:08 PM
test/CodeGen/AMDGPU/sdwa-ors.mir
4–11

Can probably use update_mir_test_checks for this one

35

Can you condense the register values for this? Running -run-pass=none with -simplify-mir should work once you delete the registers section