Page MenuHomePhabricator

[AMDGPU] Improve SDWA generation for V_OR_B32_E32.
Needs ReviewPublic

Authored by ronlieb on Dec 11 2018, 12:33 PM.

Details

Reviewers
rampitec
arsenm
Group Reviewers
Restricted Project
Summary

Add missing patterns for V_OR_B32_SDWA:

WORD_1, BYTE_3, BYTE_2, BYTE_1

Previously we only recognized WORD_0 and BYTE_0.

Transform:

%13:vgpr_32 = GLOBAL_LOAD_DWORD %2, 0, 0, 0, implicit $exec ::
              (volatile load 4, addrspace 1)
%14:sreg_32_xm0 = S_MOV_B32 65280
%15:vgpr_32 = V_AND_B32_e64 %13, killed %14, implicit $exec
%16:vgpr_32 = V_OR_B32_e64 killed %15, killed %13, implicit $exec

Into

%6:vgpr_32 = GLOBAL_LOAD_DWORD %1, 0, 0, 0, implicit $exec ::
             (volatile load 4, addrspace 1)
%9:vgpr_32 = V_OR_B32_sdwa 0, %6, 0, killed %6, 0, 6, 0, 1, 6, implicit $exec

A subsequent set of patches will address the XOR and AND pattern improvements.

Diff Detail

Event Timeline

ronlieb created this revision.Dec 11 2018, 12:33 PM
rampitec added inline comments.Dec 11 2018, 1:09 PM
test/CodeGen/AMDGPU/add.v2i16.ll
130–131

Check for specific sdwa operand.

test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll
4

Fiji is VI, CIVIFIJI makes no sense.
Also please use VI or GFX8, do not add FIJI checks.

219

Check for sdwa operands.

ronlieb marked 3 inline comments as done.Dec 11 2018, 6:20 PM

next patch

ronlieb updated this revision to Diff 177808.Dec 11 2018, 6:21 PM
rampitec added inline comments.Dec 11 2018, 9:38 PM
test/CodeGen/AMDGPU/sdwa-xors-ands-ors.ll
33

-mcpu=fiji run line has no effect. Remove the attribute.

arsenm added inline comments.Dec 12 2018, 6:58 PM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
707

These are the same thing

ronlieb marked an inline comment as done.Dec 13 2018, 5:06 AM
ronlieb added inline comments.
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
707

actually, these are not always the same in the LLVM IR for Immediate constants, when i dump out the Imm value one can see

This is from sdwa-ors.mir
IMM 4294901760
ffff0000

and this is from load-log16.ll
IMM -65536
ffffffffffff0000

it would probably be easier to simply preserve the low 32 bits of the Imm which would allow me to get rid of the 2 additional expressions

*Imm == -65536

and

*Imm == -16777216
arsenm added inline comments.Feb 1 2019, 9:33 AM
lib/Target/AMDGPU/SIPeepholeSDWA.cpp
694

Seems like a bad use of auto

707

This is because foldToImm returns int64_t and somebody didn't sign extend this constant properly somewhere. You can either truncate the constant, but the constant probably should have been sign extended in the first place?