This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fixed encoding of v_pk_mul_f16 in fcanonicalize
ClosedPublic

Authored by rampitec on Sep 6 2017, 10:59 AM.

Diff Detail

Event Timeline

rampitec created this revision.Sep 6 2017, 10:59 AM
arsenm accepted this revision.Sep 6 2017, 11:08 AM

LGTM with constant adjusted

lib/Target/AMDGPU/SIInstructions.td
1325

Since there isn't really a V2FP16_ONE inline immediate, this should be changed to just FP16_ONE

This revision is now accepted and ready to land.Sep 6 2017, 11:08 AM
rampitec added inline comments.Sep 6 2017, 11:29 AM
lib/Target/AMDGPU/SIInstructions.td
1325

There is another bug which occurs if I use scalar constant directly, it needs to be fixed before we can change the constant:

# After Instruction Selection
# Machine code for function s_test_canonicalize_var_v2f16: IsSSA, TracksLiveness
Function Live Ins: %SGPR0_SGPR1 in %vreg1

BB#0: derived from LLVM BB %0
    Live Ins: %SGPR0_SGPR1
        %vreg1<def> = COPY %SGPR0_SGPR1; SGPR_64:%vreg1
        %vreg4<def> = S_LOAD_DWORDX2_IMM %vreg1, 36, 0; mem:LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant) SReg_64_XEXEC:%vreg4 SGPR_64:%vreg1
        %vreg5<def> = S_LOAD_DWORD_IMM %vreg1, 44, 0; mem:LD4[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant) SReg_32_XM0_XEXEC:%vreg5 SGPR_64:%vreg1
        %vreg6<def> = COPY %vreg4:sub1; SReg_32_XM0:%vreg6 SReg_64_XEXEC:%vreg4
        %vreg7<def> = COPY %vreg4:sub0; SReg_32_XM0:%vreg7 SReg_64_XEXEC:%vreg4
        %vreg8<def> = S_MOV_B32 61440; SReg_32_XM0:%vreg8
        %vreg9<def> = S_MOV_B32 -1; SReg_32_XM0:%vreg9
        %vreg10<def> = REG_SEQUENCE %vreg7<kill>, sub0, %vreg6<kill>, sub1, %vreg9<kill>, sub2, %vreg8<kill>, sub3; SReg_128:%vreg10 SReg_32_XM0:%vreg7,%vreg6,%vreg9,%vreg8
        %vreg11<def> = V_PK_MUL_F16 0, 15360, 8, %vreg5<kill>, 0, 0, 0, 0, 0, %EXEC<imp-use>; VGPR_32:%vreg11 SReg_32_XM0_XEXEC:%vreg5
        BUFFER_STORE_DWORD_OFFSET %vreg11<kill>, %vreg10<kill>, 0, 0, 0, 0, 0, %EXEC<imp-use>; mem:ST4[%out(addrspace=1)] VGPR_32:%vreg11 SReg_128:%vreg10
        S_ENDPGM

# End machine code for function s_test_canonicalize_var_v2f16.

*** Bad machine code: VOP* instruction uses the constant bus more than once ***
- function:    s_test_canonicalize_var_v2f16
- basic block: BB#0  (0x7a5f658)
- instruction: %vreg11<def> = V_PK_MUL_F16
LLVM ERROR: Found 1 machine code errors.
This revision was automatically updated to reflect the committed changes.
rampitec added inline comments.Sep 6 2017, 2:06 PM
lib/Target/AMDGPU/SIInstructions.td
1325

I am fixing the legalization now, but what is interesting, this constant is better to be v2 because then it can be inlined. That is why we had no problems with the legalization too.