Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
LGTM with constant adjusted
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
1325 | Since there isn't really a V2FP16_ONE inline immediate, this should be changed to just FP16_ONE |
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
1325 | There is another bug which occurs if I use scalar constant directly, it needs to be fixed before we can change the constant: # After Instruction Selection # Machine code for function s_test_canonicalize_var_v2f16: IsSSA, TracksLiveness Function Live Ins: %SGPR0_SGPR1 in %vreg1 BB#0: derived from LLVM BB %0 Live Ins: %SGPR0_SGPR1 %vreg1<def> = COPY %SGPR0_SGPR1; SGPR_64:%vreg1 %vreg4<def> = S_LOAD_DWORDX2_IMM %vreg1, 36, 0; mem:LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant) SReg_64_XEXEC:%vreg4 SGPR_64:%vreg1 %vreg5<def> = S_LOAD_DWORD_IMM %vreg1, 44, 0; mem:LD4[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant) SReg_32_XM0_XEXEC:%vreg5 SGPR_64:%vreg1 %vreg6<def> = COPY %vreg4:sub1; SReg_32_XM0:%vreg6 SReg_64_XEXEC:%vreg4 %vreg7<def> = COPY %vreg4:sub0; SReg_32_XM0:%vreg7 SReg_64_XEXEC:%vreg4 %vreg8<def> = S_MOV_B32 61440; SReg_32_XM0:%vreg8 %vreg9<def> = S_MOV_B32 -1; SReg_32_XM0:%vreg9 %vreg10<def> = REG_SEQUENCE %vreg7<kill>, sub0, %vreg6<kill>, sub1, %vreg9<kill>, sub2, %vreg8<kill>, sub3; SReg_128:%vreg10 SReg_32_XM0:%vreg7,%vreg6,%vreg9,%vreg8 %vreg11<def> = V_PK_MUL_F16 0, 15360, 8, %vreg5<kill>, 0, 0, 0, 0, 0, %EXEC<imp-use>; VGPR_32:%vreg11 SReg_32_XM0_XEXEC:%vreg5 BUFFER_STORE_DWORD_OFFSET %vreg11<kill>, %vreg10<kill>, 0, 0, 0, 0, 0, %EXEC<imp-use>; mem:ST4[%out(addrspace=1)] VGPR_32:%vreg11 SReg_128:%vreg10 S_ENDPGM # End machine code for function s_test_canonicalize_var_v2f16. *** Bad machine code: VOP* instruction uses the constant bus more than once *** - function: s_test_canonicalize_var_v2f16 - basic block: BB#0 (0x7a5f658) - instruction: %vreg11<def> = V_PK_MUL_F16 LLVM ERROR: Found 1 machine code errors. |
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
1325 | I am fixing the legalization now, but what is interesting, this constant is better to be v2 because then it can be inlined. That is why we had no problems with the legalization too. |
Since there isn't really a V2FP16_ONE inline immediate, this should be changed to just FP16_ONE