This is still added as implicit operand if used because
we are checking that phys reg is used to emit initialization.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1391–1393 | I don't see why this would be conditional, or why this would need to be added |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1391–1393 | It is conditional because I do not want to add "implicit $noreg" in case of flat scratch. if (!ScratchRsrcReg || !MRI.isPhysRegUsed(ScratchRsrcReg)) return Register(); I.e. if register is unused it will not be initialized. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1391–1393 | I think we can remove this check and replace it with a check for stack objects now. In the spill case it should have always been redundant |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1391–1393 | Theoretically yes, we could replace it with: if (!ScratchRsrcReg || (!MRI.isPhysRegUsed(ScratchRsrcReg) && allStackObjectsAreDead(MF.getFrameInfo()))) Then isPhysRegUsed() is still needed. Example is in the amdpal.ll, function @scratch: bb.0.entry: liveins: $sgpr0_sgpr1 %1:sgpr_64(p4) = COPY $sgpr0_sgpr1 %4:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 11, 0, 0 :: (dereferenceable invariant load 8 from %ir.0, align 4, addrspace 4) %5:sreg_32 = COPY %4.sub0:sreg_64_xexec %6:sreg_32 = S_MOV_B32 0 %7:sreg_64 = REG_SEQUENCE killed %5:sreg_32, %subreg.sub0, killed %6:sreg_32, %subreg.sub1 %8:sreg_32 = S_MOV_B32 2 %9:sreg_64 = S_LSHL_B64 killed %7:sreg_64, killed %8:sreg_32, implicit-def dead $scc %10:sreg_64 = S_ADD_U64_PSEUDO %1:sgpr_64(p4), killed %9:sreg_64, implicit-def dead $scc %11:sreg_32 = COPY %4.sub1:sreg_64_xexec %12:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM killed %10:sreg_64, 9, 0, 0 :: (dereferenceable invariant load 4, addrspace 4) %13:vgpr_32 = COPY %12:sreg_32_xm0_xexec %14:vgpr_32 = COPY %11:sreg_32 BUFFER_STORE_DWORD_OFFEN killed %13:vgpr_32, killed %14:vgpr_32, $private_rsrc_reg, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.2, addrspace 5) S_ENDPGM 0 It has $private_rsrc_reg use right after selection and no stack objects. Then allStackObjectsAreDead() is also needed. Unfortunatelly doing so worsens the codegen. For example in idot8s.ll there is test @idot8_acc32. It ends up requesting 12 bytes of scratch, but never uses it. Stack is allocated at the selection and comes dead, but not marked as dead and not eliminated: Frame Objects: fi#-1: size=4, align=16, fixed, at location [SP] fi#0: size=4, align=4, at location [SP+4] fi#1: size=4, align=4, at location [SP+8] I believe it was supposed to be eliminated by the StackSlotColoring, but it was not because LiveStacks comes out empty. I.e. until we resolve dead stack objects elimination issue it is better to rely on the scratch rsrc register usage. |
llvm/test/CodeGen/AMDGPU/idot8s.ll | ||
---|---|---|
531–534 ↗ | (On Diff #304951) | These are ugly but this won't happen in the real world with normal vector types |
I don't see why this would be conditional, or why this would need to be added