This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove scratch rsrc from spill pseudos
ClosedPublic

Authored by rampitec on Nov 9 2020, 4:44 PM.

Details

Summary

This is still added as implicit operand if used because
we are checking that phys reg is used to emit initialization.

Diff Detail

Event Timeline

rampitec created this revision.Nov 9 2020, 4:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2020, 4:44 PM
rampitec requested review of this revision.Nov 9 2020, 4:44 PM
arsenm added inline comments.Nov 10 2020, 7:00 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1388–1390

I don't see why this would be conditional, or why this would need to be added

rampitec added inline comments.Nov 10 2020, 9:46 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1388–1390

It is conditional because I do not want to add "implicit $noreg" in case of flat scratch.
It is needed because SIFrameLowering::getEntryFunctionReservedScratchRsrcReg() does that:

if (!ScratchRsrcReg || !MRI.isPhysRegUsed(ScratchRsrcReg))
  return Register();

I.e. if register is unused it will not be initialized.

sebastian-ne accepted this revision.Nov 11 2020, 9:33 AM

Thanks, LGTM from my side. Looks quite clean and passed short Vulkan CTS testing.

This revision is now accepted and ready to land.Nov 11 2020, 9:33 AM
arsenm added inline comments.Nov 11 2020, 10:05 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1388–1390

I think we can remove this check and replace it with a check for stack objects now. In the spill case it should have always been redundant

rampitec added inline comments.Nov 11 2020, 12:14 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1388–1390

Theoretically yes, we could replace it with:

if (!ScratchRsrcReg || (!MRI.isPhysRegUsed(ScratchRsrcReg) &&
                        allStackObjectsAreDead(MF.getFrameInfo())))

Then isPhysRegUsed() is still needed. Example is in the amdpal.ll, function @scratch:

bb.0.entry:
  liveins: $sgpr0_sgpr1
  %1:sgpr_64(p4) = COPY $sgpr0_sgpr1
  %4:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 11, 0, 0 :: (dereferenceable invariant load 8 from %ir.0, align 4, addrspace 4)
  %5:sreg_32 = COPY %4.sub0:sreg_64_xexec
  %6:sreg_32 = S_MOV_B32 0
  %7:sreg_64 = REG_SEQUENCE killed %5:sreg_32, %subreg.sub0, killed %6:sreg_32, %subreg.sub1
  %8:sreg_32 = S_MOV_B32 2
  %9:sreg_64 = S_LSHL_B64 killed %7:sreg_64, killed %8:sreg_32, implicit-def dead $scc
  %10:sreg_64 = S_ADD_U64_PSEUDO %1:sgpr_64(p4), killed %9:sreg_64, implicit-def dead $scc
  %11:sreg_32 = COPY %4.sub1:sreg_64_xexec
  %12:sreg_32_xm0_xexec = S_LOAD_DWORD_IMM killed %10:sreg_64, 9, 0, 0 :: (dereferenceable invariant load 4, addrspace 4)
  %13:vgpr_32 = COPY %12:sreg_32_xm0_xexec
  %14:vgpr_32 = COPY %11:sreg_32
  BUFFER_STORE_DWORD_OFFEN killed %13:vgpr_32, killed %14:vgpr_32, $private_rsrc_reg, 0, 0, 0, 0, 0, 0, 0, implicit $exec :: (store 4 into %ir.2, addrspace 5)
  S_ENDPGM 0

It has $private_rsrc_reg use right after selection and no stack objects.

Then allStackObjectsAreDead() is also needed. Unfortunatelly doing so worsens the codegen. For example in idot8s.ll there is test @idot8_acc32. It ends up requesting 12 bytes of scratch, but never uses it. Stack is allocated at the selection and comes dead, but not marked as dead and not eliminated:

Frame Objects:
  fi#-1: size=4, align=16, fixed, at location [SP]
  fi#0: size=4, align=4, at location [SP+4]
  fi#1: size=4, align=4, at location [SP+8]

I believe it was supposed to be eliminated by the StackSlotColoring, but it was not because LiveStacks comes out empty.

I.e. until we resolve dead stack objects elimination issue it is better to rely on the scratch rsrc register usage.

rampitec added inline comments.Nov 11 2020, 4:16 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
1388–1390

See D91307 for elimination of dead stack objects. That one is needed if we want to remove implicit operand here (and in fact just needed).

rampitec updated this revision to Diff 304951.Nov 12 2020, 1:23 PM

OK, let's untangle this from dead slot elimination.
Dropped rsrc operand completely.

arsenm accepted this revision.Nov 12 2020, 3:09 PM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/idot8s.ll
531–534

These are ugly but this won't happen in the real world with normal vector types

This revision was automatically updated to reflect the committed changes.