This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Avoid saving/restoring reserved m0 register
AbandonedPublic

Authored by MatzeB on Jun 13 2017, 6:09 PM.

Details

Reviewers
arsenm
mareko
Summary

This is in preparation to https://reviews.llvm.org/D23097 which uncovered some problems in the way AMDGPU uses the register scavenger in SIRegisterInfo::spillSGPR()/SIRegisterInfo::restoreSGPR().

  • The current code does not use the register scavenger correctly:
    • RS->isRegUsed() always reports true for m0 because it is a reserved register.
    • Even if m0 was not reserved RS->isRegUsed() would give you information about the position where we are scavenging a register. This is not actually the position where the spill/reload will be inserted.
    • Spilling m0 creates a new vreg. This is sketchy as that happens at a time when the register scavenger is already spilling/reloading to free registers for vreg assignment. If there is no free physreg at that place we end up in an endless loop (because m0 is always reported as reserved, see above). The existing tests are lucky to not hit the case with D23097 applied one is hit.

This changes the code to assume that m0 is available since it is
reserved so we shouldn't have any longer liveranges crossing potential
spill/reload places.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Jun 13 2017, 6:09 PM

This changes the code to assume that m0 is available since it is
reserved so we shouldn't have any longer liveranges crossing potential
spill/reload places.

m0 could potentially be live for the whole program, so this assumption is not correct.

This changes the code to assume that m0 is available since it is
reserved so we shouldn't have any longer liveranges crossing potential
spill/reload places.

m0 could potentially be live for the whole program, so this assumption is not correct.

So you actually do you use the register before prolog/epilogue insertion? (Note that the register allocator will not use it at the moment as it is reserved, so it would need other AMDGPU code to explicitely use it).

So you actually do you use the register before prolog/epilogue insertion? (Note that the register allocator will not use it at the moment as it is reserved, so it would need other AMDGPU code to explicitely use it).

Yes, the SelectionDAG emits instructions that have implicit defs of the register, see SITargetLowering::copyToM0(), and several instructions have it defined as an implicit use. Take a look at the ds_read2.ll test case for an example.

MatzeB added a comment.EditedJun 14 2017, 3:35 PM

For the record: I just checked why that testcase did not fail before my patch. It's simply hitting a bug in the register scavenger:

$ llc -mtriple=amdgcn--amdhsa -mcpu=fiji -amdgpu-spill-sgpr-to-smem=1 -print-machineinstrs test/CodeGen/AMDGPU/attr-amdgpu-num-sgpr.ll
...
# After Shrink Wrapping analysis:
# Machine code for function max_12_sgprs_14_input_sgprs: NoPHIs, TracksLiveness, NoVRegs
...
BB#1:
  ...%SGPR4_SGPR5<def> = SI_SPILL_S64_RESTORE <fi#3>, %EXEC<imp-use>, %SGPR8_SGPR9_SGPR10_SGPR11<imp-use>, %SGPR7<imp-use>, %M0<imp-def>; mem:LD8[FixedStack3](align=4)

after frame elimination/scavenging:

...
# After Prologue/Epilogue Insertion & Frame Finalization:
# Machine code for function max_12_sgprs_14_input_sgprs: NoPHIs, TracksLiveness, NoVRegs
...
#B1:
  ...
    %M0<def> = S_ADD_U32 %SGPR7, 1024, %SCC<imp-def>
    %SGPR4_SGPR5<def> = S_BUFFER_LOAD_DWORDX2_SGPR %SGPR8_SGPR9_SGPR10_SGPR11, %M0<kill>, 0; mem:LD8[FixedStack3](align=4)
    %M0<def> = COPY %SGPR4<kill>

=> This restores the wrong value into M0. (There is not free register at that point in the program so your recovery strategy of saving M0 in a register is impossible to solve at that point, but you happen to hit a bug in the old scavenging code that reports SGPR4 :-/ )

Would it be okay to XFAIL that part of attr-amdgpu-num-sgpr.ll? It is currently blocking me from moving forward with D23097

but you happen to hit a bug in the old scavenging code that reports SGPR4 :-/

Turns out this is not directly a bug in the register scavenger, but comes from the fact that AMDGPU creates new vregs while the scavenger is running. When the vreg to save m0 is created in the example, we have already moved past the definition of SGPR4_SGPR5 and it will miss that the register is occupied further below; It doesn't really work in this "reentrant" way.

Would it be okay to XFAIL that part of attr-amdgpu-num-sgpr.ll? It is currently blocking me from moving forward with D23097

I think that's fine since it's currently passing by accident.

MatzeB abandoned this revision.Jun 15 2017, 5:37 PM

Abandoning.

I commented out the part of the test that failed when pushing D21885 / r305516