Page MenuHomePhabricator

[AMDGPU] Make SGPR spills exec mask agnostic

Authored by critson on May 20 2020, 2:14 AM.



Explicitly set the exec mask for SGPR spills and reloads.
This fixes a bug where SGPR spills to memory could be incorrect
if the exec mask was 0 (or differed between spill and reload).

Additionally pack scalar subregisters (upto 16/32 per VGPR),
so that the majority of scalar types can be spilt or reloaded
with a simple memory access. This should amortize some of the
additional overhead of manipulating the exec mask.

Diff Detail

Event Timeline

critson created this revision.May 20 2020, 2:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2020, 2:14 AM

Missing new tests / checks. This could probably use a MIR test


Needs a comment explaining the point of this function


I don't understand these lane numbers; it picks the one past the midpoint lane?


I'm not sure I understand only sometimes saving exec_hi


We're trying to eliminate use of NoRegister in favor of using the default Register constructor, so you don't need the initializer here (or explicit checks for it later in the function)




Grammar, a odd size->an odd size


What other places do is have ExecMovOpc and ExecReg set from isWave32 and unify the BuildMI calls


Could this just turn on all lanes for the memory spill? mov -1 is always free for code size, but some other bitmask may not be


Should assert this isn't an SGPRSpill stack ID. You can't reuse the SGPR spill frame index for the real stack spill index


I think the correct thing to do is use use the base alignment here and add the offset to PtrInfo.getWithOffset


I know this is what this was doing before, but do we really need to use the intermediate spill pseudo here? As a follow on change could we directly emit the stack operation here?

critson marked 14 inline comments as done.May 21 2020, 4:22 AM
critson added inline comments.

Correct, because lanes before the midpoint may be used to store the SGPRs.
I have added a comment.


Made the comment cleared.
The idea is that we try to turn off all unused lanes for the load/store on the assumption that this /may/ have some benefit (performance or power). However if we do not have space for saving a copy of EXEC_HI then it is safe not to adjust the lanes.


I can get rid of the initializer, but I definitely need the explicit tests as getMatchingSuperReg returns NoRegister which must be handled correctly.


I have applied a simplification to the code based on this.


As mentioned above, the idea is that we try to only enable lanes that are used, in case there is some potential benefit.
The only lane combinations which required additional encoding space are S256, S512 and S1024.
Note that S1024 will not occur on Wave32, as we avoid filling all but half the lanes in the VGPR before calling this function.
I would argue that the encoding space is probably not an issue as most common spills are S32, S64, S128.
Of course I am working without any tangible assessment of the benefit of disabling lanes, it might be that for >=S256 we should just set the mask to -1 anyway.


Sorry I am not totally clear on what you are saying? This will only get here if it is not an SGPR to VGPR spill slot (tested in spillSGPR / restoreSGPR).


Again, I am not clear on this. Can you explain further?


Agreed, I will work on a follow up change to directly emit VMEM.

critson updated this revision to Diff 265474.May 21 2020, 4:22 AM
critson marked 5 inline comments as done.
  • Address comments.
  • Add MIR test.
critson updated this revision to Diff 266173.May 26 2020, 5:10 AM
  • Tidy test whitespace
  • Directly output buffer_store/load from SGPR spill.
  • Add assertion for type of spill slot.
arsenm added inline comments.May 27 2020, 3:59 PM

These are the same thing. getMatchingSuperReg should return MCRegister. All of the unsigneds refering to registers should use Register


But is there an actual benefit? I don't think the hardware saves anything by having fewer lanes. I would rather not add complexity to handle a hypothetical case that doesn't provide a real benefit


It's hard to follow how all of the SGPR->VGPR spills work, and it doesn't hurt to be sure this is definitely not an SGPR frame index


I mean the usage of the MachineMemOperand isn't correct with respect to alignment. It's supposed to be expressed as a base MMO alignment with an offset applied. You're trying to figure out the total alignment. There are helpers for getting the memory operand with an offset

critson marked 12 inline comments as done.May 27 2020, 8:26 PM
critson added inline comments.

I have simplified the code.


Assertion added.


As this now directly calls buildSpillLoadStore it only needs to generate an MMO for the base pointer (with base alignment), and the offset is passed directly to buildSpillLoadStore which handles any further alignment requirements (in the way suggested).

critson updated this revision to Diff 266726.May 27 2020, 8:27 PM
critson marked 3 inline comments as done.
  • Address outstanding comments.
  • Add stack offset checks to MIR test.
critson updated this revision to Diff 267426.May 29 2020, 8:13 PM

Update spill-wide-sgpr.ll for new tests in master.

arsenm accepted this revision.Jun 2 2020, 12:43 PM

LGTM with some comment fixes


The "In which case use VGPR." reads weirdly to me





This revision is now accepted and ready to land.Jun 2 2020, 12:43 PM
This revision was automatically updated to reflect the committed changes.