This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Make SGPR spills exec mask agnostic
ClosedPublic

Authored by critson on May 20 2020, 2:14 AM.

Details

Summary

Explicitly set the exec mask for SGPR spills and reloads.
This fixes a bug where SGPR spills to memory could be incorrect
if the exec mask was 0 (or differed between spill and reload).

Additionally pack scalar subregisters (upto 16/32 per VGPR),
so that the majority of scalar types can be spilt or reloaded
with a simple memory access. This should amortize some of the
additional overhead of manipulating the exec mask.

Diff Detail

Event Timeline

critson created this revision.May 20 2020, 2:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2020, 2:14 AM

Missing new tests / checks. This could probably use a MIR test

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
838

Needs a comment explaining the point of this function

858–859

I don't understand these lane numbers; it picks the one past the midpoint lane?

867

I'm not sure I understand only sometimes saving exec_hi

870

We're trying to eliminate use of NoRegister in favor of using the default Register constructor, so you don't need the initializer here (or explicit checks for it later in the function)

880

getUndefRegState

896

Grammar, a odd size->an odd size

906–912

What other places do is have ExecMovOpc and ExecReg set from isWave32 and unify the BuildMI calls

907–908

Could this just turn on all lanes for the memory spill? mov -1 is always free for code size, but some other bitmask may not be

916

Should assert this isn't an SGPRSpill stack ID. You can't reuse the SGPR spill frame index for the real stack spill index

920–921

I think the correct thing to do is use use the base alignment here and add the offset to PtrInfo.getWithOffset

924–930

I know this is what this was doing before, but do we really need to use the intermediate spill pseudo here? As a follow on change could we directly emit the stack operation here?

critson marked 14 inline comments as done.May 21 2020, 4:22 AM
critson added inline comments.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
858–859

Correct, because lanes before the midpoint may be used to store the SGPRs.
I have added a comment.

867

Made the comment cleared.
The idea is that we try to turn off all unused lanes for the load/store on the assumption that this /may/ have some benefit (performance or power). However if we do not have space for saving a copy of EXEC_HI then it is safe not to adjust the lanes.

870

I can get rid of the initializer, but I definitely need the explicit tests as getMatchingSuperReg returns NoRegister which must be handled correctly.

906–912

I have applied a simplification to the code based on this.

907–908

As mentioned above, the idea is that we try to only enable lanes that are used, in case there is some potential benefit.
The only lane combinations which required additional encoding space are S256, S512 and S1024.
Note that S1024 will not occur on Wave32, as we avoid filling all but half the lanes in the VGPR before calling this function.
I would argue that the encoding space is probably not an issue as most common spills are S32, S64, S128.
Of course I am working without any tangible assessment of the benefit of disabling lanes, it might be that for >=S256 we should just set the mask to -1 anyway.

916

Sorry I am not totally clear on what you are saying? This will only get here if it is not an SGPR to VGPR spill slot (tested in spillSGPR / restoreSGPR).

920–921

Again, I am not clear on this. Can you explain further?

924–930

Agreed, I will work on a follow up change to directly emit VMEM.

critson updated this revision to Diff 265474.May 21 2020, 4:22 AM
critson marked 5 inline comments as done.
  • Address comments.
  • Add MIR test.
critson updated this revision to Diff 266173.May 26 2020, 5:10 AM
  • Tidy test whitespace
  • Directly output buffer_store/load from SGPR spill.
  • Add assertion for type of spill slot.
arsenm added inline comments.May 27 2020, 3:59 PM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
870

These are the same thing. getMatchingSuperReg should return MCRegister. All of the unsigneds refering to registers should use Register

907–908

But is there an actual benefit? I don't think the hardware saves anything by having fewer lanes. I would rather not add complexity to handle a hypothetical case that doesn't provide a real benefit

916

It's hard to follow how all of the SGPR->VGPR spills work, and it doesn't hurt to be sure this is definitely not an SGPR frame index

920–921

I mean the usage of the MachineMemOperand isn't correct with respect to alignment. It's supposed to be expressed as a base MMO alignment with an offset applied. You're trying to figure out the total alignment. There are helpers for getting the memory operand with an offset

critson marked 12 inline comments as done.May 27 2020, 8:26 PM
critson added inline comments.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
907–908

I have simplified the code.

916

Assertion added.

920–921

As this now directly calls buildSpillLoadStore it only needs to generate an MMO for the base pointer (with base alignment), and the offset is passed directly to buildSpillLoadStore which handles any further alignment requirements (in the way suggested).

critson updated this revision to Diff 266726.May 27 2020, 8:27 PM
critson marked 3 inline comments as done.
  • Address outstanding comments.
  • Add stack offset checks to MIR test.
critson updated this revision to Diff 267426.May 29 2020, 8:13 PM

Update spill-wide-sgpr.ll for new tests in master.

arsenm accepted this revision.Jun 2 2020, 12:43 PM

LGTM with some comment fixes

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
872

The "In which case use VGPR." reads weirdly to me

885

Capitalize

958

Capitalize

This revision is now accepted and ready to land.Jun 2 2020, 12:43 PM
This revision was automatically updated to reflect the committed changes.