This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix VGPR spills where offset doesn't fit in 12 bits
ClosedPublic

Authored by scott.linder on Jul 17 2018, 1:23 PM.

Details

Summary

Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory.

Diff Detail

Event Timeline

scott.linder created this revision.Jul 17 2018, 1:23 PM

Needs a test

t-tye added inline comments.Jul 17 2018, 5:36 PM
lib/Target/AMDGPU/SIRegisterInfo.cpp
543

Setting this to Offset does not seem correct as the real value needs to be the scaled offset which is computed below. Here we just need the variable declared and it only has a valid value when RanOutOfSGPRs is true. So perhaps initialize to 0?

Probably a better approach is to eliminate RanOutOfSGPRs and instead just have ScratchOffsetRegDelta which is initialized to 0. Then set it when it is actually used (as already done). Then change the "if (RanOutOfSGPRs)" below to "if (ScratchOffsetRegDelta != 0)".

548

If this is introduced then seems it should be used as well in the NumSubRegs and Size calculations above which are also in terms of the EltSize.

548–550

Should this be Offset + Size - EltSize? Note that the loop is 0 based and increments by EltSize.

arsenm added inline comments.Jul 18 2018, 12:10 AM
lib/Target/AMDGPU/SIRegisterInfo.cpp
548–550

Yes, and this needs a very specific test to stress it

Addressed feedback, and added at least one test to exercise the fix and the condition for putting the offset in an SGPR.

I would like to add more tests for Offset + Size - EltSize where Size != EltSize but I have had some trouble getting a ValueReg with subregisters to survive until the spill occurs. E.g. if I load and store a <2 x i32> it is spilled as two distinct i32.

Addressed feedback, and added at least one test to exercise the fix and the condition for putting the offset in an SGPR.

I would like to add more tests for Offset + Size - EltSize where Size != EltSize but I have had some trouble getting a ValueReg with subregisters to survive until the spill occurs. E.g. if I load and store a <2 x i32> it is spilled as two distinct i32.

You may have better success making all of these register clobbers in a single asm statement. Also -stress-regalloc will probably help make this simpler.

This is also a candidate for just making it a MIR test, but in this case I would probably prefer if the IR test worked

test/CodeGen/AMDGPU/spill-offset-calculation.ll
90

You can check the literal SGPR used for this

153

Should also have a test where the SGPR needs to be increment and restored (though I thought we already had one)

Also could use versions for non-kernels

scott.linder added inline comments.Jul 19 2018, 9:57 AM
test/CodeGen/AMDGPU/spill-offset-calculation.ll
153

This case is actually not scavenging another SGPR, it is just incrementing/decrementing the scratch offset SGPR. I don't know how to coerce CodeGen to do the spilling when the register scavenger is available (i.e. not in scavengeFrameVirtualRegs).

arsenm added inline comments.Jul 19 2018, 10:10 AM
test/CodeGen/AMDGPU/spill-offset-calculation.ll
153

OK, I assumed this was scavenging because you used a variable for it, as well as there being no check line for the restore. You should check the literal chosen register here because it will be fixed for the function, and need to check the restore.

Address some more feedback; use -stress-regalloc to cut down on the clobbers needed, add non-kernel tests, explicitly test the increment/decrement case, including the scratch offset SGPR.

I will try adding a MIR test to exercise the subregister condition and loop; with an IR test I don't know how to get a VGPR with subregs to survive "AMDGPU DAG->DAG Pattern Instruction Selection"

Address some more feedback; use -stress-regalloc to cut down on the clobbers needed, add non-kernel tests, explicitly test the increment/decrement case, including the scratch offset SGPR.

I will try adding a MIR test to exercise the subregister condition and loop; with an IR test I don't know how to get a VGPR with subregs to survive "AMDGPU DAG->DAG Pattern Instruction Selection"

A physical register tuple inline asm use should work

Add tests for the the boundary condition for subregs.

Thank you for the help @arsenm; are there more cases I should be testing?

arsenm added inline comments.Jul 26 2018, 3:20 AM
test/CodeGen/AMDGPU/spill-offset-calculation.ll
180

This inline assembly is invalid, since it doesn't list the memory or physical register uses.

You don't need something meaningful here anyway, you should just need a comment with the register name to model the constraint

211

Ditto

Address feedback

arsenm accepted this revision.Jul 26 2018, 11:01 AM

LGTM with the 2 remaining asms fixed

test/CodeGen/AMDGPU/spill-offset-calculation.ll
75

Missed one

106

Missed one

This revision is now accepted and ready to land.Jul 26 2018, 11:01 AM
This revision was automatically updated to reflect the committed changes.