Scale the offset of VGPR spills by the wave size when it cannot fit in the 12-bit offset immediate field and so is added to the soffset SGPR. This accounts for hardware swizzling of scratch memory.
Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
543 | Setting this to Offset does not seem correct as the real value needs to be the scaled offset which is computed below. Here we just need the variable declared and it only has a valid value when RanOutOfSGPRs is true. So perhaps initialize to 0? Probably a better approach is to eliminate RanOutOfSGPRs and instead just have ScratchOffsetRegDelta which is initialized to 0. Then set it when it is actually used (as already done). Then change the "if (RanOutOfSGPRs)" below to "if (ScratchOffsetRegDelta != 0)". | |
548 | If this is introduced then seems it should be used as well in the NumSubRegs and Size calculations above which are also in terms of the EltSize. | |
548–550 | Should this be Offset + Size - EltSize? Note that the loop is 0 based and increments by EltSize. |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
---|---|---|
548–550 | Yes, and this needs a very specific test to stress it |
Addressed feedback, and added at least one test to exercise the fix and the condition for putting the offset in an SGPR.
I would like to add more tests for Offset + Size - EltSize where Size != EltSize but I have had some trouble getting a ValueReg with subregisters to survive until the spill occurs. E.g. if I load and store a <2 x i32> it is spilled as two distinct i32.
You may have better success making all of these register clobbers in a single asm statement. Also -stress-regalloc will probably help make this simpler.
This is also a candidate for just making it a MIR test, but in this case I would probably prefer if the IR test worked
test/CodeGen/AMDGPU/spill-offset-calculation.ll | ||
---|---|---|
90 | You can check the literal SGPR used for this | |
153 | Should also have a test where the SGPR needs to be increment and restored (though I thought we already had one) Also could use versions for non-kernels |
test/CodeGen/AMDGPU/spill-offset-calculation.ll | ||
---|---|---|
153 | This case is actually not scavenging another SGPR, it is just incrementing/decrementing the scratch offset SGPR. I don't know how to coerce CodeGen to do the spilling when the register scavenger is available (i.e. not in scavengeFrameVirtualRegs). |
test/CodeGen/AMDGPU/spill-offset-calculation.ll | ||
---|---|---|
153 | OK, I assumed this was scavenging because you used a variable for it, as well as there being no check line for the restore. You should check the literal chosen register here because it will be fixed for the function, and need to check the restore. |
Address some more feedback; use -stress-regalloc to cut down on the clobbers needed, add non-kernel tests, explicitly test the increment/decrement case, including the scratch offset SGPR.
I will try adding a MIR test to exercise the subregister condition and loop; with an IR test I don't know how to get a VGPR with subregs to survive "AMDGPU DAG->DAG Pattern Instruction Selection"
Add tests for the the boundary condition for subregs.
Thank you for the help @arsenm; are there more cases I should be testing?
Setting this to Offset does not seem correct as the real value needs to be the scaled offset which is computed below. Here we just need the variable declared and it only has a valid value when RanOutOfSGPRs is true. So perhaps initialize to 0?
Probably a better approach is to eliminate RanOutOfSGPRs and instead just have ScratchOffsetRegDelta which is initialized to 0. Then set it when it is actually used (as already done). Then change the "if (RanOutOfSGPRs)" below to "if (ScratchOffsetRegDelta != 0)".