This lets you select which sort of spilling you want, either s[0:1] or 64-bit loads from s[0:1].
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
FYI: The current mesa side of this is here:
https://github.com/airlied/mesa/tree/radv-wip-spilling
I think ABI changes should be encoded as part of the triple (a radv environment type?) rather than a subtarget feature. This is supposed to be just the pointer and not the full scratch resource descriptor?
Alternatively we could just switch mesa to always doing this and remove the relocations
Hmm I'll try and investigate adding a radv env triple tomorrow.
Just a pointer, I don't have space for 4 user gprs here. An alternative plan of action is to make the first descriptor on the first pointer in the user sgprs to point to the spill descriptor, I'm not sure if that causes problems though.
I'm not sure the GL driver has enough spare user sgprs to allow this to happen.
As mentioned on mesa-dev, I like the general approach, but instead of a machine feature to enable this I'd use a function attribute which designates one of the function arguments as the source for the spill pointer, e.g. "amdgpu-spill-ptr=4" to take the pointer from the fifth function argument (which would typically be SGPR 8/9).
True, GL will actually need to do some indirect loading at least for compute shaders. Also, we should really set the size part of the descriptor correctly anyway, so there's not much benefit to loading a full descriptor instead of just a 64-bit pointer.
Updated diff to spill to a 64-bit buffer, pointed to by 64-bits in the buffer pointed to by SGPR0/1.
lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
245–270 | Formatting for this all looks wrong | |
247 | This is confusingly named because it's not S_MOV_B64 | |
248 | You can just use the literal AMDGPU::SGPR0_SGPR1 if you're just using a known constant register. However, I would expect this to be getting the register from the MachineFunctionInfo rather than hardcoding it again inside FrameLowering | |
253 | This shouldn't be undef, the read value does matter | |
256 | Because this is using physical registers here, the sub register index isn't used (also sub0_sub1 of s[0:1] doesn't make sense) | |
256 | Because you are creating a load you should add a MachineMemOperand or else it will unduly constrain the post-RA scheduler. There are a few other dummy places that create read-only memory operands for similar purposes. Also because it's a load I would hope we could do this much earlier than frame lowering. I'm thinking about how to properly initialize m0 in the HSA ABI which also requires a load so I will probably end up taking care of that eventually |
This is missing the part touching the argument lowering. When the input sgpr0/1 is going to be used it needs to be marked in the initial argument lowering in LowerFormalArguments, or else the other argument usage might end up thinking it can use these
lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
316 | I think this needs a better name, that definitely doesn't involve the word spill. We currently mix the terms scratch and private memory. How about hasPrivateMemoryPointerInput()? | |
326 | hasIndirectPrivaetMemoryPointerInput? I assume you are putting other things in this buffer besides just this one pointer, so maybe the name should be whatever you want to call that buffer. | |
333–335 | These go over 80 lines | |
338–340 | Usually we put a comment with the name of the operand after each one for the more complicated instructions, e.g. // offset | |
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
132–135 | I don't really like using the attributes this way naming specific registers. This needs to always be available, so I don't see why you need to explicitly enable this particularly in the indirect case. |
lib/Target/AMDGPU/SIFrameLowering.cpp | ||
---|---|---|
326 | There are no other things in the scratch buffer. Only LLVM uses it. It doesn't matter what it's called from the Mesa's point of view. | |
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
132–135 | I don't understand the comment. Of course it's always available. The driver just chooses one of the methods: 1) scratch relocation; 2) the pointer is in SGPR01; 3) the pointer is at the address pointed to by SGPR01 |
This rewrites the patch, using a new triple OsType (needs renaming to something better), and adds an intrinsic to get access to the private buffer.
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
103–105 ↗ | (On Diff #82643) | The name shouldn't include private because this is the pointer to the buffer with more than just the private resource descriptor. |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
156 | Did you try adding the version number to the end and checking that instead of adding a new OS? |
LGTM except for some pedantry
lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
---|---|---|
300 | Looks like it's over 80 columns | |
lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
318 | C++ style comment, capitalized | |
327 | Extra line | |
lib/Target/AMDGPU/SIRegisterInfo.cpp | ||
1114–1117 ↗ | (On Diff #85525) | No return after else |
fixed a regression in the when to emit spill setup path, should have just added isMesaGfx option not removed the other one.
Did you try adding the version number to the end and checking that instead of adding a new OS?