In general, we need queue_ptr for aperture bases and trap handling, and user SGPRs have to be set up to hold queue_ptr. In current implementation, user SGPRs are set up unnecessarily for some cases. If the target has aperture registers, queue_ptr is not needed to reference aperture bases. For trap handling, if target suppots getDoorbellID, queue_ptr is also not necessary. Futher, code object version 5 introduces new kernel ABI which passes queue_ptr as an implicit kernel argument, so user SGPRs are no longer necessary for queue_ptr. Based on the trap handling document: https://llvm.org/docs/AMDGPUUsage.html#amdgpu-trap-handler-for-amdhsa-os-v4-onwards-table, llvm.debugtrap does not need queue_ptr, we remove queue_ptr suport for llvm.debugtrap in the backend.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
55 | This really ought to be something read from the IR | |
429–430 | This should recognize both the intrinsic and load from the specific offset from the implicitarg ptr, similar to the new hostcall handling. We still should be able to infer no queue ptr with it in memory | |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
653 ↗ | (On Diff #408540) | I don't see why this query needs to change, the code object version can be considered when setting QueuePtr initially |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
429–430 | This is different from the case of hostcall handling. We are handling aperture bases in the backend. We do not have explicit intrinsic call for implicitarf ptr. | |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
653 ↗ | (On Diff #408540) | I intend to factor out same thing like QuterPre && (CodeObjectVersion < 5). |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
429–430 | It's not different because some subtargets still use the queue pointer from here (pre gfx9) | |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.h | ||
653 ↗ | (On Diff #408540) | No, I mean it's inappropriate to check the version here. The version+subtarget should change whether QueuePtr is set, not add an additional check here |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
429–430 | I know some subtargets still use the queue pointer. However, |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
429–430 | The logical queue pointer value still exists and we can infer that it's not needed, just like hostcall in this case |
Remove the introduced SIMachineFunctionInfo::needsQueuePtrUserSGPRs function.
Also correct the formats.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
429–430 | We should still be tracking the logical queue pointer |
Please fix the commit description so that the first line is self-contained and separated from the rest by a blank line. This matters a lot when looking at the output of "git log ---oneline".
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
429–430 | Can you be explicit what is "logical queue pointer" here> And why do we need to trace it? |
Add the check for queue_ptr through implicitarg_ptr + offset (200) under V5.
Add corresponding LIT test.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
55 | As discussed in another review (https://reviews.llvm.org/D119027 |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
166–171 | Will do this in a following patch because the enum of the implicit kernel arguments will also be used there. Still thinking of the appropriate definition. |
This really ought to be something read from the IR