Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
60,050 ms | x64 debian > libFuzzer.libFuzzer::large.test |
Event Timeline
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
16253–16254 | Is the logic reversed? |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
16253–16254 | You are right. Will correct it. Thanks |
Correct the logic ordering error in selecting the intrinsic based on code object version.
Also update the code based on clang-format check and Lint suggestions.
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
16252–16254 | Given that it's an offset from a different base, I think it would be cleaner to just branch around the two cases | |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
410–412 ↗ | (On Diff #410415) | This isn't covered by any test changes |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
326–328 | You shouldn't merge these into the same enum. This enum should be renamed, this is for a different clover ABI | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
4876 | You're repeating this long sequence to get the queue pointer in two places, should common these into a function to get the queue pointer. Alternatively, emit the intrinsic and move this expansion into a lowering of the queue pointer intrinsic | |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
155 ↗ | (On Diff #410415) | The code object version should probably come from the IR, not a global opt |
Please fix the commit description so that the first line is self-contained and separated from the rest by a blank line. This matters a lot when looking at the output of "git log ---oneline". In particular, the start with "in this work we implement" is pretty much unnecessary.
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
155 ↗ | (On Diff #410415) | Can you be explicit how to get code object version from the IR? |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4876 | We are loading different implicit kernel arguments in these two place, one is for queue_ptr, and another is for private_base/shared_base. I can try to figure out whether we can factor out some common part. |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
16255 | I agree with Matt above; this comment seems like it should also be updated for the v5 case? | |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
4388 | typo | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
1817 | These parens are redundant | |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
155 ↗ | (On Diff #410415) | IIUC the global opt var is the best we have right now, and any improvement to that situation is orthogonal to this change. I would vote that this not block the patch under review |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
410–412 ↗ | (On Diff #410415) | Are the tests of @llvm_amdgcn_is_private, @llvm_amdgcn_is_shared and @addrspacecast under v5 covering this? For example, for @llvm_amdgcn_is_shared, "s_load_dword s0, s[6:7], 0xcc" is generated to load |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
---|---|---|
326–328 | I am not clear what is the clover ABI. But GRID_DIM and GRID_OFFSET have never been used. Can we simply remove them? |
Update based on review feedback
- Update EmitAMDGPUImplicitArgPtr: DereferenceableBytes=256 and Align=8
- Branch around v5 to emit ImplicitArgPtr or DispatchPtr,
- Remove unused entries (GROD_DIM and GRID_OFFSET) in ImplicitParameter enum
- Fix typos and remove unnecessary "(" and ")".
Please remove the JIRA ticket number from the commit description.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
4876 | There is still a lot common between this hunk and the previous hunk. It should be refactored into a common function. |
- Introduce a common function, SITargetLowering::loadImplicitKernelArgument, which is used
in both getSegmentAperture and lowerTrapHsaQueuePtr.
- Define enum ImplicitKernargOffset to get implicit kernel argument offset directly.
- This eliminates the functions to get the offset.
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
16259 | Extra space after ( | |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
1839 | Use buildPtrAdd (not sure why we still have materializePtrAdd) | |
4850 | ST is available in the class | |
llvm/lib/Target/AMDGPU/SIDefines.h | ||
784 ↗ | (On Diff #415256) | Add a COV5 suffix? Probably should also wrap in a namespace |
Update based on Matt's comments:
- Use buildPtrAdd
- Remove a space
- Add suffix for the enum definition and also wrap with a namespace
- Remove the redundant def of ST (SubTarget)
- Updated according to clang-format
llvm/lib/Target/AMDGPU/SIDefines.h | ||
---|---|---|
784 ↗ | (On Diff #415256) | I meant suffix on the enum itself, not on each individual field |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
546 ↗ | (On Diff #416017) | I think we should keep using the original getHeapPtrImplicitArgPosition(). Hardcoding the enum here doesn't necessarily make the code more readable. And later if we have a different value in COV6, we will end up reintroducing a check for the code-object-version anyway. That check can be encapsulated within the get...ArgPosition() family of functions. |
llvm/lib/Target/AMDGPU/SIDefines.h | ||
786 ↗ | (On Diff #416017) | This should be "UPTO_COV4". Or if we really want to say COV5, then "BEFORE_COV5" or "PRE_COV5". But to me, "UPTO_COV4" is the clearest. |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
546 ↗ | (On Diff #416017) | I personally do not have any preference here to use the offset enum or a function. Similarly I also could not understand why the offset of an existing argument changes its value across code object versions. However, I do think this should not block the current work of code object 5. |
llvm/lib/Target/AMDGPU/SIDefines.h | ||
786 ↗ | (On Diff #416017) | This issue does not exist if the _COV5 suffix is for the type of this enum because we only consider COV5 in this definition. |
784 ↗ | (On Diff #415256) | OK, will change to the suffix of the enum itself. |
A minor change: add suffix to the enum itself instead of the individual field.
Also remove the "Fixes" field in the summary (commit message).
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
---|---|---|
323–325 | This isn't FIRST_IMPLICIT unless you reuse the same value. There's also no use of FIRST_IMPLICIT so you might as well remove it |
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h | ||
---|---|---|
323–325 | FIRST_IMPLICIT is used in multiple places, for example, getImplicitArgPtr, which is used to lower Intrinsic::amdgcn_implicitarg_ptr. WE need amdgcn_implicitarg_ptr + offset to reference the implicit kernel argument. |
Isn't the total size 256?