This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Use the implicit kernargs for code object version 5
ClosedPublic

Authored by cfang on Feb 21 2022, 11:23 AM.

Details

Summary

Specifically, for trap handling, for targets that do not support getDoorbellID,
we load the queue_ptr from the implicit kernarg, and move queue_ptr to s[0:1].
To get aperture bases when targets do not have aperture registers, we load
private_base or shared_base directly from the implicit kernarg. In clang, we use
implicitarg_ptr + offsets to implement __builtin_amdgcn_workgroup_size_{xyz}.

Diff Detail

Unit TestsFailed

Event Timeline

cfang created this revision.Feb 21 2022, 11:23 AM
cfang requested review of this revision.Feb 21 2022, 11:23 AM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: wdng. · View Herald Transcript
yaxunl added inline comments.Feb 21 2022, 2:59 PM
clang/lib/CodeGen/CGBuiltin.cpp
16253–16254

Is the logic reversed?

cfang added inline comments.Feb 21 2022, 3:17 PM
clang/lib/CodeGen/CGBuiltin.cpp
16253–16254

You are right. Will correct it. Thanks

cfang updated this revision to Diff 410415.Feb 21 2022, 4:01 PM
cfang marked an inline comment as done.

Correct the logic ordering error in selecting the intrinsic based on code object version.
Also update the code based on clang-format check and Lint suggestions.

arsenm added inline comments.Feb 21 2022, 5:40 PM
clang/lib/CodeGen/CGBuiltin.cpp
16252–16254

Given that it's an offset from a different base, I think it would be cleaner to just branch around the two cases

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
410–412 ↗(On Diff #410415)

This isn't covered by any test changes

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
326–328

You shouldn't merge these into the same enum. This enum should be renamed, this is for a different clover ABI

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4876

You're repeating this long sequence to get the queue pointer in two places, should common these into a function to get the queue pointer. Alternatively, emit the intrinsic and move this expansion into a lowering of the queue pointer intrinsic

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
155 ↗(On Diff #410415)

The code object version should probably come from the IR, not a global opt

arsenm added inline comments.Feb 21 2022, 5:46 PM
clang/lib/CodeGen/CGBuiltin.cpp
16243

Isn't the total size 256?

16244

If I remember correctly we require 8 byte alignment (not that it makes much difference)

Please fix the commit description so that the first line is self-contained and separated from the rest by a blank line. This matters a lot when looking at the output of "git log ---oneline". In particular, the start with "in this work we implement" is pretty much unnecessary.

cfang added inline comments.Feb 21 2022, 9:32 PM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
155 ↗(On Diff #410415)

Can you be explicit how to get code object version from the IR?

cfang added inline comments.Feb 21 2022, 9:53 PM
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4876

We are loading different implicit kernel arguments in these two place, one is for queue_ptr, and another is for private_base/shared_base. I can try to figure out whether we can factor out some common part.

scott.linder added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
16255

I agree with Matt above; this comment seems like it should also be updated for the v5 case?

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
4388

typo

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1817

These parens are redundant

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
155 ↗(On Diff #410415)

IIUC the global opt var is the best we have right now, and any improvement to that situation is orthogonal to this change. I would vote that this not block the patch under review

cfang added inline comments.Feb 22 2022, 2:26 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
410–412 ↗(On Diff #410415)

Are the tests of @llvm_amdgcn_is_private, @llvm_amdgcn_is_shared and @addrspacecast under v5 covering this? For example, for @llvm_amdgcn_is_shared, "s_load_dword s0, s[6:7], 0xcc" is generated to load
the shared_base and s[6:7] holds the argument ptr.

cfang added inline comments.Feb 22 2022, 2:30 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
326–328

I am not clear what is the clover ABI. But GRID_DIM and GRID_OFFSET have never been used. Can we simply remove them?

cfang edited the summary of this revision. (Show Details)Feb 22 2022, 2:36 PM
cfang updated this revision to Diff 410933.Feb 23 2022, 2:02 PM

Update based on review feedback

  1. Update EmitAMDGPUImplicitArgPtr: DereferenceableBytes=256 and Align=8
  2. Branch around v5 to emit ImplicitArgPtr or DispatchPtr,
  3. Remove unused entries (GROD_DIM and GRID_OFFSET) in ImplicitParameter enum
  4. Fix typos and remove unnecessary "(" and ")".
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 11:57 AM
cfang updated this revision to Diff 414711.Mar 11 2022, 11:24 AM

Rebase and update LIT tests.

Please remove the JIRA ticket number from the commit description.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
4876

There is still a lot common between this hunk and the previous hunk. It should be refactored into a common function.

cfang marked an inline comment as done.Mar 14 2022, 3:50 PM
cfang updated this revision to Diff 415256.Mar 14 2022, 3:57 PM
  1. Introduce a common function, SITargetLowering::loadImplicitKernelArgument, which is used

in both getSegmentAperture and lowerTrapHsaQueuePtr.

  1. Define enum ImplicitKernargOffset to get implicit kernel argument offset directly.
    • This eliminates the functions to get the offset.
arsenm added inline comments.Mar 14 2022, 4:08 PM
clang/lib/CodeGen/CGBuiltin.cpp
16259

Extra space after (

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1839

Use buildPtrAdd (not sure why we still have materializePtrAdd)

4850

ST is available in the class

llvm/lib/Target/AMDGPU/SIDefines.h
784 ↗(On Diff #415256)

Add a COV5 suffix? Probably should also wrap in a namespace

cfang marked 4 inline comments as done.Mar 16 2022, 3:06 PM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
1839

Use buildPtrAdd ! Thanks.

4850

Remove it. Thanks.

llvm/lib/Target/AMDGPU/SIDefines.h
784 ↗(On Diff #415256)

Add COV5 suffix, and wrap in a namespace of ImplicitArg. Rename the type to
Offset. So it is of AMDGPU::ImplicitArg::Offset type.

cfang updated this revision to Diff 416017.Mar 16 2022, 4:15 PM
cfang marked 3 inline comments as done.

Update based on Matt's comments:

  1. Use buildPtrAdd
  2. Remove a space
  3. Add suffix for the enum definition and also wrap with a namespace
  4. Remove the redundant def of ST (SubTarget)
  5. Updated according to clang-format
arsenm added inline comments.Mar 16 2022, 4:46 PM
llvm/lib/Target/AMDGPU/SIDefines.h
784 ↗(On Diff #415256)

I meant suffix on the enum itself, not on each individual field

sameerds added inline comments.Mar 16 2022, 9:34 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
546 ↗(On Diff #416017)

I think we should keep using the original getHeapPtrImplicitArgPosition(). Hardcoding the enum here doesn't necessarily make the code more readable. And later if we have a different value in COV6, we will end up reintroducing a check for the code-object-version anyway. That check can be encapsulated within the get...ArgPosition() family of functions.

llvm/lib/Target/AMDGPU/SIDefines.h
786 ↗(On Diff #416017)

This should be "UPTO_COV4". Or if we really want to say COV5, then "BEFORE_COV5" or "PRE_COV5". But to me, "UPTO_COV4" is the clearest.

cfang marked 2 inline comments as done.Mar 17 2022, 12:07 AM
cfang added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
546 ↗(On Diff #416017)

I personally do not have any preference here to use the offset enum or a function. Similarly I also could not understand why the offset of an existing argument changes its value across code object versions. However, I do think this should not block the current work of code object 5.

llvm/lib/Target/AMDGPU/SIDefines.h
786 ↗(On Diff #416017)

This issue does not exist if the _COV5 suffix is for the type of this enum because we only consider COV5 in this definition.

784 ↗(On Diff #415256)

OK, will change to the suffix of the enum itself.

cfang updated this revision to Diff 416080.Mar 17 2022, 12:11 AM
cfang marked 2 inline comments as done.
cfang edited the summary of this revision. (Show Details)

A minor change: add suffix to the enum itself instead of the individual field.
Also remove the "Fixes" field in the summary (commit message).

arsenm accepted this revision.Mar 17 2022, 1:33 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
323–325

This isn't FIRST_IMPLICIT unless you reuse the same value. There's also no use of FIRST_IMPLICIT so you might as well remove it

This revision is now accepted and ready to land.Mar 17 2022, 1:33 PM
cfang added inline comments.Mar 17 2022, 1:59 PM
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
323–325

FIRST_IMPLICIT is used in multiple places, for example, getImplicitArgPtr, which is used to lower Intrinsic::amdgcn_implicitarg_ptr. WE need amdgcn_implicitarg_ptr + offset to reference the implicit kernel argument.

This revision was landed with ongoing or failed builds.Mar 17 2022, 2:13 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 2:13 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript