This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use inreg for hint to preload kernel arguments
ClosedPublic

Authored by kerbowa on Aug 1 2023, 11:11 PM.

Details

Summary

This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.

Diff Detail

Event Timeline

kerbowa created this revision.Aug 1 2023, 11:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2023, 11:11 PM
Herald added subscribers: foad, okura, kuter and 6 others. · View Herald Transcript
kerbowa requested review of this revision.Aug 1 2023, 11:11 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
arsenm added inline comments.Aug 2 2023, 4:16 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
922

This isn't really gaining value by being in the attributor as it stands. Are you planning on a more sophisticated user analysis to select the starting point?

924

Assuming 1 register : 1 argument which is not the case.

Also the user SGPR count went up (in gfx10 I think?) so you should query the number for that

986

Don't need the arg_empty check, the implementation handles it anyway and it's uncommon

llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll
3

Use -passes

236

Needs some exotic types. i8, <2 x half>, <3 x half> odd bit sized scalars, odd bit sized element vectors, odd vectors, structs, arrays and nested structs

arsenm added inline comments.Aug 2 2023, 4:31 AM
llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll
236

We could probably get away with making the verifier reject aggregate arguments for amdgpu_kernel

rampitec added inline comments.Aug 2 2023, 10:08 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
924

AFAIR it was increased to 32 in gfx9, then again was 16 in gfx10. I also do not think this is implemented.

arsenm added inline comments.Aug 2 2023, 10:26 AM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
924

We already have getMaxNumUserSGPRs but it's just hardcoded

kerbowa added inline comments.Aug 13 2023, 9:35 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
922

Yes, that's the plan. Any heuristics of that sort will be implemented here.

924

InReg is just a hint so being super accurate with the number of available SGPRs isn't needed for correctness here. The intent is just to have it stop at the maximum, but if it goes over it doesn't matter. I wasn't aware that the number of available user SGPRs had changed i.e. at https://llvm.org/docs/AMDGPUUsage.html I just see 16, I've only ever heard the number 16 talked about. Anyway, I will change this to getMaxNumUserSGPRs as I look into what's actually supported more.

llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll
3

I think it won't work for amdgpu-attributor with old pass manager?

236

Kind of unnecessary here since none of that impacts the addition of inreg at this point, it will just reject preloading the argument later.

kerbowa updated this revision to Diff 549787.Aug 13 2023, 9:36 PM

Use getMaxNumUserSGPRs.
Check incompatible attributes.

kerbowa updated this revision to Diff 552570.Aug 22 2023, 6:53 PM

Add metadata to ensure that preloading is explicitly requested.

arsenm added inline comments.Aug 23 2023, 3:27 PM
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
930

Could actually handle byref (and should, ideally all kernargs would use byref). If we consistently used byref, you could implement this by moving memory values to the arg list

939

Typo arguemnts

942

I don't understand why you need this metadata. You mark the arguments with inreg which provides the same information? You can still ignore the hint in the codegen

llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll
3

Apparently it hasn't been ported yet

kerbowa updated this revision to Diff 555797.Sep 4 2023, 9:08 PM
kerbowa marked an inline comment as done.

Fix typo

kerbowa marked 3 inline comments as done.Sep 4 2023, 9:09 PM
kerbowa added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
930

I'm planning on adding it in a future revision but I don't think it's needed for this iteration of the changes.

942

I'm adding the metadata early to ensure that kernarg preloading is constrained to those that are explicitly using the CL flag. In case there are some frontend that may sneak inreg on some arguments. Will probably just until the feature is more mature or is enabled by default.

kerbowa updated this revision to Diff 556388.Sep 10 2023, 8:40 PM
kerbowa marked 2 inline comments as done.

Cleanup.

arsenm requested changes to this revision.Sep 12 2023, 4:00 AM

I don't want any usage of metadata, it's confusing and unnecessary. The layout should be fully deterministic from just the argument list and inreg, you don't need a parallel encoding scheme

llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
942

This is an anti feature. Just respect inreg regardless of the source. You're just adding a huge source of fragility just in case a frontend tries to make use of an in progress feature

This revision now requires changes to proceed.Sep 12 2023, 4:00 AM
kerbowa updated this revision to Diff 556588.Sep 12 2023, 10:10 AM

Remove metadata.

I don't want any usage of metadata, it's confusing and unnecessary. The layout should be fully deterministic from just the argument list and inreg, you don't need a parallel encoding scheme

Thanks for the review Matt, I've removed the metadata system entirely.

arsenm accepted this revision.Sep 14 2023, 8:41 AM
This revision is now accepted and ready to land.Sep 14 2023, 8:41 AM
This revision was landed with ongoing or failed builds.Sep 19 2023, 3:14 PM
This revision was automatically updated to reflect the committed changes.