This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add ASM and MC updates for preloading kernargs
ClosedPublic

Authored by kerbowa on Sep 5 2023, 5:59 PM.

Details

Summary

Add assembler directives for preloading kernel arguments that correspond
to new fields in the kernel descriptor for the length and offset of
arguments that will be placed in SGPRs prior to kernel launch. Alignment
of the arguments in SGPRs is equivalent to the kernarg segment when
accessed via the kernarg_segment_ptr. Kernarg SGPRs are allocated
directly after other user SGPRs.

Diff Detail

Event Timeline

kerbowa created this revision.Sep 5 2023, 5:59 PM
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
kerbowa requested review of this revision.Sep 5 2023, 5:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2023, 5:59 PM

Added on phab because of the dependencies on other patches and is split from other existing reviews.

kerbowa updated this revision to Diff 555961.Sep 5 2023, 7:55 PM

Restrict to gfx90a and gfx940.

Alignment of the arguments in SGPRs is equivalent to the kernarg segment when

accessed via the kernarg_segment_ptr.

The alignment should not be relevant to the registers. The registers should always be packed

Also diagnose out of bounds offsets?

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1953

I think the disassembler should proceed to print whatever is there regardless of the support. It can skip printing if it's 0

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2149

Probably should define a proper sub target feature for this

llvm/test/MC/AMDGPU/user-sgpr-count-diag.s
6

Why amdhsa_accum_offset? I don't recognize this one

kerbowa updated this revision to Diff 556613.Sep 12 2023, 3:19 PM
kerbowa marked an inline comment as done.

Address comments. Bounds checking. Add formal subtarget feature for KernargPreload.

Alignment of the arguments in SGPRs is equivalent to the kernarg segment when

accessed via the kernarg_segment_ptr.

The alignment should not be relevant to the registers. The registers should always be packed

I don't think they are with the current FW/runtime. The placement of the arguments is exactly the same as the kernarg segment. In the future, the runtime could fix the alignment so that the arguments are packed but Jack said he didn't want to pursue this yet.

llvm/test/MC/AMDGPU/user-sgpr-count-diag.s
6

It errors out if it's not there.

kerbowa updated this revision to Diff 556824.Sep 14 2023, 9:49 PM

Make kernarg_size directive optional per-request.

arsenm accepted this revision.Sep 15 2023, 6:14 AM
This revision is now accepted and ready to land.Sep 15 2023, 6:14 AM
This revision was landed with ongoing or failed builds.Sep 19 2023, 3:47 PM
This revision was automatically updated to reflect the committed changes.