This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add DAG ISel support for preloaded kernel arguments
ClosedPublic

Authored by kerbowa on Aug 22 2023, 11:08 PM.

Details

Summary

This patch adds the DAG isel changes for kernel argument preloading.
These changes are not usable with older firmware but subsequent patches
in the series will make the codegen backwards compatible. This patch
should only be submitted alongside that subsequent patch.

Preloading here begins from the start of the kernel arguments until the
amount of arguments indicated by the CL flag
amdgpu-kernarg-preload-count.

Aggregates and arguments passed by-ref are not supported.

Special care for the alignment of the kernarg segment is needed as well
as consideration of the alignment of addressable SGPR tuples when we
cannot directly use misaligned large tuples that the arguments are
loaded to.

Diff Detail

Event Timeline

kerbowa created this revision.Aug 22 2023, 11:08 PM
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
kerbowa requested review of this revision.Aug 22 2023, 11:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 11:08 PM
arsenm added inline comments.Aug 24 2023, 8:07 AM
llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.cpp
63

Probably should but I don't even know if this is part of debug printing anywhere, I don't know the last time I saw this

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
449

Can you move this to hasKernargPreload helper or something? Probably should make it a full subtarget feature on its own

llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
261–263 ↗(On Diff #552593)

Can you separate all the formatting changes out?

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
5003 ↗(On Diff #552593)

Check the subtarget for the limit

5003–5004 ↗(On Diff #552593)

Is the interaction with .amdhsa_user_sgpr_count checked?

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
1949–1955 ↗(On Diff #552593)

Can you split the assembler / MC bits into a separate patch?

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
927–929

Needs some temporary variables

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2247

I do not understand this metadata system, you know directly from the IR arguments the register layout

2257

unchecked dyn_caast

llvm/test/MC/AMDGPU/hsa-v4.s
113 ↗(On Diff #552593)

Need some tests with the limit exceeded diagnostic and interactions with amdhsa_user_sgpr_count

kerbowa updated this revision to Diff 556390.Sep 10 2023, 8:41 PM

Factor out MC changes, cleanup, add nop padding.

kerbowa updated this revision to Diff 556742.Sep 13 2023, 11:54 PM

Rebase with non-metadata based kernarg preload tracking.

kerbowa updated this revision to Diff 556743.Sep 14 2023, 12:03 AM

Fixup feature checks.

kerbowa updated this revision to Diff 556812.Sep 14 2023, 2:58 PM

Limit preloading to gfx940.

Also should do the globalisel equivalent

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h
158

Why is it a map? Isn't this just an array?

llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
209
llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
833–839

Shouldn't need this, there are already nop emission utilities?

845

emitValueToAlignment?

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2247
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
246

This can return ArrayRef?

267

Typo Kerarg

llvm/test/CodeGen/AMDGPU/preload-kernargs.ll
22

should be able to use a directive to avoid spamming this

kerbowa updated this revision to Diff 556999.Sep 18 2023, 10:54 PM
kerbowa marked an inline comment as done.

Rebase, address comments.

llvm/lib/Target/AMDGPU/AMDGPUArgumentUsageInfo.h
158

We would at least need an offset stored somewhere since we shouldn't assume preloads will always start at the first kernel argument.

llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp
845

I'm not sure how relevant the alignment is here actually. What matters is that we are emitting exactly 256 bytes.

bcahoon accepted this revision.Sep 22 2023, 3:29 PM

Looks good to me. I also tried some additional tests and they also worked as expected.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
2319

Always

This revision is now accepted and ready to land.Sep 22 2023, 3:29 PM
kerbowa updated this revision to Diff 557286.Sep 24 2023, 4:21 PM

Fix case where aligned register class was not a trivially lowerable copy.

bcahoon accepted this revision.Sep 24 2023, 6:33 PM

Changes LGTM. Thanks!

This revision was landed with ongoing or failed builds.Sep 25 2023, 9:33 AM
This revision was automatically updated to reflect the committed changes.