This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add IR lowering changes for preloaded kernargs
ClosedPublic

Authored by kerbowa on Aug 1 2023, 11:11 PM.

Details

Summary

Preloaded kernel arguments should not be lowered in the IR pass
AMDGPULowerKernelArguments. Therefore it's necessary to calculate the
total number of user SGPRs that are available for preloading and how
many SGPRs would be required to preload each argument to determine
whether we should skip lowering i.e. the argument will be preloaded
instead.

Diff Detail

Event Timeline

kerbowa created this revision.Aug 1 2023, 11:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2023, 11:11 PM
kerbowa requested review of this revision.Aug 1 2023, 11:11 PM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript

I will probably end up factoring out the FreeUserSGPRsCount code somewhere just to make sure the calculation between here and MFI is consistent.

I have a problem with the 'current FW'. What if an user uses not a current FW? Preloading shall be accompanied with the 1) code reading SGPRs 2) FW writing these SGPRs 3) RT undertstanding all of that. If any of that disagrees things will get broken. From this review request I do not see how is that going to happen in a lockstep.

Testing is clearly broken: "opt: Unknown command line argument '-amdgpu-kernarg-preload-count=1'"

Also 'current FW' is really vague. It differs between cards. What does a current SI FW do for example?

arsenm added inline comments.Aug 2 2023, 4:08 AM
llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
28

All this logic is basically repeated in the other 2 kernarg lowering implementations, can this be shared?

160

This AllocSize thing isn't sophisticated enough and won't handle aggregates correctly. You need to check the type and number of parts from getNumRegistersForCallingConv and getRegisterTypeForCallingConv

165–167

My first thought is I'm not a huge fan of adding metadata for communicating between passes but I'll have to look at the use context

llvm/test/CodeGen/AMDGPU/preload-kernal-args-metadata.ll
2 ↗(On Diff #546326)

Can you use -passes here?

kerbowa updated this revision to Diff 551908.Aug 20 2023, 11:50 PM

Factor out user SGPR info.
Refine conditions for preloading an argument.

Drive by. Style: Remove all the llvm:: you don't need.

llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
173–176

Swap cases. No need for llvm::

arsenm added inline comments.Aug 23 2023, 3:30 PM
llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
167–174

Don't understand why you need to read, much less create new metadata here

llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
148–155 ↗(On Diff #551908)

Can you split this refactoring into a separate NFC patch

kerbowa updated this revision to Diff 555812.Sep 4 2023, 11:45 PM
kerbowa marked 2 inline comments as done.

Factor out changes, address comments.

I have a problem with the 'current FW'. What if an user uses not a current FW? Preloading shall be accompanied with the 1) code reading SGPRs 2) FW writing these SGPRs 3) RT undertstanding all of that. If any of that disagrees things will get broken. From this review request I do not see how is that going to happen in a lockstep.

The compiler is supposed to produce backwards-compatible code. New FW is the only way to reach code that expects preloaded kern args.

llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
160

I was planning on ignoring aggregate types for now.

173–176

Thanks, this condition disappeared in child revisions since we continue here, so no need to flip the cases.

174

The metadata communicates which arguments, and how many, should be preloaded. While not strictly needed for correctness since isel should be able to lower any arguments not handled in this pass, it would be nice to have some indication that some amount of preloading has been requested and to not rely entirely on the inreg attribute. If you would rather do away with the metadata entirely I can consider that, but I am someone concerned that inreg may be added when it is not intended to indicate preloading.

kerbowa updated this revision to Diff 556601.Sep 12 2023, 11:52 AM

Remove metadata.

kerbowa retitled this revision from [AMDGPU] Add metadata to track preloaded arguments to [AMDGPU] Add IR lowering changes for preloaded kernargs.Sep 12 2023, 12:00 PM
kerbowa edited the summary of this revision. (Show Details)
kerbowa updated this revision to Diff 556811.Sep 14 2023, 2:30 PM

Limit preloading to gfx940.

mostly LGTM with some nits

llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp
33

Remove blank lines?

46

Typo SGRPs

54
56

isAligned has a weird argument order

157

don't see why no uses would call for skipping it. You may just account for this when picking a starting position

llvm/lib/Target/AMDGPU/GCNSubtarget.h
1188 ↗(On Diff #556811)

BC isn't obvious to me. Spell out compatibility?

kerbowa updated this revision to Diff 556875.Sep 15 2023, 12:05 PM

Address comments.

bcahoon accepted this revision.Sep 22 2023, 7:58 AM

LGTM. The part I'm wondering about is in tryAllocPreloadSGPRs, the part about multiple arguments in the same SGPR. I think there is code in https://reviews.llvm.org/D158579 that handles this case as well.

This revision is now accepted and ready to land.Sep 22 2023, 7:58 AM
This revision was landed with ongoing or failed builds.Sep 25 2023, 8:54 AM
This revision was automatically updated to reflect the committed changes.