Preloaded kernel arguments should not be lowered in the IR pass
AMDGPULowerKernelArguments. Therefore it's necessary to calculate the
total number of user SGPRs that are available for preloading and how
many SGPRs would be required to preload each argument to determine
whether we should skip lowering i.e. the argument will be preloaded
instead.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I will probably end up factoring out the FreeUserSGPRsCount code somewhere just to make sure the calculation between here and MFI is consistent.
I have a problem with the 'current FW'. What if an user uses not a current FW? Preloading shall be accompanied with the 1) code reading SGPRs 2) FW writing these SGPRs 3) RT undertstanding all of that. If any of that disagrees things will get broken. From this review request I do not see how is that going to happen in a lockstep.
Testing is clearly broken: "opt: Unknown command line argument '-amdgpu-kernarg-preload-count=1'"
Also 'current FW' is really vague. It differs between cards. What does a current SI FW do for example?
llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp | ||
---|---|---|
28 | All this logic is basically repeated in the other 2 kernarg lowering implementations, can this be shared? | |
160 | This AllocSize thing isn't sophisticated enough and won't handle aggregates correctly. You need to check the type and number of parts from getNumRegistersForCallingConv and getRegisterTypeForCallingConv | |
165–167 | My first thought is I'm not a huge fan of adding metadata for communicating between passes but I'll have to look at the use context | |
llvm/test/CodeGen/AMDGPU/preload-kernal-args-metadata.ll | ||
2 ↗ | (On Diff #546326) | Can you use -passes here? |
Drive by. Style: Remove all the llvm:: you don't need.
llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp | ||
---|---|---|
173–176 | Swap cases. No need for llvm:: |
The compiler is supposed to produce backwards-compatible code. New FW is the only way to reach code that expects preloaded kern args.
llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp | ||
---|---|---|
160 | I was planning on ignoring aggregate types for now. | |
173–176 | Thanks, this condition disappeared in child revisions since we continue here, so no need to flip the cases. | |
174 | The metadata communicates which arguments, and how many, should be preloaded. While not strictly needed for correctness since isel should be able to lower any arguments not handled in this pass, it would be nice to have some indication that some amount of preloading has been requested and to not rely entirely on the inreg attribute. If you would rather do away with the metadata entirely I can consider that, but I am someone concerned that inreg may be added when it is not intended to indicate preloading. |
mostly LGTM with some nits
llvm/lib/Target/AMDGPU/AMDGPULowerKernelArguments.cpp | ||
---|---|---|
33 | Remove blank lines? | |
46 | Typo SGRPs | |
54 | ||
56 | isAligned has a weird argument order | |
157 | don't see why no uses would call for skipping it. You may just account for this when picking a starting position | |
llvm/lib/Target/AMDGPU/GCNSubtarget.h | ||
1188 ↗ | (On Diff #556811) | BC isn't obvious to me. Spell out compatibility? |
LGTM. The part I'm wondering about is in tryAllocPreloadSGPRs, the part about multiple arguments in the same SGPR. I think there is code in https://reviews.llvm.org/D158579 that handles this case as well.
All this logic is basically repeated in the other 2 kernarg lowering implementations, can this be shared?