This patch is the first in a series that adds support for pre-loading
kernel arguments into SGPRs. The command-line argument
'amdgpu-kernarg-preload-count' is used to specify the number of
arguments sequentially from the first that we should attempt to preload,
the default is 0.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
30 ms | x64 debian > Flang.Driver::pic-flags.f90 |
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
925 | This isn't really gaining value by being in the attributor as it stands. Are you planning on a more sophisticated user analysis to select the starting point? | |
927 | Assuming 1 register : 1 argument which is not the case. Also the user SGPR count went up (in gfx10 I think?) so you should query the number for that | |
1002 | Don't need the arg_empty check, the implementation handles it anyway and it's uncommon | |
llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll | ||
3 | Use -passes | |
236 | Needs some exotic types. i8, <2 x half>, <3 x half> odd bit sized scalars, odd bit sized element vectors, odd vectors, structs, arrays and nested structs |
llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll | ||
---|---|---|
236 | We could probably get away with making the verifier reject aggregate arguments for amdgpu_kernel |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
927 | AFAIR it was increased to 32 in gfx9, then again was 16 in gfx10. I also do not think this is implemented. |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
927 | We already have getMaxNumUserSGPRs but it's just hardcoded |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
925 | Yes, that's the plan. Any heuristics of that sort will be implemented here. | |
927 | InReg is just a hint so being super accurate with the number of available SGPRs isn't needed for correctness here. The intent is just to have it stop at the maximum, but if it goes over it doesn't matter. I wasn't aware that the number of available user SGPRs had changed i.e. at https://llvm.org/docs/AMDGPUUsage.html I just see 16, I've only ever heard the number 16 talked about. Anyway, I will change this to getMaxNumUserSGPRs as I look into what's actually supported more. | |
llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll | ||
3 | I think it won't work for amdgpu-attributor with old pass manager? | |
236 | Kind of unnecessary here since none of that impacts the addition of inreg at this point, it will just reject preloading the argument later. |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
933 | Could actually handle byref (and should, ideally all kernargs would use byref). If we consistently used byref, you could implement this by moving memory values to the arg list | |
942 | Typo arguemnts | |
945 | I don't understand why you need this metadata. You mark the arguments with inreg which provides the same information? You can still ignore the hint in the codegen | |
llvm/test/CodeGen/AMDGPU/preload-kernal-args-inreg-hints.ll | ||
3 | Apparently it hasn't been ported yet |
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
933 | I'm planning on adding it in a future revision but I don't think it's needed for this iteration of the changes. | |
945 | I'm adding the metadata early to ensure that kernarg preloading is constrained to those that are explicitly using the CL flag. In case there are some frontend that may sneak inreg on some arguments. Will probably just until the feature is more mature or is enabled by default. |
I don't want any usage of metadata, it's confusing and unnecessary. The layout should be fully deterministic from just the argument list and inreg, you don't need a parallel encoding scheme
llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp | ||
---|---|---|
945 | This is an anti feature. Just respect inreg regardless of the source. You're just adding a huge source of fragility just in case a frontend tries to make use of an in progress feature |
This isn't really gaining value by being in the attributor as it stands. Are you planning on a more sophisticated user analysis to select the starting point?