Add option -fgpu-inline-threshold for inline threshold for device compilation only.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm concerned that if we make it a top-level option, it's likely to be cargo-culted and (mis)used as a sledgehammer in cases where it's not needed. I think the option should remain hidden.
While thresholds do need to be tweaked, in my experience it happens very rarely.
When it does, most of the time it's sufficient to apply __attribute__((always_inline)) to a few functions where it matters.
If AMDGPU bumps into the limit too often, perhaps it's the default threshold value that needs to be changed.
If we do add an option to control inlining threshold, then we should also consider doing the same for other thresholds.
For instance, loop unrolling thresholds in my experience need bumping up about as often as the inlining ones.
Similarly, most of the time the issue can be dealt with at the source code level with #pragma unroll.
Perhaps we should generalize this patch to deal with wider range of threshold.
E.g. we could have something like --gpu-threshold threshold-kind=x which would expand to appropriate cc1 options for GPU sub-compilations.
It would also be nice to handle it in a way that can be used by both CUDA and HIP w/o having to copy/paste code.
Also, this patch would not be necessary if we had the generalized way to specify options for particular offload targets. Alas, we don't have it yet.
Currently ROCm builds all math libs and frameworks with an LLVM option which inline all functions for AMDGPU target. We cannot simply remove that option and use the default inline threshold since it will cause performance degradations. We cannot use -mllvm -inline-threshold=x directly either since it will affect both host and device compilation. We need an option to set the inline threshold for GPU only so that we could fine-tuning the inline threshold. I agree that this option should be hidden since it is intended for compiler development.
If we do add an option to control inlining threshold, then we should also consider doing the same for other thresholds.
For instance, loop unrolling thresholds in my experience need bumping up about as often as the inlining ones.
Similarly, most of the time the issue can be dealt with at the source code level with #pragma unroll.Perhaps we should generalize this patch to deal with wider range of threshold.
E.g. we could have something like --gpu-threshold threshold-kind=x which would expand to appropriate cc1 options for GPU sub-compilations.
My concern with --gpu-threshold threshold-kind=x is that it needs custom handling, e.g. setting default values, letting the last option win if multiple options are set. Using separate options allows standard handling of these options.
It would also be nice to handle it in a way that can be used by both CUDA and HIP w/o having to copy/paste code.
I can add a function as ToolChain so that it can be used by different ToolChains.
Also, this patch would not be necessary if we had the generalized way to specify options for particular offload targets. Alas, we don't have it yet.
The planned new option for offloading will be a more generic solution, however, I expect it will take time to develop and be adopted.
The planned new option for offloading will be a more generic solution, however, I expect it will take time to develop and be adopted.
Agreed. OK, let's use a hidden option until we have a better way of dealing with this.
clang/include/clang/Driver/Options.td | ||
---|---|---|
960 | This option is only handled at the top level, it does not need CC1Option. It does need HelpHidden. Also, the option should probably be -fgpu-inline-threshold=... as it's a parameter tweak, and not something more serious, like --offload-arch. |
clang/include/clang/Driver/Options.td | ||
---|---|---|
960 | will do |
This option is only handled at the top level, it does not need CC1Option. It does need HelpHidden.
Also, the option should probably be -fgpu-inline-threshold=... as it's a parameter tweak, and not something more serious, like --offload-arch.
Naming is hard. :-)