This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Add option -fgpu-inline-threshold
ClosedPublic

Authored by yaxunl on Mar 23 2021, 8:17 PM.

Details

Summary

Add option -fgpu-inline-threshold for inline threshold for device compilation only.

Diff Detail

Event Timeline

yaxunl created this revision.Mar 23 2021, 8:17 PM
yaxunl requested review of this revision.Mar 23 2021, 8:17 PM
tra added a comment.Mar 29 2021, 11:11 AM

I'm concerned that if we make it a top-level option, it's likely to be cargo-culted and (mis)used as a sledgehammer in cases where it's not needed. I think the option should remain hidden.

While thresholds do need to be tweaked, in my experience it happens very rarely.
When it does, most of the time it's sufficient to apply __attribute__((always_inline)) to a few functions where it matters.
If AMDGPU bumps into the limit too often, perhaps it's the default threshold value that needs to be changed.

If we do add an option to control inlining threshold, then we should also consider doing the same for other thresholds.
For instance, loop unrolling thresholds in my experience need bumping up about as often as the inlining ones.
Similarly, most of the time the issue can be dealt with at the source code level with #pragma unroll.

Perhaps we should generalize this patch to deal with wider range of threshold.
E.g. we could have something like --gpu-threshold threshold-kind=x which would expand to appropriate cc1 options for GPU sub-compilations.
It would also be nice to handle it in a way that can be used by both CUDA and HIP w/o having to copy/paste code.

Also, this patch would not be necessary if we had the generalized way to specify options for particular offload targets. Alas, we don't have it yet.

In D99233#2656446, @tra wrote:

I'm concerned that if we make it a top-level option, it's likely to be cargo-culted and (mis)used as a sledgehammer in cases where it's not needed. I think the option should remain hidden.

While thresholds do need to be tweaked, in my experience it happens very rarely.
When it does, most of the time it's sufficient to apply __attribute__((always_inline)) to a few functions where it matters.
If AMDGPU bumps into the limit too often, perhaps it's the default threshold value that needs to be changed.

Currently ROCm builds all math libs and frameworks with an LLVM option which inline all functions for AMDGPU target. We cannot simply remove that option and use the default inline threshold since it will cause performance degradations. We cannot use -mllvm -inline-threshold=x directly either since it will affect both host and device compilation. We need an option to set the inline threshold for GPU only so that we could fine-tuning the inline threshold. I agree that this option should be hidden since it is intended for compiler development.

If we do add an option to control inlining threshold, then we should also consider doing the same for other thresholds.
For instance, loop unrolling thresholds in my experience need bumping up about as often as the inlining ones.
Similarly, most of the time the issue can be dealt with at the source code level with #pragma unroll.

Perhaps we should generalize this patch to deal with wider range of threshold.
E.g. we could have something like --gpu-threshold threshold-kind=x which would expand to appropriate cc1 options for GPU sub-compilations.

My concern with --gpu-threshold threshold-kind=x is that it needs custom handling, e.g. setting default values, letting the last option win if multiple options are set. Using separate options allows standard handling of these options.

It would also be nice to handle it in a way that can be used by both CUDA and HIP w/o having to copy/paste code.

I can add a function as ToolChain so that it can be used by different ToolChains.

Also, this patch would not be necessary if we had the generalized way to specify options for particular offload targets. Alas, we don't have it yet.

The planned new option for offloading will be a more generic solution, however, I expect it will take time to develop and be adopted.

tra added a comment.Apr 21 2021, 11:04 AM

The planned new option for offloading will be a more generic solution, however, I expect it will take time to develop and be adopted.

Agreed. OK, let's use a hidden option until we have a better way of dealing with this.

clang/include/clang/Driver/Options.td
960

This option is only handled at the top level, it does not need CC1Option. It does need HelpHidden.

Also, the option should probably be -fgpu-inline-threshold=... as it's a parameter tweak, and not something more serious, like --offload-arch.
Naming is hard. :-)

yaxunl marked an inline comment as done.Apr 21 2021, 11:34 AM
yaxunl added inline comments.
clang/include/clang/Driver/Options.td
960

will do

yaxunl updated this revision to Diff 339353.Apr 21 2021, 12:45 PM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

revised by Artem's comments.

yaxunl retitled this revision from [HIP] Add option --gpu-inline-threshold to [HIP] Add option -fgpu-inline-threshold.Apr 21 2021, 12:45 PM
tra accepted this revision.Apr 21 2021, 1:03 PM
This revision is now accepted and ready to land.Apr 21 2021, 1:03 PM
This revision was landed with ongoing or failed builds.Apr 21 2021, 2:19 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2021, 2:19 PM