This is an archive of the discontinued LLVM Phabricator instance.

[Driver] Add `-f[no-]offload-uniform-block`
ClosedPublic

Authored by yaxunl on Jul 13 2023, 9:08 AM.

Details

Summary

By default, clang assumes HIP kernels are launched with uniform block size,
which is the case for kernels launched through triple chevron or
hipLaunchKernelGGL. Clang adds uniform-work-group-size function attribute
to HIP kernels to allow the backend to do optimizations on that.

However, in some rare cases, HIP kernels can be launched
through hipExtModuleLaunchKernel where global work size is specified,
which may result in non-uniform block size.

To be able to support non-uniform block size for HIP kernels,
an option `-f[no-]offload-uniform-block is added. This option
is generic for offloading languages. Its default value is on for
CUDA/HIP and off otherwise.

Make -cl-uniform-work-group-size an alias to -foffload-uniform-block.

Diff Detail

Event Timeline

yaxunl created this revision.Jul 13 2023, 9:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 13 2023, 9:08 AM
yaxunl requested review of this revision.Jul 13 2023, 9:08 AM
MaskRay added inline comments.Jul 13 2023, 9:11 AM
clang/lib/Driver/ToolChains/Clang.cpp
7284

Why is the -Wunused-command-line-argument warning suppressed in non-IsHIP mode?

yaxunl added inline comments.Jul 13 2023, 9:18 AM
clang/lib/Driver/ToolChains/Clang.cpp
7284

Users may want to add these options to clang config file.

Is there a general rule which options should be claimed?

MaskRay added inline comments.Jul 13 2023, 9:59 AM
clang/lib/Driver/ToolChains/Clang.cpp
7284

Options in a configuration file are automatically claimed.

I don't know a general rule, but we generally don't claim newly introduced options.

yaxunl marked 2 inline comments as done.Jul 13 2023, 9:14 PM
yaxunl added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
7284

I think I should remove the claimAllArgs for this option. It should behave like the usual options when not used.

yaxunl updated this revision to Diff 540397.Jul 14 2023, 6:37 AM
yaxunl marked an inline comment as done.

revised by comments

arsenm added inline comments.Jul 14 2023, 10:22 AM
clang/include/clang/Driver/Options.td
1097

Can we avoid adding yet another language flag for something that's reusable for everything? Is there an --offload-<something> ?

scchan added inline comments.Jul 14 2023, 10:41 AM
clang/include/clang/Driver/Options.td
1097

Don't we need a different default value for some languages like OpenCL?

arsenm added inline comments.Jul 14 2023, 10:42 AM
clang/include/clang/Driver/Options.td
1097

Yes, but opencl already has a spec'd flag for this. If we're making up a new one, it could be something generic that aliases the opencl one in that case. Plus the +/- value of a new flag should work (the CL one only goes in one direction)

yaxunl marked an inline comment as done.Jul 14 2023, 10:44 AM
yaxunl added inline comments.
clang/include/clang/Driver/Options.td
1097

Currently, the naming convention for shared CUDA/HIP language options is -fgpu-* or --gpu-* . The shared CUDA/HIP/OpenMP driver options are named --offload-*`.

This option is named -fhip-uniform-block because AFAIK CUDA does not support non-uniform block size.

If we want to make it a generic option, it should be named as -fgpu-uniform-block by the current naming convention. Unless we want to change the naming convention for generic offloading language options.

@tra What do you think? Thanks.

yaxunl marked 2 inline comments as done.Jul 20 2023, 8:58 AM
yaxunl added inline comments.
clang/include/clang/Driver/Options.td
1097

I am thinking, maybe it is time to start moving towards the final direction.

How about renaming it as --offload-uniform-block ?

@MaskRay @tra

yaxunl updated this revision to Diff 544345.Jul 26 2023, 7:01 AM
yaxunl retitled this revision from [HIP] Add `-fno-hip-uniform-block` to [HIP] Add `-fno-offload-uniform-block`.
yaxunl edited the summary of this revision. (Show Details)

rename the option

ping

I renamed the option as -fno-offload-uniform-block. I switched to offload instead of gpu because I think in the long run offloading toolchains are not limited to GPUs, therefore sooner or later we will feel -fgpu- options are awkward. I did not use --no-offload-uniform-block because Options.td does not allow marshalling -- prefixed options. It needs considerable change to some basic multiclass to achieve that and that would break quite a few established conventions. Therefore I feel it is better to follow the convention.

This should be named -foffload*, it should not use HIP in the description, and it should apply to OpenMP as well.

yaxunl updated this revision to Diff 544491.Jul 26 2023, 1:26 PM
yaxunl edited the summary of this revision. (Show Details)

make the option generic for offloading languages

scchan added inline comments.Jul 27 2023, 8:17 AM
clang/lib/CodeGen/CGCall.cpp
2390

The block here needs to be aware of this new flag.

yaxunl marked an inline comment as done.Jul 27 2023, 9:02 AM
yaxunl added inline comments.
clang/lib/CodeGen/CGCall.cpp
2390

Now that -foffload-uniform-block has the same default value as -cl-uniform-work-group-size for OpenCL. we can make -cl-uniform-work-group-size an alias to -foffload-uniform-block. will update

yaxunl updated this revision to Diff 544812.Jul 27 2023, 9:15 AM
yaxunl marked an inline comment as done.
yaxunl added a reviewer: Anastasia.

revised by comments

yaxunl retitled this revision from [HIP] Add `-fno-offload-uniform-block` to [Driver] Add `-f[no-]offload-uniform-block`.Jul 27 2023, 9:18 AM
yaxunl edited the summary of this revision. (Show Details)
scchan accepted this revision.Jul 27 2023, 12:30 PM

LGTM thanks

This revision is now accepted and ready to land.Jul 27 2023, 12:30 PM
This revision was landed with ongoing or failed builds.Jul 27 2023, 1:36 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2023, 1:36 PM