Add this option to change the default launch bounds.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
What's the use case for this flag?
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
8078 | The magic value of 256 should be defined as a constant or macro somewhere -- you're using it in multiple places. | |
clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu | ||
19 | Is this the attribute that __launch_bounds__() expands to in HIP? |
If a kernel is launched with a block size greater than the default max block size, explicit launch bound is required.
Different projects have different block size usages.
If a project mostly uses block size 1024, it prefers to use 1024 as the default max block size to avoid adding explicit launch bounds to each kernel.
If a project mostly uses block size 256, it prefers to use 256 as the default max block size.
Another situation is that at the initial development stage people prefer a default max block size that works for all possible launching block sizes. Then they can just let the max block size be 1024 by using this option. Later on, they can add launch bounds and choose a different max block size.
On the other hand, we cannot simply let the default max block size be 1024 since we have large sets of existing projects assuming default max block size be 256. Changing the default max block size to 1024 will cause perf degradation in the existing projects. Adding this options provides an option for backward compatibility in case we want to change the default max block size.
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
8078 | For the default value of LangOpts.GPUMaxThreadsPerBlock, it tends to be target dependent. I am thinking probably should add TargetInfo.getDefaultMaxThreadsPerBlock() and use it to set the default value for LangOpts.GPUMaxThreadsPerBlock. | |
clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu | ||
19 | yes. |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
8078 | That could be an option. I just want to have an unambiguous source for that number. I think it would make sense for --gpu-max-threads-per-block=n to control the value for OpenCL, too. |
The magic value of 256 should be defined as a constant or macro somewhere -- you're using it in multiple places.
Alternatively, always set LangOpts.GPUMaxThreadsPerBlock to something and skip figuring out the default everywhere else.