This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Add option --gpu-max-threads-per-block=n
ClosedPublic

Authored by yaxunl on Dec 9 2019, 12:04 PM.

Details

Summary

Add this option to change the default launch bounds.

Diff Detail

Event Timeline

yaxunl created this revision.Dec 9 2019, 12:04 PM
tra added a comment.Dec 19 2019, 2:04 PM

What's the use case for this flag?

clang/lib/CodeGen/TargetInfo.cpp
8078

The magic value of 256 should be defined as a constant or macro somewhere -- you're using it in multiple places.
Alternatively, always set LangOpts.GPUMaxThreadsPerBlock to something and skip figuring out the default everywhere else.

clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
19

Is this the attribute that __launch_bounds__() expands to in HIP?
If launch_bounds is a separate attribute, then, I guess, it should be tested, too.

yaxunl marked 2 inline comments as done.Dec 20 2019, 5:51 AM
In D71221#1791802, @tra wrote:

What's the use case for this flag?

If a kernel is launched with a block size greater than the default max block size, explicit launch bound is required.

Different projects have different block size usages.

If a project mostly uses block size 1024, it prefers to use 1024 as the default max block size to avoid adding explicit launch bounds to each kernel.

If a project mostly uses block size 256, it prefers to use 256 as the default max block size.

Another situation is that at the initial development stage people prefer a default max block size that works for all possible launching block sizes. Then they can just let the max block size be 1024 by using this option. Later on, they can add launch bounds and choose a different max block size.

On the other hand, we cannot simply let the default max block size be 1024 since we have large sets of existing projects assuming default max block size be 256. Changing the default max block size to 1024 will cause perf degradation in the existing projects. Adding this options provides an option for backward compatibility in case we want to change the default max block size.

clang/lib/CodeGen/TargetInfo.cpp
8078

For the default value of LangOpts.GPUMaxThreadsPerBlock, it tends to be target dependent. I am thinking probably should add TargetInfo.getDefaultMaxThreadsPerBlock() and use it to set the default value for LangOpts.GPUMaxThreadsPerBlock.

clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu
19

yes.

tra added inline comments.Dec 20 2019, 10:30 AM
clang/lib/CodeGen/TargetInfo.cpp
8078

That could be an option. I just want to have an unambiguous source for that number.
BTW, does the value need to be hardcoded for OpenCL?

I think it would make sense for --gpu-max-threads-per-block=n to control the value for OpenCL, too.
Then you would always get the value from LangOpts.GPUMaxThreadsPerBlock and will have only one place where you set the default, which would be OK until we make the default target-specific.

yaxunl updated this revision to Diff 235025.Dec 21 2019, 12:20 PM

revised by Artem's comments.

tra accepted this revision.Jan 6 2020, 8:49 AM
This revision is now accepted and ready to land.Jan 6 2020, 8:49 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 8:21 AM