This patch adds the amdgpu-flat-work-group-size=1,1024 attribute to
OpenMP kernels targeting AMDGPU. This also lets us use
--gpu-max-threads-per-block which is loosened from being a HIP only
option.
Details
- Reviewers
JonChesterfield arsenm tra yaxunl jdoerfert
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9552 | Probably shouldn’t check the language, just it’s a kernel. Also shouldn’t emit this if it’s the default 1024. I’ve been trying to cut down on the superfluous attribute spam |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9552 | There's a section for HIP above that does the same. We could probably consolidate here for all "AMDGPU" kernels and get rid of the redundant attribute. Maybe in a separate patch? |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9552 | All the isCUDA || HIP || OpenMP checks scattered around are driving me crazy. A bunch of the out of tree divergent patches are just adding to them. We should just purge everything checking languages to the actual features and stop putting language names in things |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9552 | OpenCL is the odd one out as far as I know, HIP and OpenMP are mostly equivalent as far as attributes go. |
clang/lib/CodeGen/TargetInfo.cpp | ||
---|---|---|
9552 | OpenCL uses 256 as default max block size. This is to avoid performance regressions for existing apps. HIP uses 1024 by default. | |
clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp | ||
61 | we should keep the default value in Options.td instead of having multiple copies at different places. save as below. |
clang/lib/Driver/ToolChains/AMDGPU.cpp | ||
---|---|---|
791 | ^ |
clang/include/clang/Basic/LangOptions.def | ||
---|---|---|
271 | probably should drop the language and describe what it is |
clang/include/clang/Basic/LangOptions.def | ||
---|---|---|
271 | CUDA does not use it. Drop the language may cause confusion. |
clang/include/clang/Basic/LangOptions.def | ||
---|---|---|
271 | If CUDA doesn't respect the flag, that's just broken. The concept is common among all the languages, and all the unnecessary language qualifications are a plague. Controls should be expressed as a generic concept that languages selectively enable |
probably should drop the language and describe what it is