The main purpose of introducing these builtins is to add a range metadata [1, 1025) on the work group size loaded from dispatch ptr, which cannot be done by source code.
Details
Diff Detail
Event Timeline
| clang/lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 13428 | Why is this necessary? The builtin always has the same return type? | |
| 13435 | Comment that this is indexing the hsa_kernel_dispatch_packet sstruct? | |
| 13442 | I thought I had a patch to include the maximum group size in AMDGPUTargetInfo to avoid hardcoding it, but I guess it was never committed | |
| 13443 | Also set it's invariant | |
| clang/test/CodeGenOpenCL/builtins-amdgcn.cl | ||
| 539 | Also run in a hip test, or some case where the addrspacecast is needed? | |
Revised by Matt's comments
| clang/lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 13428 | due to https://github.com/llvm/llvm-project/commit/c65f966d76aa5412920b3f14d199e764135bd5ec pointers returned by builtin functions are in default address space for HIP. | |
| 13435 | done | |
| 13442 | Added getMaxOpenCLWorkGroupSize() to TargetInfo | |
| 13443 | done | |
| clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu | ||
|---|---|---|
| 2 | I assume the addrspacecast got optimized out? Should this disable llvm passes? | |
| clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu | ||
|---|---|---|
| 2 | We did not emit addrspacecast here since we only need return the loaded value. HIP by default uses -O0, therefore no need to disable llvm passes. | |
Why is this necessary? The builtin always has the same return type?