The main purpose of introducing these builtins is to add a range metadata [1, 1025) on the work group size loaded from dispatch ptr, which cannot be done by source code.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
13428 | Why is this necessary? The builtin always has the same return type? | |
13435 | Comment that this is indexing the hsa_kernel_dispatch_packet sstruct? | |
13442 | I thought I had a patch to include the maximum group size in AMDGPUTargetInfo to avoid hardcoding it, but I guess it was never committed | |
13443 | Also set it's invariant | |
clang/test/CodeGenOpenCL/builtins-amdgcn.cl | ||
539 | Also run in a hip test, or some case where the addrspacecast is needed? |
Revised by Matt's comments
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
13428 | due to https://github.com/llvm/llvm-project/commit/c65f966d76aa5412920b3f14d199e764135bd5ec pointers returned by builtin functions are in default address space for HIP. | |
13435 | done | |
13442 | Added getMaxOpenCLWorkGroupSize() to TargetInfo | |
13443 | done |
clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu | ||
---|---|---|
3 | I assume the addrspacecast got optimized out? Should this disable llvm passes? |
clang/test/CodeGenCUDA/amdgpu-workgroup-size.cu | ||
---|---|---|
3 | We did not emit addrspacecast here since we only need return the loaded value. HIP by default uses -O0, therefore no need to disable llvm passes. |
Why is this necessary? The builtin always has the same return type?