Clang provides builtin_amdgcn_read_exec, builtin_amdgcn_read_exec_lo and __builtin_amdgcn_read_exec_hi. These currently emit llvm.read_register with the corresponding registers.
This patch treats llvm.amdgcn.ballot(i1 true) as the canonical way to read a raw exec value.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Introduced EmitAMDGCNBallotForExec to create a ballot intrinsic for __builtin_amdgcn_read_exec*.
| clang/lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 17494–17495 | Don't need the cast or setConvergent anymore | |
| clang/lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 7804–7805 | Right this should be true | |
Comment Actions
As a follow up can you prepare an AMDGPUInstCombineIntrinsic patch to reduce ballot.i64 to ballot.i32 if wave32?
| clang/lib/CodeGen/CGBuiltin.cpp | ||
|---|---|---|
| 17494–17495 | removed cast & setConvergent. | |
You shouldn't be special casing EmitSpecialRegisterBuiltin, you should be specifically handling this in the __builtin_amdgcn_read_exec* handling