Clang provides builtin_amdgcn_read_exec, builtin_amdgcn_read_exec_lo and __builtin_amdgcn_read_exec_hi. These currently emit llvm.read_register with the corresponding registers.
This patch treats llvm.amdgcn.ballot(i1 true) as the canonical way to read a raw exec value.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Introduced EmitAMDGCNBallotForExec to create a ballot intrinsic for __builtin_amdgcn_read_exec*.
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
17494–17495 | Don't need the cast or setConvergent anymore |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
7804–7805 | Right this should be true |
Comment Actions
As a follow up can you prepare an AMDGPUInstCombineIntrinsic patch to reduce ballot.i64 to ballot.i32 if wave32?
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
17494–17495 | removed cast & setConvergent. |
You shouldn't be special casing EmitSpecialRegisterBuiltin, you should be specifically handling this in the __builtin_amdgcn_read_exec* handling