Clang provides builtin_amdgcn_read_exec, builtin_amdgcn_read_exec_lo and __builtin_amdgcn_read_exec_hi. These currently emit llvm.read_register with the corresponding registers.
This patch treats llvm.amdgcn.ballot(i1 true) as the canonical way to read a raw exec value.
Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
Introduced EmitAMDGCNBallotForExec to create a ballot intrinsic for __builtin_amdgcn_read_exec*.
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
17487–17488 | Don't need the cast or setConvergent anymore |
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
7788–7789 | Right this should be true |
Comment Actions
As a follow up can you prepare an AMDGPUInstCombineIntrinsic patch to reduce ballot.i64 to ballot.i32 if wave32?
clang/lib/CodeGen/CGBuiltin.cpp | ||
---|---|---|
17487–17488 | removed cast & setConvergent. |
You shouldn't be special casing EmitSpecialRegisterBuiltin, you should be specifically handling this in the __builtin_amdgcn_read_exec* handling