This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] __builtin_amdgcn_read_exec_* should be implemented with llvm.amdgcn.ballot
ClosedPublic

Authored by ranapratap55 on Jul 25 2023, 2:16 AM.

Details

Summary

Clang provides builtin_amdgcn_read_exec, builtin_amdgcn_read_exec_lo and __builtin_amdgcn_read_exec_hi. These currently emit llvm.read_register with the corresponding registers.
This patch treats llvm.amdgcn.ballot(i1 true) as the canonical way to read a raw exec value.

Diff Detail

Event Timeline

ranapratap55 created this revision.Jul 25 2023, 2:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 2:16 AM
Herald added a subscriber: jvesely. · View Herald Transcript
ranapratap55 requested review of this revision.Jul 25 2023, 2:16 AM
ranapratap55 removed a project: Restricted Project.
ranapratap55 removed a subscriber: jvesely.
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 3:00 AM
arsenm added a subscriber: arsenm.Jul 25 2023, 4:22 AM
arsenm added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
7802

You shouldn't be special casing EmitSpecialRegisterBuiltin, you should be specifically handling this in the __builtin_amdgcn_read_exec* handling

7804–7805

This should just always be getFalse

ranapratap55 removed a project: Restricted Project.

Introduced EmitAMDGCNBallotForExec to create a ballot intrinsic for __builtin_amdgcn_read_exec*.

Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 10:38 AM
arsenm added inline comments.Jul 25 2023, 10:40 AM
clang/lib/CodeGen/CGBuiltin.cpp
17494–17495

Don't need the cast or setConvergent anymore

ranapratap55 added inline comments.Jul 25 2023, 10:43 AM
clang/lib/CodeGen/CGBuiltin.cpp
7804–7805

ballot(i1 false) is resulting always to zero due to Line. So emitting ballot(i1 true) which results in exec AFAIK.

arsenm added inline comments.Jul 25 2023, 10:44 AM
clang/lib/CodeGen/CGBuiltin.cpp
7804–7805

Right this should be true

ranapratap55 removed a project: Restricted Project.
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 11:11 AM
arsenm accepted this revision.Jul 25 2023, 11:13 AM

As a follow up can you prepare an AMDGPUInstCombineIntrinsic patch to reduce ballot.i64 to ballot.i32 if wave32?

This revision is now accepted and ready to land.Jul 25 2023, 11:13 AM
ranapratap55 added inline comments.Jul 25 2023, 11:14 AM
clang/lib/CodeGen/CGBuiltin.cpp
17494–17495

removed cast & setConvergent.

ranapratap55 retitled this revision from [WIP] __builtin_amdgcn_read_exec_* should be implemented with llvm.amdgcn.ballot to [AMDGPU] __builtin_amdgcn_read_exec_* should be implemented with llvm.amdgcn.ballot.Jul 25 2023, 10:20 PM
ranapratap55 set the repository for this revision to rG LLVM Github Monorepo.
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 10:20 PM