Page MenuHomePhabricator

[AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic
Needs ReviewPublic

Authored by foad on Mon, Jul 22, 6:34 AM.



Add a new llvm.amdgcn.ballot intrinsic modeled on the ballot function
in GLSL and other shader languages. It returns a bitfield containing the
result of its boolean argument in all active lanes, and zero in all
inactive lanes.

This is intended to replace the existing llvm.amdgcn.icmp and
llvm.amdgcn.fcmp intrinsics after a suitable transition period.

Use the new intrinsic in the atomic optimizer pass.

I'm not going to commit this as-is because tests are failing due to
poor code generation, e.g. test2 in ballot.ll generates:

v_cmp_eq_u32_e32 vcc, v0, v1
v_cndmask_b32_e64 v0, 0, 1, vcc
v_cmp_ne_u32_e64 s[4:5], 0, v0
v_mov_b32_e32 v0, s4
v_mov_b32_e32 v1, s5

instead of:

v_cmp_eq_u32_e32 s[4:5], v0, v1
v_mov_b32_e32 v0, s4
v_mov_b32_e32 v1, s5

I'd appreciate feedback on (a) the idea, (b) the implementation and
(c) how best to improve the code generation.

Event Timeline

foad created this revision.Mon, Jul 22, 6:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMon, Jul 22, 6:34 AM
arsenm added inline comments.Mon, Jul 22, 6:44 AM

Should also do this in instcombine, as icmp/fcmp already do


This is already the right type, so this should be unnecessary

4 ↗(On Diff #211078)

This needs the .i64 for the mangling, I'm surprised this works

49 ↗(On Diff #211078)

Can you add some different imp and fcmp sources?

Also need a separate wave32 test

foad updated this revision to Diff 211100.Mon, Jul 22, 7:42 AM
foad marked 2 inline comments as done.

Address review comments.

foad marked 2 inline comments as done.Mon, Jul 22, 7:46 AM
foad added inline comments.

I've had a go, but is it really necessary to implement the same optimization cases in two places (especially the one that just replaces a ballot intrinsic with a read_register intrinsic), and how do I test them independently?


Does anything guarantee that it's the right type? Is it OK for the compiler to crash if someone writes a .ll file that uses it with the wrong type?

Thanks for doing this. For the codegen quality question, I wonder if something like the following could be done:

  1. Remove what's currently in SITargetLowering.
  2. Add a custom DAG combine which combines (ballot (ISD::SETCC ...)) to (AMDGPUISD::SETCC ...)
  3. Do code generation for the remaining ballot cases either in AMDGPUISelDAGToDAG or via a TableGen pattern (the most generic case would have to map to some combination of COPY_TO_REGCLASS and S_AND_B32/B64).

Independent tests as opt tests should go into test/Transforms/InstCombine. You can find existing tests for the icmp/fcmp intrinsics there.

Nitpick: please add braces for the if-body (multiple lines guarded by if).

arsenm added inline comments.Wed, Jul 24, 5:54 AM

Yes. If I were to pick one, it would probably be instcombine. I think it’s unlikely this would come up in a context only visible after lowering. Reducing intrinsics is useful when reducing testcases, and any code in the DAG is just going to need to be redone in globalisel eventually


Ideally we would verify wave32 IR somewhere and error, but I don’t think that exists today