This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use `S_BFE_U64` for uniform i1-i64 ext
Changes PlannedPublic

Authored by Pierre-vh on Feb 24 2023, 3:32 AM.

Details

Diff Detail

Event Timeline

Pierre-vh created this revision.Feb 24 2023, 3:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 3:32 AM
Pierre-vh requested review of this revision.Feb 24 2023, 3:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 3:32 AM
Pierre-vh retitled this revision from [AMDGPU] Use `S_BFE_U64` for unary i1-i64 ext to [AMDGPU] Use `S_BFE_U64` for uniform i1-i64 ext.Feb 24 2023, 3:32 AM
Pierre-vh added inline comments.Feb 24 2023, 3:34 AM
llvm/test/CodeGen/AMDGPU/saddo.ll
36

Not sure if it's a regression here. Yes there's one more instruction, but we're using more scalar instructions so isn't it beneficial in the end?

llvm/test/CodeGen/AMDGPU/usubo.ll
19

This looks a bit like a regression but I'm not sure how to address it. The pattern comes from zext (setcc).
I thought about adding a PatFrag that doesn't accept setcc operands to zext but it feels hacky.
Thoughts?

foad added inline comments.Feb 24 2023, 3:55 AM
llvm/test/CodeGen/AMDGPU/saddo.ll
32

I'm not sure this is correct. The old code treated s[4:5] like a divergent boolean, with a bit for each *active* lane. The new code assumes the boolean value is in bit 0 - but will that work if lane 0 is not active?

foad added inline comments.Feb 24 2023, 4:00 AM
llvm/test/CodeGen/AMDGPU/saddo.ll
34

This looks like we could avoid the v_cndmask_b32_e64 if we used v_addc with the carry input coming from s[4:5]

arsenm added inline comments.Feb 24 2023, 4:03 AM
llvm/test/CodeGen/AMDGPU/saddo.ll
36

It's one more instruction because of the copy to VGPR for the store. We probably should have a rematerialize-as-VALU optimization to handle this kind of case.

llvm/test/CodeGen/AMDGPU/usubo.ll
19

Before this was a 32-bit select, so I assume this was a zext to i32 so I don't see why this zext to i64 change matters. What was the DAG here?

Pierre-vh added inline comments.Feb 24 2023, 4:23 AM
llvm/test/CodeGen/AMDGPU/saddo.ll
32

Very good question; I have a bit of trouble following what V_CNDMASK does exactly in this case.
v0 is the destination, but what do the 0/1/s[4:5] correspond to?

This function doesn't seem to select with global isel so I can't compare with that

llvm/test/CodeGen/AMDGPU/usubo.ll
19

It was a zext to i64

      t41: i1 = setcc t39, t51, setugt:ch
    t30: i64 = zero_extend t41
  t31: i64 = add t39, t30
t37: v2i32 = bitcast t31
foad added inline comments.Feb 24 2023, 5:35 AM
llvm/test/CodeGen/AMDGPU/saddo.ll
32

For each active lane, v0 get 0 if the corresponding bit of s[4:5] if 0, 1 if the corresponding bit is 1.

v_cndmask v0, v1, v2, s0 is a bit like v0 = s0 ? v2 : v1 (NB v1/v2 are swapped!) where s0 holds the condition as a divergent boolean, 1 bit per lane.

Pierre-vh planned changes to this revision.Apr 17 2023, 6:39 AM