This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Fold or of shifts with constant amount to funnel shift.
ClosedPublic

Authored by abinavpp on Jan 3 2022, 2:34 AM.

Details

Summary

This change folds (or (shl x, C0), (lshr y, C1)) to funnel shift iff C0
and C1 are constants where C0 + C1 is the bit-width of the shift
instructions.

Diff Detail

Event Timeline

abinavpp created this revision.Jan 3 2022, 2:34 AM
abinavpp requested review of this revision.Jan 3 2022, 2:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2022, 2:34 AM
foad added inline comments.Jan 4 2022, 2:16 AM
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp
3901

I think this comment belongs just above the FshOpc = line.

3907–3910

Likewise.

llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
2219

Maybe not your fault, but it's a bad idea to use a VALU instruction for uniform values, especially if it means we need to insert readfirstlanes.

arsenm added inline comments.Jan 4 2022, 10:00 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
2219

should probably do this in the post-regbank combiner

abinavpp updated this revision to Diff 401594.Jan 20 2022, 4:23 AM

Repositioned comments.

abinavpp marked 2 inline comments as done.Jan 20 2022, 4:24 AM
abinavpp added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
2219

We could maintain this generic combine and an AMDGPU specific post
regbank-select version that bails out on an SGPR destination by reusing the
match code. We'll need to exclude the generic combine until regbank-select in
AMDGPUCombine.td.

More importantly, is this worth the effort? The constant shift amt pattern
looks bad for uniform, but the original pattern:

define amdgpu_kernel void @fshr_v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i32> %amt, <4 x i32> addrspace(1)* %m) {
  %sub = sub <4 x i32> <i32 32, i32 32, i32 32, i32 32>, %amt
  %shl = shl <4 x i32> %a, %sub
  %lshr = lshr <4 x i32> %b, %amt
  %ret = or <4 x i32> %shl, %lshr
  store <4 x i32> %ret, <4 x i32> addrspace(1)* %m
  ret void
}

has lesser instructions with the combine.

How should we move forward?

foad added inline comments.Jan 20 2022, 4:54 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll
2219

Can't we just do what this comment in RegBankSelect says:

case AMDGPU::G_FSHR: // TODO: Expand for scalar

maybe expanding it to S_LSHR_B64 and just taking the low part of the result?

In any case I don't think this needs to block the current patch.

paquette accepted this revision.Jan 21 2022, 2:39 PM

On the AArch64 side this looks good to me?

This revision is now accepted and ready to land.Jan 21 2022, 2:39 PM