This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Select v_sat_pk_u8_i16
ClosedPublic

Authored by Pierre-vh on Feb 24 2023, 5:48 AM.

Details

Summary

The backend knew about v_sat_pk_u8_i16 but never made use of it.
This patch adds selection patterns (DAG/GISel) for that instruction.
I think it'll be very rarely used, but at least it's possible to use it.

Solves #58266 (https://github.com/llvm/llvm-project/issues/58266)

Diff Detail

Event Timeline

Pierre-vh created this revision.Feb 24 2023, 5:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 5:48 AM
Pierre-vh requested review of this revision.Feb 24 2023, 5:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2023, 5:48 AM
Pierre-vh updated this revision to Diff 500164.Feb 24 2023, 5:49 AM

Remove leftover comment

Pierre-vh updated this revision to Diff 500170.Feb 24 2023, 5:53 AM

Rename test

foad added inline comments.Feb 24 2023, 5:57 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

Do you also need to match them the other way round: (smin (smax $src, (i16 0)), (i16 255))?

llvm/lib/Target/AMDGPU/SIInstructions.td
2929

Probably should not generate any VALU instructions without a Divergent*Frag check.

Pierre-vh added inline comments.Feb 24 2023, 5:58 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2929

Should this only be for vector operations? (use DivergentBinFrag?)

2933–2934

Is it worth adding a GFX11/UniformBinFrag variant with S_PACK_LL?

Pierre-vh updated this revision to Diff 500177.Feb 24 2023, 6:12 AM
Pierre-vh marked 2 inline comments as done.

Use DivergentBinFrag + Add SGPR test

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

I thought so too, but the other way around is always folded to smed3 it seems

foad added inline comments.Feb 24 2023, 7:36 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

That raises the question, why aren't both ways folded to smed3?

Pierre-vh added inline comments.Feb 28 2023, 1:32 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

I am not sure if this is intentional or if it's a missed opportunity
@arsenm is there any reason why we can't fold smax/smin into med3 like we do for smin/smax?

arsenm added inline comments.Feb 28 2023, 5:24 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

We have the two cases handled in IntMed3Pat already. I guess that just wasn't applied to the 16-bit case? IIRC the 16-bit med3 was introduced after 16-bit min/max so it likely got missed whenever that happened

293

Actually we have a separate Int16Med3Pat which handles both cases?

Pierre-vh marked 2 inline comments as done.Mar 2 2023, 12:41 AM
Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

That's separate, I meant the smed3 DAG op not the instruction.
We only seem to match min(max(x, K0), K1), K0 < K1 -> med3(x, K0, K1) currently. Should we also be matching the commuted version?

Then the pattern can just use smed3 directly and doesn't need this PatFrag.

arsenm added inline comments.Mar 2 2023, 5:01 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

Yes, the combine should match both versions

Pierre-vh added inline comments.Mar 2 2023, 5:06 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

Is this correct for the commuted version?

max(min(x, K0), K1), K1 < K0 ->  med3(x, K0, K1)

(= keep the max rhs < min rhs and don't just swap opcodes)

Pierre-vh updated this revision to Diff 501851.Mar 2 2023, 6:45 AM
Pierre-vh marked an inline comment as done.

Rebase on D145159

llvm/lib/Target/AMDGPU/AMDGPUInstructions.td
293

Gentle ping (low priority review but it'd still be nice to land it and get it out of the review stack)

foad accepted this revision.Mar 14 2023, 6:38 AM

LGTM.

llvm/lib/Target/AMDGPU/SIInstructions.td
2929

You have added trailing whitespace on a few lines in this file.

This revision is now accepted and ready to land.Mar 14 2023, 6:38 AM
This revision was landed with ongoing or failed builds.Mar 15 2023, 1:36 AM
This revision was automatically updated to reflect the committed changes.
Pierre-vh marked an inline comment as done.
foad added inline comments.Mar 30 2023, 1:47 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2931

Looking at this again, I don't think these patterns match what the instruction does. The instruction puts the two 8-bit results in bits [15..8] and [7..0], not in bits [23..16] and [7..0].

Pierre-vh added inline comments.Mar 30 2023, 1:52 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2931

Oh I see, right. Not sure what the right pattern is then. All of the patterns are wrong in that case.
Maybe it needs to match an additional trunc to v2i8 after the build_vector?

Pierre-vh added inline comments.Mar 30 2023, 5:07 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
2931

I took a look and adding a trunc <2xi16> to <2xi8> causes the following:

  • DAG uses bitwise operations instead in the first testcase, I think that can still be matched easily.
  • GISel on the other hand doesn't seem to pack the values in 16 bits and returns 2 vgprs each containing 8 bits.
    • Without the trunc though it still packs both 16 bit values in one register.

I think some GISel work may be needed, or we shouldn't use a trunc.