This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Enable `s_sendmsg_rtn` selection with `+gfx11-insts`
AbandonedPublic

Authored by Pierre-vh on Oct 28 2022, 5:40 AM.

Details

Summary

ROCm device libs can emit a s_sendmsg_rtn w/ the +gfx11-insts attribute, and it counts on the optimizer to remove the call if the GPU is <GFX11.
When built at O0 it caused codegen issues as Clang allowed this intrinsic to go through with just +gfx11-insts, but the backend wanted the GPU to be >=GFX11 as well.

This patch allows selecting that intrinsic when just the +gfx11-insts attribute is present.

Diff Detail

Event Timeline

Pierre-vh created this revision.Oct 28 2022, 5:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 5:40 AM
Pierre-vh requested review of this revision.Oct 28 2022, 5:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 5:40 AM
foad added a comment.Oct 28 2022, 5:53 AM

I don't really like the fact that we are left with both isGFX11Plus and HasGFX11Insts, and no clear way to know which one to use, unless you happen to know what instructions some ROCm libraries are relying on.

Seems reasonable apart from that.

foad added a comment.Oct 28 2022, 5:59 AM

I do think this is much nicer than the previous approach of generating V_ILLEGAL instead of diagnosing illegal intrinsics (D123693).

The goal eventually is to move all instructions to just relying on attribute(s) instead of GPU generations I believe. However, that's a lot of work with a lot of risks of breaking things so we agreed with @arsen to just do the necessary work to fix Device Libs for now

I do think this is much nicer than the previous approach of generating V_ILLEGAL instead of diagnosing illegal intrinsics (D123693).

We can't really avoid the V_ILLEGAL case if the instruction isn't encodable for the end subtarget. There's 3 different ways instructions can be illegal

I don't really like the fact that we are left with both isGFX11Plus and HasGFX11Insts, and no clear way to know which one to use, unless you happen to know what instructions some ROCm libraries are relying on.

Seems reasonable apart from that.

Generation checks should always be "wrong". Ideally the hardware would move in nice, modular feature based changes but that's not what has happened

arsenm added inline comments.Oct 31 2022, 7:34 PM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8002–8004

I'm surprised this is necessary. How was this missing before? I thought the encoding didn't change in 11?

foad added a comment.Nov 1 2022, 12:10 PM

I do think this is much nicer than the previous approach of generating V_ILLEGAL instead of diagnosing illegal intrinsics (D123693).

We can't really avoid the V_ILLEGAL case if the instruction isn't encodable for the end subtarget.

Doesn't this patch solve that by adding stuff to pseudoToMCOpcode to encode it for GFX11 if it can't find an encoding for the real subtarget?

There's 3 different ways instructions can be illegal

What are they?

There's 3 different ways instructions can be illegal

What are they?

  1. Instruction doesn't exist for the subtarget, but is encodable. For example, all instructions added in CI aren't available for SI, but are encodable since nothing changed there.
  2. Instruction does not exist for the subtarget, but the opcode was recycled into something else. This is the case for when v_mad_mix* was replaced with v_fma_mix* between gfx900 and gfx906
  3. The instruction doesn't exist on the subtarget, and also isn't encodable. This is what happened with the image instructions for MI-200
Pierre-vh planned changes to this revision.Nov 2 2022, 2:02 AM
Pierre-vh marked an inline comment as done.Nov 3 2022, 6:15 AM
Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
8002–8004

Without it, I run into `Opcode < NumOpcodes && "Invalid opcode!" without it because that function ends up returning -1

Pierre-vh abandoned this revision.Nov 15 2022, 12:26 AM
Pierre-vh marked an inline comment as done.