This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][GISel] Add llvm.amdgcn.icmp selection
ClosedPublic

Authored by Pierre-vh on Oct 21 2022, 6:30 AM.

Details

Summary

Add missing logic to select i16 variants and enable GISel testing.

Diff Detail

Event Timeline

Pierre-vh created this revision.Oct 21 2022, 6:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 6:30 AM
Pierre-vh requested review of this revision.Oct 21 2022, 6:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 6:30 AM
Pierre-vh added inline comments.Oct 21 2022, 6:33 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1133–1143

The test won't pass on, say, hawaii because of lack of 16 bit insts. It'll fail to select.
I guess it's a separate issue where we should probably legalize the intrinsic (= widen its input to i32) for those archs?

1138

Note: not sure if "true" or "false" 16 bit insts matter.
I did both to be safe but I'd be curious to know if it's actually needed.

llvm/lib/Target/AMDGPU/SIInstructions.td
893 ↗(On Diff #469577)

Help needed here, not sure how to get this one to work. I tried a lot of things, including COPY_TO_REGCLASS and nothing seems to do the trick.
Due to the simplicity of the pattern I'm wondering if it isn't better to just do it manually in the InstructionSelector rather than fight the GISel TableGen emitter?

foad added inline comments.Oct 21 2022, 7:24 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1133–1143

I think failing to select is the correct behaviour - or diagnose it as "unsupported" in the legalizer (we do this in some cases but not consistently). Generally intrinsics like this map one-to-one onto a machine instruction, and are not designed to be supported on subtargets that do not have the instruction.

foad added inline comments.Oct 21 2022, 7:25 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1138

I think both are needed here. The "true" ones will be used on GFX11 and the "fake"(!) ones on previous subtargets.

arsenm added inline comments.Oct 21 2022, 7:51 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
893 ↗(On Diff #469577)

What's the error with COPY_TO_REGCLASS?

Pierre-vh added inline comments.Oct 24 2022, 2:10 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
893 ↗(On Diff #469577)

IIRC with COPY_TO_REGCLASS it's a type issue, or if I put some regclass that works then it just doesn't match in the instruction selector.
I think my question is more: What regclass do I need to use there?

Pierre-vh updated this revision to Diff 470105.Oct 24 2022, 3:22 AM

Rebase, fix test

Joe_Nash added inline comments.
llvm/lib/Target/AMDGPU/SIInstructions.td
893 ↗(On Diff #469577)

Just a guess, but maybe the patterns need to be copied, one for wave32 and one for wave64? So on wave32 you do COPY_TO_REGCLASS to SReg_32_XM0_XEXEC, and for wave64 SReg_64_XM0_XEXEC

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
0

I would prefer a check-prefix other than DAG. I believe this could be confused with https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive.

Can you also add a runline for gfx1100?

foad added inline comments.Oct 24 2022, 7:08 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
0

"SDAG" is quite common

Pierre-vh added inline comments.Oct 24 2022, 7:26 AM
llvm/lib/Target/AMDGPU/SIInstructions.td
893 ↗(On Diff #469577)

The input pattern also needs a registerclass, that's the one I'm stuck on now
I tried (COPY_TO_REGCLASS SReg_1_XEXEC:$src, SReg_64_XEXEC) for Wave64 and it causes a "noreg" to be selected which crashes the backend

$vgpr1 = V_MOV_B32_e32 $noreg, implicit $exec, implicit killed $sgpr2, implicit $exec

VReg_1 cannot be used either, it says it's not a recognized class. I think it's not available in TableGen?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
0

For GFX11, is it fine if I just use that for the GFX run line (instead of no cpu) or is a third line needed?

Joe_Nash added inline comments.Oct 24 2022, 11:28 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
0

SDAG sounds good.

0

I don't think there is a need to be stingy with which tests are run. It seems good to test both no cpu (GFX6?) and gfx1100.

Pierre-vh updated this revision to Diff 470434.Oct 25 2022, 4:01 AM
Pierre-vh marked 4 inline comments as done.

Comments

I think that for the icmp i1 case there's some problem in the RegBankSelect or the legalizer. I can't seem to find the right pattern.
My current best attempt, something like this:

def : Pat <
  (i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
  (COPY_TO_REGCLASS SCSrc_i1:$src, SReg_1_XEXEC) // Return the SGPRs representing i1 src
>;

def : Pat <
  (i32 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))),
  (COPY_TO_REGCLASS SCSrc_i1:$src, SReg_1_XEXEC) // Return the SGPRs representing i1 src
>;

Selects:

Selecting: 
  %23:sreg_64(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), %32:vgpr(s1), %33:vgpr(s1), 33
 Into:
  %23:sreg_64_xexec(s64) = COPY %32:vgpr(s1)

But later, even in wave64 mode, %32 gets constrained to a vgpr_32 and we get an impossible copy:

Selecting: 
  %32:vgpr(s1) = COPY %22:sgpr(s1)
Into:
  %32:vgpr_32(s1) = COPY %22:sreg_32(s1)

; later
  %31:sreg_32 = S_AND_B32 %27:sreg_32, %28:sreg_32, implicit-def $scc
  %32:vgpr_32 = COPY %31:sreg_32
  %23:sreg_64_xexec = COPY %32:vgpr_32

I'm really not sure how to deal with this. Thoughts?

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
0

For GISel, with no CPU, I get

LLVM ERROR: cannot select: G_STORE %17:vgpr(s64), %9:sgpr(p1) :: (store (s64) into %ir.out.load, addrspace 1) (in function: v_icmp_i32_eq)

So I use GFX900 for both instead

Pierre-vh added inline comments.Oct 25 2022, 4:02 AM
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll
0

Also, for gfx 11 I had to use wave64 mode. All intrinsics in this test are the w64 variants (return i64 mask).
Should I add a w32 duplicate of the test (in a pre-patch)? Or maybe just duplicating a couple of functions in this test for w32 would be enough?

Pierre-vh updated this revision to Diff 472610.Nov 2 2022, 7:49 AM

Add manual selection for i1 (I think it's currently incorrect, need feedback)
Add wave32 test

Pierre-vh added inline comments.Nov 2 2022, 8:04 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1295

This is obviously wrong as we only want to compare the first bit in each VGPR, but I don't know what instruction to use here yet.

Which instruction do I need to use here? V_CNDMASK ?

Pierre-vh added inline comments.Nov 2 2022, 8:13 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1295

CNDMASK doesn't work because it constrains the operands to 64 bits, but the i1 are widened to 32.
We need something that would do dst[threadI] = v0.i1, if I understand correctly?

Pierre-vh updated this revision to Diff 472619.Nov 2 2022, 8:15 AM

Rename tests

arsenm added inline comments.Nov 2 2022, 3:44 PM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1295

Where are you seeing i1 VGPRs? Those should just not happen. At one point I might have been trying to handle that, but I think it's basically wrong for it to see the selector

Pierre-vh added inline comments.Nov 3 2022, 1:12 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1295

I don't see i1 VGPRs, but vgpr(s1) as the arguments of the iccmp

llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-icmp.s16.mir
28

Why has the dst class been changed to sreg_32? These instructions should not write to m0 or exec

Pierre-vh added inline comments.Nov 4 2022, 12:29 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-icmp.s16.mir
28

Dst is constrained to getBoolRC. If we want xexec we need to use getWaveMaskRegClass instead
Not sure which one is technically correct? Isn't sreg_32 more correct because the dst is always a sgpr, and never exec/m0?

LGTM except for trying to handle i1 intrinsics

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
1295

That shouldn't happen. The intrinsic shouldn't accept incoming i1 values. If you try to use i1 with these intrinsics, it should just fail selection

Pierre-vh marked an inline comment as done.

Don't support i1 inputs

Not sure how we want to handle this in a merged test given that the DAG supports them. I used global-isel-abort=2

arsenm accepted this revision.Nov 17 2022, 8:15 AM

I'm surprised i1 works for DAG

This revision is now accepted and ready to land.Nov 17 2022, 8:15 AM
This revision was landed with ongoing or failed builds.Nov 22 2022, 12:26 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-icmp.s16.mir