Add missing logic to select i16 variants and enable GISel testing.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1133–1143 | The test won't pass on, say, hawaii because of lack of 16 bit insts. It'll fail to select. | |
1138 | Note: not sure if "true" or "false" 16 bit insts matter. | |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
893 ↗ | (On Diff #469577) | Help needed here, not sure how to get this one to work. I tried a lot of things, including COPY_TO_REGCLASS and nothing seems to do the trick. |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1133–1143 | I think failing to select is the correct behaviour - or diagnose it as "unsupported" in the legalizer (we do this in some cases but not consistently). Generally intrinsics like this map one-to-one onto a machine instruction, and are not designed to be supported on subtargets that do not have the instruction. |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1138 | I think both are needed here. The "true" ones will be used on GFX11 and the "fake"(!) ones on previous subtargets. |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
893 ↗ | (On Diff #469577) | What's the error with COPY_TO_REGCLASS? |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
893 ↗ | (On Diff #469577) | IIRC with COPY_TO_REGCLASS it's a type issue, or if I put some regclass that works then it just doesn't match in the instruction selector. |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
893 ↗ | (On Diff #469577) | Just a guess, but maybe the patterns need to be copied, one for wave32 and one for wave64? So on wave32 you do COPY_TO_REGCLASS to SReg_32_XM0_XEXEC, and for wave64 SReg_64_XM0_XEXEC |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll | ||
0 | I would prefer a check-prefix other than DAG. I believe this could be confused with https://llvm.org/docs/CommandGuide/FileCheck.html#the-check-dag-directive. Can you also add a runline for gfx1100? |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll | ||
---|---|---|
0 | "SDAG" is quite common |
llvm/lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
893 ↗ | (On Diff #469577) | The input pattern also needs a registerclass, that's the one I'm stuck on now $vgpr1 = V_MOV_B32_e32 $noreg, implicit $exec, implicit killed $sgpr2, implicit $exec VReg_1 cannot be used either, it says it's not a recognized class. I think it's not available in TableGen? |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll | ||
0 | For GFX11, is it fine if I just use that for the GFX run line (instead of no cpu) or is a third line needed? |
I think that for the icmp i1 case there's some problem in the RegBankSelect or the legalizer. I can't seem to find the right pattern.
My current best attempt, something like this:
def : Pat < (i64 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))), (COPY_TO_REGCLASS SCSrc_i1:$src, SReg_1_XEXEC) // Return the SGPRs representing i1 src >; def : Pat < (i32 (int_amdgcn_icmp i1:$src, (i1 0), (i32 33))), (COPY_TO_REGCLASS SCSrc_i1:$src, SReg_1_XEXEC) // Return the SGPRs representing i1 src >;
Selects:
Selecting: %23:sreg_64(s64) = G_INTRINSIC intrinsic(@llvm.amdgcn.icmp), %32:vgpr(s1), %33:vgpr(s1), 33 Into: %23:sreg_64_xexec(s64) = COPY %32:vgpr(s1)
But later, even in wave64 mode, %32 gets constrained to a vgpr_32 and we get an impossible copy:
Selecting: %32:vgpr(s1) = COPY %22:sgpr(s1) Into: %32:vgpr_32(s1) = COPY %22:sreg_32(s1) ; later %31:sreg_32 = S_AND_B32 %27:sreg_32, %28:sreg_32, implicit-def $scc %32:vgpr_32 = COPY %31:sreg_32 %23:sreg_64_xexec = COPY %32:vgpr_32
I'm really not sure how to deal with this. Thoughts?
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll | ||
---|---|---|
0 | For GISel, with no CPU, I get LLVM ERROR: cannot select: G_STORE %17:vgpr(s64), %9:sgpr(p1) :: (store (s64) into %ir.out.load, addrspace 1) (in function: v_icmp_i32_eq) So I use GFX900 for both instead |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.icmp.ll | ||
---|---|---|
0 | Also, for gfx 11 I had to use wave64 mode. All intrinsics in this test are the w64 variants (return i64 mask). |
Add manual selection for i1 (I think it's currently incorrect, need feedback)
Add wave32 test
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1295 | This is obviously wrong as we only want to compare the first bit in each VGPR, but I don't know what instruction to use here yet. Which instruction do I need to use here? V_CNDMASK ? |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1295 | CNDMASK doesn't work because it constrains the operands to 64 bits, but the i1 are widened to 32. |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1295 | Where are you seeing i1 VGPRs? Those should just not happen. At one point I might have been trying to handle that, but I think it's basically wrong for it to see the selector |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1295 | I don't see i1 VGPRs, but vgpr(s1) as the arguments of the iccmp |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-icmp.s16.mir | ||
---|---|---|
28 | Why has the dst class been changed to sreg_32? These instructions should not write to m0 or exec |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-icmp.s16.mir | ||
---|---|---|
28 | Dst is constrained to getBoolRC. If we want xexec we need to use getWaveMaskRegClass instead |
LGTM except for trying to handle i1 intrinsics
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
1295 | That shouldn't happen. The intrinsic shouldn't accept incoming i1 values. If you try to use i1 with these intrinsics, it should just fail selection |
Don't support i1 inputs
Not sure how we want to handle this in a merged test given that the DAG supports them. I used global-isel-abort=2
Note: not sure if "true" or "false" 16 bit insts matter.
I did both to be safe but I'd be curious to know if it's actually needed.