This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 Fix VOP3 dot instructions
ClosedPublic

Authored by Petar.Avramovic on Jul 4 2022, 7:49 AM.

Details

Summary

Matches sp3 behavior. op_sel[0:1] must be 0. abs and neg src modifiers for bf16

Diff Detail

Event Timeline

Petar.Avramovic created this revision.Jul 4 2022, 7:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2022, 7:49 AM
Petar.Avramovic requested review of this revision.Jul 4 2022, 7:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 4 2022, 7:49 AM
Joe_Nash requested changes to this revision.Jul 5 2022, 8:31 AM

I agree with the problem description in the commit message, but I'm not sure this patch fully fixes the problem. There are no tests actually using the op_sel operand. I think those tests would reveal that the op_sel value is not being propagated from src_modifiers to the op_sel operand. We will need something similar to convertVOP3PDPPInst
In addition, OPSEL[1:0] are ignored by the hardware for these instructions. That should be reflected in the instruction definition. If it doesn't impact the dpp implementation, it could be a separate patch.

This revision now requires changes to proceed.Jul 5 2022, 8:31 AM
dp added a comment.Jul 5 2022, 9:23 AM

This patch does enable assembly and disassembly of v_dot2_f16_f16_e64_dpp and v_dot2_bf16_bf16_e64_dpp (with op_sel=0). Currently, these instructions are simply rejected as if they are not defined.
I agree that this patch does not fix the op_sel issue, but that is another problem that may be addressed separately.

The problem with doing the work separately is that codegen will be affected by this patch as well. Currently, dot2_bf16_bf16_e64_dpp does not exist, so GCNDPPCombine will simply fail to generate that instruction. But after this patch we will generate it. Now this test I tried appears to code generate correctly, but the problem is we cannot assemble or disassemble the resulting instructions. So I would recommend the full asm/disasm implementation and including this codegen test too.

RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck %s -check-prefix=GCN


name: dot2_bf16_bf16
tracksRegLiveness: true
body: |

bb.0:
  liveins: $vgpr0, $vgpr1, $vgpr2

  %0:vgpr_32 = COPY $vgpr0
  %1:vgpr_32 = COPY $vgpr1
  %2:vgpr_32 = COPY $vgpr2
  %3:vgpr_32 = IMPLICIT_DEF
  %4:vgpr_32 = V_MOV_B32_dpp %3, %1, 1, 15, 15, 1, implicit $exec
  %5:vgpr_32 = V_DOT2_BF16_BF16_e64 8, %4, 4, %3, 4, %2, 0, implicit $exec
  S_ENDPGM 0, implicit %5

...

Yields
0xd6677000 0x040a00fa 0xff080101; v_dot2_bf16_bf16 v0.h, v1, v0, v2.h quad_perm:[1,0,0,0] bound_ctrl:1

dp added a comment.Jul 5 2022, 12:54 PM

I see, thanks!

op_sel value is not being propagated from src_modifiers to the op_sel operand

I was convinced that op_sel value is not important for instruction printing. Non dpp version does not even have op_sel operand and still gets printed correctly, helper prints op_sel[a,b,c] from src_modifiers.
I missed to test assembler, missing part is setting op_sel bit in src_modifiers, done via cvtVOP3P.
I will double check both.

Joe_Nash added inline comments.Jul 11 2022, 11:14 AM
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
8805 ↗(On Diff #442382)

The conditional check here should really be checking for is DOT2_BF16_BF16_e64_dpp_gfx11 || DOT2_BF16_BF16_e64_dpp8_gfx11 || DOT2_F16_F16 ....
There is a similar conditional check in the Disassembler where we need to call convertVOP3PDPP.
Options: 1) check if Opc = the specific dot instructions 2) add a tablegen mapping helper table to check if something is one of the relevant instructions 3)
Since we are relying on checking the presence of operands in cvtVOP3P anyway, maybe we can unconditionally call cvtVOP3P here, and rename the function to cvtVOPModsHelper or something.

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
763

This will set op_sel to the wrong value if it was input as 1. We need to call convertVOP3PDPPInst to copy the operand correctly, or something equivalent.
However, that defect does not appear to effect the output, because the op_sel operand does not set any bits in the output machine code.

llvm/lib/Target/AMDGPU/VOP3Instructions.td
705

DOT2_BF16_BF16_e64_dpp should disallow modifiers on src0 and src1, though DOT2_BF16_BF16_e64 allows them. They should allow op_sel on dst, but this may be an issue since the bit for that is in src0 modifiers.
DOT2_F16_F16_e64_dpp should allow abs and neg modifiers on all operands, but op_sel only on dst and src2.

Setting HasSrcMods effects codegen, and dpp16. Can you please include the codegen test I proposed or some variation on it? Can you also include asm and disasm tests for dpp16?

Petar.Avramovic retitled this revision from [AMDGPU] gfx11 Fix disassembler for VOP3 dpp8 to [AMDGPU] gfx11 Fix VOP3 dot instructions.
Petar.Avramovic edited the summary of this revision. (Show Details)

Matches sp3 behavior. op_sel[0:1] must be 0. abs and neg src modifiers for bf16
FixMe(wip, requires some td file changes) clang uses i16 for bf16, I used f16 to get abs and neg parsing support.

dp added inline comments.Jul 20 2022, 3:46 AM
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4199 ↗(On Diff #445930)

Is it possible for an opcode to be both VOP3 and VOP3P?

llvm/lib/Target/AMDGPU/VOPInstructions.td
273–274 ↗(On Diff #445930)

Bits op_sel[1:0] are ignored, so opcodes with these bits set to 1 are legal. Using ? instead of 0 would allow decoding of such opcodes.

1342 ↗(On Diff #445930)

Looks like this is not necessary.

1380–1381 ↗(On Diff #445930)

Ditto.

1403–1404 ↗(On Diff #445930)

Ditto.

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4199 ↗(On Diff #445930)

afaik no, I was surprised that VOP3P instructions also have VOP3 flag, VOP3 instructions don't have VOP3P flag (maybe a bug?). I could check opcodes (I think that there should be 6 opcodes).

llvm/lib/Target/AMDGPU/VOPInstructions.td
273–274 ↗(On Diff #445930)

What is desired behavior for 'ignored bits' then?
assembler: report error if op_sel[1:0] are used (sp3 reports error) or parse the 1 and use 0 when printing/encoding instruction
disassembler: read 1 but encode as 0 anyway (sp3 does this) or fail to disassemble

dp added inline comments.Jul 20 2022, 5:29 AM
llvm/lib/Target/AMDGPU/VOPInstructions.td
273–274 ↗(On Diff #445930)

I think that assembler should be strict and report an error if op_sel[1:0] bits are not 0. Disassembler should be able to decode instructions with ignored bits to aid in binary code analysis (ignored bits may be displayed as 0 in op_sel, this is fine).

dp added a comment.Jul 20 2022, 6:03 AM

Could you add a disassembler test with ignored bits set to 1?

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp8.txt
5274 ↗(On Diff #446130)

Like this? Ignored bits set to zero (0x18), check diff with previous in History tab there are a few more.

dp added a comment.Jul 20 2022, 6:07 AM

Oh, sorry. I see that you have added them. Probably these tests need a comment to make clear what is being tested.

Overall it looks pretty good. The main thing I don't know is if making the bf16 operands float16 will work correctly with codegen. Hopefully @rampitec can answer.

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2072 ↗(On Diff #446138)

@rampitec should look at these intrinsic changes. I am not familiar enough with other BF16 support to know if this is reasonable.

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4199 ↗(On Diff #445930)

Is it possible for an opcode to be both VOP3 and VOP3P?

It should not be, IMO. Those fields designate the encoding, and the instruction should only have one encoding. It could be considered a bug but a long standing one. I have been having similar discussions about an instruction being both VOPC and VOP3. @foad
That said, this check looks fine for the way things work now. We can come back and change it in a separate patch if desired, because I'm pretty sure there will be some other issues arising if that change is made.

llvm/lib/Target/AMDGPU/VOPInstructions.td
272 ↗(On Diff #446138)

Make the parent class of this VOP3OpSel_gfx11. No functional change, but it seems better.

llvm/test/MC/AMDGPU/gfx11_asm_dpp16.s
1 ↗(On Diff #446138)

Can you add --implicit-check-not=error to the runlines?

llvm/test/MC/AMDGPU/gfx11_asm_dpp8.s
1 ↗(On Diff #446138)

Can you add --implicit-check-not=error to the runlines?

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2072 ↗(On Diff #446138)

This only gets desired behavior in testing. The problem is that there is no bf16 in clang, i16 is used instead. Our td files rely on types. I am looking if can bypass type checks. It would be much cleaner if we had the bf16 type.

Overall it looks pretty good. The main thing I don't know is if making the bf16 operands float16 will work correctly with codegen. Hopefully @rampitec can answer.

No, this is incorrect. bf16 is not the same as f16. One need a cast to/from integer type to use bf16.

rampitec added inline comments.Jul 20 2022, 10:18 AM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2072 ↗(On Diff #446138)

We cannot have bf16 type. This is not a question of compiler support, this is the support which does not exist in HW. If you can only do dots on the type you cannot generally support it.

rampitec requested changes to this revision.Jul 20 2022, 10:20 AM

Please revert the part about using half instead of integers for bf16.

This revision now requires changes to proceed.Jul 20 2022, 10:20 AM

Fix modifiers for bf16

Can you add disassembler tests with op_sel:[1,1,1,1] and show the bit is ignored? Otherwise LGTM

Joe_Nash accepted this revision.Jul 21 2022, 8:53 AM
Joe_Nash added inline comments.
llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp8.txt
5287 ↗(On Diff #446523)

Ok, I was confused because the AsmPrinter prints what the hardware will do with that instruction, not what the bits actually disassemble to. This matches sp3 and there are plenty of warnings how those bits are treated. I think this is good.

rampitec accepted this revision.Jul 21 2022, 12:19 PM

LGTM with a nit.

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4198 ↗(On Diff #446523)

Please add parenthesis around binary operations.

This revision is now accepted and ready to land.Jul 21 2022, 12:19 PM
This revision was landed with ongoing or failed builds.Jul 22 2022, 2:48 AM
This revision was automatically updated to reflect the committed changes.