This is an archive of the discontinued LLVM Phabricator instance.

I agree with the problem description in the commit message, but I'm not sure this patch fully fixes the problem. There are no tests actually using the op_sel operand. I think those tests would reveal that the op_sel value is not being propagated from src_modifiers to the op_sel operand. We will need something similar to convertVOP3PDPPInst
In addition, OPSEL[1:0] are ignored by the hardware for these instructions. That should be reflected in the instruction definition. If it doesn't impact the dpp implementation, it could be a separate patch.

This revision now requires changes to proceed.Jul 5 2022, 8:31 AM

This patch does enable assembly and disassembly of v_dot2_f16_f16_e64_dpp and v_dot2_bf16_bf16_e64_dpp (with op_sel=0). Currently, these instructions are simply rejected as if they are not defined.
I agree that this patch does not fix the op_sel issue, but that is another problem that may be addressed separately.

The problem with doing the work separately is that codegen will be affected by this patch as well. Currently, dot2_bf16_bf16_e64_dpp does not exist, so GCNDPPCombine will simply fail to generate that instruction. But after this patch we will generate it. Now this test I tried appears to code generate correctly, but the problem is we cannot assemble or disassemble the resulting instructions. So I would recommend the full asm/disasm implementation and including this codegen test too.

RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck %s -check-prefix=GCN

name: dot2_bf16_bf16
tracksRegLiveness: true
body: |

bb.0:
  liveins: $vgpr0, $vgpr1, $vgpr2

  %0:vgpr_32 = COPY $vgpr0
  %1:vgpr_32 = COPY $vgpr1
  %2:vgpr_32 = COPY $vgpr2
  %3:vgpr_32 = IMPLICIT_DEF
  %4:vgpr_32 = V_MOV_B32_dpp %3, %1, 1, 15, 15, 1, implicit $exec
  %5:vgpr_32 = V_DOT2_BF16_BF16_e64 8, %4, 4, %3, 4, %2, 0, implicit $exec
  S_ENDPGM 0, implicit %5

...

Yields
0xd6677000 0x040a00fa 0xff080101; v_dot2_bf16_bf16 v0.h, v1, v0, v2.h quad_perm:[1,0,0,0] bound_ctrl:1

I see, thanks!

Petar.Avramovic updated this revision to Diff 442382.Jul 5 2022, 1:13 PM

op_sel value is not being propagated from src_modifiers to the op_sel operand

I was convinced that op_sel value is not important for instruction printing. Non dpp version does not even have op_sel operand and still gets printed correctly, helper prints op_sel[a,b,c] from src_modifiers.
I missed to test assembler, missing part is setting op_sel bit in src_modifiers, done via cvtVOP3P.
I will double check both.

Harbormaster completed remote builds in B173759: Diff 442382.Jul 5 2022, 2:01 PM

Joe_Nash added inline comments.Jul 11 2022, 11:14 AM

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
8820	The conditional check here should really be checking for is DOT2_BF16_BF16_e64_dpp_gfx11 \|\| DOT2_BF16_BF16_e64_dpp8_gfx11 \|\| DOT2_F16_F16 .... There is a similar conditional check in the Disassembler where we need to call convertVOP3PDPP. Options: 1) check if Opc = the specific dot instructions 2) add a tablegen mapping helper table to check if something is one of the relevant instructions 3) Since we are relying on checking the presence of operands in cvtVOP3P anyway, maybe we can unconditionally call cvtVOP3P here, and rename the function to cvtVOPModsHelper or something.
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
763 ↗	(On Diff #442382)	This will set op_sel to the wrong value if it was input as 1. We need to call convertVOP3PDPPInst to copy the operand correctly, or something equivalent. However, that defect does not appear to effect the output, because the op_sel operand does not set any bits in the output machine code.
llvm/lib/Target/AMDGPU/VOP3Instructions.td
766	DOT2_BF16_BF16_e64_dpp should disallow modifiers on src0 and src1, though DOT2_BF16_BF16_e64 allows them. They should allow op_sel on dst, but this may be an issue since the bit for that is in src0 modifiers. DOT2_F16_F16_e64_dpp should allow abs and neg modifiers on all operands, but op_sel only on dst and src2. Setting HasSrcMods effects codegen, and dpp16. Can you please include the codegen test I proposed or some variation on it? Can you also include asm and disasm tests for dpp16?

dp mentioned this in D129637: [AMDGPU][MC][GFX11] Correct disassembly of *_e64_dpp opcodes which support op_sel.Jul 13 2022, 6:03 AM

Matches sp3 behavior. op_sel[0:1] must be 0. abs and neg src modifiers for bf16
FixMe(wip, requires some td file changes) clang uses i16 for bf16, I used f16 to get abs and neg parsing support.

Harbormaster completed remote builds in B176343: Diff 445930.Jul 19 2022, 3:17 PM

dp added inline comments.Jul 20 2022, 3:46 AM

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4189	Is it possible for an opcode to be both `VOP3` and `VOP3P`?
llvm/lib/Target/AMDGPU/VOPInstructions.td
273–274	Bits `op_sel[1:0]` are ignored, so opcodes with these bits set to 1 are legal. Using `?` instead of `0` would allow decoding of such opcodes.
1341	Looks like this is not necessary.
1378–1379	Ditto.
1401–1402	Ditto.

Petar.Avramovic added inline comments.Jul 20 2022, 4:40 AM

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4189	afaik no, I was surprised that VOP3P instructions also have VOP3 flag, VOP3 instructions don't have VOP3P flag (maybe a bug?). I could check opcodes (I think that there should be 6 opcodes).
llvm/lib/Target/AMDGPU/VOPInstructions.td
273–274	What is desired behavior for 'ignored bits' then? assembler: report error if op_sel[1:0] are used (sp3 reports error) or parse the 1 and use 0 when printing/encoding instruction disassembler: read 1 but encode as 0 anyway (sp3 does this) or fail to disassemble

dp added inline comments.Jul 20 2022, 5:29 AM

llvm/lib/Target/AMDGPU/VOPInstructions.td
273–274	I think that assembler should be strict and report an error if `op_sel[1:0]` bits are not 0. Disassembler should be able to decode instructions with ignored bits to aid in binary code analysis (ignored bits may be displayed as 0 in `op_sel`, this is fine).

Petar.Avramovic updated this revision to Diff 446130.Jul 20 2022, 5:53 AM

Could you add a disassembler test with ignored bits set to 1?

Petar.Avramovic added inline comments.Jul 20 2022, 6:06 AM

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp8.txt
5274	Like this? Ignored bits set to zero (0x18), check diff with previous in History tab there are a few more.

Oh, sorry. I see that you have added them. Probably these tests need a comment to make clear what is being tested.

Petar.Avramovic updated this revision to Diff 446138.Jul 20 2022, 6:24 AM

Harbormaster completed remote builds in B176494: Diff 446138.Jul 20 2022, 7:02 AM

Overall it looks pretty good. The main thing I don't know is if making the bf16 operands float16 will work correctly with codegen. Hopefully @rampitec can answer.

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2072 ↗	(On Diff #446138)	@rampitec should look at these intrinsic changes. I am not familiar enough with other BF16 support to know if this is reasonable.
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4189	Is it possible for an opcode to be both `VOP3` and `VOP3P`? It should not be, IMO. Those fields designate the encoding, and the instruction should only have one encoding. It could be considered a bug but a long standing one. I have been having similar discussions about an instruction being both VOPC and VOP3. @foad That said, this check looks fine for the way things work now. We can come back and change it in a separate patch if desired, because I'm pretty sure there will be some other issues arising if that change is made.
llvm/lib/Target/AMDGPU/VOPInstructions.td
272	Make the parent class of this VOP3OpSel_gfx11. No functional change, but it seems better.
llvm/test/MC/AMDGPU/gfx11_asm_dpp16.s
1–2	Can you add --implicit-check-not=error to the runlines?
llvm/test/MC/AMDGPU/gfx11_asm_dpp8.s
1–2	Can you add --implicit-check-not=error to the runlines?

Petar.Avramovic added inline comments.Jul 20 2022, 8:51 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2072 ↗	(On Diff #446138)	This only gets desired behavior in testing. The problem is that there is no bf16 in clang, i16 is used instead. Our td files rely on types. I am looking if can bypass type checks. It would be much cleaner if we had the bf16 type.

In D129084#3665914, @Joe_Nash wrote:

Overall it looks pretty good. The main thing I don't know is if making the bf16 operands float16 will work correctly with codegen. Hopefully @rampitec can answer.

No, this is incorrect. bf16 is not the same as f16. One need a cast to/from integer type to use bf16.

rampitec added inline comments.Jul 20 2022, 10:18 AM

llvm/include/llvm/IR/IntrinsicsAMDGPU.td
2072 ↗	(On Diff #446138)	We cannot have bf16 type. This is not a question of compiler support, this is the support which does not exist in HW. If you can only do dots on the type you cannot generally support it.

Please revert the part about using half instead of integers for bf16.

This revision now requires changes to proceed.Jul 20 2022, 10:20 AM

Fix modifiers for bf16

Can you add disassembler tests with op_sel:[1,1,1,1] and show the bit is ignored? Otherwise LGTM

Petar.Avramovic updated this revision to Diff 446523.Jul 21 2022, 8:41 AM

Joe_Nash accepted this revision.Jul 21 2022, 8:53 AM

Joe_Nash added inline comments.

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp8.txt
5287	Ok, I was confused because the AsmPrinter prints what the hardware will do with that instruction, not what the bits actually disassemble to. This matches sp3 and there are plenty of warnings how those bits are treated. I think this is good.

Harbormaster completed remote builds in B176788: Diff 446523.Jul 21 2022, 9:43 AM

LGTM with a nit.

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4188	Please add parenthesis around binary operations.

This revision is now accepted and ready to land.Jul 21 2022, 12:19 PM

This revision was landed with ongoing or failed builds.Jul 22 2022, 2:48 AM

Closed by commit rG8de1f04c77af: [AMDGPU] gfx11 Fix VOP3 dot instructions (authored by Petar.Avramovic). · Explain Why

This revision was automatically updated to reflect the committed changes.

Petar.Avramovic added a commit: rG8de1f04c77af: [AMDGPU] gfx11 Fix VOP3 dot instructions.

foad mentioned this in D130989: [AMDGPU][MC][GFX11] Correct v_dot2_f16_f16 and v_dot2_bf16_bf16.Aug 2 2022, 8:19 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AsmParser/

AMDGPUAsmParser.cpp

13 lines

VOP3Instructions.td

25 lines

VOPInstructions.td

40 lines

test/

CodeGen/

AMDGPU/

llvm.amdgcn.fdot2.bf16.bf16.ll

52 lines

llvm.amdgcn.fdot2.f16.f16.ll

48 lines

MC/

AMDGPU/

gfx11_asm_dpp16.s

27 lines

gfx11_asm_dpp8.s

27 lines

gfx11_vop123.s

14 lines

Disassembler/

AMDGPU/

gfx11_dasm_all.txt

14 lines

gfx11_dasm_vop3_dpp16.txt

26 lines

gfx11_dasm_vop3_dpp8.txt

26 lines

Diff 446758

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,163 Lines • ▼ Show 20 Lines	if (Opc == AMDGPU::V_PERMLANE16_B32_gfx10 \|\|
Opc == AMDGPU::V_PERMLANEX16_B32_gfx10) {		Opc == AMDGPU::V_PERMLANEX16_B32_gfx10) {
int OpSelIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel);		int OpSelIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel);
unsigned OpSel = Inst.getOperand(OpSelIdx).getImm();		unsigned OpSel = Inst.getOperand(OpSelIdx).getImm();

if (OpSel & ~3)		if (OpSel & ~3)
return false;		return false;
}		}

if (isGFX940() && (MII.get(Opc).TSFlags & SIInstrFlags::IsDOT)) {		uint64_t TSFlags = MII.get(Opc).TSFlags;

		if (isGFX940() && (TSFlags & SIInstrFlags::IsDOT)) {
int OpSelIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel);		int OpSelIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel);
if (OpSelIdx != -1) {		if (OpSelIdx != -1) {
if (Inst.getOperand(OpSelIdx).getImm() != 0)		if (Inst.getOperand(OpSelIdx).getImm() != 0)
return false;		return false;
}		}
int OpSelHiIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel_hi);		int OpSelHiIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel_hi);
if (OpSelHiIdx != -1) {		if (OpSelHiIdx != -1) {
if (Inst.getOperand(OpSelHiIdx).getImm() != -1)		if (Inst.getOperand(OpSelHiIdx).getImm() != -1)
return false;		return false;
}		}
}		}

		// op_sel[0:1] must be 0 for v_dot2_bf16_bf16 and v_dot2_f16_f16 (VOP3 Dot).
		if ((TSFlags & SIInstrFlags::IsDOT) && (TSFlags & SIInstrFlags::VOP3) &&
		rampitecUnsubmitted Not Done Reply Inline Actions Please add parenthesis around binary operations. rampitec: Please add parenthesis around binary operations.
		!(TSFlags & SIInstrFlags::VOP3P)) {
		dpUnsubmitted Not Done Reply Inline Actions Is it possible for an opcode to be both `VOP3` and `VOP3P`? dp: Is it possible for an opcode to be both `VOP3` and `VOP3P`?
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions afaik no, I was surprised that VOP3P instructions also have VOP3 flag, VOP3 instructions don't have VOP3P flag (maybe a bug?). I could check opcodes (I think that there should be 6 opcodes). Petar.Avramovic: afaik no, I was surprised that VOP3P instructions also have VOP3 flag, VOP3 instructions don't…
		Joe_NashUnsubmitted Not Done Reply Inline Actions Is it possible for an opcode to be both `VOP3` and `VOP3P`? It should not be, IMO. Those fields designate the encoding, and the instruction should only have one encoding. It could be considered a bug but a long standing one. I have been having similar discussions about an instruction being both VOPC and VOP3. @foad That said, this check looks fine for the way things work now. We can come back and change it in a separate patch if desired, because I'm pretty sure there will be some other issues arising if that change is made. Joe_Nash: > Is it possible for an opcode to be both `VOP3` and `VOP3P`? It should not be, IMO. Those…
		int OpSelIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel);
		unsigned OpSel = Inst.getOperand(OpSelIdx).getImm();
		if (OpSel & 3)
		return false;
		}

return true;		return true;
}		}

bool AMDGPUAsmParser::validateDPP(const MCInst &Inst,		bool AMDGPUAsmParser::validateDPP(const MCInst &Inst,
const OperandVector &Operands) {		const OperandVector &Operands) {
const unsigned Opc = Inst.getOpcode();		const unsigned Opc = Inst.getOpcode();
int DppCtrlIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::dpp_ctrl);		int DppCtrlIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::dpp_ctrl);
if (DppCtrlIdx < 0)		if (DppCtrlIdx < 0)
▲ Show 20 Lines • Show All 4,608 Lines • ▼ Show 20 Lines	void AMDGPUAsmParser::cvtVOP3DPP(MCInst &Inst, const OperandVector &Operands, bool IsDPP8) {
if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::clamp) != -1) {		if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::clamp) != -1) {
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyClampSI);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyClampSI);
}		}
if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::omod) != -1) {		if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::omod) != -1) {
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyOModSI);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyOModSI);
}		}
if (Desc.TSFlags & SIInstrFlags::VOP3P)		if (Desc.TSFlags & SIInstrFlags::VOP3P)
cvtVOP3P(Inst, Operands, OptionalIdx);		cvtVOP3P(Inst, Operands, OptionalIdx);
else if (Desc.TSFlags & SIInstrFlags::VOP3)		else if (Desc.TSFlags & SIInstrFlags::VOP3)
		Joe_NashUnsubmitted Not Done Reply Inline Actions The conditional check here should really be checking for is DOT2_BF16_BF16_e64_dpp_gfx11 \|\| DOT2_BF16_BF16_e64_dpp8_gfx11 \|\| DOT2_F16_F16 .... There is a similar conditional check in the Disassembler where we need to call convertVOP3PDPP. Options: 1) check if Opc = the specific dot instructions 2) add a tablegen mapping helper table to check if something is one of the relevant instructions 3) Since we are relying on checking the presence of operands in cvtVOP3P anyway, maybe we can unconditionally call cvtVOP3P here, and rename the function to cvtVOPModsHelper or something. Joe_Nash: The conditional check here should really be checking for is DOT2_BF16_BF16_e64_dpp_gfx11 \|\|…
cvtVOP3OpSel(Inst, Operands, OptionalIdx);		cvtVOP3OpSel(Inst, Operands, OptionalIdx);
else if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel) != -1) {		else if (AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::op_sel) != -1) {
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyOpSel);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyOpSel);
}		}

if (IsDPP8) {		if (IsDPP8) {
addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDPP8);		addOptionalImmOperand(Inst, Operands, OptionalIdx, AMDGPUOperand::ImmTyDPP8);
using namespace llvm::AMDGPU::DPP;		using namespace llvm::AMDGPU::DPP;
▲ Show 20 Lines • Show All 415 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOP3Instructions.td

	Show First 20 Lines • Show All 754 Lines • ▼ Show 20 Lines
	}			}

	let WaveSizePredicate = isWave32 in {			let WaveSizePredicate = isWave32 in {
	def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC_LO>;			def : DivFmasPat<f32, V_DIV_FMAS_F32_e64, VCC_LO>;
	def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC_LO>;			def : DivFmasPat<f64, V_DIV_FMAS_F64_e64, VCC_LO>;
	}			}

	class VOP3_DOT_Profile<VOPProfile P, VOP3Features Features = VOP3_REGULAR> : VOP3_Profile<P, Features> {			class VOP3_DOT_Profile<VOPProfile P, VOP3Features Features = VOP3_REGULAR> : VOP3_Profile<P, Features> {
	// FIXME VOP3 DPP versions are unsupported
	let HasExtVOP3DPP = 0;
	let HasClamp = 0;			let HasClamp = 0;
	let HasOMod = 0;			let HasOMod = 0;
	let InsVOP3OpSel = getInsVOP3OpSel<Src0RC64, Src1RC64, Src2RC64,			// Override modifiers for bf16(i16) (same as float modifiers).
	NumSrcArgs, HasClamp, HasOMod,			let HasSrc0Mods = 1;
				Joe_NashUnsubmitted Not Done Reply Inline Actions DOT2_BF16_BF16_e64_dpp should disallow modifiers on src0 and src1, though DOT2_BF16_BF16_e64 allows them. They should allow op_sel on dst, but this may be an issue since the bit for that is in src0 modifiers. DOT2_F16_F16_e64_dpp should allow abs and neg modifiers on all operands, but op_sel only on dst and src2. Setting HasSrcMods effects codegen, and dpp16. Can you please include the codegen test I proposed or some variation on it? Can you also include asm and disasm tests for dpp16? Joe_Nash: DOT2_BF16_BF16_e64_dpp should disallow modifiers on src0 and src1, though DOT2_BF16_BF16_e64…
	!if(isFloatType<Src0VT>.ret, FPVRegInputMods, IntOpSelMods),			let HasSrc1Mods = 1;
	!if(isFloatType<Src1VT>.ret, FPVRegInputMods, IntOpSelMods),			let HasSrc2Mods = 1;
	!if(isFloatType<Src2VT>.ret, FPVRegInputMods, IntOpSelMods)>.ret;			let Src0ModDPP = FPVRegInputMods;
				let Src1ModDPP = FPVRegInputMods;
				let Src2ModVOP3DPP = FPVRegInputMods;
				let InsVOP3OpSel = getInsVOP3OpSel<Src0RC64, Src1RC64, Src2RC64, NumSrcArgs,
				HasClamp, HasOMod, FPVRegInputMods,
				FPVRegInputMods, FPVRegInputMods>.ret;
				let AsmVOP3OpSel = getAsmVOP3OpSel<NumSrcArgs, HasClamp, 1, 1, 1>.ret;
	}			}

	let SubtargetPredicate = isGFX11Plus in {			let SubtargetPredicate = isGFX11Plus in {
	defm V_MAXMIN_F32 : VOP3Inst<"v_maxmin_f32", VOP3_Profile<VOP_F32_F32_F32_F32>>;			defm V_MAXMIN_F32 : VOP3Inst<"v_maxmin_f32", VOP3_Profile<VOP_F32_F32_F32_F32>>;
	defm V_MINMAX_F32 : VOP3Inst<"v_minmax_f32", VOP3_Profile<VOP_F32_F32_F32_F32>>;			defm V_MINMAX_F32 : VOP3Inst<"v_minmax_f32", VOP3_Profile<VOP_F32_F32_F32_F32>>;
	defm V_MAXMIN_F16 : VOP3Inst<"v_maxmin_f16", VOP3_Profile<VOP_F16_F16_F16_F16>>;			defm V_MAXMIN_F16 : VOP3Inst<"v_maxmin_f16", VOP3_Profile<VOP_F16_F16_F16_F16>>;
	defm V_MINMAX_F16 : VOP3Inst<"v_minmax_f16", VOP3_Profile<VOP_F16_F16_F16_F16>>;			defm V_MINMAX_F16 : VOP3Inst<"v_minmax_f16", VOP3_Profile<VOP_F16_F16_F16_F16>>;
	defm V_MAXMIN_U32 : VOP3Inst<"v_maxmin_u32", VOP3_Profile<VOP_I32_I32_I32_I32>>;			defm V_MAXMIN_U32 : VOP3Inst<"v_maxmin_u32", VOP3_Profile<VOP_I32_I32_I32_I32>>;
	defm V_MINMAX_U32 : VOP3Inst<"v_minmax_u32", VOP3_Profile<VOP_I32_I32_I32_I32>>;			defm V_MINMAX_U32 : VOP3Inst<"v_minmax_u32", VOP3_Profile<VOP_I32_I32_I32_I32>>;
	defm V_MAXMIN_I32 : VOP3Inst<"v_maxmin_i32", VOP3_Profile<VOP_I32_I32_I32_I32>>;			defm V_MAXMIN_I32 : VOP3Inst<"v_maxmin_i32", VOP3_Profile<VOP_I32_I32_I32_I32>>;
	defm V_MINMAX_I32 : VOP3Inst<"v_minmax_i32", VOP3_Profile<VOP_I32_I32_I32_I32>>;			defm V_MINMAX_I32 : VOP3Inst<"v_minmax_i32", VOP3_Profile<VOP_I32_I32_I32_I32>>;
	defm V_CVT_PK_I16_F32 : VOP3Inst<"v_cvt_pk_i16_f32", VOP3_Profile<VOP_V2I16_F32_F32>>;			defm V_CVT_PK_I16_F32 : VOP3Inst<"v_cvt_pk_i16_f32", VOP3_Profile<VOP_V2I16_F32_F32>>;
	defm V_CVT_PK_U16_F32 : VOP3Inst<"v_cvt_pk_u16_f32", VOP3_Profile<VOP_V2I16_F32_F32>>;			defm V_CVT_PK_U16_F32 : VOP3Inst<"v_cvt_pk_u16_f32", VOP3_Profile<VOP_V2I16_F32_F32>>;
	} // End SubtargetPredicate = isGFX11Plus			} // End SubtargetPredicate = isGFX11Plus

	let SubtargetPredicate = HasDot8Insts in {			let SubtargetPredicate = HasDot8Insts, IsDOT=1 in {
	defm V_DOT2_F16_F16 : VOP3Inst<"v_dot2_f16_f16", VOP3_DOT_Profile<VOP_F16_V2F16_V2F16_F16>, int_amdgcn_fdot2_f16_f16>;			defm V_DOT2_F16_F16 : VOP3Inst<"v_dot2_f16_f16", VOP3_DOT_Profile<VOP_F16_V2F16_V2F16_F16>, int_amdgcn_fdot2_f16_f16>;
	defm V_DOT2_BF16_BF16 : VOP3Inst<"v_dot2_bf16_bf16", VOP3_DOT_Profile<VOP_I16_V2I16_V2I16_I16>, int_amdgcn_fdot2_bf16_bf16>;			defm V_DOT2_BF16_BF16 : VOP3Inst<"v_dot2_bf16_bf16", VOP3_DOT_Profile<VOP_I16_V2I16_V2I16_I16>, int_amdgcn_fdot2_bf16_bf16>;
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Integer Clamp Patterns			// Integer Clamp Patterns
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines
	defm V_MAXMIN_F32 : VOP3_Realtriple_gfx11<0x25e>;			defm V_MAXMIN_F32 : VOP3_Realtriple_gfx11<0x25e>;
	defm V_MINMAX_F32 : VOP3_Realtriple_gfx11<0x25f>;			defm V_MINMAX_F32 : VOP3_Realtriple_gfx11<0x25f>;
	defm V_MAXMIN_F16 : VOP3_Realtriple_gfx11<0x260>;			defm V_MAXMIN_F16 : VOP3_Realtriple_gfx11<0x260>;
	defm V_MINMAX_F16 : VOP3_Realtriple_gfx11<0x261>;			defm V_MINMAX_F16 : VOP3_Realtriple_gfx11<0x261>;
	defm V_MAXMIN_U32 : VOP3_Realtriple_gfx11<0x262>;			defm V_MAXMIN_U32 : VOP3_Realtriple_gfx11<0x262>;
	defm V_MINMAX_U32 : VOP3_Realtriple_gfx11<0x263>;			defm V_MINMAX_U32 : VOP3_Realtriple_gfx11<0x263>;
	defm V_MAXMIN_I32 : VOP3_Realtriple_gfx11<0x264>;			defm V_MAXMIN_I32 : VOP3_Realtriple_gfx11<0x264>;
	defm V_MINMAX_I32 : VOP3_Realtriple_gfx11<0x265>;			defm V_MINMAX_I32 : VOP3_Realtriple_gfx11<0x265>;
	// FIXME VOP3 DPP Dot instructions are unsupported			defm V_DOT2_F16_F16 : VOP3Dot_Realtriple_gfx11<0x266>;
	defm V_DOT2_F16_F16 : VOP3_Real_Base_gfx11<0x266>;			defm V_DOT2_BF16_BF16 : VOP3Dot_Realtriple_gfx11<0x267>;
	defm V_DOT2_BF16_BF16 : VOP3_Real_Base_gfx11<0x267>;
	defm V_DIV_SCALE_F32 : VOP3be_Real_gfx11<0x2fc, "V_DIV_SCALE_F32", "v_div_scale_f32">;			defm V_DIV_SCALE_F32 : VOP3be_Real_gfx11<0x2fc, "V_DIV_SCALE_F32", "v_div_scale_f32">;
	defm V_DIV_SCALE_F64 : VOP3be_Real_gfx11<0x2fd, "V_DIV_SCALE_F64", "v_div_scale_f64">;			defm V_DIV_SCALE_F64 : VOP3be_Real_gfx11<0x2fd, "V_DIV_SCALE_F64", "v_div_scale_f64">;
	defm V_MAD_U64_U32_gfx11 : VOP3be_Real_gfx11<0x2fe, "V_MAD_U64_U32_gfx11", "v_mad_u64_u32">;			defm V_MAD_U64_U32_gfx11 : VOP3be_Real_gfx11<0x2fe, "V_MAD_U64_U32_gfx11", "v_mad_u64_u32">;
	defm V_MAD_I64_I32_gfx11 : VOP3be_Real_gfx11<0x2ff, "V_MAD_I64_I32_gfx11", "v_mad_i64_i32">;			defm V_MAD_I64_I32_gfx11 : VOP3be_Real_gfx11<0x2ff, "V_MAD_I64_I32_gfx11", "v_mad_i64_i32">;
	defm V_ADD_NC_U16 : VOP3Only_Realtriple_gfx11<0x303>;			defm V_ADD_NC_U16 : VOP3Only_Realtriple_gfx11<0x303>;
	defm V_SUB_NC_U16 : VOP3Only_Realtriple_gfx11<0x304>;			defm V_SUB_NC_U16 : VOP3Only_Realtriple_gfx11<0x304>;
	defm V_MUL_LO_U16 : VOP3Only_Realtriple_gfx11<0x305>;			defm V_MUL_LO_U16 : VOP3Only_Realtriple_gfx11<0x305>;
	defm V_CVT_PK_I16_F32 : VOP3_Realtriple_gfx11<0x306>;			defm V_CVT_PK_I16_F32 : VOP3_Realtriple_gfx11<0x306>;
	▲ Show 20 Lines • Show All 507 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/VOPInstructions.td

Show First 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	class VOP3OpSel_gfx10<bits<10> op, VOPProfile p> : VOP3e_gfx10<op, p> {
let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);		let Inst{11} = !if(p.HasSrc0, src0_modifiers{2}, 0);
let Inst{12} = !if(p.HasSrc1, src1_modifiers{2}, 0);		let Inst{12} = !if(p.HasSrc1, src1_modifiers{2}, 0);
let Inst{13} = !if(p.HasSrc2, src2_modifiers{2}, 0);		let Inst{13} = !if(p.HasSrc2, src2_modifiers{2}, 0);
let Inst{14} = !if(p.HasDst, src0_modifiers{3}, 0);		let Inst{14} = !if(p.HasDst, src0_modifiers{3}, 0);
}		}

class VOP3OpSel_gfx11<bits<10> op, VOPProfile p> : VOP3OpSel_gfx10<op, p>;		class VOP3OpSel_gfx11<bits<10> op, VOPProfile p> : VOP3OpSel_gfx10<op, p>;

		class VOP3DotOpSel_gfx11<bits<10> op, VOPProfile p> : VOP3OpSel_gfx11<op, p>{
		Joe_NashUnsubmitted Not Done Reply Inline Actions Make the parent class of this VOP3OpSel_gfx11. No functional change, but it seems better. Joe_Nash: Make the parent class of this VOP3OpSel_gfx11. No functional change, but it seems better.
		let Inst{11} = ?;
		let Inst{12} = ?;
		dpUnsubmitted Not Done Reply Inline Actions Bits `op_sel[1:0]` are ignored, so opcodes with these bits set to 1 are legal. Using `?` instead of `0` would allow decoding of such opcodes. dp: Bits `op_sel[1:0]` are ignored, so opcodes with these bits set to 1 are legal. Using `?`…
		Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions What is desired behavior for 'ignored bits' then? assembler: report error if op_sel[1:0] are used (sp3 reports error) or parse the 1 and use 0 when printing/encoding instruction disassembler: read 1 but encode as 0 anyway (sp3 does this) or fail to disassemble Petar.Avramovic: What is desired behavior for 'ignored bits' then? assembler: report error if op_sel[1:0] are…
		dpUnsubmitted Not Done Reply Inline Actions I think that assembler should be strict and report an error if `op_sel[1:0]` bits are not 0. Disassembler should be able to decode instructions with ignored bits to aid in binary code analysis (ignored bits may be displayed as 0 in `op_sel`, this is fine). dp: I think that assembler should be strict and report an error if `op_sel[1:0]` bits are not 0.
		}

// NB: For V_INTERP* opcodes, src0 is encoded as src1 and vice versa		// NB: For V_INTERP* opcodes, src0 is encoded as src1 and vice versa
class VOP3Interp_vi <bits<10> op, VOPProfile P> : VOP3e_vi <op, P> {		class VOP3Interp_vi <bits<10> op, VOPProfile P> : VOP3e_vi <op, P> {
bits<2> attrchan;		bits<2> attrchan;
bits<6> attr;		bits<6> attr;
bits<1> high;		bits<1> high;

let Inst{8} = 0; // No modifiers for src0		let Inst{8} = 0; // No modifiers for src0
▲ Show 20 Lines • Show All 986 Lines • ▼ Show 20 Lines

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// VOP3 DPP		// VOP3 DPP
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

class Base_VOP3_DPP16<bits<10> op, VOP_DPP_Pseudo ps, string opName = ps.OpName>		class Base_VOP3_DPP16<bits<10> op, VOP_DPP_Pseudo ps, string opName = ps.OpName>
: VOP3_DPP<op, opName, ps.Pfl, 1> {		: VOP3_DPP<op, opName, ps.Pfl, 1> {
let VOP3_OPSEL = ps.Pfl.HasOpSel;		let VOP3_OPSEL = ps.Pfl.HasOpSel;
		let IsDOT = ps.IsDOT;
let hasSideEffects = ps.hasSideEffects;		let hasSideEffects = ps.hasSideEffects;
let Defs = ps.Defs;		let Defs = ps.Defs;
let SchedRW = ps.SchedRW;		let SchedRW = ps.SchedRW;
let Uses = ps.Uses;		let Uses = ps.Uses;
let AssemblerPredicate = HasDPP16;		let AssemblerPredicate = HasDPP16;
let SubtargetPredicate = HasDPP16;		let SubtargetPredicate = HasDPP16;
let OtherPredicates = ps.OtherPredicates;		let OtherPredicates = ps.OtherPredicates;
}		}

class VOP3_DPP16<bits<10> op, VOP_DPP_Pseudo ps, int subtarget,		class VOP3_DPP16<bits<10> op, VOP_DPP_Pseudo ps, int subtarget,
string opName = ps.OpName>		string opName = ps.OpName>
: Base_VOP3_DPP16<op, ps, opName>, SIMCInstr<ps.PseudoInstr, subtarget>;		: Base_VOP3_DPP16<op, ps, opName>, SIMCInstr<ps.PseudoInstr, subtarget>;

class Base_VOP3_DPP8<bits<10> op, VOP_Pseudo ps, string opName = ps.OpName>		class Base_VOP3_DPP8<bits<10> op, VOP_Pseudo ps, string opName = ps.OpName>
: VOP3_DPP8<op, opName, ps.Pfl> {		: VOP3_DPP8<op, opName, ps.Pfl> {
let VOP3_OPSEL = ps.Pfl.HasOpSel;		let VOP3_OPSEL = ps.Pfl.HasOpSel;
		let IsDOT = ps.IsDOT;
let hasSideEffects = ps.hasSideEffects;		let hasSideEffects = ps.hasSideEffects;
let Defs = ps.Defs;		let Defs = ps.Defs;
let SchedRW = ps.SchedRW;		let SchedRW = ps.SchedRW;
let Uses = ps.Uses;		let Uses = ps.Uses;

let OtherPredicates = ps.OtherPredicates;		let OtherPredicates = ps.OtherPredicates;
}		}

Show All 25 Lines	foreach _ = BoolToList<ps.Pfl.HasOpSel>.ret in
VOP3_Real<ps, SIEncodingFamily.GFX11>,		VOP3_Real<ps, SIEncodingFamily.GFX11>,
VOP3OpSel_gfx11<op, ps.Pfl>;		VOP3OpSel_gfx11<op, ps.Pfl>;
foreach _ = BoolToList<!not(ps.Pfl.HasOpSel)>.ret in		foreach _ = BoolToList<!not(ps.Pfl.HasOpSel)>.ret in
def _e64_gfx11 :		def _e64_gfx11 :
VOP3_Real<ps, SIEncodingFamily.GFX11>,		VOP3_Real<ps, SIEncodingFamily.GFX11>,
VOP3e_gfx11<op, ps.Pfl>;		VOP3e_gfx11<op, ps.Pfl>;
}		}
}		}
		multiclass VOP3Dot_Real_Base_gfx11<bits<10> op, string opName = NAME,
		bit isSingle = 0> {
		defvar ps = !cast<VOP_Pseudo>(opName#"_e64");
		let IsSingle = !or(isSingle, ps.Pfl.IsSingle) in {
		def _e64_gfx11 :
		dpUnsubmitted Not Done Reply Inline Actions Looks like this is not necessary. dp: Looks like this is not necessary.
		VOP3_Real<ps, SIEncodingFamily.GFX11>,
		VOP3DotOpSel_gfx11<op, ps.Pfl>;
		}
		}
multiclass VOP3_Real_with_name_gfx11<bits<10> op, string opName,		multiclass VOP3_Real_with_name_gfx11<bits<10> op, string opName,
string asmName, bit isSingle = 0> {		string asmName, bit isSingle = 0> {
defvar ps = !cast<VOP_Pseudo>(opName#"_e64");		defvar ps = !cast<VOP_Pseudo>(opName#"_e64");
let AsmString = asmName # ps.AsmOperands,		let AsmString = asmName # ps.AsmOperands,
IsSingle = !or(isSingle, ps.Pfl.IsSingle) in {		IsSingle = !or(isSingle, ps.Pfl.IsSingle) in {
foreach _ = BoolToList<ps.Pfl.HasOpSel>.ret in		foreach _ = BoolToList<ps.Pfl.HasOpSel>.ret in
def _e64_gfx11 :		def _e64_gfx11 :
VOP3_Real<ps, SIEncodingFamily.GFX11>,		VOP3_Real<ps, SIEncodingFamily.GFX11>,
Show All 13 Lines	defvar ps = !cast<VOP_Pseudo>(opName);
VOP3_Real<ps, SIEncodingFamily.GFX11>,		VOP3_Real<ps, SIEncodingFamily.GFX11>,
VOP3e_gfx11<op, ps.Pfl>;		VOP3e_gfx11<op, ps.Pfl>;
}		}
multiclass VOP3_Real_dpp_Base_gfx11<bits<10> op, string opName = NAME> {		multiclass VOP3_Real_dpp_Base_gfx11<bits<10> op, string opName = NAME> {
def _e64_dpp_gfx11 : VOP3_DPP16<op, !cast<VOP_DPP_Pseudo>(opName#"_e64"#"_dpp"), SIEncodingFamily.GFX11> {		def _e64_dpp_gfx11 : VOP3_DPP16<op, !cast<VOP_DPP_Pseudo>(opName#"_e64"#"_dpp"), SIEncodingFamily.GFX11> {
let DecoderNamespace = "DPPGFX11";		let DecoderNamespace = "DPPGFX11";
}		}
}		}

		multiclass VOP3Dot_Real_dpp_Base_gfx11<bits<10> op, string opName = NAME> {
		def _e64_dpp_gfx11 : VOP3_DPP16<op, !cast<VOP_DPP_Pseudo>(opName#"_e64"#"_dpp"), SIEncodingFamily.GFX11> {
		let Inst{11} = ?;
		let Inst{12} = ?;
		dpUnsubmitted Not Done Reply Inline Actions Ditto. dp: Ditto.
		let DecoderNamespace = "DPPGFX11";
		}
		}

multiclass VOP3_Real_dpp_with_name_gfx11<bits<10> op, string opName,		multiclass VOP3_Real_dpp_with_name_gfx11<bits<10> op, string opName,
string asmName> {		string asmName> {
defvar ps = !cast<VOP3_Pseudo>(opName#"_e64");		defvar ps = !cast<VOP3_Pseudo>(opName#"_e64");
let AsmString = asmName # ps.Pfl.AsmVOP3DPP16, DecoderNamespace = "DPPGFX11" in {		let AsmString = asmName # ps.Pfl.AsmVOP3DPP16, DecoderNamespace = "DPPGFX11" in {
defm NAME : VOP3_Real_dpp_Base_gfx11<op, opName>;		defm NAME : VOP3_Real_dpp_Base_gfx11<op, opName>;
}		}
}		}
multiclass VOP3_Real_dpp8_Base_gfx11<bits<10> op, string opName = NAME> {		multiclass VOP3_Real_dpp8_Base_gfx11<bits<10> op, string opName = NAME> {
defvar ps = !cast<VOP3_Pseudo>(opName#"_e64");		defvar ps = !cast<VOP3_Pseudo>(opName#"_e64");
def _e64_dpp8_gfx11 : Base_VOP3_DPP8<op, ps> {		def _e64_dpp8_gfx11 : Base_VOP3_DPP8<op, ps> {
let DecoderNamespace = "DPP8GFX11";		let DecoderNamespace = "DPP8GFX11";
}		}
}		}

		multiclass VOP3Dot_Real_dpp8_Base_gfx11<bits<10> op, string opName = NAME> {
		defvar ps = !cast<VOP3_Pseudo>(opName#"_e64");
		def _e64_dpp8_gfx11 : Base_VOP3_DPP8<op, ps> {
		let Inst{11} = ?;
		let Inst{12} = ?;
		dpUnsubmitted Not Done Reply Inline Actions Ditto. dp: Ditto.
		let DecoderNamespace = "DPP8GFX11";
		}
		}

multiclass VOP3_Real_dpp8_with_name_gfx11<bits<10> op, string opName,		multiclass VOP3_Real_dpp8_with_name_gfx11<bits<10> op, string opName,
string asmName> {		string asmName> {
defvar ps = !cast<VOP3_Pseudo>(opName#"_e64");		defvar ps = !cast<VOP3_Pseudo>(opName#"_e64");
let AsmString = asmName # ps.Pfl.AsmVOP3DPP8, DecoderNamespace = "DPP8GFX11" in {		let AsmString = asmName # ps.Pfl.AsmVOP3DPP8, DecoderNamespace = "DPP8GFX11" in {
defm NAME : VOP3_Real_dpp8_Base_gfx11<op, opName>;		defm NAME : VOP3_Real_dpp8_Base_gfx11<op, opName>;
}		}
}		}
multiclass VOP3be_Real_gfx11<bits<10> op, string opName, string asmName,		multiclass VOP3be_Real_gfx11<bits<10> op, string opName, string asmName,
Show All 22 Lines

// VOP1 and VOP2 depend on these triple defs		// VOP1 and VOP2 depend on these triple defs
multiclass VOP3_Realtriple_gfx11<bits<10> op,		multiclass VOP3_Realtriple_gfx11<bits<10> op,
bit isSingle = 0, string opName = NAME> :		bit isSingle = 0, string opName = NAME> :
VOP3_Real_Base_gfx11<op, opName, isSingle>,		VOP3_Real_Base_gfx11<op, opName, isSingle>,
VOP3_Real_dpp_Base_gfx11<op, opName>,		VOP3_Real_dpp_Base_gfx11<op, opName>,
VOP3_Real_dpp8_Base_gfx11<op, opName>;		VOP3_Real_dpp8_Base_gfx11<op, opName>;

		multiclass VOP3Dot_Realtriple_gfx11<bits<10> op,
		bit isSingle = 0, string opName = NAME> :
		VOP3Dot_Real_Base_gfx11<op, opName, isSingle>,
		VOP3Dot_Real_dpp_Base_gfx11<op, opName>,
		VOP3Dot_Real_dpp8_Base_gfx11<op, opName>;

multiclass VOP3Only_Realtriple_gfx11<bits<10> op> :		multiclass VOP3Only_Realtriple_gfx11<bits<10> op> :
VOP3_Realtriple_gfx11<op, 1>;		VOP3_Realtriple_gfx11<op, 1>;

multiclass VOP3_Realtriple_with_name_gfx11<bits<10> op, string opName,		multiclass VOP3_Realtriple_with_name_gfx11<bits<10> op, string opName,
string asmName, bit isSingle = 0> :		string asmName, bit isSingle = 0> :
VOP3_Real_with_name_gfx11<op, opName, asmName, isSingle>,		VOP3_Real_with_name_gfx11<op, opName, asmName, isSingle>,
VOP3_Real_dpp_with_name_gfx11<op, opName, asmName>,		VOP3_Real_dpp_with_name_gfx11<op, opName, asmName>,
VOP3_Real_dpp8_with_name_gfx11<op, opName, asmName>;		VOP3_Real_dpp8_with_name_gfx11<op, opName, asmName>;
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.bf16.bf16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11			; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11,SDAG-GFX11
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11,GISEL-GFX11

	declare i16 @llvm.amdgcn.fdot2.bf16.bf16(<2 x i16> %a, <2 x i16> %b, i16 %c)			declare i16 @llvm.amdgcn.fdot2.bf16.bf16(<2 x i16> %a, <2 x i16> %b, i16 %c)

	define amdgpu_kernel void @test_llvm_amdgcn_fdot2_f16_f16(			define amdgpu_kernel void @test_llvm_amdgcn_fdot2_bf16_bf16(
	; GFX11-LABEL: test_llvm_amdgcn_fdot2_f16_f16:			; GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	; GFX11-NEXT: s_waitcnt lgkmcnt(0)			; GFX11-NEXT: s_waitcnt lgkmcnt(0)
	; GFX11-NEXT: global_load_u16 v1, v0, s[6:7]			; GFX11-NEXT: global_load_u16 v1, v0, s[6:7]
	; GFX11-NEXT: s_load_b32 s2, s[2:3], 0x0			; GFX11-NEXT: s_load_b32 s2, s[2:3], 0x0
	; GFX11-NEXT: s_load_b32 s3, s[4:5], 0x0			; GFX11-NEXT: s_load_b32 s3, s[4:5], 0x0
	; GFX11-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GFX11-NEXT: v_dot2_bf16_bf16 v1, s2, s3, v1			; GFX11-NEXT: v_dot2_bf16_bf16 v1, s2, s3, v1
	; GFX11-NEXT: global_store_b16 v0, v1, s[0:1]			; GFX11-NEXT: global_store_b16 v0, v1, s[0:1]
	; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)			; GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	; GFX11-NEXT: s_endpgm			; GFX11-NEXT: s_endpgm
	i16 addrspace(1)* %r,			i16 addrspace(1)* %r,
	<2 x i16> addrspace(1)* %a,			<2 x i16> addrspace(1)* %a,
	<2 x i16> addrspace(1)* %b,			<2 x i16> addrspace(1)* %b,
	i16 addrspace(1)* %c) {			i16 addrspace(1)* %c) {
	entry:			entry:
	%a.val = load <2 x i16>, <2 x i16> addrspace(1)* %a			%a.val = load <2 x i16>, <2 x i16> addrspace(1)* %a
	%b.val = load <2 x i16>, <2 x i16> addrspace(1)* %b			%b.val = load <2 x i16>, <2 x i16> addrspace(1)* %b
	%c.val = load i16, i16 addrspace(1)* %c			%c.val = load i16, i16 addrspace(1)* %c
	%r.val = call i16 @llvm.amdgcn.fdot2.bf16.bf16(<2 x i16> %a.val, <2 x i16> %b.val, i16 %c.val)			%r.val = call i16 @llvm.amdgcn.fdot2.bf16.bf16(<2 x i16> %a.val, <2 x i16> %b.val, i16 %c.val)
	store i16 %r.val, i16 addrspace(1)* %r			store i16 %r.val, i16 addrspace(1)* %r
	ret void			ret void
	}			}

				define amdgpu_kernel void @test_llvm_amdgcn_fdot2_bf16_bf16_dpp(
				; SDAG-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:
				; SDAG-GFX11: ; %bb.0: ; %entry
				; SDAG-GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
				; SDAG-GFX11-NEXT: s_waitcnt lgkmcnt(0)
				; SDAG-GFX11-NEXT: scratch_load_b32 v0, off, s2
				; SDAG-GFX11-NEXT: scratch_load_u16 v1, off, s3
				; SDAG-GFX11-NEXT: scratch_load_b32 v2, off, s1
				; SDAG-GFX11-NEXT: s_waitcnt vmcnt(0)
				; SDAG-GFX11-NEXT: v_dot2_bf16_bf16_e64_dpp v0, v2, v0, v1 quad_perm:[1,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:1
				; SDAG-GFX11-NEXT: scratch_store_b16 off, v0, s0
				; SDAG-GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
				; SDAG-GFX11-NEXT: s_endpgm
				;
				; GISEL-GFX11-LABEL: test_llvm_amdgcn_fdot2_bf16_bf16_dpp:
				; GISEL-GFX11: ; %bb.0: ; %entry
				; GISEL-GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
				; GISEL-GFX11-NEXT: s_waitcnt lgkmcnt(0)
				; GISEL-GFX11-NEXT: scratch_load_b32 v0, off, s1
				; GISEL-GFX11-NEXT: scratch_load_b32 v1, off, s2
				; GISEL-GFX11-NEXT: scratch_load_u16 v2, off, s3
				; GISEL-GFX11-NEXT: s_waitcnt vmcnt(0)
				; GISEL-GFX11-NEXT: v_dot2_bf16_bf16_e64_dpp v0, v0, v1, v2 quad_perm:[1,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GISEL-GFX11-NEXT: scratch_store_b16 off, v0, s0
				; GISEL-GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
				; GISEL-GFX11-NEXT: s_endpgm
				i16 addrspace(5)* %r,
				<2 x i16> addrspace(5)* %a,
				<2 x i16> addrspace(5)* %b,
				i16 addrspace(5)* %c) {
				entry:
				%a.val = load <2 x i16>, <2 x i16> addrspace(5)* %a
				%b.val = load <2 x i16>, <2 x i16> addrspace(5)* %b
				%c.val = load i16, i16 addrspace(5)* %c
				%a.val.i32 = bitcast <2 x i16> %a.val to i32
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 %a.val.i32, i32 %a.val.i32, i32 1, i32 15, i32 15, i1 1)
				%a.val.dpp.v2i16 = bitcast i32 %dpp to <2 x i16>
				%r.val = call i16 @llvm.amdgcn.fdot2.bf16.bf16(<2 x i16> %a.val.dpp.v2i16, <2 x i16> %b.val, i16 %c.val)
				store i16 %r.val, i16 addrspace(5)* %r
				ret void
				}

				declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1)

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f16.f16.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11			; RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11,SDAG-GFX11
	; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11			; RUN: llc -global-isel -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs < %s \| FileCheck %s --check-prefixes=GFX11,GISEL-GFX11

	declare half @llvm.amdgcn.fdot2.f16.f16(<2 x half> %a, <2 x half> %b, half %c)			declare half @llvm.amdgcn.fdot2.f16.f16(<2 x half> %a, <2 x half> %b, half %c)

	define amdgpu_kernel void @test_llvm_amdgcn_fdot2_f16_f16(			define amdgpu_kernel void @test_llvm_amdgcn_fdot2_f16_f16(
	; GFX11-LABEL: test_llvm_amdgcn_fdot2_f16_f16:			; GFX11-LABEL: test_llvm_amdgcn_fdot2_f16_f16:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24			; GFX11-NEXT: s_load_b256 s[0:7], s[0:1], 0x24
	; GFX11-NEXT: v_mov_b32_e32 v0, 0			; GFX11-NEXT: v_mov_b32_e32 v0, 0
	Show All 13 Lines
	entry:			entry:
	%a.val = load <2 x half>, <2 x half> addrspace(1)* %a			%a.val = load <2 x half>, <2 x half> addrspace(1)* %a
	%b.val = load <2 x half>, <2 x half> addrspace(1)* %b			%b.val = load <2 x half>, <2 x half> addrspace(1)* %b
	%c.val = load half, half addrspace(1)* %c			%c.val = load half, half addrspace(1)* %c
	%r.val = call half @llvm.amdgcn.fdot2.f16.f16(<2 x half> %a.val, <2 x half> %b.val, half %c.val)			%r.val = call half @llvm.amdgcn.fdot2.f16.f16(<2 x half> %a.val, <2 x half> %b.val, half %c.val)
	store half %r.val, half addrspace(1)* %r			store half %r.val, half addrspace(1)* %r
	ret void			ret void
	}			}

				define amdgpu_kernel void @test_llvm_amdgcn_fdot2_f16_f16_dpp(
				; SDAG-GFX11-LABEL: test_llvm_amdgcn_fdot2_f16_f16_dpp:
				; SDAG-GFX11: ; %bb.0: ; %entry
				; SDAG-GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
				; SDAG-GFX11-NEXT: s_waitcnt lgkmcnt(0)
				; SDAG-GFX11-NEXT: scratch_load_b32 v0, off, s2
				; SDAG-GFX11-NEXT: scratch_load_u16 v1, off, s3
				; SDAG-GFX11-NEXT: scratch_load_b32 v2, off, s1
				; SDAG-GFX11-NEXT: s_waitcnt vmcnt(0)
				; SDAG-GFX11-NEXT: v_dot2_f16_f16_e64_dpp v0, v2, v0, v1 quad_perm:[1,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:1
				; SDAG-GFX11-NEXT: scratch_store_b16 off, v0, s0
				; SDAG-GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
				; SDAG-GFX11-NEXT: s_endpgm
				;
				; GISEL-GFX11-LABEL: test_llvm_amdgcn_fdot2_f16_f16_dpp:
				; GISEL-GFX11: ; %bb.0: ; %entry
				; GISEL-GFX11-NEXT: s_load_b128 s[0:3], s[0:1], 0x24
				; GISEL-GFX11-NEXT: s_waitcnt lgkmcnt(0)
				; GISEL-GFX11-NEXT: scratch_load_b32 v0, off, s1
				; GISEL-GFX11-NEXT: scratch_load_b32 v1, off, s2
				; GISEL-GFX11-NEXT: scratch_load_u16 v2, off, s3
				; GISEL-GFX11-NEXT: s_waitcnt vmcnt(0)
				; GISEL-GFX11-NEXT: v_dot2_f16_f16_e64_dpp v0, v0, v1, v2 quad_perm:[1,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:1
				; GISEL-GFX11-NEXT: scratch_store_b16 off, v0, s0
				; GISEL-GFX11-NEXT: s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
				; GISEL-GFX11-NEXT: s_endpgm
				half addrspace(5)* %r,
				<2 x half> addrspace(5)* %a,
				<2 x half> addrspace(5)* %b,
				half addrspace(5)* %c) {
				entry:
				%a.val = load <2 x half>, <2 x half> addrspace(5)* %a
				%b.val = load <2 x half>, <2 x half> addrspace(5)* %b
				%c.val = load half, half addrspace(5)* %c
				%a.val.i32 = bitcast <2 x half> %a.val to i32
				%dpp = call i32 @llvm.amdgcn.update.dpp.i32(i32 %a.val.i32, i32 %a.val.i32, i32 1, i32 15, i32 15, i1 1)
				%a.val.dpp.v2half = bitcast i32 %dpp to <2 x half>
				%r.val = call half @llvm.amdgcn.fdot2.f16.f16(<2 x half> %a.val.dpp.v2half, <2 x half> %b.val, half %c.val)
				store half %r.val, half addrspace(5)* %r
				ret void
				}

				declare i32 @llvm.amdgcn.update.dpp.i32(i32, i32, i32, i32, i32, i1)

llvm/test/MC/AMDGPU/gfx11_asm_dpp16.s

	// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s \| FileCheck --check-prefixes=GFX11 %s			// RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s \| FileCheck --check-prefixes=GFX11 %s
				// RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s 2>&1 \| FileCheck --check-prefixes=GFX11-ERR --implicit-check-not=error %s
				Joe_NashUnsubmitted Not Done Reply Inline Actions Can you add --implicit-check-not=error to the runlines? Joe_Nash: Can you add --implicit-check-not=error to the runlines?

	v_mov_b32 v5, v1 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0			v_mov_b32 v5, v1 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0
	// GFX11: encoding: [0xfa,0x02,0x0a,0x7e,0x01,0x1b,0x00,0x00]			// GFX11: encoding: [0xfa,0x02,0x0a,0x7e,0x01,0x1b,0x00,0x00]

	v_mov_b32 v5, v1 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0			v_mov_b32 v5, v1 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0
	// GFX11: encoding: [0xfa,0x02,0x0a,0x7e,0x01,0xe4,0x00,0x00]			// GFX11: encoding: [0xfa,0x02,0x0a,0x7e,0x01,0xe4,0x00,0x00]

	v_mov_b32 v5, v1 row_mirror row_mask:0x0 bank_mask:0x0			v_mov_b32 v5, v1 row_mirror row_mask:0x0 bank_mask:0x0
	▲ Show 20 Lines • Show All 617 Lines • ▼ Show 20 Lines
	v_movrels_b32 v1, v0 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0 fi:1			v_movrels_b32 v1, v0 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0 fi:1
	// GFX11: encoding: [0xfa,0x86,0x02,0x7e,0x00,0x1b,0x04,0x00]			// GFX11: encoding: [0xfa,0x86,0x02,0x7e,0x00,0x1b,0x04,0x00]

	v_movrelsd_2_b32 v0, v2 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0			v_movrelsd_2_b32 v0, v2 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0
	// GFX11: encoding: [0xfa,0x90,0x00,0x7e,0x02,0x1b,0x00,0x00]			// GFX11: encoding: [0xfa,0x90,0x00,0x7e,0x02,0x1b,0x00,0x00]

	v_movrelsd_b32 v0, v255 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0			v_movrelsd_b32 v0, v255 quad_perm:[3,2,1,0] row_mask:0x0 bank_mask:0x0
	// GFX11: encoding: [0xfa,0x88,0x00,0x7e,0xff,0x1b,0x00,0x00]			// GFX11: encoding: [0xfa,0x88,0x00,0x7e,0xff,0x1b,0x00,0x00]

				v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11: encoding: [0x00,0x00,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]

				v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[1,1,0,0] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11-ERR: :[[@LINE-1]]:{{[0-9]+}}: error: invalid op_sel operand

				v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11: encoding: [0x00,0x60,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]

				v_dot2_f16_f16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11: encoding: [0x00,0x65,0x66,0xd6,0xfa,0x04,0x0e,0xc4,0x01,0xe4,0x04,0x00]

				v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11: encoding: [0x00,0x00,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]

				v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[1,1,0,0] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11-ERR: :[[@LINE-1]]:{{[0-9]+}}: error: invalid op_sel operand

				v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11: encoding: [0x00,0x60,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]

				v_dot2_bf16_bf16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1
				// GFX11: encoding: [0x00,0x65,0x67,0xd6,0xfa,0x04,0x0e,0xc4,0x01,0xe4,0x04,0x00]

llvm/test/MC/AMDGPU/gfx11_asm_dpp8.s

	// RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s \| FileCheck --check-prefixes=GFX11 %s			// RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s \| FileCheck --check-prefixes=GFX11 %s
				// RUN: not llvm-mc -arch=amdgcn -mcpu=gfx1100 -show-encoding %s 2>&1 \| FileCheck --check-prefixes=GFX11-ERR --implicit-check-not=error %s
				Joe_NashUnsubmitted Not Done Reply Inline Actions Can you add --implicit-check-not=error to the runlines? Joe_Nash: Can you add --implicit-check-not=error to the runlines?

	v_mov_b32 v5, v1 dpp8:[0,1,2,3,4,5,6,7]			v_mov_b32 v5, v1 dpp8:[0,1,2,3,4,5,6,7]
	// GFX11: encoding: [0xe9,0x02,0x0a,0x7e,0x01,0x88,0xc6,0xfa]			// GFX11: encoding: [0xe9,0x02,0x0a,0x7e,0x01,0x88,0xc6,0xfa]

	v_cvt_f32_i32 v5, v1 dpp8:[0,1,2,3,4,5,6,7]			v_cvt_f32_i32 v5, v1 dpp8:[0,1,2,3,4,5,6,7]
	// GFX11: encoding: [0xe9,0x0a,0x0a,0x7e,0x01,0x88,0xc6,0xfa]			// GFX11: encoding: [0xe9,0x0a,0x0a,0x7e,0x01,0x88,0xc6,0xfa]

	v_cvt_f32_u32 v5, v1 dpp8:[0,1,2,3,4,5,6,7]			v_cvt_f32_u32 v5, v1 dpp8:[0,1,2,3,4,5,6,7]
	▲ Show 20 Lines • Show All 504 Lines • ▼ Show 20 Lines
	// GFX11: encoding: [0xea,0x04,0x0a,0x4c,0x01,0x77,0x39,0x05]			// GFX11: encoding: [0xea,0x04,0x0a,0x4c,0x01,0x77,0x39,0x05]

	v_subrev_nc_u32 v5, v1, v255 dpp8:[7,6,5,4,3,2,1,0]			v_subrev_nc_u32 v5, v1, v255 dpp8:[7,6,5,4,3,2,1,0]
	// GFX11: encoding: [0xe9,0xfe,0x0b,0x4e,0x01,0x77,0x39,0x05]			// GFX11: encoding: [0xe9,0xfe,0x0b,0x4e,0x01,0x77,0x39,0x05]

	v_subrev_nc_u32 v5, v1, v2 dpp8:[7,6,5,4,3,2,1,0] fi:1			v_subrev_nc_u32 v5, v1, v2 dpp8:[7,6,5,4,3,2,1,0] fi:1
	// GFX11: encoding: [0xea,0x04,0x0a,0x4e,0x01,0x77,0x39,0x05]			// GFX11: encoding: [0xea,0x04,0x0a,0x4e,0x01,0x77,0x39,0x05]

				v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 dpp8:[0,1,2,3,4,4,4,4]
				// GFX11: encoding: [0x00,0x00,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]

				v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[1,1,0,0] dpp8:[0,1,2,3,4,4,4,4]
				// GFX11-ERR: :[[@LINE-1]]:{{[0-9]+}}: error: invalid op_sel operand

				v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4]
				// GFX11: encoding: [0x00,0x60,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]

				v_dot2_f16_f16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4]
				// GFX11: encoding: [0x00,0x65,0x66,0xd6,0xe9,0x04,0x0e,0xc4,0x01,0x88,0x46,0x92]

				v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 dpp8:[0,1,2,3,4,4,4,4]
				// GFX11: encoding: [0x00,0x00,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]

				v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[1,1,0,0] dpp8:[0,1,2,3,4,4,4,4]
				// GFX11-ERR: :[[@LINE-1]]:{{[0-9]+}}: error: invalid op_sel operand

				v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4]
				// GFX11: encoding: [0x00,0x60,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]

				v_dot2_bf16_bf16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4]
				// GFX11: encoding: [0x00,0x65,0x67,0xd6,0xe9,0x04,0x0e,0xc4,0x01,0x88,0x46,0x92]

llvm/test/MC/AMDGPU/gfx11_vop123.s

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,265 Lines • ▼ Show 20 Lines
	// GFX11: encoding: [0x05,0x00,0x38,0xd6,0x01,0x05,0x0e,0x14]			// GFX11: encoding: [0x05,0x00,0x38,0xd6,0x01,0x05,0x0e,0x14]

	v_div_fmas_f64 v[5:6], v[1:2], v[2:3], v[3:4] div:2			v_div_fmas_f64 v[5:6], v[1:2], v[2:3], v[3:4] div:2
	// GFX11: encoding: [0x05,0x00,0x38,0xd6,0x01,0x05,0x0e,0x1c]			// GFX11: encoding: [0x05,0x00,0x38,0xd6,0x01,0x05,0x0e,0x1c]

	v_dot2_f16_f16 v0, v1, v2, v3			v_dot2_f16_f16 v0, v1, v2, v3
	// GFX11: encoding: [0x00,0x00,0x66,0xd6,0x01,0x05,0x0e,0x04]			// GFX11: encoding: [0x00,0x00,0x66,0xd6,0x01,0x05,0x0e,0x04]

				v_dot2_f16_f16 v0, v1, v2, v3 op_sel:[1,1,0,0]
				// W32-ERR: error: invalid op_sel operand
				// W64-ERR: error: invalid op_sel operand

	v_dot2_f16_f16 v0, v1, v2, v3 op_sel:[0,0,1,1]			v_dot2_f16_f16 v0, v1, v2, v3 op_sel:[0,0,1,1]
	// GFX11: encoding: [0x00,0x60,0x66,0xd6,0x01,0x05,0x0e,0x04]			// GFX11: encoding: [0x00,0x60,0x66,0xd6,0x01,0x05,0x0e,0x04]

				v_dot2_f16_f16 v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1]
				// GFX11: encoding: [0x00,0x65,0x66,0xd6,0x01,0x05,0x0e,0xc4]

	v_dot2_bf16_bf16 v0, v1, v2, v3			v_dot2_bf16_bf16 v0, v1, v2, v3
	// GFX11: encoding: [0x00,0x00,0x67,0xd6,0x01,0x05,0x0e,0x04]			// GFX11: encoding: [0x00,0x00,0x67,0xd6,0x01,0x05,0x0e,0x04]

				v_dot2_bf16_bf16 v0, v1, v2, v3 op_sel:[1,1,0,0]
				// W32-ERR: error: invalid op_sel operand
				// W64-ERR: error: invalid op_sel operand

	v_dot2_bf16_bf16 v0, v1, v2, v3 op_sel:[0,0,1,1]			v_dot2_bf16_bf16 v0, v1, v2, v3 op_sel:[0,0,1,1]
	// GFX11: encoding: [0x00,0x60,0x67,0xd6,0x01,0x05,0x0e,0x04]			// GFX11: encoding: [0x00,0x60,0x67,0xd6,0x01,0x05,0x0e,0x04]

				v_dot2_bf16_bf16 v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1]
				// GFX11: encoding: [0x00,0x65,0x67,0xd6,0x01,0x05,0x0e,0xc4]

	v_dot2c_f32_f16_e32 v5, v1, v2			v_dot2c_f32_f16_e32 v5, v1, v2
	// GFX11: encoding: [0x01,0x05,0x0a,0x04]			// GFX11: encoding: [0x01,0x05,0x0a,0x04]

	v_dot2acc_f32_f16_e32 v5, v1, v2			v_dot2acc_f32_f16_e32 v5, v1, v2
	// GFX11: encoding: [0x01,0x05,0x0a,0x04]			// GFX11: encoding: [0x01,0x05,0x0a,0x04]

	v_exp_f32 v5, v1			v_exp_f32 v5, v1
	// GFX11: encoding: [0x01,0x4b,0x0a,0x7e]			// GFX11: encoding: [0x01,0x4b,0x0a,0x7e]
	▲ Show 20 Lines • Show All 10,182 Lines • Show Last 20 Lines

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_all.txt

	Show First 20 Lines • Show All 14,532 Lines • ▼ Show 20 Lines
	0x05,0x00,0x38,0xd6,0x01,0xfd,0x0f,0x04			0x05,0x00,0x38,0xd6,0x01,0xfd,0x0f,0x04

	# GFX11: v_div_fmas_f64 v[5:6], v[254:255], v[2:3], v[3:4] ; encoding: [0x05,0x00,0x38,0xd6,0xfe,0x05,0x0e,0x04]			# GFX11: v_div_fmas_f64 v[5:6], v[254:255], v[2:3], v[3:4] ; encoding: [0x05,0x00,0x38,0xd6,0xfe,0x05,0x0e,0x04]
	0x05,0x00,0x38,0xd6,0xfe,0x05,0x0e,0x04			0x05,0x00,0x38,0xd6,0xfe,0x05,0x0e,0x04

	# GFX11: v_dot2_f16_f16 v0, v1, v2, v3 ; encoding: [0x00,0x00,0x66,0xd6,0x01,0x05,0x0e,0x04]			# GFX11: v_dot2_f16_f16 v0, v1, v2, v3 ; encoding: [0x00,0x00,0x66,0xd6,0x01,0x05,0x0e,0x04]
	0x00,0x00,0x66,0xd6,0x01,0x05,0x0e,0x04			0x00,0x00,0x66,0xd6,0x01,0x05,0x0e,0x04

				# op_sel[1:0] are ignored
				# GFX11: v_dot2_f16_f16 v0, v1, v2, v3 op_sel:[0,0,1,1] ; encoding: [0x00,0x60,0x66,0xd6,0x01,0x05,0x0e,0x04]
				0x00,0x78,0x66,0xd6,0x01,0x05,0x0e,0x04

	# GFX11: v_dot2_f16_f16 v0, v1, v2, v3 op_sel:[0,0,1,1] ; encoding: [0x00,0x60,0x66,0xd6,0x01,0x05,0x0e,0x04]			# GFX11: v_dot2_f16_f16 v0, v1, v2, v3 op_sel:[0,0,1,1] ; encoding: [0x00,0x60,0x66,0xd6,0x01,0x05,0x0e,0x04]
	0x00,0x60,0x66,0xd6,0x01,0x05,0x0e,0x04			0x00,0x60,0x66,0xd6,0x01,0x05,0x0e,0x04

				# GFX11: v_dot2_f16_f16 v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] ; encoding: [0x00,0x65,0x66,0xd6,0x01,0x05,0x0e,0xc4]
				0x00,0x65,0x66,0xd6,0x01,0x05,0x0e,0xc4

	# GFX11: v_dot2_bf16_bf16 v0, v1, v2, v3 ; encoding: [0x00,0x00,0x67,0xd6,0x01,0x05,0x0e,0x04]			# GFX11: v_dot2_bf16_bf16 v0, v1, v2, v3 ; encoding: [0x00,0x00,0x67,0xd6,0x01,0x05,0x0e,0x04]
	0x00,0x00,0x67,0xd6,0x01,0x05,0x0e,0x04			0x00,0x00,0x67,0xd6,0x01,0x05,0x0e,0x04

				# op_sel[1:0] are ignored
				# GFX11: v_dot2_bf16_bf16 v0, v1, v2, v3 op_sel:[0,0,1,1] ; encoding: [0x00,0x60,0x67,0xd6,0x01,0x05,0x0e,0x04]
				0x00,0x78,0x67,0xd6,0x01,0x05,0x0e,0x04

	# GFX11: v_dot2_bf16_bf16 v0, v1, v2, v3 op_sel:[0,0,1,1] ; encoding: [0x00,0x60,0x67,0xd6,0x01,0x05,0x0e,0x04]			# GFX11: v_dot2_bf16_bf16 v0, v1, v2, v3 op_sel:[0,0,1,1] ; encoding: [0x00,0x60,0x67,0xd6,0x01,0x05,0x0e,0x04]
	0x00,0x60,0x67,0xd6,0x01,0x05,0x0e,0x04			0x00,0x60,0x67,0xd6,0x01,0x05,0x0e,0x04

				# GFX11: v_dot2_bf16_bf16 v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] ; encoding: [0x00,0x65,0x67,0xd6,0x01,0x05,0x0e,0xc4]
				0x00,0x65,0x67,0xd6,0x01,0x05,0x0e,0xc4

	# GFX11: v_dot2acc_f32_f16 v5, v1, v2 ; encoding: [0x01,0x05,0x0a,0x04]			# GFX11: v_dot2acc_f32_f16 v5, v1, v2 ; encoding: [0x01,0x05,0x0a,0x04]
	0x01,0x05,0x0a,0x04			0x01,0x05,0x0a,0x04

	# GFX11: v_dot4_i32_iu8 v3, v4, v5, v6 ; encoding: [0x03,0x40,0x16,0xcc,0x04,0x0b,0x1a,0x1c]			# GFX11: v_dot4_i32_iu8 v3, v4, v5, v6 ; encoding: [0x03,0x40,0x16,0xcc,0x04,0x0b,0x1a,0x1c]
	0x03,0x40,0x16,0xcc,0x04,0x0b,0x1a,0x1c			0x03,0x40,0x16,0xcc,0x04,0x0b,0x1a,0x1c

	# GFX11: v_dot4_i32_iu8 v3, v4, v5, 15 neg_lo:[1,1,0] ; encoding: [0x03,0x40,0x16,0xcc,0x04,0x0b,0x3e,0x7a]			# GFX11: v_dot4_i32_iu8 v3, v4, v5, 15 neg_lo:[1,1,0] ; encoding: [0x03,0x40,0x16,0xcc,0x04,0x0b,0x3e,0x7a]
	0x03,0x40,0x16,0xcc,0x04,0x0b,0x3e,0x7a			0x03,0x40,0x16,0xcc,0x04,0x0b,0x3e,0x7a
	▲ Show 20 Lines • Show All 33,627 Lines • Show Last 20 Lines

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp16.txt

	Show First 20 Lines • Show All 14,092 Lines • ▼ Show 20 Lines
	# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[1,0,0] row_share:15 row_mask:0x0 bank_mask:0x1 ; encoding: [0x05,0x08,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x5f,0x01,0x01]			# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[1,0,0] row_share:15 row_mask:0x0 bank_mask:0x1 ; encoding: [0x05,0x08,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x5f,0x01,0x01]
	0x05,0x08,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x5f,0x01,0x01			0x05,0x08,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x5f,0x01,0x01

	# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[0,1,0] row_xmask:0 row_mask:0x1 bank_mask:0x3 ; encoding: [0x05,0x10,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x60,0x01,0x13]			# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[0,1,0] row_xmask:0 row_mask:0x1 bank_mask:0x3 ; encoding: [0x05,0x10,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x60,0x01,0x13]
	0x05,0x10,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x60,0x01,0x13			0x05,0x10,0x04,0xd7,0xfa,0x04,0x02,0x00,0x01,0x60,0x01,0x13

	# GFX11: v_sub_nc_u16_e64_dpp v255, v255, v255 op_sel:[0,0,1] clamp row_xmask:15 row_mask:0x3 bank_mask:0x0 bound_ctrl:1 fi:1 ; encoding: [0xff,0xc0,0x04,0xd7,0xfa,0xfe,0x03,0x00,0xff,0x6f,0x0d,0x30]			# GFX11: v_sub_nc_u16_e64_dpp v255, v255, v255 op_sel:[0,0,1] clamp row_xmask:15 row_mask:0x3 bank_mask:0x0 bound_ctrl:1 fi:1 ; encoding: [0xff,0xc0,0x04,0xd7,0xfa,0xfe,0x03,0x00,0xff,0x6f,0x0d,0x30]
	0xff,0xc0,0x04,0xd7,0xfa,0xfe,0x03,0x00,0xff,0x6f,0x0d,0x30			0xff,0xc0,0x04,0xd7,0xfa,0xfe,0x03,0x00,0xff,0x6f,0x0d,0x30

				# GFX11: v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x00,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]
				0x00,0x00,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00

				# op_sel[1:0] are ignored
				# GFX11: v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x60,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]
				0x00,0x78,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00

				# GFX11: v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x60,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]
				0x00,0x60,0x66,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00

				# GFX11: v_dot2_f16_f16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x65,0x66,0xd6,0xfa,0x04,0x0e,0xc4,0x01,0xe4,0x04,0x00]
				0x00,0x65,0x66,0xd6,0xfa,0x04,0x0e,0xc4,0x01,0xe4,0x04,0x00

				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x00,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]
				0x00,0x00,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00

				# op_sel[1:0] are ignored
				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x60,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]
				0x00,0x78,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00

				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x60,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00]
				0x00,0x60,0x67,0xd6,0xfa,0x04,0x0e,0x04,0x01,0xe4,0x04,0x00

				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] quad_perm:[0,1,2,3] row_mask:0x0 bank_mask:0x0 fi:1 ; encoding: [0x00,0x65,0x67,0xd6,0xfa,0x04,0x0e,0xc4,0x01,0xe4,0x04,0x00]
				0x00,0x65,0x67,0xd6,0xfa,0x04,0x0e,0xc4,0x01,0xe4,0x04,0x00

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp8.txt

	Show First 20 Lines • Show All 5,260 Lines • ▼ Show 20 Lines
	# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[1,0,0] dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0x05,0x08,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05]			# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[1,0,0] dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0x05,0x08,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05]
	0x05,0x08,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05			0x05,0x08,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05

	# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[0,1,0] dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0x05,0x10,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05]			# GFX11: v_sub_nc_u16_e64_dpp v5, v1, v2 op_sel:[0,1,0] dpp8:[7,6,5,4,3,2,1,0] ; encoding: [0x05,0x10,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05]
	0x05,0x10,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05			0x05,0x10,0x04,0xd7,0xe9,0x04,0x02,0x00,0x01,0x77,0x39,0x05

	# GFX11: v_sub_nc_u16_e64_dpp v255, v255, v255 op_sel:[0,0,1] clamp dpp8:[0,0,0,0,0,0,0,0] fi:1 ; encoding: [0xff,0xc0,0x04,0xd7,0xea,0xfe,0x03,0x00,0xff,0x00,0x00,0x00]			# GFX11: v_sub_nc_u16_e64_dpp v255, v255, v255 op_sel:[0,0,1] clamp dpp8:[0,0,0,0,0,0,0,0] fi:1 ; encoding: [0xff,0xc0,0x04,0xd7,0xea,0xfe,0x03,0x00,0xff,0x00,0x00,0x00]
	0xff,0xc0,0x04,0xd7,0xea,0xfe,0x03,0x00,0xff,0x00,0x00,0x00			0xff,0xc0,0x04,0xd7,0xea,0xfe,0x03,0x00,0xff,0x00,0x00,0x00

				# GFX11: v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x00,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]
				0x00,0x00,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92

				# op_sel[1:0] are ignored
				# GFX11: v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x60,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]
				Petar.AvramovicAuthorUnsubmitted Done Reply Inline Actions Like this? Ignored bits set to zero (0x18), check diff with previous in History tab there are a few more. Petar.Avramovic: Like this? Ignored bits set to zero (0x18), check diff with previous in History tab there are a…
				0x00,0x78,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92

				# GFX11: v_dot2_f16_f16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x60,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]
				0x00,0x60,0x66,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92

				# GFX11: v_dot2_f16_f16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x65,0x66,0xd6,0xe9,0x04,0x0e,0xc4,0x01,0x88,0x46,0x92]
				0x00,0x65,0x66,0xd6,0xe9,0x04,0x0e,0xc4,0x01,0x88,0x46,0x92

				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x00,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]
				0x00,0x00,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92

				# op_sel[1:0] are ignored
				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x60,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]
				Joe_NashUnsubmitted Not Done Reply Inline Actions Ok, I was confused because the AsmPrinter prints what the hardware will do with that instruction, not what the bits actually disassemble to. This matches sp3 and there are plenty of warnings how those bits are treated. I think this is good. Joe_Nash: Ok, I was confused because the AsmPrinter prints what the hardware will do with that…
				0x00,0x78,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92

				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, v1, v2, v3 op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x60,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92]
				0x00,0x60,0x67,0xd6,0xe9,0x04,0x0e,0x04,0x01,0x88,0x46,0x92

				# GFX11: v_dot2_bf16_bf16_e64_dpp v0, \|v1\|, -v2, -\|v3\| op_sel:[0,0,1,1] dpp8:[0,1,2,3,4,4,4,4] ; encoding: [0x00,0x65,0x67,0xd6,0xe9,0x04,0x0e,0xc4,0x01,0x88,0x46,0x92]
				0x00,0x65,0x67,0xd6,0xe9,0x04,0x0e,0xc4,0x01,0x88,0x46,0x92

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 Fix VOP3 dot instructionsClosedPublic

Details

Diff Detail

Event Timeline

RUN: llc -march=amdgcn -mcpu=gfx1100 -verify-machineinstrs -o - %s | FileCheck %s -check-prefix=GCN

Revision Contents

Diff 446758

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

llvm/lib/Target/AMDGPU/VOP3Instructions.td

llvm/lib/Target/AMDGPU/VOPInstructions.td

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.bf16.bf16.ll

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.fdot2.f16.f16.ll

llvm/test/MC/AMDGPU/gfx11_asm_dpp16.s

llvm/test/MC/AMDGPU/gfx11_asm_dpp8.s

llvm/test/MC/AMDGPU/gfx11_vop123.s

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_all.txt

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp16.txt

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3_dpp8.txt

[AMDGPU] gfx11 Fix VOP3 dot instructions
ClosedPublic