This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type.
ClosedPublic

Authored by Joe_Nash on Sep 29 2022, 11:16 AM.

Download Raw Diff

Details

Reviewers

dp
foad
arsenm
rampitec

Commits

rG203d0b0ee136: [AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type.

Summary

For V_CMP_CLASS_F16_t16_e64 and V_CMPX_CLASS_F16_t16_e64,
https://reviews.llvm.org/D133723 changed the value type of src1 from i32 to i16.
These src1 operands are 16 bits, therefore need to be encoded as true16
operands. So the _e32 type was correctly set to VGPR_32_Lo128.
In _e64 form the operand class went from
VSrc_b32 to VSrc_b16. For some reason, we cannot encode inline literals for
VSrc_b16, see 5f5f566b265db00f577ead268400d99f34ba9cdd. In this phase of
the true16 implementation, VSrc_b16 and VSrc_b32 are still similar,
except from that quirk of inlines. So set the operand class to regain
that function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Joe_Nash created this revision.Sep 29 2022, 11:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 11:16 AM

Herald added subscribers: kosarev, kerbowa, hiraditya and 6 others. · View Herald Transcript

Joe_Nash requested review of this revision.Sep 29 2022, 11:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 29 2022, 11:16 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

The semantics of this instruction are strange, and we should probably improve on the inline literal support for 16 bit operands, so if you have any thoughts on this patch or future improvements please let me know.

A valid class mask is only 10 bits anyway so it's not like it matters

This revision is now accepted and ready to land.Sep 29 2022, 11:31 AM

For some reason, we cannot encode inline literals for VSrc_b16.

This is true for inline floating-point constants only. Integer inline constants may be used without limitations, so I do not see any problems using VSrc_b16 for src1.

In D134897#3824780, @dp wrote:

For some reason, we cannot encode inline literals for VSrc_b16.

This is true for inline floating-point constants only. Integer inline constants may be used without limitations, so I do not see any problems using VSrc_b16 for src1.

Before D133723, inline floating-point constants were allowed. Arguably they should be allowed ( https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf "Note that the S1 has a format of f16 since floating point literal
constants are interpreted as 16 bit value for this opcode"), though I don't know if they would be emitted by codegen.

Harbormaster completed remote builds in B189469: Diff 463973.Sep 29 2022, 12:25 PM

In D134897#3824823, @Joe_Nash wrote:

In D134897#3824780, @dp wrote:

For some reason, we cannot encode inline literals for VSrc_b16.

This is true for inline floating-point constants only. Integer inline constants may be used without limitations, so I do not see any problems using VSrc_b16 for src1.

Before D133723, inline floating-point constants were allowed. Arguably they should be allowed ( https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf "Note that the S1 has a format of f16 since floating point literal
constants are interpreted as 16 bit value for this opcode"), though I don't know if they would be emitted by codegen.

Thanks, I see. I forgot that there are special opcodes with 16 bit integer operands which should be handled differently. I'll take a closer look at the issue.

5f5f566b265d is really a hardware bug, so it's possible it's been fixed on newer hardware (don't think it was ever properly communicated as one to fix though)

In D134897#3824888, @arsenm wrote:

5f5f566b265d is really a hardware bug, so it's possible it's been fixed on newer hardware (don't think it was ever properly communicated as one to fix though)

Last time I read SP3 documentation for GFX11, this issue was presented as a feature, not a bug. There was also a special flag to label opcodes which correctly handled fp inline constants for 16 bit integer operands.

LGTM. Created a separate issue we might want to tackle in the future: https://github.com/llvm/llvm-project/issues/58167.

In D134897#3836885, @dp wrote:

LGTM. Created a separate issue we might want to tackle in the future: https://github.com/llvm/llvm-project/issues/58167.

That makes sense, thanks.

Closed by commit rG203d0b0ee136: [AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type. (authored by Joe_Nash). · Explain WhyOct 5 2022, 8:15 AM

This revision was automatically updated to reflect the committed changes.

Joe_Nash added a commit: rG203d0b0ee136: [AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type..

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

VOPCInstructions.td

2 lines

test/

MC/

AMDGPU/

gfx11_asm_vop3c.s

4 lines

gfx11_asm_vop3cx.s

3 lines

Disassembler/

AMDGPU/

gfx11_dasm_vop3c.txt

4 lines

gfx11_dasm_vop3cx.txt

3 lines

Diff 465407

llvm/lib/Target/AMDGPU/VOPCInstructions.td

Show First 20 Lines • Show All 786 Lines • ▼ Show 20 Lines	class VOPC_Class_Profile<list<SchedReadWrite> sched, ValueType src0VT, ValueType src1VT = i32> :
let HasOMod = 0;		let HasOMod = 0;
}		}

multiclass VOPC_Class_Profile_t16<list<SchedReadWrite> sched> {		multiclass VOPC_Class_Profile_t16<list<SchedReadWrite> sched> {
def NAME : VOPC_Class_Profile<sched, f16>;		def NAME : VOPC_Class_Profile<sched, f16>;
def _t16 : VOPC_Class_Profile<sched, f16, i16> {		def _t16 : VOPC_Class_Profile<sched, f16, i16> {
let IsTrue16 = 1;		let IsTrue16 = 1;
let Src1RC32 = RegisterOperand<getVregSrcForVT_t16<Src1VT>.ret>;		let Src1RC32 = RegisterOperand<getVregSrcForVT_t16<Src1VT>.ret>;
		let Src1RC64 = VSrc_b32;
let Src0DPP = getVregSrcForVT_t16<Src0VT>.ret;		let Src0DPP = getVregSrcForVT_t16<Src0VT>.ret;
let Src1DPP = getVregSrcForVT_t16<Src1VT>.ret;		let Src1DPP = getVregSrcForVT_t16<Src1VT>.ret;
let Src2DPP = getVregSrcForVT_t16<Src2VT>.ret;		let Src2DPP = getVregSrcForVT_t16<Src2VT>.ret;
let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;		let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;
let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;		let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;
let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;		let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;
}		}
}		}
Show All 11 Lines	class VOPC_Class_NoSdst_Profile<list<SchedReadWrite> sched, ValueType src0VT, ValueType src1VT = i32> :
let EmitDst = 0;		let EmitDst = 0;
}		}

multiclass VOPC_Class_NoSdst_Profile_t16<list<SchedReadWrite> sched> {		multiclass VOPC_Class_NoSdst_Profile_t16<list<SchedReadWrite> sched> {
def NAME : VOPC_Class_NoSdst_Profile<sched, f16>;		def NAME : VOPC_Class_NoSdst_Profile<sched, f16>;
def _t16 : VOPC_Class_NoSdst_Profile<sched, f16, i16> {		def _t16 : VOPC_Class_NoSdst_Profile<sched, f16, i16> {
let IsTrue16 = 1;		let IsTrue16 = 1;
let Src1RC32 = RegisterOperand<getVregSrcForVT_t16<Src1VT>.ret>;		let Src1RC32 = RegisterOperand<getVregSrcForVT_t16<Src1VT>.ret>;
		let Src1RC64 = VSrc_b32;
let Src0DPP = getVregSrcForVT_t16<Src0VT>.ret;		let Src0DPP = getVregSrcForVT_t16<Src0VT>.ret;
let Src1DPP = getVregSrcForVT_t16<Src1VT>.ret;		let Src1DPP = getVregSrcForVT_t16<Src1VT>.ret;
let Src2DPP = getVregSrcForVT_t16<Src2VT>.ret;		let Src2DPP = getVregSrcForVT_t16<Src2VT>.ret;
let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;		let Src0ModDPP = getSrcModDPP_t16<Src0VT>.ret;
let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;		let Src1ModDPP = getSrcModDPP_t16<Src1VT>.ret;
let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;		let Src2ModDPP = getSrcModDPP_t16<Src2VT>.ret;
}		}
}		}
▲ Show 20 Lines • Show All 1,547 Lines • Show Last 20 Lines

llvm/test/MC/AMDGPU/gfx11_asm_vop3c.s

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	v_cmp_class_f16_e64 vcc_hi, 0.5, m0			v_cmp_class_f16_e64 vcc_hi, 0.5, m0
	// W32: encoding: [0x6b,0x00,0x7d,0xd4,0xf0,0xfa,0x00,0x00]			// W32: encoding: [0x6b,0x00,0x7d,0xd4,0xf0,0xfa,0x00,0x00]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction

	v_cmp_class_f16_e64 ttmp15, src_scc, vcc_lo			v_cmp_class_f16_e64 ttmp15, src_scc, vcc_lo
	// W32: encoding: [0x7b,0x00,0x7d,0xd4,0xfd,0xd4,0x00,0x00]			// W32: encoding: [0x7b,0x00,0x7d,0xd4,0xfd,0xd4,0x00,0x00]
	// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W64-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction

				v_cmp_class_f16_e64 s[10:11], v1, 0.5
				// W64: encoding: [0x0a,0x00,0x7d,0xd4,0x01,0xe1,0x01,0x00]
				// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction

	v_cmp_class_f16_e64 s[10:11], v1, v2			v_cmp_class_f16_e64 s[10:11], v1, v2
	// W64: encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00]			// W64: encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction

	v_cmp_class_f16_e64 s[10:11], v255, v2			v_cmp_class_f16_e64 s[10:11], v255, v2
	// W64: encoding: [0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00]			// W64: encoding: [0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00]
	// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction			// W32-ERR: :[[@LINE-2]]:{{[0-9]+}}: error: invalid operand for instruction

	▲ Show 20 Lines • Show All 10,413 Lines • Show Last 20 Lines

llvm/test/MC/AMDGPU/gfx11_asm_vop3cx.s

	Show All 40 Lines
	// GFX11: encoding: [0x7e,0x00,0xfd,0xd4,0xf0,0xfa,0x00,0x00]			// GFX11: encoding: [0x7e,0x00,0xfd,0xd4,0xf0,0xfa,0x00,0x00]

	v_cmpx_class_f16_e64 src_scc, vcc_lo			v_cmpx_class_f16_e64 src_scc, vcc_lo
	// GFX11: encoding: [0x7e,0x00,0xfd,0xd4,0xfd,0xd4,0x00,0x00]			// GFX11: encoding: [0x7e,0x00,0xfd,0xd4,0xfd,0xd4,0x00,0x00]

	v_cmpx_class_f16_e64 -\|0xfe0b\|, vcc_hi			v_cmpx_class_f16_e64 -\|0xfe0b\|, vcc_hi
	// GFX11: encoding: [0x7e,0x01,0xfd,0xd4,0xff,0xd6,0x00,0x20,0x0b,0xfe,0x00,0x00]			// GFX11: encoding: [0x7e,0x01,0xfd,0xd4,0xff,0xd6,0x00,0x20,0x0b,0xfe,0x00,0x00]

				v_cmpx_class_f16_e64 v1, 0.5
				// GFX11: encoding: [0x7e,0x00,0xfd,0xd4,0x01,0xe1,0x01,0x00]

	v_cmpx_class_f32_e64 v1, v2			v_cmpx_class_f32_e64 v1, v2
	// GFX11: encoding: [0x7e,0x00,0xfe,0xd4,0x01,0x05,0x02,0x00]			// GFX11: encoding: [0x7e,0x00,0xfe,0xd4,0x01,0x05,0x02,0x00]

	v_cmpx_class_f32_e64 v255, v255			v_cmpx_class_f32_e64 v255, v255
	// GFX11: encoding: [0x7e,0x00,0xfe,0xd4,0xff,0xff,0x03,0x00]			// GFX11: encoding: [0x7e,0x00,0xfe,0xd4,0xff,0xff,0x03,0x00]

	v_cmpx_class_f32_e64 s1, s2			v_cmpx_class_f32_e64 s1, s2
	// GFX11: encoding: [0x7e,0x00,0xfe,0xd4,0x01,0x04,0x00,0x00]			// GFX11: encoding: [0x7e,0x00,0xfe,0xd4,0x01,0x04,0x00,0x00]
	▲ Show 20 Lines • Show All 4,059 Lines • Show Last 20 Lines

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3c.txt

	# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11,W32 %s			# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11,W32 %s
	# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -mattr=-WavefrontSize32,+WavefrontSize64 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11,W64 %s			# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -mattr=-WavefrontSize32,+WavefrontSize64 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11,W64 %s

	# W32: v_cmp_class_f16_e64 s10, v1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00]			# W32: v_cmp_class_f16_e64 s10, v1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00]
	# W64: v_cmp_class_f16_e64 s[10:11], v1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00]			# W64: v_cmp_class_f16_e64 s[10:11], v1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00]
	0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00			0x0a,0x00,0x7d,0xd4,0x01,0x05,0x02,0x00

				# W32: v_cmp_class_f16_e64 s10, v1, 0.5 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0xe1,0x01,0x00]
				# W64: v_cmp_class_f16_e64 s[10:11], v1, 0.5 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0xe1,0x01,0x00]
				0x0a,0x00,0x7d,0xd4,0x01,0xe1,0x01,0x00

	# W32: v_cmp_class_f16_e64 s10, v255, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00]			# W32: v_cmp_class_f16_e64 s10, v255, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00]
	# W64: v_cmp_class_f16_e64 s[10:11], v255, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00]			# W64: v_cmp_class_f16_e64 s[10:11], v255, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00]
	0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00			0x0a,0x00,0x7d,0xd4,0xff,0x05,0x02,0x00

	# W32: v_cmp_class_f16_e64 s10, s1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x04,0x02,0x00]			# W32: v_cmp_class_f16_e64 s10, s1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x04,0x02,0x00]
	# W64: v_cmp_class_f16_e64 s[10:11], s1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x04,0x02,0x00]			# W64: v_cmp_class_f16_e64 s[10:11], s1, v2 ; encoding: [0x0a,0x00,0x7d,0xd4,0x01,0x04,0x02,0x00]
	0x0a,0x00,0x7d,0xd4,0x01,0x04,0x02,0x00			0x0a,0x00,0x7d,0xd4,0x01,0x04,0x02,0x00

	▲ Show 20 Lines • Show All 5,208 Lines • Show Last 20 Lines

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3cx.txt

	# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11 %s			# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11 %s
	# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -mattr=-WavefrontSize32,+WavefrontSize64 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11 %s			# RUN: llvm-mc -arch=amdgcn -mcpu=gfx1100 -mattr=-WavefrontSize32,+WavefrontSize64 -disassemble -show-encoding < %s \| FileCheck -check-prefixes=GFX11 %s

	# GFX11: v_cmpx_class_f16_e64 v1, v2 ; encoding: [0x7e,0x00,0xfd,0xd4,0x01,0x05,0x02,0x00]			# GFX11: v_cmpx_class_f16_e64 v1, v2 ; encoding: [0x7e,0x00,0xfd,0xd4,0x01,0x05,0x02,0x00]
	0x7e,0x00,0xfd,0xd4,0x01,0x05,0x02,0x00			0x7e,0x00,0xfd,0xd4,0x01,0x05,0x02,0x00

				# GFX11: v_cmpx_class_f16_e64 v1, 0.5 ; encoding: [0x7e,0x00,0xfd,0xd4,0x01,0xe1,0x01,0x00]
				0x7e,0x00,0xfd,0xd4,0x01,0xe1,0x01,0x00

	# GFX11: v_cmpx_class_f16_e64 v255, v2 ; encoding: [0x7e,0x00,0xfd,0xd4,0xff,0x05,0x02,0x00]			# GFX11: v_cmpx_class_f16_e64 v255, v2 ; encoding: [0x7e,0x00,0xfd,0xd4,0xff,0x05,0x02,0x00]
	0x7e,0x00,0xfd,0xd4,0xff,0x05,0x02,0x00			0x7e,0x00,0xfd,0xd4,0xff,0x05,0x02,0x00

	# GFX11: v_cmpx_class_f16_e64 s1, v2 ; encoding: [0x7e,0x00,0xfd,0xd4,0x01,0x04,0x02,0x00]			# GFX11: v_cmpx_class_f16_e64 s1, v2 ; encoding: [0x7e,0x00,0xfd,0xd4,0x01,0x04,0x02,0x00]
	0x7e,0x00,0xfd,0xd4,0x01,0x04,0x02,0x00			0x7e,0x00,0xfd,0xd4,0x01,0x04,0x02,0x00

	# GFX11: v_cmpx_class_f16_e64 s105, v255 ; encoding: [0x7e,0x00,0xfd,0xd4,0x69,0xfe,0x03,0x00]			# GFX11: v_cmpx_class_f16_e64 s105, v255 ; encoding: [0x7e,0x00,0xfd,0xd4,0x69,0xfe,0x03,0x00]
	0x7e,0x00,0xfd,0xd4,0x69,0xfe,0x03,0x00			0x7e,0x00,0xfd,0xd4,0x69,0xfe,0x03,0x00
	▲ Show 20 Lines • Show All 3,975 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 465407

llvm/lib/Target/AMDGPU/VOPCInstructions.td

llvm/test/MC/AMDGPU/gfx11_asm_vop3c.s

llvm/test/MC/AMDGPU/gfx11_asm_vop3cx.s

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3c.txt

llvm/test/MC/Disassembler/AMDGPU/gfx11_dasm_vop3cx.txt

[AMDGPU] Fix V_CMP_CLASS_F16_t16_e64 src1 type.
ClosedPublic