This is an archive of the discontinued LLVM Phabricator instance.

clang/include/clang/Basic/BuiltinsAMDGPU.def
308	xf32 suffix indicates data in TF32 format: 8-bit exponent with FP16’s 10-bit mantissa. Uses 32-bits of storage for 19 bits of data. MFMA instructions got new names in the gfx940 (see D121741). It now includes block size which was implicit before. That is because some of them operate on a 2x-16x blocks. Together adding an 'x' suffix and new size factor made the names ambiguous so the decision was made to separate fields with an underscore. The old names are preserved as aliases for compatibility with the existing programs. The same is true for the builtins as these are used in the existing programs. Going forward a new names will be used.

Added 2 more tests.

One more clang test.

rampitec added a child revision: D122191: [AMDGPU] Support gfx940 smfmac instructions.Mar 21 2022, 3:43 PM

Harbormaster completed remote builds in B155505: Diff 417113.Mar 21 2022, 5:52 PM

xf32 suffix indicates data in TF32 format

Why not tf32 suffix?

Herald added a subscriber: hsmhsm. · View Herald TranscriptMar 24 2022, 3:51 AM

In D122044#3404924, @foad wrote:

xf32 suffix indicates data in TF32 format

Why not tf32 suffix?

The names come from the HW spec, it is not something I can change.

In D122044#3405607, @rampitec wrote:

In D122044#3404924, @foad wrote:

xf32 suffix indicates data in TF32 format

Why not tf32 suffix?

The names come from the HW spec, it is not something I can change.

OK, if the intrinsic name matches the instruction name then that is obviously fine.

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
539	I am still confused why the instruction name in `MAIInst<"v_mfma_i32_32x32x16i8"` has no underscore before `i8`...
745	... but here the same instruction name does have an underscore.

rampitec marked an inline comment as done.Mar 24 2022, 10:00 AM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
745	These are asm aliases. The same aliases with and w/o underscores supported by the SP3. In fact Justin and me decided we want these aliases to make it easier to adopt old kernels.

rampitec marked an inline comment as done.Mar 24 2022, 10:09 AM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
745	Look at the ai-gfx940.s lines 282 and 286 which tests both names are accepted.

foad added inline comments.Mar 24 2022, 10:15 AM

llvm/lib/Target/AMDGPU/VOP3PInstructions.td
745	OK. No further comments from me :)

LGTM, unless @foad has any additional questions

This revision is now accepted and ready to land.Mar 24 2022, 11:50 AM

This revision was landed with ongoing or failed builds.Mar 24 2022, 12:13 PM

Closed by commit rG27439a764230: [AMDGPU] New gfx940 mfma instructions (authored by rampitec). · Explain Why

This revision was automatically updated to reflect the committed changes.

rampitec added a commit: rG27439a764230: [AMDGPU] New gfx940 mfma instructions.

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2022, 12:13 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

jfurtek mentioned this in D151923: [APFloat] Add APFloat semantic support for TF32.Jun 1 2023, 1:27 PM

mehdi_amini mentioned this in rG55c2211a233e: [APFloat] Add APFloat semantic support for TF32.Jun 23 2023, 1:55 AM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsAMDGPU.def

5 lines

test/

CodeGenOpenCL/

builtins-amdgcn-mfma.cl

32 lines

SemaOpenCL/

builtins-amdgcn-error-gfx940-param.cl

35 lines

llvm/

include/

llvm/

IR/

IntrinsicsAMDGPU.td

5 lines

lib/

Target/

AMDGPU/

AMDGPURegisterBankInfo.cpp

6 lines

AMDGPUSearchableTables.td

4 lines

SIInstrInfo.td

5 lines

SISchedule.td

4 lines

VOP3PInstructions.td

20 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

regbankselect-amdgcn.mfma.gfx940.mir

119 lines

llvm.amdgcn.mfma.gfx940.ll

83 lines

mfma-vgpr-cd-select-gfx940.ll

47 lines

MC/

AMDGPU/

mai-gfx940.s

80 lines

Disassembler/

AMDGPU/

mai-gfx940.txt

30 lines

Diff 418013

clang/include/clang/Basic/BuiltinsAMDGPU.def

	Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x4bf16_1k, "V32fV4sV4sV32fIiIiIi", "nc", "mai-insts")			TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x4bf16_1k, "V32fV4sV4sV32fIiIiIi", "nc", "mai-insts")
	TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x4bf16_1k, "V16fV4sV4sV16fIiIiIi", "nc", "mai-insts")			TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x4bf16_1k, "V16fV4sV4sV16fIiIiIi", "nc", "mai-insts")
	TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_4x4x4bf16_1k, "V4fV4sV4sV4fIiIiIi", "nc", "mai-insts")			TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_4x4x4bf16_1k, "V4fV4sV4sV4fIiIiIi", "nc", "mai-insts")
	TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x8bf16_1k, "V16fV4sV4sV16fIiIiIi", "nc", "mai-insts")			TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x8bf16_1k, "V16fV4sV4sV16fIiIiIi", "nc", "mai-insts")
	TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x16bf16_1k, "V4fV4sV4sV4fIiIiIi", "nc", "mai-insts")			TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x16bf16_1k, "V4fV4sV4sV4fIiIiIi", "nc", "mai-insts")
	TARGET_BUILTIN(__builtin_amdgcn_mfma_f64_16x16x4f64, "V4dddV4dIiIiIi", "nc", "mai-insts")			TARGET_BUILTIN(__builtin_amdgcn_mfma_f64_16x16x4f64, "V4dddV4dIiIiIi", "nc", "mai-insts")
	TARGET_BUILTIN(__builtin_amdgcn_mfma_f64_4x4x4f64, "ddddIiIiIi", "nc", "mai-insts")			TARGET_BUILTIN(__builtin_amdgcn_mfma_f64_4x4x4f64, "ddddIiIiIi", "nc", "mai-insts")

				TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_16x16x32_i8, "V4iWiWiV4iIiIiIi", "nc", "mai-insts")
				foadUnsubmitted Done Reply Inline Actions Why do the new ones have `_` before the `i8`/`xf32` suffix? None of the old ones have it. What does `xf32` mean? foad: Why do the new ones have `_` before the `i8`/`xf32` suffix? None of the old ones have it. What…
				rampitecAuthorUnsubmitted Done Reply Inline Actions xf32 suffix indicates data in TF32 format: 8-bit exponent with FP16’s 10-bit mantissa. Uses 32-bits of storage for 19 bits of data. MFMA instructions got new names in the gfx940 (see D121741). It now includes block size which was implicit before. That is because some of them operate on a 2x-16x blocks. Together adding an 'x' suffix and new size factor made the names ambiguous so the decision was made to separate fields with an underscore. The old names are preserved as aliases for compatibility with the existing programs. The same is true for the builtins as these are used in the existing programs. Going forward a new names will be used. rampitec: xf32 suffix indicates data in TF32 format: 8-bit exponent with FP16’s 10-bit mantissa. Uses 32…
				TARGET_BUILTIN(__builtin_amdgcn_mfma_i32_32x32x16_i8, "V16iWiWiV16iIiIiIi", "nc", "mai-insts")
				TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_16x16x8_xf32, "V4fV2fV2fV4fIiIiIi", "nc", "mai-insts")
				TARGET_BUILTIN(__builtin_amdgcn_mfma_f32_32x32x4_xf32, "V16fV2fV2fV16fIiIiIi", "nc", "mai-insts")

	#undef BUILTIN			#undef BUILTIN
	#undef TARGET_BUILTIN			#undef TARGET_BUILTIN

clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx908 -DMFMA_GFX908_TESTS -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=CHECK-GFX908			// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx908 -DMFMA_GFX908_TESTS -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=CHECK-GFX908
	// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx90a -DMFMA_GFX90A_TESTS -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=CHECK-GFX90A			// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx90a -DMFMA_GFX90A_TESTS -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=CHECK-GFX90A
				// RUN: %clang_cc1 -triple amdgcn-unknown-unknown -target-cpu gfx940 -DMFMA_GFX940_TESTS -S -emit-llvm -o - %s \| FileCheck %s --check-prefix=CHECK-GFX940

	#pragma OPENCL EXTENSION cl_khr_fp64:enable			#pragma OPENCL EXTENSION cl_khr_fp64:enable

				typedef float v2f __attribute__((ext_vector_type(2)));
	typedef float v4f __attribute__((ext_vector_type(4)));			typedef float v4f __attribute__((ext_vector_type(4)));
	typedef float v16f __attribute__((ext_vector_type(16)));			typedef float v16f __attribute__((ext_vector_type(16)));
	typedef float v32f __attribute__((ext_vector_type(32)));			typedef float v32f __attribute__((ext_vector_type(32)));
	typedef half v4h __attribute__((ext_vector_type(4)));			typedef half v4h __attribute__((ext_vector_type(4)));
	typedef half v16h __attribute__((ext_vector_type(16)));			typedef half v16h __attribute__((ext_vector_type(16)));
	typedef half v32h __attribute__((ext_vector_type(32)));			typedef half v32h __attribute__((ext_vector_type(32)));
	typedef int v4i __attribute__((ext_vector_type(4)));			typedef int v4i __attribute__((ext_vector_type(4)));
	typedef int v16i __attribute__((ext_vector_type(16)));			typedef int v16i __attribute__((ext_vector_type(16)));
	▲ Show 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	// CHECK-GFX90A-LABEL: @test_mfma_f64_4x4x4f64			// CHECK-GFX90A-LABEL: @test_mfma_f64_4x4x4f64
	// CHECK-GFX90A: call double @llvm.amdgcn.mfma.f64.4x4x4f64(double %a, double %b, double %c, i32 0, i32 0, i32 0)			// CHECK-GFX90A: call double @llvm.amdgcn.mfma.f64.4x4x4f64(double %a, double %b, double %c, i32 0, i32 0, i32 0)
	void test_mfma_f64_4x4x4f64(global double* out, double a, double b, double c)			void test_mfma_f64_4x4x4f64(global double* out, double a, double b, double c)
	{			{
	*out = __builtin_amdgcn_mfma_f64_4x4x4f64(a, b, c, 0, 0, 0);			*out = __builtin_amdgcn_mfma_f64_4x4x4f64(a, b, c, 0, 0, 0);
	}			}

	#endif // MFMA_GFX90A_TESTS			#endif // MFMA_GFX90A_TESTS

				#ifdef MFMA_GFX940_TESTS
				// CHECK-GFX940-LABEL: @test_mfma_i32_16x16x32_i8
				// CHECK-GFX940: call <4 x i32> @llvm.amdgcn.mfma.i32.16x16x32.i8(i64 %a, i64 %b, <4 x i32> %c, i32 0, i32 0, i32 0)
				void test_mfma_i32_16x16x32_i8(global v4i* out, long a, long b, v4i c)
				{
				*out = __builtin_amdgcn_mfma_i32_16x16x32_i8(a, b, c, 0, 0, 0);
				}

				// CHECK-GFX940-LABEL: @test_mfma_i32_32x32x16_i8
				// CHECK-GFX940: call <16 x i32> @llvm.amdgcn.mfma.i32.32x32x16.i8(i64 %a, i64 %b, <16 x i32> %c, i32 0, i32 0, i32 0)
				void test_mfma_i32_32x32x16_i8(global v16i* out, long a, long b, v16i c)
				{
				*out = __builtin_amdgcn_mfma_i32_32x32x16_i8(a, b, c, 0, 0, 0);
				}

				// CHECK-GFX940-LABEL: @test_mfma_f32_16x16x8_xf32
				// CHECK-GFX940: call <4 x float> @llvm.amdgcn.mfma.f32.16x16x8.xf32(<2 x float> %a, <2 x float> %b, <4 x float> %c, i32 0, i32 0, i32 0)
				void test_mfma_f32_16x16x8_xf32(global v4f* out, v2f a, v2f b, v4f c)
				{
				*out = __builtin_amdgcn_mfma_f32_16x16x8_xf32(a, b, c, 0, 0, 0);
				}

				// CHECK-GFX940-LABEL: @test_mfma_f32_32x32x4_xf32
				// CHECK-GFX940: call <16 x float> @llvm.amdgcn.mfma.f32.32x32x4.xf32(<2 x float> %a, <2 x float> %b, <16 x float> %c, i32 0, i32 0, i32 0)
				void test_mfma_f32_32x32x4_xf32(global v16f* out, v2f a, v2f b, v16f c)
				{
				*out = __builtin_amdgcn_mfma_f32_32x32x4_xf32(a, b, c, 0, 0, 0);
				}
				#endif // MFMA_GFX940_TESTS

clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl

This file was added.

				// RUN: %clang_cc1 -triple amdgcn-- -target-cpu gfx940 -verify -S -o - %s

				typedef float v2f __attribute__((ext_vector_type(2)));
				typedef float v4f __attribute__((ext_vector_type(4)));
				typedef float v16f __attribute__((ext_vector_type(16)));
				typedef int v4i __attribute__((ext_vector_type(4)));
				typedef int v16i __attribute__((ext_vector_type(16)));

				void test_mfma_i32_16x16x32i8(global v4i* out, long a, long b, v4i c, int d)
				{
				*out = __builtin_amdgcn_mfma_i32_16x16x32_i8(a, b, c, d, 0, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_16x16x32_i8' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_i32_16x16x32_i8(a, b, c, 0, d, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_16x16x32_i8' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_i32_16x16x32_i8(a, b, c, 0, 0, d); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_16x16x32_i8' must be a constant integer}}
				}

				void test_mfma_i32_32x32x16i8(global v16i* out, long a, long b, v16i c, int d)
				{
				*out = __builtin_amdgcn_mfma_i32_32x32x16_i8(a, b, c, d, 0, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_32x32x16_i8' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_i32_32x32x16_i8(a, b, c, 0, d, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_32x32x16_i8' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_i32_32x32x16_i8(a, b, c, 0, 0, d); // expected-error{{argument to '__builtin_amdgcn_mfma_i32_32x32x16_i8' must be a constant integer}}
				}

				void test_mfma_f32_16x16x8xf32(global v4f* out, v2f a, v2f b, v4f c, int d)
				{
				*out = __builtin_amdgcn_mfma_f32_16x16x8_xf32(a, b, c, d, 0, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x8_xf32' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_f32_16x16x8_xf32(a, b, c, 0, d, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x8_xf32' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_f32_16x16x8_xf32(a, b, c, 0, 0, d); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_16x16x8_xf32' must be a constant integer}}
				}

				void test_mfma_f32_32x32x4xf32(global v16f* out, v2f a, v2f b, v16f c, int d)
				{
				*out = __builtin_amdgcn_mfma_f32_32x32x4_xf32(a, b, c, d, 0, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_32x32x4_xf32' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_f32_32x32x4_xf32(a, b, c, 0, d, 0); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_32x32x4_xf32' must be a constant integer}}
				*out = __builtin_amdgcn_mfma_f32_32x32x4_xf32(a, b, c, 0, 0, d); // expected-error{{argument to '__builtin_amdgcn_mfma_f32_32x32x4_xf32' must be a constant integer}}
				}

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,993 Lines • ▼ Show 20 Lines
	def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUGlobalAtomicRtn<llvm_v2i16_ty>;			def int_amdgcn_global_atomic_fadd_v2bf16 : AMDGPUGlobalAtomicRtn<llvm_v2i16_ty>;
	def int_amdgcn_flat_atomic_fadd_v2bf16 : AMDGPUGlobalAtomicRtn<llvm_v2i16_ty>;			def int_amdgcn_flat_atomic_fadd_v2bf16 : AMDGPUGlobalAtomicRtn<llvm_v2i16_ty>;
	def int_amdgcn_ds_fadd_v2bf16 : Intrinsic<			def int_amdgcn_ds_fadd_v2bf16 : Intrinsic<
	[llvm_v2i16_ty],			[llvm_v2i16_ty],
	[LLVMQualPointerType<llvm_v2i16_ty, 3>, llvm_v2i16_ty],			[LLVMQualPointerType<llvm_v2i16_ty, 3>, llvm_v2i16_ty],
	[IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>]>,			[IntrArgMemOnly, IntrWillReturn, NoCapture<ArgIndex<0>>]>,
	GCCBuiltin<"__builtin_amdgcn_ds_atomic_fadd_v2bf16">;			GCCBuiltin<"__builtin_amdgcn_ds_atomic_fadd_v2bf16">;

				def int_amdgcn_mfma_i32_16x16x32_i8 : AMDGPUMfmaIntrinsic<llvm_v4i32_ty, llvm_i64_ty>;
				def int_amdgcn_mfma_i32_32x32x16_i8 : AMDGPUMfmaIntrinsic<llvm_v16i32_ty, llvm_i64_ty>;
				def int_amdgcn_mfma_f32_16x16x8_xf32 : AMDGPUMfmaIntrinsic<llvm_v4f32_ty, llvm_v2f32_ty>;
				def int_amdgcn_mfma_f32_32x32x4_xf32 : AMDGPUMfmaIntrinsic<llvm_v16f32_ty, llvm_v2f32_ty>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Special Intrinsics for backend internal use only. No frontend			// Special Intrinsics for backend internal use only. No frontend
	// should emit calls to these.			// should emit calls to these.
	// ===----------------------------------------------------------------------===//			// ===----------------------------------------------------------------------===//
	def int_amdgcn_if : Intrinsic<[llvm_i1_ty, llvm_anyint_ty],			def int_amdgcn_if : Intrinsic<[llvm_i1_ty, llvm_anyint_ty],
	[llvm_i1_ty], [IntrConvergent, IntrWillReturn]			[llvm_i1_ty], [IntrConvergent, IntrWillReturn]
	>;			>;

	Show All 32 Lines

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

Show First 20 Lines • Show All 4,237 Lines • ▼ Show 20 Lines	case AMDGPU::G_INTRINSIC: {
case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:		case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:		case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:		case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:		case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:		case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:		case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:		case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
case Intrinsic::amdgcn_mfma_f64_16x16x4f64:		case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
case Intrinsic::amdgcn_mfma_f64_4x4x4f64: {		case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
		case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
		case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
		case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
		case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32: {
// Default for MAI intrinsics.		// Default for MAI intrinsics.
// srcC can also be an immediate which can be folded later.		// srcC can also be an immediate which can be folded later.
// FIXME: Should we eventually add an alternative mapping with AGPR src		// FIXME: Should we eventually add an alternative mapping with AGPR src
// for srcA/srcB?		// for srcA/srcB?
//		//
// vdst, srcA, srcB, srcC		// vdst, srcA, srcB, srcC
const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();		const SIMachineFunctionInfo *Info = MF.getInfo<SIMachineFunctionInfo>();
OpdsMapping[0] =		OpdsMapping[0] =
▲ Show 20 Lines • Show All 330 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

	Show First 20 Lines • Show All 289 Lines • ▼ Show 20 Lines
	def : SourceOfDivergence<int_amdgcn_mfma_f32_32x32x4bf16>;			def : SourceOfDivergence<int_amdgcn_mfma_f32_32x32x4bf16>;
	def : SourceOfDivergence<int_amdgcn_mfma_f32_32x32x4bf16_1k>;			def : SourceOfDivergence<int_amdgcn_mfma_f32_32x32x4bf16_1k>;
	def : SourceOfDivergence<int_amdgcn_mfma_f32_16x16x4bf16_1k>;			def : SourceOfDivergence<int_amdgcn_mfma_f32_16x16x4bf16_1k>;
	def : SourceOfDivergence<int_amdgcn_mfma_f32_4x4x4bf16_1k>;			def : SourceOfDivergence<int_amdgcn_mfma_f32_4x4x4bf16_1k>;
	def : SourceOfDivergence<int_amdgcn_mfma_f32_32x32x8bf16_1k>;			def : SourceOfDivergence<int_amdgcn_mfma_f32_32x32x8bf16_1k>;
	def : SourceOfDivergence<int_amdgcn_mfma_f32_16x16x16bf16_1k>;			def : SourceOfDivergence<int_amdgcn_mfma_f32_16x16x16bf16_1k>;
	def : SourceOfDivergence<int_amdgcn_mfma_f64_16x16x4f64>;			def : SourceOfDivergence<int_amdgcn_mfma_f64_16x16x4f64>;
	def : SourceOfDivergence<int_amdgcn_mfma_f64_4x4x4f64>;			def : SourceOfDivergence<int_amdgcn_mfma_f64_4x4x4f64>;
				def : SourceOfDivergence<int_amdgcn_mfma_i32_16x16x32_i8>;
				def : SourceOfDivergence<int_amdgcn_mfma_i32_32x32x16_i8>;
				def : SourceOfDivergence<int_amdgcn_mfma_f32_16x16x8_xf32>;
				def : SourceOfDivergence<int_amdgcn_mfma_f32_32x32x4_xf32>;

	// The dummy boolean output is divergent from the IR's perspective,			// The dummy boolean output is divergent from the IR's perspective,
	// but the mask results are uniform. These produce a divergent and			// but the mask results are uniform. These produce a divergent and
	// uniform result, so the returned struct is collectively divergent.			// uniform result, so the returned struct is collectively divergent.
	// isAlwaysUniform can override the extract of the uniform component.			// isAlwaysUniform can override the extract of the uniform component.
	def : SourceOfDivergence<int_amdgcn_if>;			def : SourceOfDivergence<int_amdgcn_if>;
	def : SourceOfDivergence<int_amdgcn_else>;			def : SourceOfDivergence<int_amdgcn_else>;
	def : SourceOfDivergence<int_amdgcn_loop>;			def : SourceOfDivergence<int_amdgcn_loop>;

	foreach intr = AMDGPUImageDimAtomicIntrinsics in			foreach intr = AMDGPUImageDimAtomicIntrinsics in
	def : SourceOfDivergence<intr>;			def : SourceOfDivergence<intr>;

llvm/lib/Target/AMDGPU/SIInstrInfo.td

	Show First 20 Lines • Show All 2,401 Lines • ▼ Show 20 Lines

	def VOP_V2F32_V2F32_V2F32_V2F32 : VOPProfile <[v2f32, v2f32, v2f32, v2f32]>;			def VOP_V2F32_V2F32_V2F32_V2F32 : VOPProfile <[v2f32, v2f32, v2f32, v2f32]>;
	def VOP_V2F32_V2F32_V2F32 : VOPProfile <[v2f32, v2f32, v2f32, untyped]>;			def VOP_V2F32_V2F32_V2F32 : VOPProfile <[v2f32, v2f32, v2f32, untyped]>;
	def VOP_V2I32_V2I32_V2I32 : VOPProfile <[v2i32, v2i32, v2i32, untyped]>;			def VOP_V2I32_V2I32_V2I32 : VOPProfile <[v2i32, v2i32, v2i32, untyped]>;
	def VOP_V4F32_V4I16_V4I16_V4F32 : VOPProfile <[v4f32, v4i16, v4i16, v4f32]>;			def VOP_V4F32_V4I16_V4I16_V4F32 : VOPProfile <[v4f32, v4i16, v4i16, v4f32]>;
	def VOP_V16F32_V4I16_V4I16_V16F32 : VOPProfile <[v16f32, v4i16, v4i16, v16f32]>;			def VOP_V16F32_V4I16_V4I16_V16F32 : VOPProfile <[v16f32, v4i16, v4i16, v16f32]>;
	def VOP_V32F32_V4I16_V4I16_V32F32 : VOPProfile <[v32f32, v4i16, v4i16, v32f32]>;			def VOP_V32F32_V4I16_V4I16_V32F32 : VOPProfile <[v32f32, v4i16, v4i16, v32f32]>;

				def VOP_V4I32_I64_I64_V4I32 : VOPProfile <[v4i32, i64, i64, v4i32]>;
				def VOP_V16I32_I64_I64_V16I32 : VOPProfile <[v16i32, i64, i64, v16i32]>;
				def VOP_V4F32_V2F32_V2F32_V4F32 : VOPProfile <[v4f32, v2f32, v2f32, v4f32]>;
				def VOP_V16F32_V2F32_V2F32_V16F32 : VOPProfile <[v16f32, v2f32, v2f32, v16f32]>;

	class Commutable_REV <string revOp, bit isOrig> {			class Commutable_REV <string revOp, bit isOrig> {
	string RevOp = revOp;			string RevOp = revOp;
	bit IsOrig = isOrig;			bit IsOrig = isOrig;
	}			}

	class AtomicNoRet <string noRetOp, bit isRet> {			class AtomicNoRet <string noRetOp, bit isRet> {
	string NoRetOp = noRetOp;			string NoRetOp = noRetOp;
	bit IsRet = isRet;			bit IsRet = isRet;
	▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SISchedule.td

	Show First 20 Lines • Show All 258 Lines • ▼ Show 20 Lines
	def : HWVALUWriteRes<WriteTrans64, 4>;			def : HWVALUWriteRes<WriteTrans64, 4>;
	def : HWVALUWriteRes<WriteIntMul, 1>;			def : HWVALUWriteRes<WriteIntMul, 1>;
	def : HWVALUWriteRes<Write64Bit, 1>;			def : HWVALUWriteRes<Write64Bit, 1>;

	def : InstRW<[WriteCopy], (instrs COPY)>;			def : InstRW<[WriteCopy], (instrs COPY)>;
	def : InstRW<[Write64Bit], (instregex "^V_ACCVGPR_WRITE_B32_e64$")>;			def : InstRW<[Write64Bit], (instregex "^V_ACCVGPR_WRITE_B32_e64$")>;
	def : InstRW<[Write2PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_4X4X")>;			def : InstRW<[Write2PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_4X4X")>;

				def : InstRW<[Write4PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_16X16X8X")>;
	def : InstRW<[Write4PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_16X16X16")>;			def : InstRW<[Write4PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_16X16X16")>;
				def : InstRW<[Write4PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_16X16X32")>;
	def : InstRW<[Write8PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_16X16X[14][FBI]")>;			def : InstRW<[Write8PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_16X16X[14][FBI]")>;

				def : InstRW<[Write8PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_32X32X4XF")>;
	def : InstRW<[Write8PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_32X32X8")>;			def : InstRW<[Write8PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_32X32X8")>;
				def : InstRW<[Write8PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_32X32X16")>;
	def : InstRW<[Write16PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_32X32X[124][FBI]")>;			def : InstRW<[Write16PassMAI, MIMFMARead], (instregex "^V_MFMA_.32_32X32X[124][FBI]")>;

	def : InstRW<[Write4PassDGEMM, MIMFMARead], (instregex "^V_MFMA_.64_4X4X")>;			def : InstRW<[Write4PassDGEMM, MIMFMARead], (instregex "^V_MFMA_.64_4X4X")>;
	def : InstRW<[Write8PassDGEMM, MIMFMARead], (instregex "^V_MFMA_.64_16X16X")>;			def : InstRW<[Write8PassDGEMM, MIMFMARead], (instregex "^V_MFMA_.64_16X16X")>;

	} // End SchedModel = SIDPGFX940FullSpeedModel			} // End SchedModel = SIDPGFX940FullSpeedModel

	let SchedModel = GFX10SpeedModel in {			let SchedModel = GFX10SpeedModel in {
	Show All 28 Lines

llvm/lib/Target/AMDGPU/VOP3PInstructions.td

Show First 20 Lines • Show All 406 Lines • ▼ Show 20 Lines
def VOPProfileMAI_F32_V4F16_X4 : VOPProfileMAI<VOP_V4F32_V4F16_V4F16_V4F32, AISrc_128_b32, ADst_128, AVSrc_64>;		def VOPProfileMAI_F32_V4F16_X4 : VOPProfileMAI<VOP_V4F32_V4F16_V4F16_V4F32, AISrc_128_b32, ADst_128, AVSrc_64>;
def VOPProfileMAI_F32_V4F16_X16 : VOPProfileMAI<VOP_V16F32_V4F16_V4F16_V16F32, AISrc_512_b32, ADst_512, AVSrc_64>;		def VOPProfileMAI_F32_V4F16_X16 : VOPProfileMAI<VOP_V16F32_V4F16_V4F16_V16F32, AISrc_512_b32, ADst_512, AVSrc_64>;
def VOPProfileMAI_F32_V4F16_X32 : VOPProfileMAI<VOP_V32F32_V4F16_V4F16_V32F32, AISrc_1024_b32, ADst_1024, AVSrc_64>;		def VOPProfileMAI_F32_V4F16_X32 : VOPProfileMAI<VOP_V32F32_V4F16_V4F16_V32F32, AISrc_1024_b32, ADst_1024, AVSrc_64>;
def VOPProfileMAI_F32_V4I16_X4 : VOPProfileMAI<VOP_V4F32_V4I16_V4I16_V4F32, AISrc_128_b32, ADst_128, AVSrc_64>;		def VOPProfileMAI_F32_V4I16_X4 : VOPProfileMAI<VOP_V4F32_V4I16_V4I16_V4F32, AISrc_128_b32, ADst_128, AVSrc_64>;
def VOPProfileMAI_F32_V4I16_X16 : VOPProfileMAI<VOP_V16F32_V4I16_V4I16_V16F32, AISrc_512_b32, ADst_512, AVSrc_64>;		def VOPProfileMAI_F32_V4I16_X16 : VOPProfileMAI<VOP_V16F32_V4I16_V4I16_V16F32, AISrc_512_b32, ADst_512, AVSrc_64>;
def VOPProfileMAI_F32_V4I16_X32 : VOPProfileMAI<VOP_V32F32_V4I16_V4I16_V32F32, AISrc_1024_b32, ADst_1024, AVSrc_64>;		def VOPProfileMAI_F32_V4I16_X32 : VOPProfileMAI<VOP_V32F32_V4I16_V4I16_V32F32, AISrc_1024_b32, ADst_1024, AVSrc_64>;
def VOPProfileMAI_F64_16X16X4F64 : VOPProfileMAI<VOP_V4F64_F64_F64_V4F64, AISrc_256_f64, ADst_256, AVSrc_64>;		def VOPProfileMAI_F64_16X16X4F64 : VOPProfileMAI<VOP_V4F64_F64_F64_V4F64, AISrc_256_f64, ADst_256, AVSrc_64>;
def VOPProfileMAI_F64_4X4X4F64 : VOPProfileMAI<VOP_F64_F64_F64_F64, AISrc_64_f64, ADst_64, AVSrc_64>;		def VOPProfileMAI_F64_4X4X4F64 : VOPProfileMAI<VOP_F64_F64_F64_F64, AISrc_64_f64, ADst_64, AVSrc_64>;
		def VOPProfileMAI_I32_I64_X16 : VOPProfileMAI<VOP_V4I32_I64_I64_V4I32, AISrc_128_b32, ADst_128, AVSrc_64>;
		def VOPProfileMAI_I32_I64_X32 : VOPProfileMAI<VOP_V16I32_I64_I64_V16I32, AISrc_512_b32, ADst_512, AVSrc_64>;
		def VOPProfileMAI_F32_V2F32_X16 : VOPProfileMAI<VOP_V4F32_V2F32_V2F32_V4F32, AISrc_128_b32, ADst_128, AVSrc_64>;
		def VOPProfileMAI_F32_V2F32_X32 : VOPProfileMAI<VOP_V16F32_V2F32_V2F32_V16F32, AISrc_512_b32, ADst_512, AVSrc_64>;

def VOPProfileMAI_F32_F32_X4_VCD : VOPProfileMAI<VOP_V4F32_F32_F32_V4F32, VISrc_128_f32, VDst_128>;		def VOPProfileMAI_F32_F32_X4_VCD : VOPProfileMAI<VOP_V4F32_F32_F32_V4F32, VISrc_128_f32, VDst_128>;
def VOPProfileMAI_F32_F32_X16_VCD : VOPProfileMAI<VOP_V16F32_F32_F32_V16F32, VISrc_512_f32, VDst_512>;		def VOPProfileMAI_F32_F32_X16_VCD : VOPProfileMAI<VOP_V16F32_F32_F32_V16F32, VISrc_512_f32, VDst_512>;
def VOPProfileMAI_F32_F32_X32_VCD : VOPProfileMAI<VOP_V32F32_F32_F32_V32F32, VISrc_1024_f32, VDst_1024>;		def VOPProfileMAI_F32_F32_X32_VCD : VOPProfileMAI<VOP_V32F32_F32_F32_V32F32, VISrc_1024_f32, VDst_1024>;
def VOPProfileMAI_I32_I32_X4_VCD : VOPProfileMAI<VOP_V4I32_I32_I32_V4I32, VISrc_128_b32, VDst_128>;		def VOPProfileMAI_I32_I32_X4_VCD : VOPProfileMAI<VOP_V4I32_I32_I32_V4I32, VISrc_128_b32, VDst_128>;
def VOPProfileMAI_I32_I32_X16_VCD : VOPProfileMAI<VOP_V16I32_I32_I32_V16I32, VISrc_512_b32, VDst_512>;		def VOPProfileMAI_I32_I32_X16_VCD : VOPProfileMAI<VOP_V16I32_I32_I32_V16I32, VISrc_512_b32, VDst_512>;
def VOPProfileMAI_I32_I32_X32_VCD : VOPProfileMAI<VOP_V32I32_I32_I32_V32I32, VISrc_1024_b32, VDst_1024>;		def VOPProfileMAI_I32_I32_X32_VCD : VOPProfileMAI<VOP_V32I32_I32_I32_V32I32, VISrc_1024_b32, VDst_1024>;
def VOPProfileMAI_F32_V2I16_X4_VCD : VOPProfileMAI<VOP_V4F32_V2I16_V2I16_V4F32, VISrc_128_b32, VDst_128>;		def VOPProfileMAI_F32_V2I16_X4_VCD : VOPProfileMAI<VOP_V4F32_V2I16_V2I16_V4F32, VISrc_128_b32, VDst_128>;
def VOPProfileMAI_F32_V2I16_X16_VCD : VOPProfileMAI<VOP_V16F32_V2I16_V2I16_V16F32, VISrc_512_b32, VDst_512>;		def VOPProfileMAI_F32_V2I16_X16_VCD : VOPProfileMAI<VOP_V16F32_V2I16_V2I16_V16F32, VISrc_512_b32, VDst_512>;
def VOPProfileMAI_F32_V2I16_X32_VCD : VOPProfileMAI<VOP_V32F32_V2I16_V2I16_V32F32, VISrc_1024_b32, VDst_1024>;		def VOPProfileMAI_F32_V2I16_X32_VCD : VOPProfileMAI<VOP_V32F32_V2I16_V2I16_V32F32, VISrc_1024_b32, VDst_1024>;
def VOPProfileMAI_F32_V4F16_X4_VCD : VOPProfileMAI<VOP_V4F32_V4F16_V4F16_V4F32, VISrc_128_b32, VDst_128, AVSrc_64>;		def VOPProfileMAI_F32_V4F16_X4_VCD : VOPProfileMAI<VOP_V4F32_V4F16_V4F16_V4F32, VISrc_128_b32, VDst_128, AVSrc_64>;
def VOPProfileMAI_F32_V4F16_X16_VCD : VOPProfileMAI<VOP_V16F32_V4F16_V4F16_V16F32, VISrc_512_b32, VDst_512, AVSrc_64>;		def VOPProfileMAI_F32_V4F16_X16_VCD : VOPProfileMAI<VOP_V16F32_V4F16_V4F16_V16F32, VISrc_512_b32, VDst_512, AVSrc_64>;
def VOPProfileMAI_F32_V4F16_X32_VCD : VOPProfileMAI<VOP_V32F32_V4F16_V4F16_V32F32, VISrc_1024_b32, VDst_1024, AVSrc_64>;		def VOPProfileMAI_F32_V4F16_X32_VCD : VOPProfileMAI<VOP_V32F32_V4F16_V4F16_V32F32, VISrc_1024_b32, VDst_1024, AVSrc_64>;
def VOPProfileMAI_F32_V4I16_X4_VCD : VOPProfileMAI<VOP_V4F32_V4I16_V4I16_V4F32, VISrc_128_b32, VDst_128, AVSrc_64>;		def VOPProfileMAI_F32_V4I16_X4_VCD : VOPProfileMAI<VOP_V4F32_V4I16_V4I16_V4F32, VISrc_128_b32, VDst_128, AVSrc_64>;
def VOPProfileMAI_F32_V4I16_X16_VCD : VOPProfileMAI<VOP_V16F32_V4I16_V4I16_V16F32, VISrc_512_b32, VDst_512, AVSrc_64>;		def VOPProfileMAI_F32_V4I16_X16_VCD : VOPProfileMAI<VOP_V16F32_V4I16_V4I16_V16F32, VISrc_512_b32, VDst_512, AVSrc_64>;
def VOPProfileMAI_F32_V4I16_X32_VCD : VOPProfileMAI<VOP_V32F32_V4I16_V4I16_V32F32, VISrc_1024_b32, VDst_1024, AVSrc_64>;		def VOPProfileMAI_F32_V4I16_X32_VCD : VOPProfileMAI<VOP_V32F32_V4I16_V4I16_V32F32, VISrc_1024_b32, VDst_1024, AVSrc_64>;
def VOPProfileMAI_F64_16X16X4F64_VCD : VOPProfileMAI<VOP_V4F64_F64_F64_V4F64, VISrc_256_f64, VDst_256, AVSrc_64>;		def VOPProfileMAI_F64_16X16X4F64_VCD : VOPProfileMAI<VOP_V4F64_F64_F64_V4F64, VISrc_256_f64, VDst_256, AVSrc_64>;
def VOPProfileMAI_F64_4X4X4F64_VCD : VOPProfileMAI<VOP_F64_F64_F64_F64, VISrc_64_f64, VDst_64, AVSrc_64>;		def VOPProfileMAI_F64_4X4X4F64_VCD : VOPProfileMAI<VOP_F64_F64_F64_F64, VISrc_64_f64, VDst_64, AVSrc_64>;
		def VOPProfileMAI_I32_I64_X16_VCD : VOPProfileMAI<VOP_V4I32_I64_I64_V4I32, VISrc_128_b32, VDst_128, AVSrc_64>;
		def VOPProfileMAI_I32_I64_X32_VCD : VOPProfileMAI<VOP_V16I32_I64_I64_V16I32, VISrc_512_b32, VDst_512, AVSrc_64>;
		def VOPProfileMAI_F32_V2F32_X16_VCD : VOPProfileMAI<VOP_V4F32_V2F32_V2F32_V4F32, VISrc_128_b32, VDst_128, AVSrc_64>;
		def VOPProfileMAI_F32_V2F32_X32_VCD : VOPProfileMAI<VOP_V16F32_V2F32_V2F32_V16F32, VISrc_512_b32, VDst_512, AVSrc_64>;

class MFMATable <bit is_mac, string Name> {		class MFMATable <bit is_mac, string Name> {
bit IsMac = is_mac;		bit IsMac = is_mac;
string FMAOp = Name;		string FMAOp = Name;
}		}

class MAIFrag<SDPatternOperator Op, code pred> : PatFrag <		class MAIFrag<SDPatternOperator Op, code pred> : PatFrag <
(ops node:$src0, node:$src1, node:$src2, node:$cbsz, node:$abid, node:$blgp),		(ops node:$src0, node:$src1, node:$src2, node:$cbsz, node:$abid, node:$blgp),
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	let Predicates = [isGFX90APlus] in {
defm V_MFMA_F32_4X4X4BF16_1K : MAIInst<"v_mfma_f32_4x4x4bf16_1k", "F32_V4I16_X4", int_amdgcn_mfma_f32_4x4x4bf16_1k>;		defm V_MFMA_F32_4X4X4BF16_1K : MAIInst<"v_mfma_f32_4x4x4bf16_1k", "F32_V4I16_X4", int_amdgcn_mfma_f32_4x4x4bf16_1k>;
defm V_MFMA_F32_32X32X8BF16_1K : MAIInst<"v_mfma_f32_32x32x8bf16_1k", "F32_V4I16_X16", int_amdgcn_mfma_f32_32x32x8bf16_1k>;		defm V_MFMA_F32_32X32X8BF16_1K : MAIInst<"v_mfma_f32_32x32x8bf16_1k", "F32_V4I16_X16", int_amdgcn_mfma_f32_32x32x8bf16_1k>;
defm V_MFMA_F32_16X16X16BF16_1K : MAIInst<"v_mfma_f32_16x16x16bf16_1k", "F32_V4I16_X4", int_amdgcn_mfma_f32_16x16x16bf16_1k>;		defm V_MFMA_F32_16X16X16BF16_1K : MAIInst<"v_mfma_f32_16x16x16bf16_1k", "F32_V4I16_X4", int_amdgcn_mfma_f32_16x16x16bf16_1k>;

defm V_MFMA_F64_16X16X4F64 : MAIInst<"v_mfma_f64_16x16x4f64", "F64_16X16X4F64", int_amdgcn_mfma_f64_16x16x4f64>;		defm V_MFMA_F64_16X16X4F64 : MAIInst<"v_mfma_f64_16x16x4f64", "F64_16X16X4F64", int_amdgcn_mfma_f64_16x16x4f64>;
defm V_MFMA_F64_4X4X4F64 : MAIInst<"v_mfma_f64_4x4x4f64", "F64_4X4X4F64", int_amdgcn_mfma_f64_4x4x4f64>;		defm V_MFMA_F64_4X4X4F64 : MAIInst<"v_mfma_f64_4x4x4f64", "F64_4X4X4F64", int_amdgcn_mfma_f64_4x4x4f64>;
} // End Predicates = [isGFX90APlus]		} // End Predicates = [isGFX90APlus]

		let Predicates = [isGFX940Plus] in {
		defm V_MFMA_I32_32X32X16I8 : MAIInst<"v_mfma_i32_32x32x16i8", "I32_I64_X32", int_amdgcn_mfma_i32_32x32x16_i8>;
		foadUnsubmitted Not Done Reply Inline Actions I am still confused why the instruction name in `MAIInst<"v_mfma_i32_32x32x16i8"` has no underscore before `i8`... foad: I am still confused why the instruction name in `MAIInst<"v_mfma_i32_32x32x16i8"` has no…
		defm V_MFMA_I32_16X16X32I8 : MAIInst<"v_mfma_i32_16x16x32i8", "I32_I64_X16", int_amdgcn_mfma_i32_16x16x32_i8>;
		defm V_MFMA_F32_16X16X8XF32 : MAIInst<"v_mfma_f32_16x16x8xf32", "F32_V2F32_X16", int_amdgcn_mfma_f32_16x16x8_xf32>;
		defm V_MFMA_F32_32X32X4XF32 : MAIInst<"v_mfma_f32_32x32x4xf32", "F32_V2F32_X32", int_amdgcn_mfma_f32_32x32x4_xf32>;
		} // End Predicates = [isGFX940Plus]

let SubtargetPredicate = HasPackedFP32Ops, isCommutable = 1 in {		let SubtargetPredicate = HasPackedFP32Ops, isCommutable = 1 in {
defm V_PK_FMA_F32 : VOP3PInst<"v_pk_fma_f32", VOP3_Profile<VOP_V2F32_V2F32_V2F32_V2F32, VOP3_PACKED>, any_fma>;		defm V_PK_FMA_F32 : VOP3PInst<"v_pk_fma_f32", VOP3_Profile<VOP_V2F32_V2F32_V2F32_V2F32, VOP3_PACKED>, any_fma>;
defm V_PK_MUL_F32 : VOP3PInst<"v_pk_mul_f32", VOP3_Profile<VOP_V2F32_V2F32_V2F32, VOP3_PACKED>, any_fmul>;		defm V_PK_MUL_F32 : VOP3PInst<"v_pk_mul_f32", VOP3_Profile<VOP_V2F32_V2F32_V2F32, VOP3_PACKED>, any_fmul>;
defm V_PK_ADD_F32 : VOP3PInst<"v_pk_add_f32", VOP3_Profile<VOP_V2F32_V2F32_V2F32, VOP3_PACKED>, any_fadd>;		defm V_PK_ADD_F32 : VOP3PInst<"v_pk_add_f32", VOP3_Profile<VOP_V2F32_V2F32_V2F32, VOP3_PACKED>, any_fadd>;
defm V_PK_MOV_B32 : VOP3PInst<"v_pk_mov_b32", VOP3_Profile<VOP_V2I32_V2I32_V2I32, VOP3_PACKED>>;		defm V_PK_MOV_B32 : VOP3PInst<"v_pk_mov_b32", VOP3_Profile<VOP_V2I32_V2I32_V2I32, VOP3_PACKED>>;
} // End SubtargetPredicate = HasPackedFP32Ops, isCommutable = 1		} // End SubtargetPredicate = HasPackedFP32Ops, isCommutable = 1

def : MnemonicAlias<"v_accvgpr_read", "v_accvgpr_read_b32">;		def : MnemonicAlias<"v_accvgpr_read", "v_accvgpr_read_b32">;
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines
defm V_MFMA_F32_32X32X4BF16_1K : VOP3P_Real_MFMA_gfx90a <0x63>;		defm V_MFMA_F32_32X32X4BF16_1K : VOP3P_Real_MFMA_gfx90a <0x63>;
defm V_MFMA_F32_16X16X4BF16_1K : VOP3P_Real_MFMA_gfx90a <0x64>;		defm V_MFMA_F32_16X16X4BF16_1K : VOP3P_Real_MFMA_gfx90a <0x64>;
defm V_MFMA_F32_4X4X4BF16_1K : VOP3P_Real_MFMA_gfx90a <0x65>;		defm V_MFMA_F32_4X4X4BF16_1K : VOP3P_Real_MFMA_gfx90a <0x65>;
defm V_MFMA_F32_32X32X8BF16_1K : VOP3P_Real_MFMA_gfx90a <0x66>;		defm V_MFMA_F32_32X32X8BF16_1K : VOP3P_Real_MFMA_gfx90a <0x66>;
defm V_MFMA_F32_16X16X16BF16_1K : VOP3P_Real_MFMA_gfx90a <0x67>;		defm V_MFMA_F32_16X16X16BF16_1K : VOP3P_Real_MFMA_gfx90a <0x67>;
defm V_MFMA_F64_16X16X4F64 : VOP3P_Real_MFMA_gfx90a <0x6e>;		defm V_MFMA_F64_16X16X4F64 : VOP3P_Real_MFMA_gfx90a <0x6e>;
defm V_MFMA_F64_4X4X4F64 : VOP3P_Real_MFMA_gfx90a <0x6f>;		defm V_MFMA_F64_4X4X4F64 : VOP3P_Real_MFMA_gfx90a <0x6f>;

		defm V_MFMA_I32_32X32X16I8 : VOP3P_Real_MFMA_gfx940 <0x56, "v_mfma_i32_32x32x16_i8">;
		foadUnsubmitted Done Reply Inline Actions ... but here the same instruction name does have an underscore. foad: ... but here the same instruction name does have an underscore.
		rampitecAuthorUnsubmitted Done Reply Inline Actions These are asm aliases. The same aliases with and w/o underscores supported by the SP3. In fact Justin and me decided we want these aliases to make it easier to adopt old kernels. rampitec: These are asm aliases. The same aliases with and w/o underscores supported by the SP3. In fact…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Look at the ai-gfx940.s lines 282 and 286 which tests both names are accepted. rampitec: Look at the ai-gfx940.s lines 282 and 286 which tests both names are accepted.
		foadUnsubmitted Not Done Reply Inline Actions OK. No further comments from me :) foad: OK. No further comments from me :)
		defm V_MFMA_I32_16X16X32I8 : VOP3P_Real_MFMA_gfx940 <0x57, "v_mfma_i32_16x16x32_i8">;
		defm V_MFMA_F32_16X16X8XF32 : VOP3P_Real_MFMA_gfx940 <0x3e, "v_mfma_f32_16x16x8_xf32">;
		defm V_MFMA_F32_32X32X4XF32 : VOP3P_Real_MFMA_gfx940 <0x3f, "v_mfma_f32_32x32x4_xf32">;

defm V_MFMA_F32_32X32X4BF16_1K : VOP3P_Real_MFMA_gfx940 <0x5d, "v_mfma_f32_32x32x4_2b_bf16">;		defm V_MFMA_F32_32X32X4BF16_1K : VOP3P_Real_MFMA_gfx940 <0x5d, "v_mfma_f32_32x32x4_2b_bf16">;
defm V_MFMA_F32_16X16X4BF16_1K : VOP3P_Real_MFMA_gfx940 <0x5e, "v_mfma_f32_16x16x4_4b_bf16">;		defm V_MFMA_F32_16X16X4BF16_1K : VOP3P_Real_MFMA_gfx940 <0x5e, "v_mfma_f32_16x16x4_4b_bf16">;
defm V_MFMA_F32_4X4X4BF16_1K : VOP3P_Real_MFMA_gfx940 <0x5f, "v_mfma_f32_4x4x4_16b_bf16">;		defm V_MFMA_F32_4X4X4BF16_1K : VOP3P_Real_MFMA_gfx940 <0x5f, "v_mfma_f32_4x4x4_16b_bf16">;
defm V_MFMA_F32_32X32X8BF16_1K : VOP3P_Real_MFMA_gfx940 <0x60, "v_mfma_f32_32x32x8_bf16">;		defm V_MFMA_F32_32X32X8BF16_1K : VOP3P_Real_MFMA_gfx940 <0x60, "v_mfma_f32_32x32x8_bf16">;
defm V_MFMA_F32_16X16X16BF16_1K : VOP3P_Real_MFMA_gfx940 <0x61, "v_mfma_f32_16x16x16_bf16">;		defm V_MFMA_F32_16X16X16BF16_1K : VOP3P_Real_MFMA_gfx940 <0x61, "v_mfma_f32_16x16x16_bf16">;

defm V_MFMA_F64_16X16X4F64 : VOP3P_Real_MFMA_gfx940 <0x6e, "v_mfma_f64_16x16x4_f64">;		defm V_MFMA_F64_16X16X4F64 : VOP3P_Real_MFMA_gfx940 <0x6e, "v_mfma_f64_16x16x4_f64">;
defm V_MFMA_F64_4X4X4F64 : VOP3P_Real_MFMA_gfx940 <0x6f, "v_mfma_f64_4x4x4_4b_f64">;		defm V_MFMA_F64_4X4X4F64 : VOP3P_Real_MFMA_gfx940 <0x6f, "v_mfma_f64_4x4x4_4b_f64">;
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -march=amdgcn -mcpu=gfx940 -run-pass=regbankselect -regbankselect-fast -verify-machineinstrs %s -o - \| FileCheck %s -check-prefix=FAST
				# RUN: llc -march=amdgcn -mcpu=gfx940 -run-pass=regbankselect -regbankselect-greedy -verify-machineinstrs %s -o - \| FileCheck %s -check-prefix=GREEDY

				---
				name: mfma_i32_16x16x32_i8_vva
				legalized: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr1_vgpr2, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3

				; FAST-LABEL: name: mfma_i32_16x16x32_i8_vva
				; FAST: liveins: $vgpr1_vgpr2, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3
				; FAST: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; FAST: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; FAST: [[COPY2:%[0-9]+]]:agpr(<4 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3
				; FAST: [[INT:%[0-9]+]]:agpr(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.i32.16x16x32.i8), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<4 x s32>), 0, 0, 0
				; FAST: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[INT]](<4 x s32>)
				; GREEDY-LABEL: name: mfma_i32_16x16x32_i8_vva
				; GREEDY: liveins: $vgpr1_vgpr2, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3
				; GREEDY: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; GREEDY: [[COPY2:%[0-9]+]]:agpr(<4 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3
				; GREEDY: [[INT:%[0-9]+]]:agpr(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.i32.16x16x32.i8), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<4 x s32>), 0, 0, 0
				; GREEDY: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[INT]](<4 x s32>)
				%0:_(s64) = COPY $vgpr0_vgpr1
				%1:_(s64) = COPY $vgpr2_vgpr3
				%2:_(<4 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3
				%3:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.i32.16x16x32.i8), %0, %1, %2, 0, 0, 0
				$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %3
				...

				---
				name: mfma_i32_32x32x16_i8_vva
				legalized: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15

				; FAST-LABEL: name: mfma_i32_32x32x16_i8_vva
				; FAST: liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; FAST: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; FAST: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; FAST: [[COPY2:%[0-9]+]]:agpr(<16 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; FAST: [[INT:%[0-9]+]]:agpr(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.i32.32x32x16.i8), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<16 x s32>), 0, 0, 0
				; FAST: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = COPY [[INT]](<16 x s32>)
				; GREEDY-LABEL: name: mfma_i32_32x32x16_i8_vva
				; GREEDY: liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; GREEDY: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; GREEDY: [[COPY2:%[0-9]+]]:agpr(<16 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; GREEDY: [[INT:%[0-9]+]]:agpr(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.i32.32x32x16.i8), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<16 x s32>), 0, 0, 0
				; GREEDY: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = COPY [[INT]](<16 x s32>)
				%0:_(s64) = COPY $vgpr0_vgpr1
				%1:_(s64) = COPY $vgpr2_vgpr3
				%2:_(<16 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				%3:_(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.i32.32x32x16.i8), %0, %1, %2, 0, 0, 0
				$vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = COPY %3
				...

				---
				name: mfma_f32_16x16x8_xf32_vva
				legalized: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr1_vgpr2, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3

				; FAST-LABEL: name: mfma_f32_16x16x8_xf32_vva
				; FAST: liveins: $vgpr1_vgpr2, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3
				; FAST: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; FAST: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; FAST: [[COPY2:%[0-9]+]]:agpr(<4 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3
				; FAST: [[INT:%[0-9]+]]:agpr(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.f32.16x16x8.xf32), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<4 x s32>), 0, 0, 0
				; FAST: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[INT]](<4 x s32>)
				; GREEDY-LABEL: name: mfma_f32_16x16x8_xf32_vva
				; GREEDY: liveins: $vgpr1_vgpr2, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3
				; GREEDY: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; GREEDY: [[COPY2:%[0-9]+]]:agpr(<4 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3
				; GREEDY: [[INT:%[0-9]+]]:agpr(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.f32.16x16x8.xf32), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<4 x s32>), 0, 0, 0
				; GREEDY: $vgpr0_vgpr1_vgpr2_vgpr3 = COPY [[INT]](<4 x s32>)
				%0:_(s64) = COPY $vgpr0_vgpr1
				%1:_(s64) = COPY $vgpr2_vgpr3
				%2:_(<4 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3
				%3:_(<4 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.f32.16x16x8.xf32), %0, %1, %2, 0, 0, 0
				$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %3
				...

				---
				name: mfma_f32_32x32x4_xf32_vva
				legalized: true
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15

				; FAST-LABEL: name: mfma_f32_32x32x4_xf32_vva
				; FAST: liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; FAST: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; FAST: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; FAST: [[COPY2:%[0-9]+]]:agpr(<16 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; FAST: [[INT:%[0-9]+]]:agpr(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.f32.32x32x4.xf32), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<16 x s32>), 0, 0, 0
				; FAST: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = COPY [[INT]](<16 x s32>)
				; GREEDY-LABEL: name: mfma_f32_32x32x4_xf32_vva
				; GREEDY: liveins: $vgpr0_vgpr1, $vgpr2_vgpr3, $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; GREEDY: [[COPY:%[0-9]+]]:vgpr(s64) = COPY $vgpr0_vgpr1
				; GREEDY: [[COPY1:%[0-9]+]]:vgpr(s64) = COPY $vgpr2_vgpr3
				; GREEDY: [[COPY2:%[0-9]+]]:agpr(<16 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				; GREEDY: [[INT:%[0-9]+]]:agpr(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.f32.32x32x4.xf32), [[COPY]](s64), [[COPY1]](s64), [[COPY2]](<16 x s32>), 0, 0, 0
				; GREEDY: $vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = COPY [[INT]](<16 x s32>)
				%0:_(s64) = COPY $vgpr0_vgpr1
				%1:_(s64) = COPY $vgpr2_vgpr3
				%2:_(<16 x s32>) = COPY $agpr0_agpr1_agpr2_agpr3_agpr4_agpr5_agpr6_agpr7_agpr8_agpr9_agpr10_agpr11_agpr12_agpr13_agpr14_agpr15
				%3:_(<16 x s32>) = G_INTRINSIC intrinsic(@llvm.amdgcn.mfma.f32.32x32x4.xf32), %0, %1, %2, 0, 0, 0
				$vgpr0_vgpr1_vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9_vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15 = COPY %3
				...

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s \| FileCheck -enable-var-scope --check-prefixes=GCN,GFX940 %s
				; RUN: llc -march=amdgcn -mcpu=gfx940 -global-isel -verify-machineinstrs < %s \| FileCheck -enable-var-scope --check-prefixes=GCN,GISEL %s
				; RUN: llc -march=amdgcn -mcpu=gfx940 -stress-regalloc=10 -verify-machineinstrs < %s \| FileCheck -enable-var-scope --check-prefixes=GCN,GFX940 %s
				; RUN: llc -march=amdgcn -mcpu=gfx940 -stress-regalloc=10 -global-isel -verify-machineinstrs < %s \| FileCheck -enable-var-scope --check-prefixes=GCN,GISEL %s

				declare <4 x i32> @llvm.amdgcn.mfma.i32.16x16x32.i8(i64, i64, <4 x i32>, i32, i32, i32)
				declare <16 x i32> @llvm.amdgcn.mfma.i32.32x32x16.i8(i64, i64, <16 x i32>, i32, i32, i32)
				declare <4 x float> @llvm.amdgcn.mfma.f32.16x16x8.xf32(<2 x float>, <2 x float>, <4 x float>, i32, i32, i32)
				declare <16 x float> @llvm.amdgcn.mfma.f32.32x32x4.xf32(<2 x float>, <2 x float>, <16 x float>, i32, i32, i32)

				; GCN-LABEL: {{^}}test_mfma_i32_16x16x32i8:
				; GFX940-DAG: v_mov_b32_e32 v[[ONE:[0-9]+]], 1
				; GFX940-DAG: v_mov_b32_e32 v[[TWO:[0-9]+]], 2
				; GFX940-DAG: v_mov_b32_e32 v[[THREE:[0-9]+]], 3
				; GFX940-DAG: v_mov_b32_e32 v[[FOUR:[0-9]+]], 4
				; GCN-COUNT-4: v_accvgpr_write_b32 a{{[0-9]+}}, s{{[0-9]+}}
				; GFX940: v_mfma_i32_16x16x32_i8 a[{{[0-9]+:[0-9]+}}], v{{\[}}[[TWO]]:[[ONE]]], v{{\[}}[[FOUR]]:[[THREE]]], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GISEL: v_mfma_i32_16x16x32_i8 a[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GCN-NOT: v_accvgpr_read_b32
				; GCN: global_store_dwordx4 v{{[0-9]+}}, a[{{[0-9:]+}}]
				define amdgpu_kernel void @test_mfma_i32_16x16x32i8(<4 x i32> addrspace(1)* %arg) #0 {
				bb:
				%in.1 = load <4 x i32>, <4 x i32> addrspace(1)* %arg
				%mai.1 = tail call <4 x i32> @llvm.amdgcn.mfma.i32.16x16x32.i8(i64 4294967298, i64 12884901892, <4 x i32> %in.1, i32 1, i32 2, i32 3)
				store <4 x i32> %mai.1, <4 x i32> addrspace(1)* %arg
				ret void
				}

				; GCN-LABEL: {{^}}test_mfma_i32_32x32x16i8:
				; GFX940-DAG: v_mov_b32_e32 v[[ONE:[0-9]+]], 1
				; GFX940-DAG: v_mov_b32_e32 v[[TWO:[0-9]+]], 2
				; GFX940-DAG: v_mov_b32_e32 v[[THREE:[0-9]+]], 3
				; GFX940-DAG: v_mov_b32_e32 v[[FOUR:[0-9]+]], 4
				; GCN-COUNT-16: v_accvgpr_write_b32 a{{[0-9]+}}, s{{[0-9]+}}
				; GFX940: v_mfma_i32_32x32x16_i8 a[{{[0-9]+:[0-9]+}}], v{{\[}}[[TWO]]:[[ONE]]], v{{\[}}[[FOUR]]:[[THREE]]], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GISEL: v_mfma_i32_32x32x16_i8 a[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GCN-NOT: v_accvgpr_read_b32
				; GCN-COUNT-4: global_store_dwordx4 v{{[0-9]+}}, a[{{[0-9:]+}}]
				define amdgpu_kernel void @test_mfma_i32_32x32x16i8(<16 x i32> addrspace(1)* %arg) #0 {
				bb:
				%in.1 = load <16 x i32>, <16 x i32> addrspace(1)* %arg
				%mai.1 = tail call <16 x i32> @llvm.amdgcn.mfma.i32.32x32x16.i8(i64 4294967298, i64 12884901892, <16 x i32> %in.1, i32 1, i32 2, i32 3)
				store <16 x i32> %mai.1, <16 x i32> addrspace(1)* %arg
				ret void
				}

				; GCN-LABEL: {{^}}test_mfma_f32_16x16x8xf32:
				; GFX940-DAG: v_mov_b32_e32 v[[ONE:[0-9]+]], 1.0
				; GFX940-DAG: v_mov_b32_e32 v[[TWO:[0-9]+]], 2.0
				; GFX940-DAG: v_mov_b32_e32 v[[THREE:[0-9]+]], 0x40400000
				; GFX940-DAG: v_mov_b32_e32 v[[FOUR:[0-9]+]], 4.0
				; GCN-COUNT-4: v_accvgpr_write_b32 a{{[0-9]+}}, s{{[0-9]+}}
				; GFX940: v_mfma_f32_16x16x8_xf32 a[{{[0-9]+:[0-9]+}}], v{{\[}}[[ONE]]:[[TWO]]], v{{\[}}[[THREE]]:[[FOUR]]], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GISEL: v_mfma_f32_16x16x8_xf32 a[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GCN-NOT: v_accvgpr_read_b32
				; GCN: global_store_dwordx4 v{{[0-9]+}}, a[{{[0-9:]+}}]
				define amdgpu_kernel void @test_mfma_f32_16x16x8xf32(<4 x float> addrspace(1)* %arg) #0 {
				bb:
				%in.1 = load <4 x float>, <4 x float> addrspace(1)* %arg
				%mai.1 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x8.xf32(<2 x float> <float 1.0, float 2.0>, <2 x float> <float 3.0, float 4.0>, <4 x float> %in.1, i32 1, i32 2, i32 3)
				store <4 x float> %mai.1, <4 x float> addrspace(1)* %arg
				ret void
				}

				; GCN-LABEL: {{^}}test_mfma_f32_32x32x4xf32:
				; GFX940-DAG: v_mov_b32_e32 v[[ONE:[0-9]+]], 1.0
				; GFX940-DAG: v_mov_b32_e32 v[[TWO:[0-9]+]], 2.0
				; GFX940-DAG: v_mov_b32_e32 v[[THREE:[0-9]+]], 0x40400000
				; GFX940-DAG: v_mov_b32_e32 v[[FOUR:[0-9]+]], 4.0
				; GCN-COUNT-4: v_accvgpr_write_b32 a{{[0-9]+}}, s{{[0-9]+}}
				; GFX940: v_mfma_f32_32x32x4_xf32 a[{{[0-9]+:[0-9]+}}], v{{\[}}[[ONE]]:[[TWO]]], v{{\[}}[[THREE]]:[[FOUR]]], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GISEL: v_mfma_f32_32x32x4_xf32 a[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], a[{{[0-9]+:[0-9]+}}] cbsz:1 abid:2 blgp:3
				; GCN-NOT: v_accvgpr_read_b32
				; GCN: global_store_dwordx4 v{{[0-9]+}}, a[{{[0-9:]+}}]
				define amdgpu_kernel void @test_mfma_f32_32x32x4xf32(<16 x float> addrspace(1)* %arg) #0 {
				bb:
				%in.1 = load <16 x float>, <16 x float> addrspace(1)* %arg
				%mai.1 = tail call <16 x float> @llvm.amdgcn.mfma.f32.32x32x4.xf32(<2 x float> <float 1.0, float 2.0>, <2 x float> <float 3.0, float 4.0>, <16 x float> %in.1, i32 1, i32 2, i32 3)
				store <16 x float> %mai.1, <16 x float> addrspace(1)* %arg
				ret void
				}

				attributes #0 = { "amdgpu-flat-work-group-size"="1,256" }

llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s \| FileCheck -enable-var-scope --check-prefix=GCN %s
				; RUN: llc -global-isel -march=amdgcn -mcpu=gfx940 -verify-machineinstrs < %s \| FileCheck -enable-var-scope --check-prefix=GCN %s

				declare <4 x i32> @llvm.amdgcn.mfma.i32.16x16x32.i8(i64, i64, <4 x i32>, i32, i32, i32)
				declare <16 x i32> @llvm.amdgcn.mfma.i32.32x32x16.i8(i64, i64, <16 x i32>, i32, i32, i32)
				declare <4 x float> @llvm.amdgcn.mfma.f32.16x16x8.xf32(<2 x float>, <2 x float>, <4 x float>, i32, i32, i32)
				declare <16 x float> @llvm.amdgcn.mfma.f32.32x32x4.xf32(<2 x float>, <2 x float>, <16 x float>, i32, i32, i32)

				; GCN-LABEL: {{^}}test_mfma_i32_16x16x32i8:
				; GCN: v_mfma_i32_16x16x32_i8 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
				define amdgpu_kernel void @test_mfma_i32_16x16x32i8(<4 x i32> addrspace(1)* %arg) {
				bb:
				%in.1 = load <4 x i32>, <4 x i32> addrspace(1)* %arg
				%mai.1 = tail call <4 x i32> @llvm.amdgcn.mfma.i32.16x16x32.i8(i64 4294967298, i64 12884901892, <4 x i32> %in.1, i32 0, i32 0, i32 0)
				store <4 x i32> %mai.1, <4 x i32> addrspace(1)* %arg
				ret void
				}

				; GCN-LABEL: {{^}}test_mfma_i32_32x32x16i8:
				; GCN: v_mfma_i32_32x32x16_i8 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
				define amdgpu_kernel void @test_mfma_i32_32x32x16i8(<16 x i32> addrspace(1)* %arg) {
				bb:
				%in.1 = load <16 x i32>, <16 x i32> addrspace(1)* %arg
				%mai.1 = tail call <16 x i32> @llvm.amdgcn.mfma.i32.32x32x16.i8(i64 4294967298, i64 12884901892, <16 x i32> %in.1, i32 0, i32 0, i32 0)
				store <16 x i32> %mai.1, <16 x i32> addrspace(1)* %arg
				ret void
				}

				; GCN-LABEL: {{^}}test_mfma_f32_16x16x8xf32:
				; GCN: v_mfma_f32_16x16x8_xf32 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
				define amdgpu_kernel void @test_mfma_f32_16x16x8xf32(<4 x float> addrspace(1)* %arg) {
				bb:
				%in.1 = load <4 x float>, <4 x float> addrspace(1)* %arg
				%mai.1 = tail call <4 x float> @llvm.amdgcn.mfma.f32.16x16x8.xf32(<2 x float> <float 1.0, float 2.0>, <2 x float> <float 3.0, float 4.0>, <4 x float> %in.1, i32 0, i32 0, i32 0)
				store <4 x float> %mai.1, <4 x float> addrspace(1)* %arg
				ret void
				}

				; GCN-LABEL: {{^}}test_mfma_f32_32x32x4xf32:
				; GCN: v_mfma_f32_32x32x4_xf32 v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}], v[{{[0-9]+:[0-9]+}}]
				define amdgpu_kernel void @test_mfma_f32_32x32x4xf32(<16 x float> addrspace(1)* %arg) {
				bb:
				%in.1 = load <16 x float>, <16 x float> addrspace(1)* %arg
				%mai.1 = tail call <16 x float> @llvm.amdgcn.mfma.f32.32x32x4.xf32(<2 x float> <float 1.0, float 2.0>, <2 x float> <float 3.0, float 4.0>, <16 x float> %in.1, i32 0, i32 0, i32 0)
				store <16 x float> %mai.1, <16 x float> addrspace(1)* %arg
				ret void
				}

llvm/test/MC/AMDGPU/mai-gfx940.s

	Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	// GFX90A: error: instruction not supported on this GPU			// GFX90A: error: instruction not supported on this GPU

	v_mfma_f32_32x32x1f32 a[0:31], v0, v1, a[34:65] blgp:7			v_mfma_f32_32x32x1f32 a[0:31], v0, v1, a[34:65] blgp:7
	// GFX940: v_mfma_f32_32x32x1_2b_f32 a[0:31], v0, v1, a[34:65] blgp:7 ; encoding: [0x00,0x80,0xc0,0xd3,0x00,0x03,0x8a,0xe4]			// GFX940: v_mfma_f32_32x32x1_2b_f32 a[0:31], v0, v1, a[34:65] blgp:7 ; encoding: [0x00,0x80,0xc0,0xd3,0x00,0x03,0x8a,0xe4]

	v_mfma_f32_32x32x1f32 v[0:31], v0, v1, v[34:65] blgp:7			v_mfma_f32_32x32x1f32 v[0:31], v0, v1, v[34:65] blgp:7
	// GFX940: v_mfma_f32_32x32x1_2b_f32 v[0:31], v0, v1, v[34:65] blgp:7 ; encoding: [0x00,0x00,0xc0,0xd3,0x00,0x03,0x8a,0xe4]			// GFX940: v_mfma_f32_32x32x1_2b_f32 v[0:31], v0, v1, v[34:65] blgp:7 ; encoding: [0x00,0x00,0xc0,0xd3,0x00,0x03,0x8a,0xe4]

				v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15]
				// GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15]
				// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15]
				// GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15]
				// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5
				// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_32x32x16i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5
				// GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_32x32x16i8 v[0:15], v[2:3], v[4:5], v[0:15] blgp:5
				// GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] blgp:5 ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0xa4]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_16x16x32_i8 v[0:3], v[2:3], v[4:5], v[0:3]
				// GFX940: v_mfma_i32_16x16x32_i8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3]
				// GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5
				// GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_16x16x32i8 v[0:3], v[2:3], v[4:5], v[0:3] blgp:5
				// GFX940: v_mfma_i32_16x16x32_i8 v[0:3], v[2:3], v[4:5], v[0:3] blgp:5 ; encoding: [0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0xa4]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_i32_16x16x32i8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5
				// GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4]
				// GFX90A: error: instruction not supported on this GPU

	v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[34:65]			v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[34:65]
	// GFX940: v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[34:65] ; encoding: [0x00,0x00,0xdd,0xd3,0x02,0x09,0x8a,0x04]			// GFX940: v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[34:65] ; encoding: [0x00,0x00,0xdd,0xd3,0x02,0x09,0x8a,0x04]
	// GFX90A: error: instruction not supported on this GPU			// GFX90A: error: instruction not supported on this GPU

	v_mfma_f32_32x32x4_2b_bf16 a[0:31], v[2:3], v[4:5], a[34:65]			v_mfma_f32_32x32x4_2b_bf16 a[0:31], v[2:3], v[4:5], a[34:65]
	// GFX940: v_mfma_f32_32x32x4_2b_bf16 a[0:31], v[2:3], v[4:5], a[34:65] ; encoding: [0x00,0x80,0xdd,0xd3,0x02,0x09,0x8a,0x04]			// GFX940: v_mfma_f32_32x32x4_2b_bf16 a[0:31], v[2:3], v[4:5], a[34:65] ; encoding: [0x00,0x80,0xdd,0xd3,0x02,0x09,0x8a,0x04]
	// GFX90A: error: instruction not supported on this GPU			// GFX90A: error: instruction not supported on this GPU

	▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines
	// GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]			// GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]
	// GFX90A: error: instruction not supported on this GPU			// GFX90A: error: instruction not supported on this GPU

	v_mfma_f32_16x16x16bf16_1k v[0:3], v[2:3], v[4:5], v[2:5]			v_mfma_f32_16x16x16bf16_1k v[0:3], v[2:3], v[4:5], v[2:5]
	// GFX940: v_mfma_f32_16x16x16_bf16 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xe1,0xd3,0x02,0x09,0x0a,0x04]			// GFX940: v_mfma_f32_16x16x16_bf16 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xe1,0xd3,0x02,0x09,0x0a,0x04]

	v_mfma_f32_16x16x16bf16_1k a[0:3], v[2:3], v[4:5], a[2:5]			v_mfma_f32_16x16x16bf16_1k a[0:3], v[2:3], v[4:5], a[2:5]
	// GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]			// GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]

				v_mfma_f32_16x16x8_xf32 a[0:3], v[2:3], v[4:5], a[2:5]
				// GFX940: v_mfma_f32_16x16x8_xf32 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_f32_16x16x8_xf32 v[0:3], v[2:3], v[4:5], v[2:5]
				// GFX940: v_mfma_f32_16x16x8_xf32 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_f32_16x16x8xf32 a[0:3], v[2:3], v[4:5], a[2:5]
				// GFX940: v_mfma_f32_16x16x8_xf32 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_f32_16x16x8xf32 v[0:3], v[2:3], v[4:5], v[2:5]
				// GFX940: v_mfma_f32_16x16x8_xf32 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_f32_32x32x4_xf32 v[0:15], v[2:3], v[4:5], v[18:33]
				// GFX940: v_mfma_f32_32x32x4_xf32 v[0:15], v[2:3], v[4:5], v[18:33] ; encoding: [0x00,0x00,0xbf,0xd3,0x02,0x09,0x4a,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[18:33]
				// GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[18:33] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x4a,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_f32_32x32x4xf32 v[0:15], v[2:3], v[4:5], v[18:33]
				// GFX940: v_mfma_f32_32x32x4_xf32 v[0:15], v[2:3], v[4:5], v[18:33] ; encoding: [0x00,0x00,0xbf,0xd3,0x02,0x09,0x4a,0x04]
				// GFX90A: error: instruction not supported on this GPU

				v_mfma_f32_32x32x4xf32 a[0:15], v[2:3], v[4:5], a[18:33]
				// GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[18:33] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x4a,0x04]
				// GFX90A: error: instruction not supported on this GPU

llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

	# RUN: llvm-mc -arch=amdgcn -mcpu=gfx940 -show-encoding -disassemble %s \| FileCheck -check-prefix=GFX940 %s			# RUN: llvm-mc -arch=amdgcn -mcpu=gfx940 -show-encoding -disassemble %s \| FileCheck -check-prefix=GFX940 %s

	# GFX940: v_accvgpr_write_b32 a10, s20 ; encoding: [0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18]			# GFX940: v_accvgpr_write_b32 a10, s20 ; encoding: [0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18]
	0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18			0x0a,0x40,0xd9,0xd3,0x14,0x00,0x00,0x18

				# GFX940: v_mfma_i32_32x32x16_i8 v[0:15], v[2:3], v[4:5], v[0:15] ; encoding: [0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04]
				0x00,0x00,0xd6,0xd3,0x02,0x09,0x02,0x04

				# GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04]
				0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0x04

				# GFX940: v_mfma_i32_32x32x16_i8 a[0:15], v[2:3], v[4:5], a[0:15] blgp:5 ; encoding: [0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4]
				0x00,0x80,0xd6,0xd3,0x02,0x09,0x02,0xa4

				# GFX940: v_mfma_i32_16x16x32_i8 v[0:3], v[2:3], v[4:5], v[0:3] ; encoding: [0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0x04]
				0x00,0x00,0xd7,0xd3,0x02,0x09,0x02,0x04

				# GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0x04]
				0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0x04

				# GFX940: v_mfma_i32_16x16x32_i8 a[0:3], v[2:3], v[4:5], a[0:3] blgp:5 ; encoding: [0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4]
				0x00,0x80,0xd7,0xd3,0x02,0x09,0x02,0xa4

	# GFX940: v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[2:33] ; encoding: [0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04]			# GFX940: v_mfma_f32_32x32x4_2b_bf16 v[0:31], v[2:3], v[4:5], v[2:33] ; encoding: [0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04]
	0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04			0x00,0x00,0xdd,0xd3,0x02,0x09,0x0a,0x04

	# GFX940: v_mfma_f32_32x32x4_2b_bf16 a[0:31], v[2:3], v[4:5], a[2:33] ; encoding: [0x00,0x80,0xdd,0xd3,0x02,0x09,0x0a,0x04]			# GFX940: v_mfma_f32_32x32x4_2b_bf16 a[0:31], v[2:3], v[4:5], a[2:33] ; encoding: [0x00,0x80,0xdd,0xd3,0x02,0x09,0x0a,0x04]
	0x00,0x80,0xdd,0xd3,0x02,0x09,0x0a,0x04			0x00,0x80,0xdd,0xd3,0x02,0x09,0x0a,0x04

	# GFX940: v_mfma_f32_16x16x4_4b_bf16 v[0:15], v[2:3], v[4:5], v[2:17] ; encoding: [0x00,0x00,0xde,0xd3,0x02,0x09,0x0a,0x04]			# GFX940: v_mfma_f32_16x16x4_4b_bf16 v[0:15], v[2:3], v[4:5], v[2:17] ; encoding: [0x00,0x00,0xde,0xd3,0x02,0x09,0x0a,0x04]
	0x00,0x00,0xde,0xd3,0x02,0x09,0x0a,0x04			0x00,0x00,0xde,0xd3,0x02,0x09,0x0a,0x04
	Show All 13 Lines
	# GFX940: v_mfma_f32_32x32x8_bf16 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xe0,0xd3,0x02,0x09,0x0a,0x04]			# GFX940: v_mfma_f32_32x32x8_bf16 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xe0,0xd3,0x02,0x09,0x0a,0x04]
	0x00,0x80,0xe0,0xd3,0x02,0x09,0x0a,0x04			0x00,0x80,0xe0,0xd3,0x02,0x09,0x0a,0x04

	# GFX940: v_mfma_f32_16x16x16_bf16 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xe1,0xd3,0x02,0x09,0x0a,0x04]			# GFX940: v_mfma_f32_16x16x16_bf16 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xe1,0xd3,0x02,0x09,0x0a,0x04]
	0x00,0x00,0xe1,0xd3,0x02,0x09,0x0a,0x04			0x00,0x00,0xe1,0xd3,0x02,0x09,0x0a,0x04

	# GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]			# GFX940: v_mfma_f32_16x16x16_bf16 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04]
	0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04			0x00,0x80,0xe1,0xd3,0x02,0x09,0x0a,0x04

				# GFX940: v_mfma_f32_16x16x8_xf32 a[0:3], v[2:3], v[4:5], a[2:5] ; encoding: [0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04]
				0x00,0x80,0xbe,0xd3,0x02,0x09,0x0a,0x04

				# GFX940: v_mfma_f32_16x16x8_xf32 v[0:3], v[2:3], v[4:5], v[2:5] ; encoding: [0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04]
				0x00,0x00,0xbe,0xd3,0x02,0x09,0x0a,0x04

				# GFX940: v_mfma_f32_32x32x4_xf32 v[0:15], v[2:3], v[4:5], v[2:17] ; encoding: [0x00,0x00,0xbf,0xd3,0x02,0x09,0x0a,0x04]
				0x00,0x00,0xbf,0xd3,0x02,0x09,0x0a,0x04

				# GFX940: v_mfma_f32_32x32x4_xf32 a[0:15], v[2:3], v[4:5], a[2:17] ; encoding: [0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04]
				0x00,0x80,0xbf,0xd3,0x02,0x09,0x0a,0x04

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] New gfx940 mfma instructionsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 418013

clang/include/clang/Basic/BuiltinsAMDGPU.def

clang/test/CodeGenOpenCL/builtins-amdgcn-mfma.cl

clang/test/SemaOpenCL/builtins-amdgcn-error-gfx940-param.cl

llvm/include/llvm/IR/IntrinsicsAMDGPU.td

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp

llvm/lib/Target/AMDGPU/AMDGPUSearchableTables.td

llvm/lib/Target/AMDGPU/SIInstrInfo.td

llvm/lib/Target/AMDGPU/SISchedule.td

llvm/lib/Target/AMDGPU/VOP3PInstructions.td

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.mfma.gfx940.mir

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.gfx940.ll

llvm/test/CodeGen/AMDGPU/mfma-vgpr-cd-select-gfx940.ll

llvm/test/MC/AMDGPU/mai-gfx940.s

llvm/test/MC/Disassembler/AMDGPU/mai-gfx940.txt

[AMDGPU] New gfx940 mfma instructions
ClosedPublic