This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/LLVMIR/
-
LLVMIR/
9/10
LLVMOps.td
-
Target/LLVMIR/
-
LLVMIR/
-
ModuleTranslation.h
-
test/Target/
-
Target/
-
llvmir-intrinsics.mlir

Differential D75653

[mlir][LLVM] Introduce an intrinsic for llvm.matrix.multiply
AbandonedPublic

Authored by nicolasvasilache on Mar 4 2020, 3:16 PM.

Download Raw Diff

Details

Reviewers

ftynse
aartbik
mehdi_amini
rriddle

Summary

This revision adds the first intrinsic for llvm.matrix.multiply.
This uses the more general LLVM_OneResultOp for now since the goal is to use the
specific Matrix builders that @fhahn has created recently.

When piped through:

opt -O3 -enable-matrix | llc -O3 -march=x86-64 -mcpu=skylake-avx512

this has been verified to generate ymm instructions.

Additional function attribute support will be needed to generate proper zmm instructions but at least things run end to end.

Benchmarking will be provided separately with the experimental metaprogramming ModelBuilder tool when ready.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.Mar 4 2020, 3:16 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2020, 3:16 PM

Herald added subscribers: llvm-commits, Joonsoo, liufengdb and 10 others. · View Herald Transcript

Note the TODO cleanup, please hold off on reviewing for now, this is more of a heads up that this is coming.
We'll also need to pipe this through the JIT properly for exec.

Also + @fhahn and @simoll in case this is interesting.

Harbormaster completed remote builds in B48126: Diff 248344.Mar 4 2020, 6:01 PM

mehdi_amini added inline comments.Mar 5 2020, 12:06 AM

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
777	Seems like some cleanup needed right now? You likely wrote this internally

ftynse requested changes to this revision.Mar 5 2020, 1:20 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
775	Can you elaborate?
801	Could have at least have a textual description to understand what ABMNK mean? These will be the names of accessor functions as well...
801	Also, any reason to chose I32 attribute?
819	I suppose the Op arguments are also in that order, so it would make sense to reorder them in Arguments<>.
823	This looks sufficiently long to go to .cpp instead.
828	This builder seems to ignore operands. I would also expect such a straightforward builder to be autogenerated.
836	Could we use a more conventional function-type notation: `'(' type($A) ',' type($B) ') -> ' type($res)` (or maybe there's support for that directly in the formatter)

This revision now requires changes to proceed.Mar 5 2020, 1:20 AM

nicolasvasilache retitled this revision from [mlir] Introduce an intrinsic for llvm.matrix.multiply to [mlir]WIP - Introduce an intrinsic for llvm.matrix.multiply.Mar 5 2020, 11:10 AM

Cleanup and simplify now that I can use MatrixBuilder.

format

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
801	to match the LLVM impl: https://reviews.llvm.org/D70456

nicolasvasilache retitled this revision from [mlir]WIP - Introduce an intrinsic for llvm.matrix.multiply to [mlir][LLVM] Introduce an intrinsic for llvm.matrix.multiply.Mar 5 2020, 2:29 PM

nicolasvasilache edited the summary of this revision. (Show Details)

nicolasvasilache marked an inline comment as done.

nicolasvasilache added reviewers: mehdi_amini, rriddle.Mar 5 2020, 2:36 PM

aartbik accepted this revision.Mar 5 2020, 2:52 PM

aartbik added inline comments.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
790	In the longer run, do we want any sanity checks on the passed in rows/columns and types of the operands?

Harbormaster failed remote builds in B48272: Diff 248608!Mar 5 2020, 3:21 PM

Harbormaster completed remote builds in B48273: Diff 248609.Mar 5 2020, 3:53 PM

This has landed as cac1ed1f4bf939d70532fb8e80cf81cb50db69db

nicolasvasilache abandoned this revision.Mar 6 2020, 5:36 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

LLVMIR/

LLVMOps.td

21 lines

Target/

LLVMIR/

ModuleTranslation.h

1 line

test/

Target/

llvmir-intrinsics.mlir

14 lines

Diff 248609

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td

	Show First 20 Lines • Show All 763 Lines • ▼ Show 20 Lines
	def LLVM_experimental_vector_reduce_umax : LLVM_VectorReduction<"umax">;			def LLVM_experimental_vector_reduce_umax : LLVM_VectorReduction<"umax">;
	def LLVM_experimental_vector_reduce_umin : LLVM_VectorReduction<"umin">;			def LLVM_experimental_vector_reduce_umin : LLVM_VectorReduction<"umin">;
	def LLVM_experimental_vector_reduce_xor : LLVM_VectorReduction<"xor">;			def LLVM_experimental_vector_reduce_xor : LLVM_VectorReduction<"xor">;

	def LLVM_experimental_vector_reduce_v2_fadd : LLVM_VectorReductionV2<"fadd">;			def LLVM_experimental_vector_reduce_v2_fadd : LLVM_VectorReductionV2<"fadd">;
	def LLVM_experimental_vector_reduce_v2_fmul : LLVM_VectorReductionV2<"fmul">;			def LLVM_experimental_vector_reduce_v2_fmul : LLVM_VectorReductionV2<"fmul">;

	//			//
				// LLVM Matrix operations.
				//

				/// As specified in the LLVM MatrixBuilder:
				ftynseUnsubmitted Done Reply Inline Actions Can you elaborate? ftynse: Can you elaborate?
				/// Create a llvm.matrix.multiply call, multiplying matrices LHS and RHS.
				def LLVM_MatrixMultiplyOp
				mehdi_aminiUnsubmitted Done Reply Inline Actions Seems like some cleanup needed right now? You likely wrote this internally mehdi_amini: Seems like some cleanup needed right now? You likely wrote this internally
				: LLVM_OneResultOp<"intr.matrix.multiply">,
				Arguments<(
				ins LLVM_Type:$lhs, LLVM_Type:$rhs,
				I32Attr:$lhs_rows, I32Attr:$lhs_columns, I32Attr:$rhs_rows)> {
				string llvmBuilder = [{
				llvm::MatrixBuilder<decltype(builder)> mb(builder);
				$res = mb.CreateMatrixMultiply(
				$lhs, $rhs, $lhs_rows.getZExtValue(), $lhs_columns.getZExtValue(),
				$rhs_rows.getZExtValue());
				}];
				let assemblyFormat = "$lhs `,` $rhs attr-dict "
				"`:` `(` type($lhs) `,` type($rhs) `)` `->` type($res)";
				}
				aartbikUnsubmitted Not Done Reply Inline Actions In the longer run, do we want any sanity checks on the passed in rows/columns and types of the operands? aartbik: In the longer run, do we want any sanity checks on the passed in rows/columns and types of the…

				//
	// Atomic operations.			// Atomic operations.
	//			//

	def AtomicBinOpXchg : I64EnumAttrCase<"xchg", 0>;			def AtomicBinOpXchg : I64EnumAttrCase<"xchg", 0>;
	def AtomicBinOpAdd : I64EnumAttrCase<"add", 1>;			def AtomicBinOpAdd : I64EnumAttrCase<"add", 1>;
	def AtomicBinOpSub : I64EnumAttrCase<"sub", 2>;			def AtomicBinOpSub : I64EnumAttrCase<"sub", 2>;
	def AtomicBinOpAnd : I64EnumAttrCase<"_and", 3>;			def AtomicBinOpAnd : I64EnumAttrCase<"_and", 3>;
	def AtomicBinOpNand : I64EnumAttrCase<"nand", 4>;			def AtomicBinOpNand : I64EnumAttrCase<"nand", 4>;
	def AtomicBinOpOr : I64EnumAttrCase<"_or", 5>;			def AtomicBinOpOr : I64EnumAttrCase<"_or", 5>;
				ftynseUnsubmitted Done Reply Inline Actions Could have at least have a textual description to understand what ABMNK mean? These will be the names of accessor functions as well... ftynse: Could have at least have a textual description to understand what ABMNK mean? These will be the…
				ftynseUnsubmitted Done Reply Inline Actions Also, any reason to chose I32 attribute? ftynse: Also, any reason to chose I32 attribute?
				nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions to match the LLVM impl: https://reviews.llvm.org/D70456 nicolasvasilache: to match the LLVM impl: https://reviews.llvm.org/D70456
	def AtomicBinOpXor : I64EnumAttrCase<"_xor", 6>;			def AtomicBinOpXor : I64EnumAttrCase<"_xor", 6>;
	def AtomicBinOpMax : I64EnumAttrCase<"max", 7>;			def AtomicBinOpMax : I64EnumAttrCase<"max", 7>;
	def AtomicBinOpMin : I64EnumAttrCase<"min", 8>;			def AtomicBinOpMin : I64EnumAttrCase<"min", 8>;
	def AtomicBinOpUMax : I64EnumAttrCase<"umax", 9>;			def AtomicBinOpUMax : I64EnumAttrCase<"umax", 9>;
	def AtomicBinOpUMin : I64EnumAttrCase<"umin", 10>;			def AtomicBinOpUMin : I64EnumAttrCase<"umin", 10>;
	def AtomicBinOpFAdd : I64EnumAttrCase<"fadd", 11>;			def AtomicBinOpFAdd : I64EnumAttrCase<"fadd", 11>;
	def AtomicBinOpFSub : I64EnumAttrCase<"fsub", 12>;			def AtomicBinOpFSub : I64EnumAttrCase<"fsub", 12>;
	def AtomicBinOp : I64EnumAttr<			def AtomicBinOp : I64EnumAttr<
	"AtomicBinOp",			"AtomicBinOp",
	"llvm.atomicrmw binary operations",			"llvm.atomicrmw binary operations",
	[AtomicBinOpXchg, AtomicBinOpAdd, AtomicBinOpSub, AtomicBinOpAnd,			[AtomicBinOpXchg, AtomicBinOpAdd, AtomicBinOpSub, AtomicBinOpAnd,
	AtomicBinOpNand, AtomicBinOpOr, AtomicBinOpXor, AtomicBinOpMax,			AtomicBinOpNand, AtomicBinOpOr, AtomicBinOpXor, AtomicBinOpMax,
	AtomicBinOpMin, AtomicBinOpUMax, AtomicBinOpUMin, AtomicBinOpFAdd,			AtomicBinOpMin, AtomicBinOpUMax, AtomicBinOpUMin, AtomicBinOpFAdd,
	AtomicBinOpFSub]> {			AtomicBinOpFSub]> {
	let cppNamespace = "::mlir::LLVM";			let cppNamespace = "::mlir::LLVM";
	}			}

	def AtomicOrderingNotAtomic : I64EnumAttrCase<"not_atomic", 0>;			def AtomicOrderingNotAtomic : I64EnumAttrCase<"not_atomic", 0>;
				ftynseUnsubmitted Done Reply Inline Actions I suppose the Op arguments are also in that order, so it would make sense to reorder them in Arguments<>. ftynse: I suppose the Op arguments are also in that order, so it would make sense to reorder them in…
	def AtomicOrderingUnordered : I64EnumAttrCase<"unordered", 1>;			def AtomicOrderingUnordered : I64EnumAttrCase<"unordered", 1>;
	def AtomicOrderingMonotonic : I64EnumAttrCase<"monotonic", 2>;			def AtomicOrderingMonotonic : I64EnumAttrCase<"monotonic", 2>;
	def AtomicOrderingAcquire : I64EnumAttrCase<"acquire", 4>;			def AtomicOrderingAcquire : I64EnumAttrCase<"acquire", 4>;
	def AtomicOrderingRelease : I64EnumAttrCase<"release", 5>;			def AtomicOrderingRelease : I64EnumAttrCase<"release", 5>;
				ftynseUnsubmitted Done Reply Inline Actions This looks sufficiently long to go to .cpp instead. ftynse: This looks sufficiently long to go to .cpp instead.
	def AtomicOrderingAcquireRelease : I64EnumAttrCase<"acq_rel", 6>;			def AtomicOrderingAcquireRelease : I64EnumAttrCase<"acq_rel", 6>;
	def AtomicOrderingSequentiallyConsistent : I64EnumAttrCase<"seq_cst", 7>;			def AtomicOrderingSequentiallyConsistent : I64EnumAttrCase<"seq_cst", 7>;
	def AtomicOrdering : I64EnumAttr<			def AtomicOrdering : I64EnumAttr<
	"AtomicOrdering",			"AtomicOrdering",
	"Atomic ordering for LLVM's memory model",			"Atomic ordering for LLVM's memory model",
				ftynseUnsubmitted Done Reply Inline Actions This builder seems to ignore operands. I would also expect such a straightforward builder to be autogenerated. ftynse: This builder seems to ignore operands. I would also expect such a straightforward builder to be…
	[AtomicOrderingNotAtomic, AtomicOrderingUnordered, AtomicOrderingMonotonic,			[AtomicOrderingNotAtomic, AtomicOrderingUnordered, AtomicOrderingMonotonic,
	AtomicOrderingAcquire, AtomicOrderingRelease, AtomicOrderingAcquireRelease,			AtomicOrderingAcquire, AtomicOrderingRelease, AtomicOrderingAcquireRelease,
	AtomicOrderingSequentiallyConsistent]> {			AtomicOrderingSequentiallyConsistent]> {
	let cppNamespace = "::mlir::LLVM";			let cppNamespace = "::mlir::LLVM";
	}			}

	def LLVM_AtomicRMWOp : LLVM_Op<"atomicrmw">,			def LLVM_AtomicRMWOp : LLVM_Op<"atomicrmw">,
	Arguments<(ins AtomicBinOp:$bin_op, LLVM_Type:$ptr, LLVM_Type:$val,			Arguments<(ins AtomicBinOp:$bin_op, LLVM_Type:$ptr, LLVM_Type:$val,
				ftynseUnsubmitted Done Reply Inline Actions Could we use a more conventional function-type notation: `'(' type($A) ',' type($B) ') -> ' type($res)` (or maybe there's support for that directly in the formatter) ftynse: Could we use a more conventional function-type notation: `'(' type($A) ',' type($B) ') -> '…
	AtomicOrdering:$ordering)>,			AtomicOrdering:$ordering)>,
	Results<(outs LLVM_Type:$res)> {			Results<(outs LLVM_Type:$res)> {
	let llvmBuilder = [{			let llvmBuilder = [{
	$res = builder.CreateAtomicRMW(getLLVMAtomicBinOp($bin_op), $ptr, $val,			$res = builder.CreateAtomicRMW(getLLVMAtomicBinOp($bin_op), $ptr, $val,
	getLLVMAtomicOrdering($ordering));			getLLVMAtomicOrdering($ordering));
	}];			}];
	let parser = [{ return parseAtomicRMWOp(parser, result); }];			let parser = [{ return parseAtomicRMWOp(parser, result); }];
	let printer = [{ printAtomicRMWOp(p, *this); }];			let printer = [{ printAtomicRMWOp(p, *this); }];
	Show All 29 Lines

mlir/include/mlir/Target/LLVMIR/ModuleTranslation.h

	Show All 17 Lines
	#include "mlir/IR/Block.h"			#include "mlir/IR/Block.h"
	#include "mlir/IR/Module.h"			#include "mlir/IR/Module.h"
	#include "mlir/IR/Value.h"			#include "mlir/IR/Value.h"

	#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"			#include "llvm/Frontend/OpenMP/OMPIRBuilder.h"
	#include "llvm/IR/BasicBlock.h"			#include "llvm/IR/BasicBlock.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/MatrixBuilder.h"
	#include "llvm/IR/Value.h"			#include "llvm/IR/Value.h"

	namespace mlir {			namespace mlir {
	class Attribute;			class Attribute;
	class Location;			class Location;
	class ModuleOp;			class ModuleOp;
	class Operation;			class Operation;

	▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

mlir/test/Target/llvmir-intrinsics.mlir

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	llvm.func @vector_reductions(%arg0: !llvm.float, %arg1: !llvm<"<8 x float>">, %arg2: !llvm<"<8 x i32>">) {
"llvm.intr.experimental.vector.reduce.v2.fadd"(%arg0, %arg1) : (!llvm.float, !llvm<"<8 x float>">) -> !llvm.float		"llvm.intr.experimental.vector.reduce.v2.fadd"(%arg0, %arg1) : (!llvm.float, !llvm<"<8 x float>">) -> !llvm.float
// CHECK: call float @llvm.experimental.vector.reduce.v2.fmul.f32.v8f32		// CHECK: call float @llvm.experimental.vector.reduce.v2.fmul.f32.v8f32
"llvm.intr.experimental.vector.reduce.v2.fmul"(%arg0, %arg1) : (!llvm.float, !llvm<"<8 x float>">) -> !llvm.float		"llvm.intr.experimental.vector.reduce.v2.fmul"(%arg0, %arg1) : (!llvm.float, !llvm<"<8 x float>">) -> !llvm.float
// CHECK: call i32 @llvm.experimental.vector.reduce.xor.v8i32		// CHECK: call i32 @llvm.experimental.vector.reduce.xor.v8i32
"llvm.intr.experimental.vector.reduce.xor"(%arg2) : (!llvm<"<8 x i32>">) -> !llvm.i32		"llvm.intr.experimental.vector.reduce.xor"(%arg2) : (!llvm<"<8 x i32>">) -> !llvm.i32
llvm.return		llvm.return
}		}

		// CHECK-LABEL: @matrix_intrinsics
		// 4x16 16x3
		llvm.func @matrix_intrinsics(%A: !llvm<"<64 x float>">, %B: !llvm<"<48 x float>">)
		// 4x3
		-> !llvm<"<12 x float>">
		{
		// CHECK: call <12 x float> @llvm.matrix.multiply.v12f32.v64f32.v48f32(<64 x float> %0, <48 x float> %1, i32 4, i32 16, i32 3)
		%C = llvm.intr.matrix.multiply %A, %B
		{ lhs_rows = 4: i32, lhs_columns = 16: i32 , rhs_rows = 3: i32} :
		(!llvm<"<64 x float>">, !llvm<"<48 x float>">) -> !llvm<"<12 x float>">
		llvm.return %C: !llvm<"<12 x float>">
		}

// Check that intrinsics are declared with appropriate types.		// Check that intrinsics are declared with appropriate types.
// CHECK-DAG: declare float @llvm.fma.f32(float, float, float)		// CHECK-DAG: declare float @llvm.fma.f32(float, float, float)
// CHECK-DAG: declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>) #0
// CHECK-DAG: declare float @llvm.fmuladd.f32(float, float, float)		// CHECK-DAG: declare float @llvm.fmuladd.f32(float, float, float)
// CHECK-DAG: declare <8 x float> @llvm.fmuladd.v8f32(<8 x float>, <8 x float>, <8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.fmuladd.v8f32(<8 x float>, <8 x float>, <8 x float>) #0
// CHECK-DAG: declare void @llvm.prefetch.p0i8(i8* nocapture readonly, i32 immarg, i32 immarg, i32)		// CHECK-DAG: declare void @llvm.prefetch.p0i8(i8* nocapture readonly, i32 immarg, i32 immarg, i32)
// CHECK-DAG: declare float @llvm.exp.f32(float)		// CHECK-DAG: declare float @llvm.exp.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.exp.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.exp.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.log.f32(float)		// CHECK-DAG: declare float @llvm.log.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.log.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.log.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.log10.f32(float)		// CHECK-DAG: declare float @llvm.log10.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.log10.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.log10.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.log2.f32(float)		// CHECK-DAG: declare float @llvm.log2.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.log2.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.log2.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.fabs.f32(float)		// CHECK-DAG: declare float @llvm.fabs.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.fabs.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.fabs.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.sqrt.f32(float)		// CHECK-DAG: declare float @llvm.sqrt.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.sqrt.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.ceil.f32(float)		// CHECK-DAG: declare float @llvm.ceil.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.ceil.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.ceil.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.cos.f32(float)		// CHECK-DAG: declare float @llvm.cos.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.cos.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.cos.v8f32(<8 x float>) #0
// CHECK-DAG: declare float @llvm.copysign.f32(float, float)		// CHECK-DAG: declare float @llvm.copysign.f32(float, float)
		// CHECK-DAG: declare <12 x float> @llvm.matrix.multiply.v12f32.v64f32.v48f32(<64 x float>, <48 x float>, i32 immarg, i32 immarg, i32 immarg)