This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/X86Vector/
-
mlir/
-
Dialect/
-
X86Vector/
3/3
X86Vector.td
-
lib/Dialect/X86Vector/Transforms/
-
Dialect/
-
X86Vector/
-
Transforms/
-
LegalizeForLLVMExport.cpp
-
test/
-
Dialect/X86Vector/
-
X86Vector/
-
legalize-for-llvm.mlir
-
roundtrip.mlir
-
Integration/Dialect/Vector/CPU/X86Vector/
-
Dialect/
-
Vector/
-
CPU/
-
X86Vector/
-
test-dot.mlir

Differential D100593

[mlir][vector][avx] add AVX dot product to X86Vector dialect with lowering
ClosedPublic

Authored by aartbik on Apr 15 2021, 12:17 PM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache
dcaballe
cota

Commits

rG916f3e16bd4d: [mlir][vector][avx] add AVX dot product to X86Vector dialect with lowering

Summary

In the long run, we want to unify the dot product codegen solutions between
all target architectures, but this intrinsic enables experimenting with AVX
specific implementations in the meantime.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Apr 15 2021, 12:17 PM

Herald added a reviewer: ftynse. · View Herald TranscriptApr 15 2021, 12:17 PM

Herald added subscribers: dcaballe, cota, teijeong and 15 others. · View Herald Transcript

aartbik requested review of this revision.Apr 15 2021, 12:17 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 15 2021, 12:17 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik added reviewers: dcaballe, cota.Apr 15 2021, 12:18 PM

nicolasvasilache accepted this revision.Apr 15 2021, 12:25 PM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/X86Vector/X86Vector.td
313	Like we argued/settled in other dialects (i.e. neon IIRC), this should have a "codegen public" form `x86.avx.dot` on `vector<2x4xf32>` and an `x86.avx.intrin.dot` form on `vector<8xf32>`.
325	Then this can become: %0 = x86vector.avx.dot %a, %b : vector<2x4xf32> %1 = vector.extract %0[0, 0]: vector<2x4xf32> %2 = vector.extract %0[1, 0]: vector<2x4xf32> %d = addf %1, %2 : f32

This revision is now accepted and ready to land.Apr 15 2021, 12:25 PM

aartbik marked an inline comment as done.Apr 15 2021, 12:38 PM

aartbik added inline comments.

mlir/include/mlir/Dialect/X86Vector/X86Vector.td
313	Ah, I suppose you are referring to https://reviews.llvm.org/D98198 I somehow missed the second part of the discussion during the review. Let me have a careful read. Are you okay with this CL + later improvement or do you want me to make those changes first?

I'd like to make sure we are on the same page globally on this.
As far as simple step forward, assuming you'd be in agreement, you could just rename to x86vector.avx.intr.dot and not change anything else.
The "codegen exposed" version would then be a new op and the lowering to x86vector would be a simple reshape.

The big thing then will be the iterative canonicalizations of reshapes all the way to vector.transfer ops which we have to do in a portable fashion.

Harbormaster completed remote builds in B98993: Diff 337865.Apr 15 2021, 1:07 PM

In D100593#2692703, @nicolasvasilache wrote:

I'd like to make sure we are on the same page globally on this.

Yes, I am! The lower/higher lanes semantics are always a bit confusing, and I like the solution of exploiting the higher dimensionality of the vector dialect to represent that better.
I will make both intrinsics, and then add a new op in next revision.

see if you like what I have done to the place....

nicolasvasilache accepted this revision.Apr 15 2021, 2:30 PM

Harbormaster completed remote builds in B99018: Diff 337894.Apr 15 2021, 2:47 PM

Closed by commit rG916f3e16bd4d: [mlir][vector][avx] add AVX dot product to X86Vector dialect with lowering (authored by aartbik). · Explain WhyApr 15 2021, 3:02 PM

This revision was automatically updated to reflect the committed changes.

aartbik marked an inline comment as done.

aartbik added a commit: rG916f3e16bd4d: [mlir][vector][avx] add AVX dot product to X86Vector dialect with lowering.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

X86Vector/

X86Vector.td

46 lines

lib/

Dialect/

X86Vector/

Transforms/

LegalizeForLLVMExport.cpp

24 lines

test/

Dialect/

X86Vector/

legalize-for-llvm.mlir

8 lines

roundtrip.mlir

8 lines

Integration/

Dialect/

Vector/

CPU/

X86Vector/

test-dot.mlir

24 lines

Diff 337916

mlir/include/mlir/Dialect/X86Vector/X86Vector.td

Show All 23 Lines	def X86Vector_Dialect : Dialect {
let name = "x86vector";		let name = "x86vector";
let cppNamespace = "::mlir::x86vector";		let cppNamespace = "::mlir::x86vector";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AVX512 op definitions		// AVX512 op definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		// Operation that is part of the input dialect.
class AVX512_Op<string mnemonic, list<OpTrait> traits = []> :		class AVX512_Op<string mnemonic, list<OpTrait> traits = []> :
Op<X86Vector_Dialect, "avx512." # mnemonic, traits> {}		Op<X86Vector_Dialect, "avx512." # mnemonic, traits> {}

		// Intrinsic operation used during lowering to LLVM IR.
class AVX512_IntrOp<string mnemonic, int numResults, list<OpTrait> traits = []> :		class AVX512_IntrOp<string mnemonic, int numResults, list<OpTrait> traits = []> :
LLVM_IntrOpBase<X86Vector_Dialect, "avx512.intr." # mnemonic,		LLVM_IntrOpBase<X86Vector_Dialect, "avx512.intr." # mnemonic,
"x86_avx512_" # !subst(".", "_", mnemonic),		"x86_avx512_" # !subst(".", "_", mnemonic),
[], [], traits, numResults>;		[], [], traits, numResults>;

// Defined by first result overload. May have to be extended for other		// Defined by first result overload. May have to be extended for other
// instructions in the future.		// instructions in the future.
class AVX512_IntrOverloadedOp<string mnemonic,		class AVX512_IntrOverloadedOp<string mnemonic,
list<OpTrait> traits = []> :		list<OpTrait> traits = []> :
LLVM_IntrOpBase<X86Vector_Dialect, "avx512.intr." # mnemonic,		LLVM_IntrOpBase<X86Vector_Dialect, "avx512.intr." # mnemonic,
"x86_avx512_" # !subst(".", "_", mnemonic),		"x86_avx512_" # !subst(".", "_", mnemonic),
/list<int> overloadedResults=/[0],		/list<int> overloadedResults=/[0],
/list<int> overloadedOperands=/[],		/list<int> overloadedOperands=/[],
traits, /numResults=/1>;		traits, /numResults=/1>;

//----------------------------------------------------------------------------//		//----------------------------------------------------------------------------//
// MaskCompressOp		// MaskCompressOp
//----------------------------------------------------------------------------//		//----------------------------------------------------------------------------//

def MaskCompressOp : AVX512_Op<"mask.compress", [NoSideEffect,		def MaskCompressOp : AVX512_Op<"mask.compress", [NoSideEffect,
// TODO: Support optional arguments in `AllTypesMatch`. "type($src)" could		// TODO: Support optional arguments in `AllTypesMatch`. "type($src)" could
// then be removed from assemblyFormat.		// then be removed from assemblyFormat.
AllTypesMatch<["a", "dst"]>,		AllTypesMatch<["a", "dst"]>,
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	def Vp2IntersectQIntrOp : AVX512_IntrOp<"vp2intersect.q.512", 2, [
let arguments = (ins VectorOfLengthAndType<[8], [I64]>:$a,		let arguments = (ins VectorOfLengthAndType<[8], [I64]>:$a,
VectorOfLengthAndType<[8], [I64]>:$b);		VectorOfLengthAndType<[8], [I64]>:$b);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// AVX op definitions		// AVX op definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		// Operation that is part of the input dialect.
class AVX_Op<string mnemonic, list<OpTrait> traits = []> :		class AVX_Op<string mnemonic, list<OpTrait> traits = []> :
Op<X86Vector_Dialect, "avx." # mnemonic, traits> {}		Op<X86Vector_Dialect, "avx." # mnemonic, traits> {}

		// Operation that may be part of the input dialect, but whose
		// form is somewhere between the user view of the operation
		// and the actual lower level intrinsic in LLVM IR.
		class AVX_LowOp<string mnemonic, list<OpTrait> traits = []> :
		Op<X86Vector_Dialect, "avx.intr." # mnemonic, traits> {}

		// Intrinsic operation used during lowering to LLVM IR.
class AVX_IntrOp<string mnemonic, int numResults, list<OpTrait> traits = []> :		class AVX_IntrOp<string mnemonic, int numResults, list<OpTrait> traits = []> :
LLVM_IntrOpBase<X86Vector_Dialect, "avx.intr." # mnemonic,		LLVM_IntrOpBase<X86Vector_Dialect, "avx.intr." # mnemonic,
"x86_avx_" # !subst(".", "_", mnemonic),		"x86_avx_" # !subst(".", "_", mnemonic),
[], [], traits, numResults>;		[], [], traits, numResults>;

//----------------------------------------------------------------------------//		//----------------------------------------------------------------------------//
// AVX Rsqrt		// AVX Rsqrt
//----------------------------------------------------------------------------//		//----------------------------------------------------------------------------//

def RsqrtOp : AVX_Op<"rsqrt", [NoSideEffect, SameOperandsAndResultType]> {		def RsqrtOp : AVX_Op<"rsqrt", [NoSideEffect, SameOperandsAndResultType]> {
let summary = "Rsqrt";		let summary = "Rsqrt";
let arguments = (ins VectorOfLengthAndType<[8], [F32]>:$a);		let arguments = (ins VectorOfLengthAndType<[8], [F32]>:$a);
let results = (outs VectorOfLengthAndType<[8], [F32]>:$b);		let results = (outs VectorOfLengthAndType<[8], [F32]>:$b);
let assemblyFormat = "$a attr-dict `:` type($a)";		let assemblyFormat = "$a attr-dict `:` type($a)";
}		}

def RsqrtIntrOp : AVX_IntrOp<"rsqrt.ps.256", 1, [NoSideEffect,		def RsqrtIntrOp : AVX_IntrOp<"rsqrt.ps.256", 1, [NoSideEffect,
SameOperandsAndResultType]> {		SameOperandsAndResultType]> {
let arguments = (ins VectorOfLengthAndType<[8], [F32]>:$a);		let arguments = (ins VectorOfLengthAndType<[8], [F32]>:$a);
}		}

		//----------------------------------------------------------------------------//
		// AVX Dot
		//----------------------------------------------------------------------------//

		def DotOp : AVX_LowOp<"dot", [NoSideEffect, SameOperandsAndResultType]> {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Like we argued/settled in other dialects (i.e. neon IIRC), this should have a "codegen public" form `x86.avx.dot` on `vector<2x4xf32>` and an `x86.avx.intrin.dot` form on `vector<8xf32>`. nicolasvasilache: Like we argued/settled in other dialects (i.e. neon IIRC), this should have a "codegen public"…
		aartbikAuthorUnsubmitted Done Reply Inline Actions Ah, I suppose you are referring to https://reviews.llvm.org/D98198 I somehow missed the second part of the discussion during the review. Let me have a careful read. Are you okay with this CL + later improvement or do you want me to make those changes first? aartbik: Ah, I suppose you are referring to https://reviews.llvm.org/D98198 I somehow missed the second…
		let summary = "Dot";
		let description = [{
		Computes the 4-way dot products of the lower and higher parts of the source
		vectors and broadcasts the two results to the lower and higher elements of
		the destination vector, respectively. Adding one element of the lower part
		to one element of the higher part in the destination vector yields the full
		dot product of the two source vectors.

		Example:

		```mlir
		%0 = x86vector.avx.intr.dot %a, %b : vector<8xf32>
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Then this can become: %0 = x86vector.avx.dot %a, %b : vector<2x4xf32> %1 = vector.extract %0[0, 0]: vector<2x4xf32> %2 = vector.extract %0[1, 0]: vector<2x4xf32> %d = addf %1, %2 : f32 nicolasvasilache: Then this can become: ``` %0 = x86vector.avx.dot %a, %b : vector<2x4xf32> %1 = vector.
		%1 = vector.extractelement %0[%i0 : i32]: vector<8xf32>
		%2 = vector.extractelement %0[%i4 : i32]: vector<8xf32>
		%d = addf %1, %2 : f32
		```
		}];
		let arguments = (ins VectorOfLengthAndType<[8], [F32]>:$a,
		VectorOfLengthAndType<[8], [F32]>:$b);
		let results = (outs VectorOfLengthAndType<[8], [F32]>:$res);
		let assemblyFormat = "$a `,` $b attr-dict `:` type($res)";
		}

		def DotIntrOp : AVX_IntrOp<"dp.ps.256", 1, [NoSideEffect,
		AllTypesMatch<["a", "b", "res"]>]> {
		let arguments = (ins VectorOfLengthAndType<[8], [F32]>:$a,
		VectorOfLengthAndType<[8], [F32]>:$b, I8:$c);
		let results = (outs VectorOfLengthAndType<[8], [F32]>:$res);
		}

#endif // X86VECTOR_OPS		#endif // X86VECTOR_OPS

mlir/lib/Dialect/X86Vector/Transforms/LegalizeForLLVMExport.cpp

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	matchAndRewrite(RsqrtOp op, ArrayRef<Value> operands,
RsqrtOp::Adaptor adaptor(operands);		RsqrtOp::Adaptor adaptor(operands);

auto opType = adaptor.a().getType();		auto opType = adaptor.a().getType();
rewriter.replaceOpWithNewOp<RsqrtIntrOp>(op, opType, adaptor.a());		rewriter.replaceOpWithNewOp<RsqrtIntrOp>(op, opType, adaptor.a());
return success();		return success();
}		}
};		};

		struct DotOpConversion : public ConvertOpToLLVMPattern<DotOp> {
		using ConvertOpToLLVMPattern<DotOp>::ConvertOpToLLVMPattern;

		LogicalResult
		matchAndRewrite(DotOp op, ArrayRef<Value> operands,
		ConversionPatternRewriter &rewriter) const override {
		DotOp::Adaptor adaptor(operands);
		auto opType = adaptor.a().getType();
		Type llvmIntType = IntegerType::get(&getTypeConverter()->getContext(), 8);
		// Dot product of all elements, broadcasted to all elements.
		auto attr = rewriter.getI8IntegerAttr(0xff);
		Value scale =
		rewriter.create<LLVM::ConstantOp>(op.getLoc(), llvmIntType, attr);
		rewriter.replaceOpWithNewOp<DotIntrOp>(op, opType, adaptor.a(), adaptor.b(),
		scale);
		return success();
		}
		};

/// An entry associating the "main" AVX512 op with its instantiations for		/// An entry associating the "main" AVX512 op with its instantiations for
/// vectors of 32-bit and 64-bit elements.		/// vectors of 32-bit and 64-bit elements.
template <typename OpTy, typename Intr32OpTy, typename Intr64OpTy>		template <typename OpTy, typename Intr32OpTy, typename Intr64OpTy>
struct RegEntry {		struct RegEntry {
using MainOp = OpTy;		using MainOp = OpTy;
using Intr32Op = Intr32OpTy;		using Intr32Op = Intr32OpTy;
using Intr64Op = Intr64OpTy;		using Intr64Op = Intr64OpTy;
};		};
Show All 25 Lines	using Registry = RegistryImpl<
RegEntry<Vp2IntersectOp, Vp2IntersectDIntrOp, Vp2IntersectQIntrOp>>;		RegEntry<Vp2IntersectOp, Vp2IntersectDIntrOp, Vp2IntersectQIntrOp>>;

} // namespace		} // namespace

/// Populate the given list with patterns that convert from X86Vector to LLVM.		/// Populate the given list with patterns that convert from X86Vector to LLVM.
void mlir::populateX86VectorLegalizeForLLVMExportPatterns(		void mlir::populateX86VectorLegalizeForLLVMExportPatterns(
LLVMTypeConverter &converter, RewritePatternSet &patterns) {		LLVMTypeConverter &converter, RewritePatternSet &patterns) {
Registry::registerPatterns(converter, patterns);		Registry::registerPatterns(converter, patterns);
patterns.add<MaskCompressOpConversion, RsqrtOpConversion>(converter);		patterns.add<MaskCompressOpConversion, RsqrtOpConversion, DotOpConversion>(
		converter);
}		}

void mlir::configureX86VectorLegalizeForExportTarget(		void mlir::configureX86VectorLegalizeForExportTarget(
LLVMConversionTarget &target) {		LLVMConversionTarget &target) {
Registry::configureTarget(target);		Registry::configureTarget(target);
target.addLegalOp<MaskCompressIntrOp>();		target.addLegalOp<MaskCompressIntrOp>();
target.addIllegalOp<MaskCompressOp>();		target.addIllegalOp<MaskCompressOp>();
target.addLegalOp<RsqrtIntrOp>();		target.addLegalOp<RsqrtIntrOp>();
target.addIllegalOp<RsqrtOp>();		target.addIllegalOp<RsqrtOp>();
		target.addLegalOp<DotIntrOp>();
		target.addIllegalOp<DotOp>();
}		}

mlir/test/Dialect/X86Vector/legalize-for-llvm.mlir

	Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: func @avx_rsqrt			// CHECK-LABEL: func @avx_rsqrt
	func @avx_rsqrt(%a: vector<8xf32>) -> (vector<8xf32>)			func @avx_rsqrt(%a: vector<8xf32>) -> (vector<8xf32>)
	{			{
	// CHECK: x86vector.avx.intr.rsqrt.ps.256			// CHECK: x86vector.avx.intr.rsqrt.ps.256
	%0 = x86vector.avx.rsqrt %a : vector<8xf32>			%0 = x86vector.avx.rsqrt %a : vector<8xf32>
	return %0 : vector<8xf32>			return %0 : vector<8xf32>
	}			}

				// CHECK-LABEL: func @avx_dot
				func @avx_dot(%a: vector<8xf32>, %b: vector<8xf32>) -> (vector<8xf32>)
				{
				// CHECK: x86vector.avx.intr.dp.ps.256
				%0 = x86vector.avx.intr.dot %a, %b : vector<8xf32>
				return %0 : vector<8xf32>
				}

mlir/test/Dialect/X86Vector/roundtrip.mlir

	Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: func @avx_rsqrt			// CHECK-LABEL: func @avx_rsqrt
	func @avx_rsqrt(%a: vector<8xf32>) -> (vector<8xf32>)			func @avx_rsqrt(%a: vector<8xf32>) -> (vector<8xf32>)
	{			{
	// CHECK: x86vector.avx.rsqrt {{.*}} : vector<8xf32>			// CHECK: x86vector.avx.rsqrt {{.*}} : vector<8xf32>
	%0 = x86vector.avx.rsqrt %a : vector<8xf32>			%0 = x86vector.avx.rsqrt %a : vector<8xf32>
	return %0 : vector<8xf32>			return %0 : vector<8xf32>
	}			}

				// CHECK-LABEL: func @avx_dot
				func @avx_dot(%a: vector<8xf32>, %b: vector<8xf32>) -> (vector<8xf32>)
				{
				// CHECK: x86vector.avx.intr.dot {{.*}} : vector<8xf32>
				%0 = x86vector.avx.intr.dot %a, %b : vector<8xf32>
				return %0 : vector<8xf32>
				}

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-dot.mlir

This file was added.

				// RUN: mlir-opt %s -convert-scf-to-std -convert-vector-to-llvm="enable-x86vector" -convert-std-to-llvm \| \
				// RUN: mlir-translate --mlir-to-llvmir \| \
				// RUN: %lli --entry-function=entry --mattr="avx" --dlopen=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				func @entry() -> i32 {
				%i0 = constant 0 : i32
				%i4 = constant 4 : i32

				%a = std.constant dense<[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]> : vector<8xf32>
				%b = std.constant dense<[9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0]> : vector<8xf32>
				%r = x86vector.avx.intr.dot %a, %b : vector<8xf32>

				%1 = vector.extractelement %r[%i0 : i32]: vector<8xf32>
				%2 = vector.extractelement %r[%i4 : i32]: vector<8xf32>
				%d = addf %1, %2 : f32

				// CHECK: ( 110, 110, 110, 110, 382, 382, 382, 382 )
				// CHECK: 492
				vector.print %r : vector<8xf32>
				vector.print %d : f32

				return %i0 : i32
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][vector][avx] add AVX dot product to X86Vector dialect with loweringClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 337916

mlir/include/mlir/Dialect/X86Vector/X86Vector.td

mlir/lib/Dialect/X86Vector/Transforms/LegalizeForLLVMExport.cpp

mlir/test/Dialect/X86Vector/legalize-for-llvm.mlir

mlir/test/Dialect/X86Vector/roundtrip.mlir

mlir/test/Integration/Dialect/Vector/CPU/X86Vector/test-dot.mlir

[mlir][vector][avx] add AVX dot product to X86Vector dialect with lowering
ClosedPublic