This is an archive of the discontinued LLVM Phabricator instance.

mlir/include/mlir/Dialect/AVX512/AVX512.td
34	There are `_mm512_maskz_compress_ps` etc. intrinsics for zeroed compress, but they use the same AVX instruction as `_mm512_mask_compress_ps`: `mask.compress`. So technically we don't really need the zeroed variant as a separate operation, and can instead set `src := vector of all zeros`. Should I keep `MaskzCompressOp` or remove it from this commit? (In most cases (e.g., sparse dot product), we probably use the zeroed variant.)

Harbormaster completed remote builds in B91175: Diff 326883.Feb 27 2021, 2:05 AM

aartbik added inline comments.Feb 27 2021, 5:37 PM

mlir/include/mlir/Dialect/AVX512/AVX512.td
36	Maybe say "Register k ...." just so it does not look like a typo to people less familiar with the mask registers
47	put quotes around the names "a" so it does not look like a dangling article
62	same as above

ftynse added inline comments.Mar 1 2021, 5:14 AM

mlir/include/mlir/Dialect/AVX512/AVX512.td
34	I don't see the value proposition of the zeroed variant, especially since it lowers to the same LLVM IR intrinsic as the non-zeroed one. If you want to avoid looking up the `src` operand and checking if it is a constant zero vector, you can either have a unit attribute `zero` on the "main" op + make the `src` argument optional + add a verifier that either the operand or the attribute is present or have a `constant_src` attribute with _any_ value (not only zero), that can be used instead of the `src` argument + the same scheme as above. The latter is closer in spirit to Linalg, which is the dialect that will target this I presume.
47	Surrounding variable names with quotes or backticks is a generally good practice. Doing so for `k` in error messages in `TypesMatchWith` would also address the issue of misreading them as a typo.
mlir/include/mlir/Dialect/LLVMIR/LLVMAVX512.td
36	Naming nit: this is a specific kind of overloading that is fully defined by the first result, there may be other kinds so I wouldn't use `IntrOverloadedOp` here only to find out later that it cannot be used for, e.g., an op that is overloaded by its second operand.
42	Nit: add a comment for `1` for consistency with other literal arguments.

nicolasvasilache accepted this revision.Mar 1 2021, 5:31 AM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/AVX512/AVX512.td
34	Hmm I thought I had replied but apparently did not commit .. I'm be curious to see if using one vs the other can result in significant difference in the final assembly generated by LLVM in interesting cases (likely a few commits down the line until we have a full dot/matvec example). Technically there may be a difference for LLVM's optimizations. Not sure yet whether this matters in practice but I'd be interested in a confirmation based on data. Re. where the funneling should go, we could indeed drop 1 op and use an attribute. However, I think we would then need to make some changes to the autogenerated LLVMIR translation in non-trivial ways (I forget)? If so, is it worth it?

This revision is now accepted and ready to land.Mar 1 2021, 5:31 AM

Remove maskz.compress. Make src optional.

Herald added a subscriber: dcaballe. · View Herald TranscriptMar 4 2021, 8:08 PM

springerm added inline comments.Mar 4 2021, 8:09 PM

mlir/include/mlir/Dialect/AVX512/AVX512.td
34	I decided to add a `constant_src` attribute. If neither `constant_src` nor `src` is specified, a vector of all zeros is used (same as `maskz.compress`).
mlir/include/mlir/Dialect/LLVMIR/LLVMAVX512.td
36	Added a comment. I can change this if necessary when adding additional ops.

springerm edited the summary of this revision. (Show Details)Mar 4 2021, 8:10 PM

Harbormaster completed remote builds in B92207: Diff 328368.Mar 5 2021, 12:43 PM

Closed by commit rGacce0ea70c11: [mlir][AVX512] Add mask.compress to AVX512 dialect. (authored by springerm). · Explain WhyMar 5 2021, 5:03 PM

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rGacce0ea70c11: [mlir][AVX512] Add mask.compress to AVX512 dialect..

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

AVX512/

AVX512.td

36 lines

LLVMIR/

LLVMAVX512.td

14 lines

lib/

Conversion/

AVX512ToLLVM/

ConvertAVX512ToLLVM.cpp

29 lines

Dialect/

AVX512/

IR/

AVX512Dialect.cpp

16 lines

test/

Conversion/

AVX512ToLLVM/

convert-to-llvm.mlir

13 lines

Dialect/

AVX512/

roundtrip.mlir

13 lines

Integration/

Dialect/

Vector/

CPU/

AVX512/

test-mask-compress.mlir

27 lines

test-vp2intersect-i32.mlir

2 lines

Target/

avx512.mlir

10 lines

Diff 328703

mlir/include/mlir/Dialect/AVX512/AVX512.td

	Show All 25 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// AVX512 op definitions			// AVX512 op definitions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	class AVX512_Op<string mnemonic, list<OpTrait> traits = []> :			class AVX512_Op<string mnemonic, list<OpTrait> traits = []> :
	Op<AVX512_Dialect, mnemonic, traits> {}			Op<AVX512_Dialect, mnemonic, traits> {}

				def MaskCompressOp : AVX512_Op<"mask.compress", [NoSideEffect,
				springermAuthorUnsubmitted Done Reply Inline Actions There are `_mm512_maskz_compress_ps` etc. intrinsics for zeroed compress, but they use the same AVX instruction as `_mm512_mask_compress_ps`: `mask.compress`. So technically we don't really need the zeroed variant as a separate operation, and can instead set `src := vector of all zeros`. Should I keep `MaskzCompressOp` or remove it from this commit? (In most cases (e.g., sparse dot product), we probably use the zeroed variant.) springerm: There are `_mm512_maskz_compress_ps` etc. intrinsics for zeroed compress, but they use the same…
				ftynseUnsubmitted Done Reply Inline Actions I don't see the value proposition of the zeroed variant, especially since it lowers to the same LLVM IR intrinsic as the non-zeroed one. If you want to avoid looking up the `src` operand and checking if it is a constant zero vector, you can either have a unit attribute `zero` on the "main" op + make the `src` argument optional + add a verifier that either the operand or the attribute is present or have a `constant_src` attribute with _any_ value (not only zero), that can be used instead of the `src` argument + the same scheme as above. The latter is closer in spirit to Linalg, which is the dialect that will target this I presume. ftynse: I don't see the value proposition of the zeroed variant, especially since it lowers to the same…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Hmm I thought I had replied but apparently did not commit .. I'm be curious to see if using one vs the other can result in significant difference in the final assembly generated by LLVM in interesting cases (likely a few commits down the line until we have a full dot/matvec example). Technically there may be a difference for LLVM's optimizations. Not sure yet whether this matters in practice but I'd be interested in a confirmation based on data. Re. where the funneling should go, we could indeed drop 1 op and use an attribute. However, I think we would then need to make some changes to the autogenerated LLVMIR translation in non-trivial ways (I forget)? If so, is it worth it? nicolasvasilache: Hmm I thought I had replied but apparently did not commit .. I'm be curious to see if using…
				springermAuthorUnsubmitted Done Reply Inline Actions I decided to add a `constant_src` attribute. If neither `constant_src` nor `src` is specified, a vector of all zeros is used (same as `maskz.compress`). springerm: I decided to add a `constant_src` attribute. If neither `constant_src` nor `src` is specified…
				// TODO: Support optional arguments in `AllTypesMatch`. "type($src)" could
				// then be removed from assemblyFormat.
				aartbikUnsubmitted Done Reply Inline Actions Maybe say "Register k ...." just so it does not look like a typo to people less familiar with the mask registers aartbik: Maybe say "Register k ...." just so it does not look like a typo to people less familiar with…
				AllTypesMatch<["a", "dst"]>,
				TypesMatchWith<"`k` has the same number of bits as elements in `dst`",
				"dst", "k",
				"VectorType::get({$_self.cast<VectorType>().getShape()[0]}, "
				"IntegerType::get($_self.getContext(), 1))">]> {
				let summary = "Masked compress op";
				let description = [{
				The mask.compress op is an AVX512 specific op that can lower to the
				`llvm.mask.compress` instruction. Instead of `src`, a constant vector
				vector attribute `constant_src` may be specified. If neither `src` nor
				`constant_src` is specified, the remaining elements in the result vector are
				aartbikUnsubmitted Done Reply Inline Actions put quotes around the names "a" so it does not look like a dangling article aartbik: put quotes around the names "a" so it does not look like a dangling article
				ftynseUnsubmitted Done Reply Inline Actions Surrounding variable names with quotes or backticks is a generally good practice. Doing so for `k` in error messages in `TypesMatchWith` would also address the issue of misreading them as a typo. ftynse: Surrounding variable names with quotes or backticks is a generally good practice. Doing so for…
				set to zero.

				#### From the Intel Intrinsics Guide:

				Contiguously store the active integer/floating-point elements in `a` (those
				with their respective bit set in writemask `k`) to `dst`, and pass through the
				remaining elements from `src`.
				}];
				let verifier = [{ return ::verify(*this); }];
				let arguments = (ins VectorOfLengthAndType<[16, 16, 8, 8],
				[I1, I1, I1, I1]>:$k,
				VectorOfLengthAndType<[16, 16, 8, 8],
				[F32, I32, F64, I64]>:$a,
				Optional<VectorOfLengthAndType<[16, 16, 8, 8],
				[F32, I32, F64, I64]>>:$src,
				aartbikUnsubmitted Done Reply Inline Actions same as above aartbik: same as above
				OptionalAttr<ElementsAttr>:$constant_src);
				let results = (outs VectorOfLengthAndType<[16, 16, 8, 8],
				[F32, I32, F64, I64]>:$dst);
				let assemblyFormat = "$k `,` $a (`,` $src^)? attr-dict"
				" `:` type($dst) (`,` type($src)^)?";
				}

	def MaskRndScaleOp : AVX512_Op<"mask.rndscale", [NoSideEffect,			def MaskRndScaleOp : AVX512_Op<"mask.rndscale", [NoSideEffect,
	AllTypesMatch<["src", "a", "dst"]>,			AllTypesMatch<["src", "a", "dst"]>,
	TypesMatchWith<"imm has the same number of bits as elements in dst",			TypesMatchWith<"imm has the same number of bits as elements in dst",
	"dst", "imm",			"dst", "imm",
	"IntegerType::get($_self.getContext(), "			"IntegerType::get($_self.getContext(), "
	"($_self.cast<VectorType>().getShape()[0]))">]> {			"($_self.cast<VectorType>().getShape()[0]))">]> {
	let summary = "Masked roundscale op";			let summary = "Masked roundscale op";
	let description = [{			let description = [{
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/LLVMIR/LLVMAVX512.td

	Show All 27 Lines
	// MLIR LLVM AVX512 intrinsics using the MLIR LLVM Dialect type system			// MLIR LLVM AVX512 intrinsics using the MLIR LLVM Dialect type system
	//----------------------------------------------------------------------------//			//----------------------------------------------------------------------------//

	class LLVMAVX512_IntrOp<string mnemonic, int numResults, list<OpTrait> traits = []> :			class LLVMAVX512_IntrOp<string mnemonic, int numResults, list<OpTrait> traits = []> :
	LLVM_IntrOpBase<LLVMAVX512_Dialect, mnemonic,			LLVM_IntrOpBase<LLVMAVX512_Dialect, mnemonic,
	"x86_avx512_" # !subst(".", "_", mnemonic),			"x86_avx512_" # !subst(".", "_", mnemonic),
	[], [], traits, numResults>;			[], [], traits, numResults>;

				// Defined by first result overload. May have to be extended for other
				ftynseUnsubmitted Not Done Reply Inline Actions Naming nit: this is a specific kind of overloading that is fully defined by the first result, there may be other kinds so I wouldn't use `IntrOverloadedOp` here only to find out later that it cannot be used for, e.g., an op that is overloaded by its second operand. ftynse: Naming nit: this is a specific kind of overloading that is fully defined by the first result…
				springermAuthorUnsubmitted Done Reply Inline Actions Added a comment. I can change this if necessary when adding additional ops. springerm: Added a comment. I can change this if necessary when adding additional ops.
				// instructions in the future.
				class LLVMAVX512_IntrOverloadedOp<string mnemonic,
				list<OpTrait> traits = []> :
				LLVM_IntrOpBase<LLVMAVX512_Dialect, mnemonic,
				"x86_avx512_" # !subst(".", "_", mnemonic),
				/list<int> overloadedResults=/[0],
				ftynseUnsubmitted Done Reply Inline Actions Nit: add a comment for `1` for consistency with other literal arguments. ftynse: Nit: add a comment for `1` for consistency with other literal arguments.
				/list<int> overloadedOperands=/[],
				traits, /numResults=/1>;

	def LLVM_x86_avx512_mask_rndscale_ps_512 :			def LLVM_x86_avx512_mask_rndscale_ps_512 :
	LLVMAVX512_IntrOp<"mask.rndscale.ps.512", 1>,			LLVMAVX512_IntrOp<"mask.rndscale.ps.512", 1>,
	Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;			Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

	def LLVM_x86_avx512_mask_rndscale_pd_512 :			def LLVM_x86_avx512_mask_rndscale_pd_512 :
	LLVMAVX512_IntrOp<"mask.rndscale.pd.512", 1>,			LLVMAVX512_IntrOp<"mask.rndscale.pd.512", 1>,
	Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;			Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

	def LLVM_x86_avx512_mask_scalef_ps_512 :			def LLVM_x86_avx512_mask_scalef_ps_512 :
	LLVMAVX512_IntrOp<"mask.scalef.ps.512", 1>,			LLVMAVX512_IntrOp<"mask.scalef.ps.512", 1>,
	Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;			Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

	def LLVM_x86_avx512_mask_scalef_pd_512 :			def LLVM_x86_avx512_mask_scalef_pd_512 :
	LLVMAVX512_IntrOp<"mask.scalef.pd.512", 1>,			LLVMAVX512_IntrOp<"mask.scalef.pd.512", 1>,
	Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;			Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				def LLVM_x86_avx512_mask_compress :
				LLVMAVX512_IntrOverloadedOp<"mask.compress">,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type)>;

	def LLVM_x86_avx512_vp2intersect_d_512 :			def LLVM_x86_avx512_vp2intersect_d_512 :
	LLVMAVX512_IntrOp<"vp2intersect.d.512", 2>,			LLVMAVX512_IntrOp<"vp2intersect.d.512", 2>,
	Arguments<(ins LLVM_Type, LLVM_Type)>;			Arguments<(ins LLVM_Type, LLVM_Type)>;

	def LLVM_x86_avx512_vp2intersect_q_512 :			def LLVM_x86_avx512_vp2intersect_q_512 :
	LLVMAVX512_IntrOp<"vp2intersect.q.512", 2>,			LLVMAVX512_IntrOp<"vp2intersect.q.512", 2>,
	Arguments<(ins LLVM_Type, LLVM_Type)>;			Arguments<(ins LLVM_Type, LLVM_Type)>;

	#endif // AVX512_OPS			#endif // AVX512_OPS

mlir/lib/Conversion/AVX512ToLLVM/ConvertAVX512ToLLVM.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	matchAndRewrite(Operation *op, ArrayRef<Value> operands,
if (elementType.isF64())		if (elementType.isF64())
return LLVM::detail::oneToOneRewrite(		return LLVM::detail::oneToOneRewrite(
op, LLVM::x86_avx512_mask_rndscale_pd_512::getOperationName(),		op, LLVM::x86_avx512_mask_rndscale_pd_512::getOperationName(),
operands, *getTypeConverter(), rewriter);		operands, *getTypeConverter(), rewriter);
return failure();		return failure();
}		}
};		};

		struct MaskCompressOpConversion
		: public ConvertOpToLLVMPattern<MaskCompressOp> {
		using ConvertOpToLLVMPattern<MaskCompressOp>::ConvertOpToLLVMPattern;

		LogicalResult
		matchAndRewrite(MaskCompressOp op, ArrayRef<Value> operands,
		ConversionPatternRewriter &rewriter) const override {
		MaskCompressOp::Adaptor adaptor(operands);
		auto opType = adaptor.a().getType();

		Value src;
		if (op.src()) {
		src = adaptor.src();
		} else if (op.constant_src()) {
		src = rewriter.create<ConstantOp>(op.getLoc(), opType,
		op.constant_srcAttr());
		} else {
		Attribute zeroAttr = rewriter.getZeroAttr(opType);
		src = rewriter.create<ConstantOp>(op->getLoc(), opType, zeroAttr);
		}

		rewriter.replaceOpWithNewOp<LLVM::x86_avx512_mask_compress>(
		op, opType, adaptor.a(), src, adaptor.k());

		return success();
		}
		};

struct ScaleFOp512Conversion : public ConvertToLLVMPattern {		struct ScaleFOp512Conversion : public ConvertToLLVMPattern {
explicit ScaleFOp512Conversion(MLIRContext *context,		explicit ScaleFOp512Conversion(MLIRContext *context,
LLVMTypeConverter &typeConverter)		LLVMTypeConverter &typeConverter)
: ConvertToLLVMPattern(MaskScaleFOp::getOperationName(), context,		: ConvertToLLVMPattern(MaskScaleFOp::getOperationName(), context,
typeConverter) {}		typeConverter) {}

LogicalResult		LogicalResult
matchAndRewrite(Operation *op, ArrayRef<Value> operands,		matchAndRewrite(Operation *op, ArrayRef<Value> operands,
Show All 38 Lines
/// Populate the given list with patterns that convert from AVX512 to LLVM.		/// Populate the given list with patterns that convert from AVX512 to LLVM.
void mlir::populateAVX512ToLLVMConversionPatterns(		void mlir::populateAVX512ToLLVMConversionPatterns(
LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {		LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {
// clang-format off		// clang-format off
patterns.insert<MaskRndScaleOp512Conversion,		patterns.insert<MaskRndScaleOp512Conversion,
ScaleFOp512Conversion,		ScaleFOp512Conversion,
Vp2IntersectOp512Conversion>(&converter.getContext(),		Vp2IntersectOp512Conversion>(&converter.getContext(),
converter);		converter);
		patterns.insert<MaskCompressOpConversion>(converter);
// clang-format on		// clang-format on
}		}

mlir/lib/Dialect/AVX512/IR/AVX512Dialect.cpp

	Show All 19 Lines

	void avx512::AVX512Dialect::initialize() {			void avx512::AVX512Dialect::initialize() {
	addOperations<			addOperations<
	#define GET_OP_LIST			#define GET_OP_LIST
	#include "mlir/Dialect/AVX512/AVX512.cpp.inc"			#include "mlir/Dialect/AVX512/AVX512.cpp.inc"
	>();			>();
	}			}

				static LogicalResult verify(avx512::MaskCompressOp op) {
				if (op.src() && op.constant_src())
				return emitError(op.getLoc(), "cannot use both src and constant_src");

				if (op.src() && (op.src().getType() != op.dst().getType()))
				return emitError(op.getLoc(),
				"failed to verify that src and dst have same type");

				if (op.constant_src() && (op.constant_src()->getType() != op.dst().getType()))
				return emitError(
				op.getLoc(),
				"failed to verify that constant_src and dst have same type");

				return success();
				}

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/AVX512/AVX512.cpp.inc"			#include "mlir/Dialect/AVX512/AVX512.cpp.inc"

mlir/test/Conversion/AVX512ToLLVM/convert-to-llvm.mlir

Show All 11 Lines	func @avx512_mask_rndscale(%a: vector<16xf32>, %b: vector<8xf64>, %i32: i32, %i16: i16, %i8: i8)
%2 = avx512.mask.scalef %a, %a, %a, %i16, %i32: vector<16xf32>		%2 = avx512.mask.scalef %a, %a, %a, %i16, %i32: vector<16xf32>
// CHECK: llvm_avx512.mask.scalef.pd.512		// CHECK: llvm_avx512.mask.scalef.pd.512
%3 = avx512.mask.scalef %b, %b, %b, %i8, %i32: vector<8xf64>		%3 = avx512.mask.scalef %b, %b, %b, %i8, %i32: vector<8xf64>

// Keep results alive.		// Keep results alive.
return %0, %1, %2, %3 : vector<16xf32>, vector<8xf64>, vector<16xf32>, vector<8xf64>		return %0, %1, %2, %3 : vector<16xf32>, vector<8xf64>, vector<16xf32>, vector<8xf64>
}		}

		func @avx512_mask_compress(%k1: vector<16xi1>, %a1: vector<16xf32>,
		%k2: vector<8xi1>, %a2: vector<8xi64>)
		-> (vector<16xf32>, vector<16xf32>, vector<8xi64>)
		{
		// CHECK: llvm_avx512.mask.compress
		%0 = avx512.mask.compress %k1, %a1 : vector<16xf32>
		// CHECK: llvm_avx512.mask.compress
		%1 = avx512.mask.compress %k1, %a1 {constant_src = dense<5.0> : vector<16xf32>} : vector<16xf32>
		// CHECK: llvm_avx512.mask.compress
		%2 = avx512.mask.compress %k2, %a2, %a2 : vector<8xi64>, vector<8xi64>
		return %0, %1, %2 : vector<16xf32>, vector<16xf32>, vector<8xi64>
		}

func @avx512_vp2intersect(%a: vector<16xi32>, %b: vector<8xi64>)		func @avx512_vp2intersect(%a: vector<16xi32>, %b: vector<8xi64>)
-> (vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>)		-> (vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>)
{		{
// CHECK: llvm_avx512.vp2intersect.d.512		// CHECK: llvm_avx512.vp2intersect.d.512
%0, %1 = avx512.vp2intersect %a, %a : vector<16xi32>		%0, %1 = avx512.vp2intersect %a, %a : vector<16xi32>
// CHECK: llvm_avx512.vp2intersect.q.512		// CHECK: llvm_avx512.vp2intersect.q.512
%2, %3 = avx512.vp2intersect %b, %b : vector<8xi64>		%2, %3 = avx512.vp2intersect %b, %b : vector<8xi64>
return %0, %1, %2, %3 : vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>		return %0, %1, %2, %3 : vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>
}		}

mlir/test/Dialect/AVX512/roundtrip.mlir

Show All 23 Lines	func @avx512_vp2intersect(%a: vector<16xi32>, %b: vector<8xi64>)
-> (vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>)		-> (vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>)
{		{
// CHECK: avx512.vp2intersect {{.*}} : vector<16xi32>		// CHECK: avx512.vp2intersect {{.*}} : vector<16xi32>
%0, %1 = avx512.vp2intersect %a, %a : vector<16xi32>		%0, %1 = avx512.vp2intersect %a, %a : vector<16xi32>
// CHECK: avx512.vp2intersect {{.*}} : vector<8xi64>		// CHECK: avx512.vp2intersect {{.*}} : vector<8xi64>
%2, %3 = avx512.vp2intersect %b, %b : vector<8xi64>		%2, %3 = avx512.vp2intersect %b, %b : vector<8xi64>
return %0, %1, %2, %3 : vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>		return %0, %1, %2, %3 : vector<16xi1>, vector<16xi1>, vector<8xi1>, vector<8xi1>
}		}

		func @avx512_mask_compress(%k1: vector<16xi1>, %a1: vector<16xf32>,
		%k2: vector<8xi1>, %a2: vector<8xi64>)
		-> (vector<16xf32>, vector<16xf32>, vector<8xi64>)
		{
		// CHECK: avx512.mask.compress {{.*}} : vector<16xf32>
		%0 = avx512.mask.compress %k1, %a1 : vector<16xf32>
		// CHECK: avx512.mask.compress {{.*}} : vector<16xf32>
		%1 = avx512.mask.compress %k1, %a1 {constant_src = dense<5.0> : vector<16xf32>} : vector<16xf32>
		// CHECK: avx512.mask.compress {{.*}} : vector<8xi64>
		%2 = avx512.mask.compress %k2, %a2, %a2 : vector<8xi64>, vector<8xi64>
		return %0, %1, %2 : vector<16xf32>, vector<16xf32>, vector<8xi64>
		}

mlir/test/Integration/Dialect/Vector/CPU/AVX512/test-mask-compress.mlir

This file was added.

				// RUN: mlir-opt %s -convert-scf-to-std -convert-vector-to-llvm="enable-avx512" -convert-std-to-llvm \| \
				// RUN: mlir-translate --mlir-to-llvmir \| \
				// RUN: %lli --entry-function=entry --mattr="avx512bw" --dlopen=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				func @entry() -> i32 {
				%i0 = constant 0 : i32

				%a = std.constant dense<[1., 0., 0., 2., 4., 3., 5., 7., 8., 1., 5., 5., 3., 1., 0., 7.]> : vector<16xf32>
				%k = std.constant dense<[1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0]> : vector<16xi1>
				%r1 = avx512.mask.compress %k, %a : vector<16xf32>
				%r2 = avx512.mask.compress %k, %a {constant_src = dense<5.0> : vector<16xf32>} : vector<16xf32>

				vector.print %r1 : vector<16xf32>
				// CHECK: ( 1, 0, 2, 4, 5, 5, 3, 1, 0, 0, 0, 0, 0, 0, 0, 0 )

				vector.print %r2 : vector<16xf32>
				// CHECK: ( 1, 0, 2, 4, 5, 5, 3, 1, 0, 5, 5, 5, 5, 5, 5, 5 )

				%src = std.constant dense<[0., 2., 1., 8., 6., 4., 4., 3., 2., 8., 5., 6., 3., 7., 6., 9.]> : vector<16xf32>
				%r3 = avx512.mask.compress %k, %a, %src : vector<16xf32>, vector<16xf32>

				vector.print %r3 : vector<16xf32>
				// CHECK: ( 1, 0, 2, 4, 5, 5, 3, 1, 0, 8, 5, 6, 3, 7, 6, 9 )

				return %i0 : i32
				}

mlir/test/Integration/Dialect/Vector/CPU/AVX512/test-vp2intersect-i32.mlir

	// RUN: mlir-opt %s -convert-scf-to-std -convert-vector-to-llvm="enable-avx512" -convert-std-to-llvm \| \			// RUN: mlir-opt %s -convert-scf-to-std -convert-vector-to-llvm="enable-avx512" -convert-std-to-llvm \| \
	// RUN: mlir-translate --avx512-mlir-to-llvmir \| \			// RUN: mlir-translate --mlir-to-llvmir \| \
	// RUN: %lli --entry-function=entry --mattr="avx512bw,avx512vp2intersect" --dlopen=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \			// RUN: %lli --entry-function=entry --mattr="avx512bw,avx512vp2intersect" --dlopen=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// Note: To run this test, your CPU must support AVX512 vp2intersect.			// Note: To run this test, your CPU must support AVX512 vp2intersect.

	func @entry() -> i32 {			func @entry() -> i32 {
	%i0 = constant 0 : i32			%i0 = constant 0 : i32
	%i1 = constant 1: i32			%i1 = constant 1: i32
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

mlir/test/Target/avx512.mlir

Show All 24 Lines	llvm.func @LLVM_x86_avx512_mask_pd_512(%a: vector<8xf64>,
%0 = "llvm_avx512.mask.rndscale.pd.512"(%a, %b, %a, %c, %b) :		%0 = "llvm_avx512.mask.rndscale.pd.512"(%a, %b, %a, %c, %b) :
(vector<8xf64>, i32, vector<8xf64>, i8, i32) -> vector<8xf64>		(vector<8xf64>, i32, vector<8xf64>, i8, i32) -> vector<8xf64>
// CHECK: call <8 x double> @llvm.x86.avx512.mask.scalef.pd.512(<8 x double>		// CHECK: call <8 x double> @llvm.x86.avx512.mask.scalef.pd.512(<8 x double>
%1 = "llvm_avx512.mask.scalef.pd.512"(%a, %a, %a, %c, %b) :		%1 = "llvm_avx512.mask.scalef.pd.512"(%a, %a, %a, %c, %b) :
(vector<8xf64>, vector<8xf64>, vector<8xf64>, i8, i32) -> vector<8xf64>		(vector<8xf64>, vector<8xf64>, vector<8xf64>, i8, i32) -> vector<8xf64>
llvm.return %1: vector<8xf64>		llvm.return %1: vector<8xf64>
}		}

		// CHECK-LABEL: define <16 x float> @LLVM_x86_mask_compress
		llvm.func @LLVM_x86_mask_compress(%k: vector<16xi1>, %a: vector<16xf32>)
		-> vector<16xf32>
		{
		// CHECK: call <16 x float> @llvm.x86.avx512.mask.compress.v16f32(
		%0 = "llvm_avx512.mask.compress"(%a, %a, %k) :
		(vector<16xf32>, vector<16xf32>, vector<16xi1>) -> vector<16xf32>
		llvm.return %0 : vector<16xf32>
		}

// CHECK-LABEL: define { <16 x i1>, <16 x i1> } @LLVM_x86_vp2intersect_d_512		// CHECK-LABEL: define { <16 x i1>, <16 x i1> } @LLVM_x86_vp2intersect_d_512
llvm.func @LLVM_x86_vp2intersect_d_512(%a: vector<16xi32>, %b: vector<16xi32>)		llvm.func @LLVM_x86_vp2intersect_d_512(%a: vector<16xi32>, %b: vector<16xi32>)
-> !llvm.struct<(vector<16 x i1>, vector<16 x i1>)>		-> !llvm.struct<(vector<16 x i1>, vector<16 x i1>)>
{		{
// CHECK: call { <16 x i1>, <16 x i1> } @llvm.x86.avx512.vp2intersect.d.512(<16 x i32>		// CHECK: call { <16 x i1>, <16 x i1> } @llvm.x86.avx512.vp2intersect.d.512(<16 x i32>
%0 = "llvm_avx512.vp2intersect.d.512"(%a, %b) :		%0 = "llvm_avx512.vp2intersect.d.512"(%a, %b) :
(vector<16xi32>, vector<16xi32>) -> !llvm.struct<(vector<16 x i1>, vector<16 x i1>)>		(vector<16xi32>, vector<16xi32>) -> !llvm.struct<(vector<16 x i1>, vector<16 x i1>)>
llvm.return %0 : !llvm.struct<(vector<16 x i1>, vector<16 x i1>)>		llvm.return %0 : !llvm.struct<(vector<16 x i1>, vector<16 x i1>)>
Show All 11 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][AVX512] Add mask.compress to AVX512 dialect.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 328703

mlir/include/mlir/Dialect/AVX512/AVX512.td

mlir/include/mlir/Dialect/LLVMIR/LLVMAVX512.td

mlir/lib/Conversion/AVX512ToLLVM/ConvertAVX512ToLLVM.cpp

mlir/lib/Dialect/AVX512/IR/AVX512Dialect.cpp

mlir/test/Conversion/AVX512ToLLVM/convert-to-llvm.mlir

mlir/test/Dialect/AVX512/roundtrip.mlir

mlir/test/Integration/Dialect/Vector/CPU/AVX512/test-mask-compress.mlir

mlir/test/Integration/Dialect/Vector/CPU/AVX512/test-vp2intersect-i32.mlir

mlir/test/Target/avx512.mlir

[mlir][AVX512] Add mask.compress to AVX512 dialect.
ClosedPublic