This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Support masking for more contraction flavors
ClosedPublic

Authored by dcaballe on Feb 16 2023, 6:44 PM.

Download Raw Diff

Details

Reviewers

ThomasRaoux
aartbik
ftynse
nicolasvasilache

Commits

rGc339f9e1c327: [mlir][Vector] Support masking for more contraction flavors

Summary

This patch adds masking support for more contraction flavors including those
with any combiner operation (add, mul, min, max, and, or, etc.) and
regular matmul contractions.

Combiner operations that are performing vertical reductions (and,
therefore, they are not represented with a horizontal reduction
operation) can be executed unmasked. However, the previous value of
the accumulator must be propagated for lanes that shouldn't accumulate.
We achieve this goal by introducing a select operation after the
accumulator to choose between the combined and the previous accumulator
value. This design decision is made to avoid introducing masking support
to all the arithmetic and logical operations in the Arith dialect. VP
intrinsics do not support pass-thru values either so we would have to
generate the same sequence when lowering to LLVM. The op + select
pattern is peepholed by some backend with native masking support for those
operations.

Consequently, this patch removes masking support from the vector.fma
operation to follow the same approach for all the combiner operations.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dcaballe created this revision.Feb 16 2023, 6:44 PM

Herald added a reviewer: aartbik. · View Herald TranscriptFeb 16 2023, 6:44 PM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 24 others. · View Herald Transcript

dcaballe requested review of this revision.Feb 16 2023, 6:44 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 16 2023, 6:44 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: • pcwang-thead, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B214313: Diff 498233.Feb 16 2023, 7:06 PM

LGTM

This revision is now accepted and ready to land.Feb 21 2023, 11:54 AM

This revision was landed with ongoing or failed builds.Feb 21 2023, 5:52 PM

Closed by commit rGc339f9e1c327: [mlir][Vector] Support masking for more contraction flavors (authored by dcaballe). · Explain Why

This revision was automatically updated to reflect the committed changes.

dcaballe added a commit: rGc339f9e1c327: [mlir][Vector] Support masking for more contraction flavors.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

IR/

VectorOps.h

15 lines

VectorOps.td

1 line

lib/

Conversion/

VectorToLLVM/

ConvertVectorToLLVM.cpp

52 lines

Dialect/

Vector/

IR/

VectorOps.cpp

121 lines

Transforms/

VectorTransforms.cpp

36 lines

test/

Conversion/

VectorToLLVM/

vector-to-llvm.mlir

140 lines

Dialect/

Vector/

vector-contract-transforms.mlir

72 lines

Diff 499335

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h

	Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines
	/// Return true if we can prove that the transfer operations access disjoint			/// Return true if we can prove that the transfer operations access disjoint
	/// memory.			/// memory.
	bool isDisjointTransferSet(VectorTransferOpInterface transferA,			bool isDisjointTransferSet(VectorTransferOpInterface transferA,
	VectorTransferOpInterface transferB);			VectorTransferOpInterface transferB);

	/// Return the result value of reducing two scalar/vector values with the			/// Return the result value of reducing two scalar/vector values with the
	/// corresponding arith operation.			/// corresponding arith operation.
	Value makeArithReduction(OpBuilder &b, Location loc, CombiningKind kind,			Value makeArithReduction(OpBuilder &b, Location loc, CombiningKind kind,
	Value v1, Value v2);			Value v1, Value acc, Value mask = Value());

	/// Returns true if `attr` has "parallel" iterator type semantics.			/// Returns true if `attr` has "parallel" iterator type semantics.
	inline bool isParallelIterator(Attribute attr) {			inline bool isParallelIterator(Attribute attr) {
	return attr.cast<IteratorTypeAttr>().getValue() == IteratorType::parallel;			return attr.cast<IteratorTypeAttr>().getValue() == IteratorType::parallel;
	}			}

	/// Returns true if `attr` has "reduction" iterator type semantics.			/// Returns true if `attr` has "reduction" iterator type semantics.
	inline bool isReductionIterator(Attribute attr) {			inline bool isReductionIterator(Attribute attr) {
	return attr.cast<IteratorTypeAttr>().getValue() == IteratorType::reduction;			return attr.cast<IteratorTypeAttr>().getValue() == IteratorType::reduction;
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Vector Masking Utilities			// Vector Masking Utilities
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Create the vector.yield-ended region of a vector.mask op with `maskableOp`			/// Create the vector.yield-ended region of a vector.mask op with `maskableOp`
	/// as masked operation.			/// as masked operation.
	void createMaskOpRegion(OpBuilder &builder, Operation *maskableOp);			void createMaskOpRegion(OpBuilder &builder, Operation *maskableOp);

	/// Creates a vector.mask operation around a maskable operation. Returns the			/// Creates a vector.mask operation around a maskable operation. Returns the
	/// vector.mask operation if the mask provided is valid. Otherwise, returns the			/// vector.mask operation if the mask provided is valid. Otherwise, returns the
	/// maskable operation itself.			/// maskable operation itself.
	Operation maskOperation(RewriterBase &rewriter, Operation maskableOp,			Operation maskOperation(OpBuilder &builder, Operation maskableOp,
	Value mask);			Value mask, Value passthru = Value());

				/// Creates a vector select operation that picks values from `newValue` or
				/// `passthru` for each result vector lane based on `mask`. This utility is used
				/// to propagate the pass-thru value for masked-out or expeculatively executed
				/// lanes. VP intrinsics do not support pass-thru values and every mask-out lane
				/// is set to poison. LLVM backends are usually able to match op + select
				/// patterns and fold them into a native target instructions.
				Value selectPassthru(OpBuilder &builder, Value mask, Value newValue,
				Value passthru);

	} // namespace vector			} // namespace vector
	} // namespace mlir			} // namespace mlir

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/Vector/IR/VectorOps.h.inc"			#include "mlir/Dialect/Vector/IR/VectorOps.h.inc"
	#include "mlir/Dialect/Vector/IR/VectorOpsDialect.h.inc"			#include "mlir/Dialect/Vector/IR/VectorOpsDialect.h.inc"

	#endif // MLIR_DIALECT_VECTOR_IR_VECTOROPS_H			#endif // MLIR_DIALECT_VECTOR_IR_VECTOROPS_H

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	def Vector_ExtractOp :
let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasFolder = 1;		let hasFolder = 1;
let hasVerifier = 1;		let hasVerifier = 1;
}		}

def Vector_FMAOp :		def Vector_FMAOp :
Op<Vector_Dialect, "fma", [		Op<Vector_Dialect, "fma", [
Pure, AllTypesMatch<["lhs", "rhs", "acc", "result"]>,		Pure, AllTypesMatch<["lhs", "rhs", "acc", "result"]>,
DeclareOpInterfaceMethods<MaskableOpInterface>,
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>		DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>
] # ElementwiseMappable.traits>,		] # ElementwiseMappable.traits>,
Arguments<(ins VectorOfAnyRankOf<[AnyFloat]>:$lhs,		Arguments<(ins VectorOfAnyRankOf<[AnyFloat]>:$lhs,
VectorOfAnyRankOf<[AnyFloat]>:$rhs,		VectorOfAnyRankOf<[AnyFloat]>:$rhs,
VectorOfAnyRankOf<[AnyFloat]>:$acc)>,		VectorOfAnyRankOf<[AnyFloat]>:$acc)>,
Results<(outs VectorOfAnyRankOf<[AnyFloat]>:$result)> {		Results<(outs VectorOfAnyRankOf<[AnyFloat]>:$result)> {
let summary = "vector fused multiply-add";		let summary = "vector fused multiply-add";
let description = [{		let description = [{
▲ Show 20 Lines • Show All 2,235 Lines • Show Last 20 Lines

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

Show First 20 Lines • Show All 698 Lines • ▼ Show 20 Lines	matchAndRewrite(vector::ReductionOp reductionOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
auto kind = reductionOp.getKind();		auto kind = reductionOp.getKind();
Type eltType = reductionOp.getDest().getType();		Type eltType = reductionOp.getDest().getType();
Type llvmType = typeConverter->convertType(eltType);		Type llvmType = typeConverter->convertType(eltType);
Value operand = adaptor.getVector();		Value operand = adaptor.getVector();
Value acc = adaptor.getAcc();		Value acc = adaptor.getAcc();
Location loc = reductionOp.getLoc();		Location loc = reductionOp.getLoc();

// Masked reductions are lowered separately.
auto maskableOp = cast<MaskableOpInterface>(reductionOp.getOperation());
if (maskableOp.isMasked())
return failure();

if (eltType.isIntOrIndex()) {		if (eltType.isIntOrIndex()) {
// Integer reductions: add/mul/min/max/and/or/xor.		// Integer reductions: add/mul/min/max/and/or/xor.
Value result;		Value result;
switch (kind) {		switch (kind) {
case vector::CombiningKind::ADD:		case vector::CombiningKind::ADD:
result =		result =
createIntegerReductionArithmeticOpLowering<LLVM::vector_reduce_add,		createIntegerReductionArithmeticOpLowering<LLVM::vector_reduce_add,
LLVM::AddOp>(		LLVM::AddOp>(
▲ Show 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	public:

LogicalResult		LogicalResult
matchAndRewrite(vector::FMAOp fmaOp, OpAdaptor adaptor,		matchAndRewrite(vector::FMAOp fmaOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
VectorType vType = fmaOp.getVectorType();		VectorType vType = fmaOp.getVectorType();
if (vType.getRank() > 1)		if (vType.getRank() > 1)
return failure();		return failure();

// Masked fmas are lowered separately.
auto maskableOp = cast<MaskableOpInterface>(fmaOp.getOperation());
if (maskableOp.isMasked())
return failure();

rewriter.replaceOpWithNewOp<LLVM::FMulAddOp>(		rewriter.replaceOpWithNewOp<LLVM::FMulAddOp>(
fmaOp, adaptor.getLhs(), adaptor.getRhs(), adaptor.getAcc());		fmaOp, adaptor.getLhs(), adaptor.getRhs(), adaptor.getAcc());
return success();		return success();
}		}
};		};

/// Conversion pattern that turns a masked vector.fma on a 1-D vector into their
/// LLVM counterpart representation. Non side effecting VP intrinsics are not
/// fully supported by some backends, including x86, and they don't support
/// pass-through values either. For these reasons, we generate an unmasked
/// fma followed by a select instrution to emulate the masking behavior.
/// This pattern is peepholed by some backends with support for masked fma
/// instructions. This pattern does not match vectors of n >= 2 rank.
class MaskedFMAOp1DConversion
: public VectorMaskOpConversionBase<vector::FMAOp> {
public:
using VectorMaskOpConversionBase<vector::FMAOp>::VectorMaskOpConversionBase;

MaskedFMAOp1DConversion(LLVMTypeConverter &converter, bool fullVPIntr)
: VectorMaskOpConversionBase<vector::FMAOp>(converter) {}

virtual LogicalResult matchAndRewriteMaskableOp(
vector::MaskOp maskOp, MaskableOpInterface maskableOp,
ConversionPatternRewriter &rewriter) const override {
auto fmaOp = cast<FMAOp>(maskableOp.getOperation());
Type llvmType = typeConverter->convertType(fmaOp.getVectorType());

Value fmulAddOp = rewriter.create<LLVM::FMulAddOp>(
fmaOp.getLoc(), llvmType, fmaOp.getLhs(), fmaOp.getRhs(),
fmaOp.getAcc());
rewriter.replaceOpWithNewOp<LLVM::SelectOp>(
maskOp, llvmType, maskOp.getMask(), fmulAddOp, fmaOp.getAcc());
return success();
}
};

class VectorInsertElementOpConversion		class VectorInsertElementOpConversion
: public ConvertOpToLLVMPattern<vector::InsertElementOp> {		: public ConvertOpToLLVMPattern<vector::InsertElementOp> {
public:		public:
using ConvertOpToLLVMPattern<vector::InsertElementOp>::ConvertOpToLLVMPattern;		using ConvertOpToLLVMPattern<vector::InsertElementOp>::ConvertOpToLLVMPattern;

LogicalResult		LogicalResult
matchAndRewrite(vector::InsertElementOp insertEltOp, OpAdaptor adaptor,		matchAndRewrite(vector::InsertElementOp insertEltOp, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {		ConversionPatternRewriter &rewriter) const override {
▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	public:
}		}

LogicalResult matchAndRewrite(FMAOp op,		LogicalResult matchAndRewrite(FMAOp op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
auto vType = op.getVectorType();		auto vType = op.getVectorType();
if (vType.getRank() < 2)		if (vType.getRank() < 2)
return failure();		return failure();

// Masked fmas are lowered separately.
auto maskableOp = cast<MaskableOpInterface>(op.getOperation());
if (maskableOp.isMasked())
return failure();

auto loc = op.getLoc();		auto loc = op.getLoc();
auto elemType = vType.getElementType();		auto elemType = vType.getElementType();
Value zero = rewriter.create<arith::ConstantOp>(		Value zero = rewriter.create<arith::ConstantOp>(
loc, elemType, rewriter.getZeroAttr(elemType));		loc, elemType, rewriter.getZeroAttr(elemType));
Value desc = rewriter.create<vector::SplatOp>(loc, vType, zero);		Value desc = rewriter.create<vector::SplatOp>(loc, vType, zero);
for (int64_t i = 0, e = vType.getShape().front(); i != e; ++i) {		for (int64_t i = 0, e = vType.getShape().front(); i != e; ++i) {
Value extrLHS = rewriter.create<ExtractOp>(loc, op.getLhs(), i);		Value extrLHS = rewriter.create<ExtractOp>(loc, op.getLhs(), i);
Value extrRHS = rewriter.create<ExtractOp>(loc, op.getRhs(), i);		Value extrRHS = rewriter.create<ExtractOp>(loc, op.getRhs(), i);
▲ Show 20 Lines • Show All 412 Lines • ▼ Show 20 Lines	void mlir::populateVectorToLLVMConversionPatterns(
MLIRContext *ctx = converter.getDialect()->getContext();		MLIRContext *ctx = converter.getDialect()->getContext();
patterns.add<VectorFMAOpNDRewritePattern>(ctx);		patterns.add<VectorFMAOpNDRewritePattern>(ctx);
populateVectorInsertExtractStridedSliceTransforms(patterns);		populateVectorInsertExtractStridedSliceTransforms(patterns);
patterns.add<VectorReductionOpConversion>(converter, reassociateFPReductions);		patterns.add<VectorReductionOpConversion>(converter, reassociateFPReductions);
patterns.add<VectorCreateMaskOpRewritePattern>(ctx, force32BitVectorIndices);		patterns.add<VectorCreateMaskOpRewritePattern>(ctx, force32BitVectorIndices);
patterns		patterns
.add<VectorBitCastOpConversion, VectorShuffleOpConversion,		.add<VectorBitCastOpConversion, VectorShuffleOpConversion,
VectorExtractElementOpConversion, VectorExtractOpConversion,		VectorExtractElementOpConversion, VectorExtractOpConversion,
VectorFMAOp1DConversion, MaskedFMAOp1DConversion,		VectorFMAOp1DConversion, VectorInsertElementOpConversion,
VectorInsertElementOpConversion, VectorInsertOpConversion,		VectorInsertOpConversion, VectorPrintOpConversion,
VectorPrintOpConversion, VectorTypeCastOpConversion,		VectorTypeCastOpConversion, VectorScaleOpConversion,
VectorScaleOpConversion,
VectorLoadStoreConversion<vector::LoadOp, vector::LoadOpAdaptor>,		VectorLoadStoreConversion<vector::LoadOp, vector::LoadOpAdaptor>,
VectorLoadStoreConversion<vector::MaskedLoadOp,		VectorLoadStoreConversion<vector::MaskedLoadOp,
vector::MaskedLoadOpAdaptor>,		vector::MaskedLoadOpAdaptor>,
VectorLoadStoreConversion<vector::StoreOp, vector::StoreOpAdaptor>,		VectorLoadStoreConversion<vector::StoreOp, vector::StoreOpAdaptor>,
VectorLoadStoreConversion<vector::MaskedStoreOp,		VectorLoadStoreConversion<vector::MaskedStoreOp,
vector::MaskedStoreOpAdaptor>,		vector::MaskedStoreOpAdaptor>,
VectorGatherOpConversion, VectorScatterOpConversion,		VectorGatherOpConversion, VectorScatterOpConversion,
VectorExpandLoadOpConversion, VectorCompressStoreOpConversion,		VectorExpandLoadOpConversion, VectorCompressStoreOpConversion,
Show All 12 Lines

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

	Show First 20 Lines • Show All 1,784 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// FmaOp			// FmaOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	std::optional<SmallVector<int64_t, 4>> FMAOp::getShapeForUnroll() {			std::optional<SmallVector<int64_t, 4>> FMAOp::getShapeForUnroll() {
	return llvm::to_vector<4>(getVectorType().getShape());			return llvm::to_vector<4>(getVectorType().getShape());
	}			}

	// MaskableOpInterface methods.

	/// Returns the mask type expected by this operation. Mostly used for
	/// verification purposes. It requires the operation to be vectorized."
	Type FMAOp::getExpectedMaskType() {
	auto vecType = this->getVectorType();
	return VectorType::get(vecType.getShape(),
	IntegerType::get(vecType.getContext(), /width=/1));
	}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// BroadcastOp			// BroadcastOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Return the dimensions of the result vector that were formerly ones in the			/// Return the dimensions of the result vector that were formerly ones in the
	/// source tensor and thus correspond to "dim-1" broadcasting.			/// source tensor and thus correspond to "dim-1" broadcasting.
	static llvm::SetVector<int64_t>			static llvm::SetVector<int64_t>
	computeBroadcastedUnitDims(ArrayRef<int64_t> srcShape,			computeBroadcastedUnitDims(ArrayRef<int64_t> srcShape,
	▲ Show 20 Lines • Show All 3,991 Lines • ▼ Show 20 Lines
	}			}

	bool WarpExecuteOnLane0Op::areTypesCompatible(Type lhs, Type rhs) {			bool WarpExecuteOnLane0Op::areTypesCompatible(Type lhs, Type rhs) {
	return succeeded(			return succeeded(
	verifyDistributedType(lhs, rhs, getWarpSize(), getOperation()));			verifyDistributedType(lhs, rhs, getWarpSize(), getOperation()));
	}			}

	Value mlir::vector::makeArithReduction(OpBuilder &b, Location loc,			Value mlir::vector::makeArithReduction(OpBuilder &b, Location loc,
	CombiningKind kind, Value v1, Value v2) {			CombiningKind kind, Value v1, Value acc,
				Value mask) {
	Type t1 = getElementTypeOrSelf(v1.getType());			Type t1 = getElementTypeOrSelf(v1.getType());
	Type t2 = getElementTypeOrSelf(v2.getType());			Type tAcc = getElementTypeOrSelf(acc.getType());
				Value result;

	switch (kind) {			switch (kind) {
	case CombiningKind::ADD:			case CombiningKind::ADD:
	if (t1.isIntOrIndex() && t2.isIntOrIndex())			if (t1.isIntOrIndex() && tAcc.isIntOrIndex())
	return b.createOrFold<arith::AddIOp>(loc, v1, v2);			result = b.createOrFold<arith::AddIOp>(loc, v1, acc);
	else if (t1.isa<FloatType>() && t2.isa<FloatType>())			else if (t1.isa<FloatType>() && tAcc.isa<FloatType>())
	return b.createOrFold<arith::AddFOp>(loc, v1, v2);			result = b.createOrFold<arith::AddFOp>(loc, v1, acc);
				else
	llvm_unreachable("invalid value types for ADD reduction");			llvm_unreachable("invalid value types for ADD reduction");
				break;
	case CombiningKind::AND:			case CombiningKind::AND:
	assert(t1.isIntOrIndex() && t2.isIntOrIndex() && "expected int values");			assert(t1.isIntOrIndex() && tAcc.isIntOrIndex() && "expected int values");
	return b.createOrFold<arith::AndIOp>(loc, v1, v2);			result = b.createOrFold<arith::AndIOp>(loc, v1, acc);
				break;
	case CombiningKind::MAXF:			case CombiningKind::MAXF:
	assert(t1.isa<FloatType>() && t2.isa<FloatType>() &&			assert(t1.isa<FloatType>() && tAcc.isa<FloatType>() &&
	"expected float values");			"expected float values");
	return b.createOrFold<arith::MaxFOp>(loc, v1, v2);			result = b.createOrFold<arith::MaxFOp>(loc, v1, acc);
				break;
	case CombiningKind::MINF:			case CombiningKind::MINF:
	assert(t1.isa<FloatType>() && t2.isa<FloatType>() &&			assert(t1.isa<FloatType>() && tAcc.isa<FloatType>() &&
	"expected float values");			"expected float values");
	return b.createOrFold<arith::MinFOp>(loc, v1, v2);			result = b.createOrFold<arith::MinFOp>(loc, v1, acc);
				break;
	case CombiningKind::MAXSI:			case CombiningKind::MAXSI:
	assert(t1.isIntOrIndex() && t2.isIntOrIndex() && "expected int values");			assert(t1.isIntOrIndex() && tAcc.isIntOrIndex() && "expected int values");
	return b.createOrFold<arith::MaxSIOp>(loc, v1, v2);			result = b.createOrFold<arith::MaxSIOp>(loc, v1, acc);
				break;
	case CombiningKind::MINSI:			case CombiningKind::MINSI:
	assert(t1.isIntOrIndex() && t2.isIntOrIndex() && "expected int values");			assert(t1.isIntOrIndex() && tAcc.isIntOrIndex() && "expected int values");
	return b.createOrFold<arith::MinSIOp>(loc, v1, v2);			result = b.createOrFold<arith::MinSIOp>(loc, v1, acc);
				break;
	case CombiningKind::MAXUI:			case CombiningKind::MAXUI:
	assert(t1.isIntOrIndex() && t2.isIntOrIndex() && "expected int values");			assert(t1.isIntOrIndex() && tAcc.isIntOrIndex() && "expected int values");
	return b.createOrFold<arith::MaxUIOp>(loc, v1, v2);			result = b.createOrFold<arith::MaxUIOp>(loc, v1, acc);
				break;
	case CombiningKind::MINUI:			case CombiningKind::MINUI:
	assert(t1.isIntOrIndex() && t2.isIntOrIndex() && "expected int values");			assert(t1.isIntOrIndex() && tAcc.isIntOrIndex() && "expected int values");
	return b.createOrFold<arith::MinUIOp>(loc, v1, v2);			result = b.createOrFold<arith::MinUIOp>(loc, v1, acc);
				break;
	case CombiningKind::MUL:			case CombiningKind::MUL:
	if (t1.isIntOrIndex() && t2.isIntOrIndex())			if (t1.isIntOrIndex() && tAcc.isIntOrIndex())
	return b.createOrFold<arith::MulIOp>(loc, v1, v2);			result = b.createOrFold<arith::MulIOp>(loc, v1, acc);
	else if (t1.isa<FloatType>() && t2.isa<FloatType>())			else if (t1.isa<FloatType>() && tAcc.isa<FloatType>())
	return b.createOrFold<arith::MulFOp>(loc, v1, v2);			result = b.createOrFold<arith::MulFOp>(loc, v1, acc);
				else
	llvm_unreachable("invalid value types for MUL reduction");			llvm_unreachable("invalid value types for MUL reduction");
				break;
	case CombiningKind::OR:			case CombiningKind::OR:
	assert(t1.isIntOrIndex() && t2.isIntOrIndex() && "expected int values");			assert(t1.isIntOrIndex() && tAcc.isIntOrIndex() && "expected int values");
	return b.createOrFold<arith::OrIOp>(loc, v1, v2);			result = b.createOrFold<arith::OrIOp>(loc, v1, acc);
				break;
	case CombiningKind::XOR:			case CombiningKind::XOR:
	assert(t1.isIntOrIndex() && t2.isIntOrIndex() && "expected int values");			assert(t1.isIntOrIndex() && tAcc.isIntOrIndex() && "expected int values");
	return b.createOrFold<arith::XOrIOp>(loc, v1, v2);			result = b.createOrFold<arith::XOrIOp>(loc, v1, acc);
				break;
	};			};
	llvm_unreachable("unknown CombiningKind");
				assert(result && "unknown CombiningKind");
				return selectPassthru(b, mask, result, acc);
	}			}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Vector Masking Utilities			// Vector Masking Utilities
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Create the vector.yield-ended region of a vector.mask op with `maskableOp`			/// Create the vector.yield-ended region of a vector.mask op with `maskableOp`
	/// as masked operation.			/// as masked operation.
	void mlir::vector::createMaskOpRegion(OpBuilder &builder,			void mlir::vector::createMaskOpRegion(OpBuilder &builder,
	Operation *maskableOp) {			Operation *maskableOp) {
	assert(maskableOp->getBlock() && "MaskableOp must be inserted into a block");			assert(maskableOp->getBlock() && "MaskableOp must be inserted into a block");
	Block *insBlock = builder.getInsertionBlock();			Block *insBlock = builder.getInsertionBlock();
	// Create a block and move the op to that block.			// Create a block and move the op to that block.
	insBlock->getOperations().splice(			insBlock->getOperations().splice(
	insBlock->begin(), maskableOp->getBlock()->getOperations(), maskableOp);			insBlock->begin(), maskableOp->getBlock()->getOperations(), maskableOp);
	builder.create<YieldOp>(maskableOp->getLoc(), maskableOp->getResults());			builder.create<YieldOp>(maskableOp->getLoc(), maskableOp->getResults());
	}			}

	/// Creates a vector.mask operation around a maskable operation. Returns the			/// Creates a vector.mask operation around a maskable operation. Returns the
	/// vector.mask operation if the mask provided is valid. Otherwise, returns			/// vector.mask operation if the mask provided is valid. Otherwise, returns
	/// the maskable operation itself.			/// the maskable operation itself.
	Operation *mlir::vector::maskOperation(RewriterBase &rewriter,			Operation *mlir::vector::maskOperation(OpBuilder &builder,
	Operation *maskableOp, Value mask) {			Operation *maskableOp, Value mask,
				Value passthru) {
	if (!mask)			if (!mask)
	return maskableOp;			return maskableOp;
	return rewriter.create<MaskOp>(maskableOp->getLoc(),			if (passthru)
				return builder.create<MaskOp>(maskableOp->getLoc(),
				maskableOp->getResultTypes(), mask, passthru,
				maskableOp, createMaskOpRegion);
				return builder.create<MaskOp>(maskableOp->getLoc(),
	maskableOp->getResultTypes(), mask, maskableOp,			maskableOp->getResultTypes(), mask, maskableOp,
	createMaskOpRegion);			createMaskOpRegion);
	}			}

				/// Creates a vector select operation that picks values from `newValue` or
				/// `passthru` for each result vector lane based on `mask`. This utility is used
				/// to propagate the pass-thru value of vector.mask or for cases where only the
				/// pass-thru value propagation is needed. VP intrinsics do not support
				/// pass-thru values and every mask-out lane is set to poison. LLVM backends are
				/// usually able to match op + select patterns and fold them into a native
				/// target instructions.
				Value mlir::vector::selectPassthru(OpBuilder &builder, Value mask,
				Value newValue, Value passthru) {
				if (!mask)
				return newValue;

				return builder.create<arith::SelectOp>(newValue.getLoc(), newValue.getType(),
				mask, newValue, passthru);
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// TableGen'd op method definitions			// TableGen'd op method definitions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#define GET_ATTRDEF_CLASSES			#define GET_ATTRDEF_CLASSES
	#include "mlir/Dialect/Vector/IR/VectorOpsAttrDefs.cpp.inc"			#include "mlir/Dialect/Vector/IR/VectorOpsAttrDefs.cpp.inc"

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/Vector/IR/VectorOps.cpp.inc"			#include "mlir/Dialect/Vector/IR/VectorOps.cpp.inc"

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	return llvm::to_vector<4>(llvm::map_range(
arrayAttr.getAsRange<IntegerAttr>(),		arrayAttr.getAsRange<IntegerAttr>(),
[](IntegerAttr attr) { return static_cast<IntType>(attr.getInt()); }));		[](IntegerAttr attr) { return static_cast<IntType>(attr.getInt()); }));
}		}

/// Helper to create arithmetic operation associated with a kind of contraction.		/// Helper to create arithmetic operation associated with a kind of contraction.
static std::optional<Value>		static std::optional<Value>
createContractArithOp(Location loc, Value x, Value y, Value acc,		createContractArithOp(Location loc, Value x, Value y, Value acc,
vector::CombiningKind kind, PatternRewriter &rewriter,		vector::CombiningKind kind, PatternRewriter &rewriter,
bool isInt,		bool isInt, Value mask = Value()) {
std::optional<Value> maybeMask = std::nullopt) {
using vector::CombiningKind;		using vector::CombiningKind;
Value mul;		Value mul;

if (isInt) {		if (isInt) {
if (kind == CombiningKind::MINF \|\| kind == CombiningKind::MAXF)		if (kind == CombiningKind::MINF \|\| kind == CombiningKind::MAXF)
// Only valid for floating point types.		// Only valid for floating point types.
return std::nullopt;		return std::nullopt;
mul = rewriter.create<arith::MulIOp>(loc, x, y);		mul = rewriter.create<arith::MulIOp>(loc, x, y);
} else {		} else {
// Float case.		// Float case.
if (kind == CombiningKind::AND \|\| kind == CombiningKind::MINUI \|\|		if (kind == CombiningKind::AND \|\| kind == CombiningKind::MINUI \|\|
kind == CombiningKind::MINSI \|\| kind == CombiningKind::MAXUI \|\|		kind == CombiningKind::MINSI \|\| kind == CombiningKind::MAXUI \|\|
kind == CombiningKind::MAXSI \|\| kind == CombiningKind::OR \|\|		kind == CombiningKind::MAXSI \|\| kind == CombiningKind::OR \|\|
kind == CombiningKind::XOR)		kind == CombiningKind::XOR)
// Only valid for integer types.		// Only valid for integer types.
return std::nullopt;		return std::nullopt;
// Special case for fused multiply-add.		// Special case for fused multiply-add.
if (acc && acc.getType().isa<VectorType>() && kind == CombiningKind::ADD) {		if (acc && acc.getType().isa<VectorType>() && kind == CombiningKind::ADD) {
Operation *fmaOp = rewriter.create<vector::FMAOp>(loc, x, y, acc);		Value fma = rewriter.create<vector::FMAOp>(loc, x, y, acc);
if (maybeMask.has_value() && maybeMask.value())		if (mask)
fmaOp = maskOperation(rewriter, fmaOp, maybeMask.value());		// The fma op doesn't need explicit masking. However, fma ops used in
return fmaOp->getResult(0);		// reductions must preserve previous 'acc' values for masked-out lanes.
		fma = selectPassthru(rewriter, mask, fma, acc);
		return fma;
}		}
mul = rewriter.create<arith::MulFOp>(loc, x, y);		mul = rewriter.create<arith::MulFOp>(loc, x, y);
}		}

assert((!maybeMask.has_value() \|\| !maybeMask.value()) &&
"Unsupported masked case");

if (!acc)		if (!acc)
return std::optional<Value>(mul);		return std::optional<Value>(mul);
return makeArithReduction(rewriter, loc, kind, mul, acc);
		return makeArithReduction(rewriter, loc, kind, mul, acc, mask);
}		}

/// Return the positions of the reductions in the given map.		/// Return the positions of the reductions in the given map.
static SmallVector<int64_t> getReductionIndex(AffineMap map,		static SmallVector<int64_t> getReductionIndex(AffineMap map,
ArrayAttr iteratorTypes) {		ArrayAttr iteratorTypes) {
SmallVector<int64_t> dimsIdx;		SmallVector<int64_t> dimsIdx;
for (unsigned i = 0, e = map.getNumResults(); i < e; i++) {		for (unsigned i = 0, e = map.getNumResults(); i < e; i++) {
if (isReductionIterator(iteratorTypes[map.getDimPosition(i)]))		if (isReductionIterator(iteratorTypes[map.getDimPosition(i)]))
▲ Show 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	if (!rhsType) {
return success();		return success();
}		}

Value result = rewriter.create<arith::ConstantOp>(		Value result = rewriter.create<arith::ConstantOp>(
loc, resType, rewriter.getZeroAttr(resType));		loc, resType, rewriter.getZeroAttr(resType));
for (int64_t d = 0, e = resType.getDimSize(0); d < e; ++d) {		for (int64_t d = 0, e = resType.getDimSize(0); d < e; ++d) {
auto pos = rewriter.getI64ArrayAttr(d);		auto pos = rewriter.getI64ArrayAttr(d);
Value x =		Value x =
rewriter.create<vector::ExtractOp>(loc, eltType, op.getLhs(), pos);		rewriter.create<vector::ExtractOp>(loc, op.getLhs(), pos);
Value a = rewriter.create<vector::BroadcastOp>(loc, rhsType, x);		Value a = rewriter.create<vector::BroadcastOp>(loc, rhsType, x);
Value r = nullptr;		Value r = nullptr;
if (acc)		if (acc)
r = rewriter.create<vector::ExtractOp>(loc, rhsType, acc, pos);		r = rewriter.create<vector::ExtractOp>(loc, acc, pos);
		Value extrMask;
		if (mask)
		extrMask = rewriter.create<vector::ExtractOp>(loc, mask, pos);

std::optional<Value> m = createContractArithOp(		std::optional<Value> m = createContractArithOp(
loc, a, op.getRhs(), r, kind, rewriter, isInt, mask);		loc, a, op.getRhs(), r, kind, rewriter, isInt, extrMask);
if (!m.has_value())		if (!m.has_value())
return failure();		return failure();
result = rewriter.create<vector::InsertOp>(loc, resType, *m, result, pos);		result = rewriter.create<vector::InsertOp>(loc, resType, *m, result, pos);
}		}

rewriter.replaceOp(rootOp, result);		rewriter.replaceOp(rootOp, result);
return success();		return success();
}		}
Show All 28 Lines	if (!contractOp.getMasks().empty())
return failure();		return failure();

if (failed(filter(contractOp)))		if (failed(filter(contractOp)))
return failure();		return failure();

if (vectorTransformOptions.vectorContractLowering !=		if (vectorTransformOptions.vectorContractLowering !=
vector::VectorContractLowering::ParallelArith)		vector::VectorContractLowering::ParallelArith)
return failure();		return failure();

ArrayRef<int64_t> lhsShape = contractOp.getLhsType().getShape();		ArrayRef<int64_t> lhsShape = contractOp.getLhsType().getShape();
ArrayRef<int64_t> rhsShape = contractOp.getRhsType().getShape();		ArrayRef<int64_t> rhsShape = contractOp.getRhsType().getShape();
AffineMap lhsMap = contractOp.getIndexingMapsArray()[0];		AffineMap lhsMap = contractOp.getIndexingMapsArray()[0];
AffineMap rhsMap = contractOp.getIndexingMapsArray()[1];		AffineMap rhsMap = contractOp.getIndexingMapsArray()[1];
SmallVector<int64_t> lhsReductionDims =		SmallVector<int64_t> lhsReductionDims =
getReductionIndex(lhsMap, contractOp.getIteratorTypes());		getReductionIndex(lhsMap, contractOp.getIteratorTypes());
SmallVector<int64_t> rhsReductionDims =		SmallVector<int64_t> rhsReductionDims =
getReductionIndex(rhsMap, contractOp.getIteratorTypes());		getReductionIndex(rhsMap, contractOp.getIteratorTypes());
▲ Show 20 Lines • Show All 910 Lines • ▼ Show 20 Lines	UnrolledOuterProductGenerator(RewriterBase &b, vector::ContractionOp op)
: StructuredGenerator<vector::ContractionOp, vector::IteratorType>(b, op),		: StructuredGenerator<vector::ContractionOp, vector::IteratorType>(b, op),
kind(op.getKind()), lhs(op.getLhs()), rhs(op.getRhs()),		kind(op.getKind()), lhs(op.getLhs()), rhs(op.getRhs()),
res(op.getAcc()), lhsType(op.getLhsType()) {		res(op.getAcc()), lhsType(op.getLhsType()) {
auto maskableOp = cast<MaskableOpInterface>(op.getOperation());		auto maskableOp = cast<MaskableOpInterface>(op.getOperation());
if (maskableOp.isMasked())		if (maskableOp.isMasked())
mask = maskableOp.getMaskingOp().getMask();		mask = maskableOp.getMaskingOp().getMask();
}		}

Value t(Value v) {		Value t(Value v, ArrayRef<int64_t> perm = {1, 0}) {
static constexpr std::array<int64_t, 2> perm = {1, 0};
if (!v)		if (!v)
return v;		return v;
return rewriter.create<vector::TransposeOp>(loc, v, perm);		return rewriter.create<vector::TransposeOp>(loc, v, perm);
}		}

Value promote(Value v, Type dstElementType) {		Value promote(Value v, Type dstElementType) {
Type elementType = v.getType();		Type elementType = v.getType();
auto vecType = elementType.dyn_cast<VectorType>();		auto vecType = elementType.dyn_cast<VectorType>();
Show All 38 Lines	struct UnrolledOuterProductGenerator
FailureOr<Value> matmat() {		FailureOr<Value> matmat() {
if (!iters({Par(), Par(), Red()}))		if (!iters({Par(), Par(), Red()}))
return failure();		return failure();
// Set up the parallel/reduction structure in the right form.		// Set up the parallel/reduction structure in the right form.
AffineExpr m, n, k;		AffineExpr m, n, k;
bindDims(rewriter.getContext(), m, n, k);		bindDims(rewriter.getContext(), m, n, k);
// Classical row-major matmul: Just permute the lhs.		// Classical row-major matmul: Just permute the lhs.
if (layout({{m, k}, {k, n}, {m, n}}))		if (layout({{m, k}, {k, n}, {m, n}}))
return outerProd(t(lhs), rhs, res, lhsType.getDimSize(1));		return outerProd(t(lhs), rhs, res, lhsType.getDimSize(1),
		t(mask, {2, 0, 1}));
// TODO: may be better to fail and use some vector<k> -> scalar reduction.		// TODO: may be better to fail and use some vector<k> -> scalar reduction.
if (layout({{m, k}, {n, k}, {m, n}})) {		if (layout({{m, k}, {n, k}, {m, n}})) {
Value tlhs = t(lhs);		Value tlhs = t(lhs);
return outerProd(tlhs, t(rhs), res, lhsType.getDimSize(1));		return outerProd(tlhs, t(rhs), res, lhsType.getDimSize(1));
}		}
// No need to permute anything.		// No need to permute anything.
if (layout({{k, m}, {k, n}, {m, n}}))		if (layout({{k, m}, {k, n}, {m, n}}))
return outerProd(lhs, rhs, res, lhsType.getDimSize(0));		return outerProd(lhs, rhs, res, lhsType.getDimSize(0));
▲ Show 20 Lines • Show All 1,491 Lines • Show Last 20 Lines

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

	Show First 20 Lines • Show All 412 Lines • ▼ Show 20 Lines
	// CHECK: %[[T16:.*]] = llvm.extractvalue %[[T7]][1] : !llvm.array<2 x vector<3xf32>>			// CHECK: %[[T16:.*]] = llvm.extractvalue %[[T7]][1] : !llvm.array<2 x vector<3xf32>>
	// CHECK: %[[T17:.*]] = llvm.intr.fmuladd(%[[T14]], %[[B]], %[[T16]]) : (vector<3xf32>, vector<3xf32>, vector<3xf32>) -> vector<3xf32>			// CHECK: %[[T17:.*]] = llvm.intr.fmuladd(%[[T14]], %[[B]], %[[T16]]) : (vector<3xf32>, vector<3xf32>, vector<3xf32>) -> vector<3xf32>
	// CHECK: %[[T18:.*]] = llvm.insertvalue %[[T17]], %[[T11]][1] : !llvm.array<2 x vector<3xf32>>			// CHECK: %[[T18:.*]] = llvm.insertvalue %[[T17]], %[[T11]][1] : !llvm.array<2 x vector<3xf32>>
	// CHECK: %[[T19:.*]] = builtin.unrealized_conversion_cast %[[T18]] : !llvm.array<2 x vector<3xf32>> to vector<2x3xf32>			// CHECK: %[[T19:.*]] = builtin.unrealized_conversion_cast %[[T18]] : !llvm.array<2 x vector<3xf32>> to vector<2x3xf32>
	// CHECK: return %[[T19]] : vector<2x3xf32>			// CHECK: return %[[T19]] : vector<2x3xf32>

	// -----			// -----

	func.func @masked_vector_contract(%arg0: vector<2xf32>, %arg1: f32, %arg2: vector<2xf32>, %m: vector<2xi1>) -> vector<2xf32> {			func.func @masked_float_add_outerprod(%arg0: vector<2xf32>, %arg1: f32, %arg2: vector<2xf32>, %m: vector<2xi1>) -> vector<2xf32> {
	%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<add>} : vector<2xf32>, f32 } : vector<2xi1> -> vector<2xf32>			%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<add>} : vector<2xf32>, f32 } : vector<2xi1> -> vector<2xf32>
	return %0 : vector<2xf32>			return %0 : vector<2xf32>
	}			}

	// We can't check for the intermediate 'vector.mask { vector.fma }' state so we			// CHECK-LABEL: func.func @masked_float_add_outerprod(
	// just make sure the vector.fma is lowered.			// CHECK-SAME: %[[VAL_0:.]]: vector<2xf32>, %[[VAL_1:.]]: f32, %[[VAL_2:.]]: vector<2xf32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xf32> {
				// CHECK: %[[VAL_8:.]] = llvm.intr.fmuladd(%[[VAL_0]], %{{.}}, %[[VAL_2]]) : (vector<2xf32>, vector<2xf32>, vector<2xf32>) -> vector<2xf32>
				// CHECK: %[[VAL_9:.*]] = arith.select %[[VAL_3]], %[[VAL_8]], %[[VAL_2]] : vector<2xi1>, vector<2xf32>

	// CHECK: llvm.intr.fmuladd			// -----
	// CHECK: llvm.select
				func.func @masked_float_mul_outerprod(%arg0: vector<2xf32>, %arg1: f32, %arg2: vector<2xf32>, %m: vector<2xi1>) -> vector<2xf32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<mul>} : vector<2xf32>, f32 } : vector<2xi1> -> vector<2xf32>
				return %0 : vector<2xf32>
				}

				// CHECK-LABEL: func.func @masked_float_mul_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xf32>, %[[VAL_1:.]]: f32, %[[VAL_2:.]]: vector<2xf32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xf32> {
				// CHECK: %[[VAL_8:.]] = arith.mulf %[[VAL_0]], %{{.}} : vector<2xf32>
				// CHECK: %[[VAL_9:.*]] = arith.mulf %[[VAL_8]], %[[VAL_2]] : vector<2xf32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xf32>

				// -----

				func.func @masked_float_max_outerprod(%arg0: vector<2xf32>, %arg1: f32, %arg2: vector<2xf32>, %m: vector<2xi1>) -> vector<2xf32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<maxf>} : vector<2xf32>, f32 } : vector<2xi1> -> vector<2xf32>
				return %0 : vector<2xf32>
				}

				// CHECK-LABEL: func.func @masked_float_max_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xf32>, %[[VAL_1:.]]: f32, %[[VAL_2:.]]: vector<2xf32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xf32> {
				// CHECK: %[[VAL_8:.]] = arith.mulf %[[VAL_0]], %{{.}} : vector<2xf32>
				// CHECK: %[[VAL_9:.*]] = arith.maxf %[[VAL_8]], %[[VAL_2]] : vector<2xf32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xf32>

				// -----

				func.func @masked_float_min_outerprod(%arg0: vector<2xf32>, %arg1: f32, %arg2: vector<2xf32>, %m: vector<2xi1>) -> vector<2xf32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<minf>} : vector<2xf32>, f32 } : vector<2xi1> -> vector<2xf32>
				return %0 : vector<2xf32>
				}

				// CHECK-LABEL: func.func @masked_float_min_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xf32>, %[[VAL_1:.]]: f32, %[[VAL_2:.]]: vector<2xf32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xf32> {
				// CHECK: %[[VAL_8:.]] = arith.mulf %[[VAL_0]], %{{.}} : vector<2xf32>
				// CHECK: %[[VAL_9:.*]] = arith.minf %[[VAL_8]], %[[VAL_2]] : vector<2xf32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xf32>

				// -----

				func.func @masked_int_add_outerprod(%arg0: vector<2xi32>, %arg1: i32, %arg2: vector<2xi32>, %m: vector<2xi1>) -> vector<2xi32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<add>} : vector<2xi32>, i32 } : vector<2xi1> -> vector<2xi32>
				return %0 : vector<2xi32>
				}

				// CHECK-LABEL: func.func @masked_int_add_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xi32>, %[[VAL_1:.]]: i32, %[[VAL_2:.]]: vector<2xi32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xi32> {
				// CHECK: %[[VAL_8:.]] = arith.muli %[[VAL_0]], %{{.}} : vector<2xi32>
				// CHECK: %[[VAL_9:.*]] = arith.addi %[[VAL_8]], %[[VAL_2]] : vector<2xi32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xi32>

				// -----

				func.func @masked_int_mul_outerprod(%arg0: vector<2xi32>, %arg1: i32, %arg2: vector<2xi32>, %m: vector<2xi1>) -> vector<2xi32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<mul>} : vector<2xi32>, i32 } : vector<2xi1> -> vector<2xi32>
				return %0 : vector<2xi32>
				}

				// CHECK-LABEL: func.func @masked_int_mul_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xi32>, %[[VAL_1:.]]: i32, %[[VAL_2:.]]: vector<2xi32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xi32> {
				// CHECK: %[[VAL_8:.]] = arith.muli %[[VAL_0]], %{{.}} : vector<2xi32>
				// CHECK: %[[VAL_9:.*]] = arith.muli %[[VAL_8]], %[[VAL_2]] : vector<2xi32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xi32>

				// -----

				func.func @masked_int_max_outerprod(%arg0: vector<2xi32>, %arg1: i32, %arg2: vector<2xi32>, %m: vector<2xi1>) -> vector<2xi32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<maxsi>} : vector<2xi32>, i32 } : vector<2xi1> -> vector<2xi32>
				return %0 : vector<2xi32>
				}

				// CHECK-LABEL: func.func @masked_int_max_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xi32>, %[[VAL_1:.]]: i32, %[[VAL_2:.]]: vector<2xi32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xi32> {
				// CHECK: %[[VAL_8:.]] = arith.muli %[[VAL_0]], %{{.}} : vector<2xi32>
				// CHECK: %[[VAL_9:.*]] = arith.maxsi %[[VAL_8]], %[[VAL_2]] : vector<2xi32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xi32>

				// -----

				func.func @masked_int_min_outerprod(%arg0: vector<2xi32>, %arg1: i32, %arg2: vector<2xi32>, %m: vector<2xi1>) -> vector<2xi32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<minui>} : vector<2xi32>, i32 } : vector<2xi1> -> vector<2xi32>
				return %0 : vector<2xi32>
				}

				// CHECK-LABEL: func.func @masked_int_min_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xi32>, %[[VAL_1:.]]: i32, %[[VAL_2:.]]: vector<2xi32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xi32> {
				// CHECK: %[[VAL_8:.]] = arith.muli %[[VAL_0]], %{{.}} : vector<2xi32>
				// CHECK: %[[VAL_9:.*]] = arith.minui %[[VAL_8]], %[[VAL_2]] : vector<2xi32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xi32>

				// -----

				func.func @masked_int_and_outerprod(%arg0: vector<2xi32>, %arg1: i32, %arg2: vector<2xi32>, %m: vector<2xi1>) -> vector<2xi32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<and>} : vector<2xi32>, i32 } : vector<2xi1> -> vector<2xi32>
				return %0 : vector<2xi32>
				}

				// CHECK-LABEL: func.func @masked_int_and_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xi32>, %[[VAL_1:.]]: i32, %[[VAL_2:.]]: vector<2xi32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xi32> {
				// CHECK: %[[VAL_8:.]] = arith.muli %[[VAL_0]], %{{.}} : vector<2xi32>
				// CHECK: %[[VAL_9:.*]] = arith.andi %[[VAL_8]], %[[VAL_2]] : vector<2xi32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xi32>

				// -----

				func.func @masked_int_or_outerprod(%arg0: vector<2xi32>, %arg1: i32, %arg2: vector<2xi32>, %m: vector<2xi1>) -> vector<2xi32> {
				%0 = vector.mask %m { vector.outerproduct %arg0, %arg1, %arg2 {kind = #vector.kind<or>} : vector<2xi32>, i32 } : vector<2xi1> -> vector<2xi32>
				return %0 : vector<2xi32>
				}

				// CHECK-LABEL: func.func @masked_int_or_outerprod(
				// CHECK-SAME: %[[VAL_0:.]]: vector<2xi32>, %[[VAL_1:.]]: i32, %[[VAL_2:.]]: vector<2xi32>, %[[VAL_3:.]]: vector<2xi1>) -> vector<2xi32> {
				// CHECK: %[[VAL_8:.]] = arith.muli %[[VAL_0]], %{{.}} : vector<2xi32>
				// CHECK: %[[VAL_9:.*]] = arith.ori %[[VAL_8]], %[[VAL_2]] : vector<2xi32>
				// CHECK: %[[VAL_10:.*]] = arith.select %[[VAL_3]], %[[VAL_9]], %[[VAL_2]] : vector<2xi1>, vector<2xi32>

	// -----			// -----

	func.func @shuffle_0D_direct(%arg0: vector<f32>) -> vector<3xf32> {			func.func @shuffle_0D_direct(%arg0: vector<f32>) -> vector<3xf32> {
	%1 = vector.shuffle %arg0, %arg0 [0, 1, 0] : vector<f32>, vector<f32>			%1 = vector.shuffle %arg0, %arg0 [0, 1, 0] : vector<f32>, vector<f32>
	return %1 : vector<3xf32>			return %1 : vector<3xf32>
	}			}
	// CHECK-LABEL: @shuffle_0D_direct(			// CHECK-LABEL: @shuffle_0D_direct(
	▲ Show 20 Lines • Show All 1,713 Lines • ▼ Show 20 Lines

	// CHECK-LABEL: @vector_scalable_extract			// CHECK-LABEL: @vector_scalable_extract
	// CHECK-SAME: %[[VEC:.*]]: vector<[4]xf32>			// CHECK-SAME: %[[VEC:.*]]: vector<[4]xf32>
	func.func @vector_scalable_extract(%vec: vector<[4]xf32>) -> vector<8xf32> {			func.func @vector_scalable_extract(%vec: vector<[4]xf32>) -> vector<8xf32> {
	// CHECK-NEXT: %{{.*}} = llvm.intr.vector.extract %[[VEC]][0] : vector<8xf32> from vector<[4]xf32>			// CHECK-NEXT: %{{.*}} = llvm.intr.vector.extract %[[VEC]][0] : vector<8xf32> from vector<[4]xf32>
	%0 = vector.scalable.extract %vec[0] : vector<8xf32> from vector<[4]xf32>			%0 = vector.scalable.extract %vec[0] : vector<8xf32> from vector<[4]xf32>
	return %0 : vector<8xf32>			return %0 : vector<8xf32>
	}			}

	// -----

	// CHECK-LABEL: func.func @masked_vector_fma(
	// CHECK-SAME: %[[INPUT:.*]]: vector<8xf32>,
	// CHECK-SAME: %[[MASK:.*]]: vector<8xi1>) -> vector<8xf32>
	// CHECK: %[[FMA:.*]] = llvm.intr.fmuladd(%[[INPUT]], %[[INPUT]], %[[INPUT]]) : (vector<8xf32>, vector<8xf32>, vector<8xf32>) -> vector<8xf32>
	// CHECK: llvm.select %[[MASK]], %[[FMA]], %[[INPUT]] : vector<8xi1>, vector<8xf32>

	func.func @masked_vector_fma(%a: vector<8xf32>, %m: vector<8xi1>) -> vector<8xf32> {
	%0 = vector.mask %m { vector.fma %a, %a, %a : vector<8xf32> } : vector<8xi1> -> vector<8xf32>
	return %0 : vector<8xf32>
	}

mlir/test/Dialect/Vector/vector-contract-transforms.mlir

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines
func.func @extract_contract2(%arg0: vector<2x3xf32>,		func.func @extract_contract2(%arg0: vector<2x3xf32>,
%arg1: vector<3xf32>,		%arg1: vector<3xf32>,
%arg2: vector<2xf32>) -> vector<2xf32> {		%arg2: vector<2xf32>) -> vector<2xf32> {
%0 = vector.contract #matvec_trait %arg0, %arg1, %arg2		%0 = vector.contract #matvec_trait %arg0, %arg1, %arg2
: vector<2x3xf32>, vector<3xf32> into vector<2xf32>		: vector<2x3xf32>, vector<3xf32> into vector<2xf32>
return %0 : vector<2xf32>		return %0 : vector<2xf32>
}		}

		// OUTERPRODUCT-LABEL: func.func @masked_extract_contract2(
		// OUTERPRODUCT-SAME: %[[VAL_0:.*]]: vector<2x3xf32>,
		// OUTERPRODUCT-SAME: %[[VAL_1:.*]]: vector<3xf32>,
		// OUTERPRODUCT-SAME: %[[VAL_2:.*]]: vector<2xf32>,
		// OUTERPRODUCT-SAME: %[[IN_MASK:.*]]: vector<2x3xi1>) -> vector<2xf32>
		// OUTERPRODUCT: %[[T_MASK:.*]] = vector.transpose %[[IN_MASK]], [1, 0] : vector<2x3xi1> to vector<3x2xi1>
		// OUTERPRODUCT: %[[MASK0:.*]] = vector.extract %[[T_MASK]][0] : vector<3x2xi1>
		// OUTERPRODUCT: vector.mask %[[MASK0]] { vector.outerproduct

		// OUTERPRODUCT: %[[MASK1:.*]] = vector.extract %[[T_MASK]][1] : vector<3x2xi1>
		// OUTERPRODUCT: vector.mask %[[MASK1]] { vector.outerproduct

		// OUTERPRODUCT: %[[MASK2:.*]] = vector.extract %[[T_MASK]][2] : vector<3x2xi1>
		// OUTERPRODUCT: vector.mask %[[MASK2]] { vector.outerproduct

		func.func @masked_extract_contract2(%arg0: vector<2x3xf32>,
		%arg1: vector<3xf32>,
		%arg2: vector<2xf32>,
		%m: vector<2x3xi1>) -> vector<2xf32> {
		%0 = vector.mask %m { vector.contract #matvec_trait %arg0, %arg1, %arg2
		: vector<2x3xf32>, vector<3xf32> into vector<2xf32> } : vector<2x3xi1> -> vector<2xf32>
		return %0 : vector<2xf32>
		}

// CHECK-LABEL: func @extract_contract2_int		// CHECK-LABEL: func @extract_contract2_int
// CHECK-SAME: %[[A:.*0]]: vector<2x3xi32>,		// CHECK-SAME: %[[A:.*0]]: vector<2x3xi32>,
// CHECK-SAME: %[[B:.*1]]: vector<3xi32>,		// CHECK-SAME: %[[B:.*1]]: vector<3xi32>,
// CHECK-SAME: %[[C:.*2]]: vector<2xi32>		// CHECK-SAME: %[[C:.*2]]: vector<2xi32>
// CHECK: %[[R:.*]] = arith.constant dense<0> : vector<2xi32>		// CHECK: %[[R:.*]] = arith.constant dense<0> : vector<2xi32>
// CHECK: %[[T0:.*]] = vector.extract %[[A]][0] : vector<2x3xi32>		// CHECK: %[[T0:.*]] = vector.extract %[[A]][0] : vector<2x3xi32>
// CHECK: %[[T2:.*]] = arith.muli %[[T0]], %[[B]] : vector<3xi32>		// CHECK: %[[T2:.*]] = arith.muli %[[T0]], %[[B]] : vector<3xi32>
// CHECK: %[[T3:.*]] = vector.reduction <add>, %[[T2]] : vector<3xi32> into i32		// CHECK: %[[T3:.*]] = vector.reduction <add>, %[[T2]] : vector<3xi32> into i32
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines
func.func @extract_contract4(%arg0: vector<2x2xf32>,		func.func @extract_contract4(%arg0: vector<2x2xf32>,
%arg1: vector<2x2xf32>,		%arg1: vector<2x2xf32>,
%arg2: vector<2x2xf32>) -> vector<2x2xf32> {		%arg2: vector<2x2xf32>) -> vector<2x2xf32> {
%0 = vector.contract #matmat_trait %arg0, %arg1, %arg2		%0 = vector.contract #matmat_trait %arg0, %arg1, %arg2
: vector<2x2xf32>, vector<2x2xf32> into vector<2x2xf32>		: vector<2x2xf32>, vector<2x2xf32> into vector<2x2xf32>
return %0 : vector<2x2xf32>		return %0 : vector<2x2xf32>
}		}

		// OUTERPRODUCT-LABEL: func.func @masked_extract_contract4(
		// OUTERPRODUCT-SAME: %[[VAL_0:.*]]: vector<3x5xf32>,
		// OUTERPRODUCT-SAME: %[[VAL_1:.*]]: vector<5x7xf32>,
		// OUTERPRODUCT-SAME: %[[VAL_2:.*]]: vector<3x7xf32>,
		// OUTERPRODUCT-SAME: %[[VAL_3:.*]]: vector<3x7x5xi1>) -> vector<3x7xf32> {
		// OUTERPRODUCT: %[[VAL_5:.*]] = vector.transpose %[[VAL_3]], [2, 0, 1] : vector<3x7x5xi1> to vector<5x3x7xi1>
		// OUTERPRODUCT: %[[VAL_8:.*]] = vector.extract %[[VAL_5]][0] : vector<5x3x7xi1>
		// OUTERPRODUCT: %[[VAL_9:.]] = vector.mask %[[VAL_8]] { vector.outerproduct %{{.}} {kind = #vector.kind<add>} : vector<3xf32>, vector<7xf32> } : vector<3x7xi1> -> vector<3x7xf32>
		// OUTERPRODUCT: %[[VAL_12:.*]] = vector.extract %[[VAL_5]][1] : vector<5x3x7xi1>
		// OUTERPRODUCT: %[[VAL_13:.]] = vector.mask %[[VAL_12]] { vector.outerproduct %{{.}} {kind = #vector.kind<add>} : vector<3xf32>, vector<7xf32> } : vector<3x7xi1> -> vector<3x7xf32>
		// OUTERPRODUCT: %[[VAL_16:.*]] = vector.extract %[[VAL_5]][2] : vector<5x3x7xi1>
		// OUTERPRODUCT: %[[VAL_17:.]] = vector.mask %[[VAL_16]] { vector.outerproduct %{{.}} {kind = #vector.kind<add>} : vector<3xf32>, vector<7xf32> } : vector<3x7xi1> -> vector<3x7xf32>
		// OUTERPRODUCT: %[[VAL_20:.*]] = vector.extract %[[VAL_5]][3] : vector<5x3x7xi1>
		// OUTERPRODUCT: %[[VAL_21:.]] = vector.mask %[[VAL_20]] { vector.outerproduct %{{.}} {kind = #vector.kind<add>} : vector<3xf32>, vector<7xf32> } : vector<3x7xi1> -> vector<3x7xf32>
		// OUTERPRODUCT: %[[VAL_24:.*]] = vector.extract %[[VAL_5]][4] : vector<5x3x7xi1>
		// OUTERPRODUCT: %[[VAL_25:.]] = vector.mask %[[VAL_24]] { vector.outerproduct %{{.}} {kind = #vector.kind<add>} : vector<3xf32>, vector<7xf32> } : vector<3x7xi1> -> vector<3x7xf32>

		func.func @masked_extract_contract4(%arg0: vector<3x5xf32>,
		%arg1: vector<5x7xf32>,
		%arg2: vector<3x7xf32>,
		%m : vector<3x7x5xi1>) -> vector<3x7xf32> {
		%0 = vector.mask %m { vector.contract #matmat_trait %arg0, %arg1, %arg2
		: vector<3x5xf32>, vector<5x7xf32> into vector<3x7xf32> } : vector<3x7x5xi1> -> vector<3x7xf32>
		return %0 : vector<3x7xf32>
		}

#contraction2d_accesses = [		#contraction2d_accesses = [
affine_map<(i, j) -> (i, j)>,		affine_map<(i, j) -> (i, j)>,
affine_map<(i, j) -> (i, j)>,		affine_map<(i, j) -> (i, j)>,
affine_map<(i, j) -> ()>		affine_map<(i, j) -> ()>
]		]
#contraction2d_trait = {		#contraction2d_trait = {
indexing_maps = #contraction2d_accesses,		indexing_maps = #contraction2d_accesses,
iterator_types = ["reduction", "reduction"]		iterator_types = ["reduction", "reduction"]
▲ Show 20 Lines • Show All 999 Lines • ▼ Show 20 Lines	%0 = vector.contract {
indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,		indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> (d0, d1)>,		affine_map<(d0, d1) -> (d0, d1)>,
affine_map<(d0, d1) -> ()>],		affine_map<(d0, d1) -> ()>],
iterator_types = ["reduction", "reduction"], kind = #vector.kind<add>}		iterator_types = ["reduction", "reduction"], kind = #vector.kind<add>}
%arg0, %arg1, %arg2 : vector<1x1xf32>, vector<1x1xf32> into f32		%arg0, %arg1, %arg2 : vector<1x1xf32>, vector<1x1xf32> into f32
return %0 : f32		return %0 : f32
}		}

func.func @masked_vector_contract(%arg0: vector<2x3xf32>,
%arg1: vector<3xf32>,
%arg2: vector<2xf32>,
%m: vector<2x3xi1>) -> vector<2xf32> {
%0 = vector.mask %m { vector.contract #matvec_trait %arg0, %arg1, %arg2
: vector<2x3xf32>, vector<3xf32> into vector<2xf32> } : vector<2x3xi1> -> vector<2xf32>
return %0 : vector<2xf32>
}

// OUTERPRODUCT-LABEL: func.func @masked_vector_contract(
// OUTERPRODUCT-SAME: %[[VAL_0:.*]]: vector<2x3xf32>,
// OUTERPRODUCT-SAME: %[[VAL_1:.*]]: vector<3xf32>,
// OUTERPRODUCT-SAME: %[[VAL_2:.*]]: vector<2xf32>,
// OUTERPRODUCT-SAME: %[[IN_MASK:.*]]: vector<2x3xi1>) -> vector<2xf32>
// OUTERPRODUCT: %[[T_MASK:.*]] = vector.transpose %[[IN_MASK]], [1, 0] : vector<2x3xi1> to vector<3x2xi1>
// OUTERPRODUCT: %[[MASK0:.*]] = vector.extract %[[T_MASK]][0] : vector<3x2xi1>
// OUTERPRODUCT: vector.mask %[[MASK0]] { vector.outerproduct

// OUTERPRODUCT: %[[MASK1:.*]] = vector.extract %[[T_MASK]][1] : vector<3x2xi1>
// OUTERPRODUCT: vector.mask %[[MASK1]] { vector.outerproduct

// OUTERPRODUCT: %[[MASK2:.*]] = vector.extract %[[T_MASK]][2] : vector<3x2xi1>
// OUTERPRODUCT: vector.mask %[[MASK2]] { vector.outerproduct

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Support masking for more contraction flavorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 499335

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

mlir/test/Dialect/Vector/vector-contract-transforms.mlir

[mlir][Vector] Support masking for more contraction flavors
ClosedPublic