This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Vector/
-
mlir/
-
Dialect/
-
Vector/
2/2
VectorOps.h
2/8
VectorTransforms.h
-
lib/Dialect/Vector/
-
Dialect/
-
Vector/
10/11
VectorTransforms.cpp
-
test/
-
Dialect/Vector/
-
Vector/
-
vector-contract-matvec-transforms.mlir
-
lib/Transforms/
-
Transforms/
-
TestVectorTransforms.cpp

Differential D83012

[mlir] [VectorOps] Add choice between dot and axpy lowering of vector.contract
ClosedPublic

Authored by aartbik on Jul 1 2020, 5:36 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
reidtatge
mehdi_amini
bkramer
ftynse

Commits

rGee01c7a74063: [mlir] [VectorOps] Add choice between dot and axpy lowering of vector.contract

Summary

Default vector.contract lowering essentially yields a series of sdot/ddot
operations. However, for some layouts a series of saxpy/daxpy operations,
chained through fma are more efficient. This CL introduces a choice between
the two lowering paths. A default heuristic is to follow.

Some preliminary avx2 performance numbers for matrix-times-vector.
Here, dot performs best for 64x64 A x b and saxpy for 64x64 A^T x b.

------------------------------------------------------------
            A x b                          A^T x b
------------------------------------------------------------
GFLOPS    sdot (reassoc)    saxpy    sdot (reassoc)    saxpy
------------------------------------------------------------
1x1        0.6               0.9       0.6             0.9
2x2        2.5               3.2       2.4             3.5
4x4        6.4               8.4       4.9             11.8
8x8       11.7               6.1       5.0             29.6
16x16     20.7              10.8       7.3             43.3
32x32     29.3               7.9       6.4             51.8
64x64     38.9                                         79.3
128x128   32.4                                         40.7
------------------------------------------------------------

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Jul 1 2020, 5:36 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 1 2020, 5:36 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: msifontes, jurahul, Kayjukh and 13 others. · View Herald Transcript

aartbik edited the summary of this revision. (Show Details)Jul 1 2020, 5:38 PM

aartbik added reviewers: reidtatge, mehdi_amini, bkramer, ftynse.

aartbik edited the summary of this revision. (Show Details)Jul 1 2020, 5:45 PM

aartbik edited the summary of this revision. (Show Details)

fixed typos

Harbormaster completed remote builds in B62606: Diff 274970.Jul 1 2020, 6:23 PM

Harbormaster completed remote builds in B62613: Diff 274978.Jul 1 2020, 6:54 PM

ftynse accepted this revision.Jul 2 2020, 1:35 AM

ftynse added inline comments.

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	Ultra-nit: TODOs shouldn't go to doxygen, so use `//` instead of `///` for them.
mlir/lib/Dialect/Vector/VectorTransforms.cpp
1774	Would it make sense to have a fused `matchAndRewrite` and avoid duplicate checks here? We can just have the trailing `else` below that returns failure().
1781	Nit: why is this using getDimSize(), but everything below uses getShape()[] ?

This revision is now accepted and ready to land.Jul 2 2020, 1:35 AM

Thanks Aart, let's land this and we can figure out the refactoring in a subsequent CL depending on what we decide for the semantics of outerproduct.

mlir/include/mlir/Dialect/Vector/VectorOps.h
56	AXPY and OuterProduct are fundamentally the same strategy in the `2dx2d -> 2d` and `1dx2d->1d / 2dx1d->2d`, respectively. It would therefore be good to use a single entry point. Going progressively through `vector.outerproduct` can be beneficial to compose with patterns that operate on `vector.outerproduct`. `vector.outerproduct` then lowers to the appropriate `extract/insert` + `splat + FMA`. For the `matvec` special case, I was thinking of relaxing the semantics of `vector.outerproduct` to take either `1dx1d -> 2d`, `scalar x 1d -> 1d` and `1d x scalar -> 1d`. Thoughts?
mlir/lib/Dialect/Vector/VectorTransforms.cpp
1766	Great, thanks for keeping this consistent with the contraction lowering.
1774	@ftynse this is similar to the `vector.outerproduct` case for which there is additional fallback logic to degrade to the FMA (now called Dot) version.
1781	I'd say let's use `getDimSize` everywhere (including the `outerproduct` part).
1804	With`vector.outerproduct` vector/scalar polymorphism, this piece of code would be exactly l. 1720: for (unsigned k = 0; k < reductionSize; ++k) { Value a = rewriter.create<vector::ExtractOp>(op.getLoc(), lhs, k); Value b = rewriter.create<vector::ExtractOp>(op.getLoc(), rhs, k); res = rewriter.create<vector::OuterProductOp>(op.getLoc(), a, b, res); } rewriter.replaceOp(op, res); In other words, AXPY is a non-progressive lowering to broadcast + FMA. Making it progressive would allow some nice refactoring IMO, but it requires agreeing that `vector.outerproduct` should work with either `1dx1d -> 2d`, `scalar x 1d -> 1d` and `1d x scalar -> 1d` forms. @aartbik @ftynse @mehdi_amini thoughts?

ftynse added inline comments.Jul 2 2020, 4:26 AM

mlir/lib/Dialect/Vector/VectorTransforms.cpp
1774	It wouldn't anyhow affect the logic. You can fail the pattern in `matchAndRewrite` and the next pattern will be applied. The default implementation of matchAndRewrite is literally calling match and rewrite in a row, https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/PatternMatch.h#L146.
1804	but it requires agreeing that vector.outerproduct should work with either 1dx1d -> 2d, scalar x 1d -> 1d and 1d x scalar -> 1d forms. I won't object to this, sounds like a reasonable generalization

aartbik marked 13 inline comments as done.Jul 2 2020, 12:38 PM

aartbik added inline comments.

mlir/include/mlir/Dialect/Vector/VectorOps.h
56	Yes, I agree. If we allow scalars in outerproduct and let that "degenerate" to axpy I can unify this code quite a bit. And I agree that having fewer cases and more progressive lowering will compose much better in the end.
mlir/lib/Dialect/Vector/VectorTransforms.cpp
1774	For now, I kept the three special cases of contract lowering in this form.
1804	That's the plan (since dotproduct with a vector/scalar is just a loop around axpy).

addressed comments

Harbormaster completed remote builds in B62741: Diff 275201.Jul 2 2020, 1:31 PM

Closed by commit rGee01c7a74063: [mlir] [VectorOps] Add choice between dot and axpy lowering of vector.contract (authored by aartbik). · Explain WhyJul 2 2020, 1:31 PM

This revision was automatically updated to reflect the committed changes.

mehdi_amini added inline comments.Jul 4 2020, 9:56 PM

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	Nit: we don't add usernames in TODO in LLVM.

rriddle added inline comments.Jul 6 2020, 6:51 PM

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	Unresolved?
mlir/lib/Dialect/Vector/VectorTransforms.cpp
1734	Please do not add usernames in TODOs, same below.

mehdi_amini added inline comments.Jul 6 2020, 9:28 PM

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	(To be fair: it was a post-commit review)

aartbik marked an inline comment as done.Jul 6 2020, 9:33 PM

aartbik added inline comments.

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	and, it is not the first named TODO we added to the code :-) I count 437 right now under mlir, I even count 21 TODO(riverriddle).... But, I can keep this in mind and cleanup going forward....

mehdi_amini added inline comments.Jul 6 2020, 9:35 PM

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	These likely predates the integration into LLVM, when MLIR was developed inside Google

rriddle added inline comments.Jul 6 2020, 10:02 PM

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	Exactly. Also, patches welcome if you want to do something about it instead of just counting :)

rriddle added inline comments.Jul 7 2020, 1:43 AM

mlir/include/mlir/Dialect/Vector/VectorTransforms.h
144	Went ahead and did it for you in: 9db53a182705ac1f652c6ee375735bea5539272c

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

VectorOps.h

8 lines

VectorTransforms.h

33 lines

lib/

Dialect/

Vector/

VectorTransforms.cpp

131 lines

test/

Dialect/

Vector/

vector-contract-matvec-transforms.mlir

163 lines

lib/

Transforms/

TestVectorTransforms.cpp

10 lines

Diff 275218

mlir/include/mlir/Dialect/Vector/VectorOps.h

	Show All 40 Lines
	/// use for "slices" ops), this lowering removes all tuple related			/// use for "slices" ops), this lowering removes all tuple related
	/// operations as well (through DCE and folding). If tuple values			/// operations as well (through DCE and folding). If tuple values
	/// "leak" coming in, however, some tuple related ops will remain.			/// "leak" coming in, however, some tuple related ops will remain.
	void populateVectorSlicesLoweringPatterns(OwningRewritePatternList &patterns,			void populateVectorSlicesLoweringPatterns(OwningRewritePatternList &patterns,
	MLIRContext *context);			MLIRContext *context);

	/// Enum to control the lowering of `vector.contract` operations.			/// Enum to control the lowering of `vector.contract` operations.
	enum class VectorContractLowering {			enum class VectorContractLowering {
	/// Progressively lower to finer grained `vector.contract` and `vector.fma`.			/// Progressively lower to finer grained `vector.contract` and dot-products.
	FMA = 0,			Dot = 0,
	/// Lower to `vector.matrix_multiply`, maps 1-1 to LLVM matrix intrinsics.			/// Lower to `vector.matrix_multiply`, maps 1-1 to LLVM matrix intrinsics.
	Matmul = 1,			Matmul = 1,
	/// Lower to `vector.outerproduct`.			/// Lower to `vector.outerproduct`.
	OuterProduct = 2,			OuterProduct = 2,
				/// Lower to series of AXPY chained through FMA.
				AXPY = 3,
				nicolasvasilacheUnsubmitted Done Reply Inline Actions AXPY and OuterProduct are fundamentally the same strategy in the `2dx2d -> 2d` and `1dx2d->1d / 2dx1d->2d`, respectively. It would therefore be good to use a single entry point. Going progressively through `vector.outerproduct` can be beneficial to compose with patterns that operate on `vector.outerproduct`. `vector.outerproduct` then lowers to the appropriate `extract/insert` + `splat + FMA`. For the `matvec` special case, I was thinking of relaxing the semantics of `vector.outerproduct` to take either `1dx1d -> 2d`, `scalar x 1d -> 1d` and `1d x scalar -> 1d`. Thoughts? nicolasvasilache: AXPY and OuterProduct are fundamentally the same strategy in the `2dx2d -> 2d` and `1dx2d->1d /…
				aartbikAuthorUnsubmitted Done Reply Inline Actions Yes, I agree. If we allow scalars in outerproduct and let that "degenerate" to axpy I can unify this code quite a bit. And I agree that having fewer cases and more progressive lowering will compose much better in the end. aartbik: Yes, I agree. If we allow scalars in outerproduct and let that "degenerate" to axpy I can…
	};			};
	/// Enum to control the lowering of `vector.transpose` operations.			/// Enum to control the lowering of `vector.transpose` operations.
	enum class VectorTransposeLowering {			enum class VectorTransposeLowering {
	// Lower transpose into element-wise extract and inserts.			// Lower transpose into element-wise extract and inserts.
	EltWise = 0,			EltWise = 0,
	/// Lower 2-D transpose to `vector.flat_transpose`, maps 1-1 to LLVM matrix			/// Lower 2-D transpose to `vector.flat_transpose`, maps 1-1 to LLVM matrix
	/// intrinsics.			/// intrinsics.
	Flat = 1,			Flat = 1,
	};			};
	/// Structure to control the behavior of vector transform patterns.			/// Structure to control the behavior of vector transform patterns.
	struct VectorTransformsOptions {			struct VectorTransformsOptions {
	VectorContractLowering vectorContractLowering = VectorContractLowering::FMA;			VectorContractLowering vectorContractLowering = VectorContractLowering::Dot;
	VectorTransposeLowering vectorTransposeLowering =			VectorTransposeLowering vectorTransposeLowering =
	VectorTransposeLowering::EltWise;			VectorTransposeLowering::EltWise;
	VectorTransformsOptions &			VectorTransformsOptions &
	setVectorTransformsOptions(VectorContractLowering opt) {			setVectorTransformsOptions(VectorContractLowering opt) {
	vectorContractLowering = opt;			vectorContractLowering = opt;
	return *this;			return *this;
	}			}
	};			};
	Show All 39 Lines

mlir/include/mlir/Dialect/Vector/VectorTransforms.h

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	public:
void rewrite(vector::ContractionOp op,		void rewrite(vector::ContractionOp op,
PatternRewriter &rewriter) const override;		PatternRewriter &rewriter) const override;

private:		private:
/// Options to control the vector patterns.		/// Options to control the vector patterns.
vector::VectorTransformsOptions vectorTransformsOptions;		vector::VectorTransformsOptions vectorTransformsOptions;
};		};

		/// Progressive lowering of a `vector.contract %a, %b, %c` with
		/// matvec semantics to series of AXPY operations that are chained
		/// through FMA operations.
		///
		/// This only kicks in when VectorTransformsOptions is set to AXPY.
		//
		// TODO (ajcbik): this is very similar, but not quite the same as
		ftynseUnsubmitted Done Reply Inline Actions Ultra-nit: TODOs shouldn't go to doxygen, so use `//` instead of `///` for them. ftynse: Ultra-nit: TODOs shouldn't go to doxygen, so use `//` instead of `///` for them.
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions Nit: we don't add usernames in TODO in LLVM. mehdi_amini: Nit: we don't add usernames in TODO in LLVM.
		rriddleUnsubmitted Not Done Reply Inline Actions Unresolved? rriddle: Unresolved?
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions (To be fair: it was a post-commit review) mehdi_amini: (To be fair: it was a post-commit review)
		aartbikAuthorUnsubmitted Done Reply Inline Actions and, it is not the first named TODO we added to the code :-) I count 437 right now under mlir, I even count 21 TODO(riverriddle).... But, I can keep this in mind and cleanup going forward.... aartbik: and, it is not the first named TODO we added to the code :-) I count 437 right now under mlir…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions These likely predates the integration into LLVM, when MLIR was developed inside Google mehdi_amini: These likely predates the integration into LLVM, when MLIR was developed inside Google
		rriddleUnsubmitted Not Done Reply Inline Actions Exactly. Also, patches welcome if you want to do something about it instead of just counting :) rriddle: Exactly. Also, patches welcome if you want to do something about it instead of just counting :)
		rriddleUnsubmitted Not Done Reply Inline Actions Went ahead and did it for you in: 9db53a182705ac1f652c6ee375735bea5539272c rriddle: Went ahead and did it for you in: 9db53a182705ac1f652c6ee375735bea5539272c
		// the outerproduct lowering above; merge the two?
		class ContractionOpToAXPYLowering
		: public OpRewritePattern<vector::ContractionOp> {
		public:
		using OpRewritePattern<vector::ContractionOp>::OpRewritePattern;
		ContractionOpToAXPYLowering(
		vector::VectorTransformsOptions vectorTransformsOptions,
		MLIRContext *context)
		: OpRewritePattern<vector::ContractionOp>(context),
		vectorTransformsOptions(vectorTransformsOptions) {}

		LogicalResult match(vector::ContractionOp op) const override;
		void rewrite(vector::ContractionOp op,
		PatternRewriter &rewriter) const override;

		private:
		/// Options to control the vector patterns.
		vector::VectorTransformsOptions vectorTransformsOptions;
		};

/// Progressive lowering of ContractionOp.		/// Progressive lowering of ContractionOp.
///		///
/// One:		/// One:
/// %x = vector.contract with at least one free/batch dimension		/// %x = vector.contract with at least one free/batch dimension
/// is replaced by:		/// is replaced by:
/// %a = vector.contract with one less free/batch dimension		/// %a = vector.contract with one less free/batch dimension
/// %b = vector.contract with one less free/batch dimension		/// %b = vector.contract with one less free/batch dimension
/// ..		/// ..
/// %x = combine %a %b ..		/// %x = combine %a %b ..
/// until a pure contraction is reached (no free/batch dimensions),		/// until a pure contraction is reached (no free/batch dimensions),
/// which is replaced by a fma/reduction op.		/// which is replaced by a dot-product.
///		///
/// This only kicks in when either VectorTransformsOptions is set to FMA or when		/// This only kicks in when either VectorTransformsOptions is set
/// other contraction patterns fail.		/// to Dot or when other contraction patterns fail.
class ContractionOpLowering : public OpRewritePattern<vector::ContractionOp> {		class ContractionOpLowering : public OpRewritePattern<vector::ContractionOp> {
public:		public:
using OpRewritePattern<vector::ContractionOp>::OpRewritePattern;		using OpRewritePattern<vector::ContractionOp>::OpRewritePattern;

ContractionOpLowering(vector::VectorTransformsOptions vectorTransformsOptions,		ContractionOpLowering(vector::VectorTransformsOptions vectorTransformsOptions,
MLIRContext *context)		MLIRContext *context)
: OpRewritePattern<vector::ContractionOp>(context),		: OpRewritePattern<vector::ContractionOp>(context),
vectorTransformsOptions(vectorTransformsOptions) {}		vectorTransformsOptions(vectorTransformsOptions) {}
Show All 18 Lines

mlir/lib/Dialect/Vector/VectorTransforms.cpp

Show First 20 Lines • Show All 1,554 Lines • ▼ Show 20 Lines
/// This only kicks in when VectorTransformsOptions is set to OuterProduct and		/// This only kicks in when VectorTransformsOptions is set to OuterProduct and
/// the vector.contract op is a row-major matrix multiply.		/// the vector.contract op is a row-major matrix multiply.
LogicalResult		LogicalResult
ContractionOpToMatmulOpLowering::match(vector::ContractionOp op) const {		ContractionOpToMatmulOpLowering::match(vector::ContractionOp op) const {
// TODO(ajcbik): implement masks		// TODO(ajcbik): implement masks
if (llvm::size(op.masks()) != 0)		if (llvm::size(op.masks()) != 0)
return failure();		return failure();

		if (vectorTransformsOptions.vectorContractLowering !=
		vector::VectorContractLowering::Matmul)
		return failure();

auto iteratorTypes = op.iterator_types().getValue();		auto iteratorTypes = op.iterator_types().getValue();
if (!isParallelIterator(iteratorTypes[0]) \|\|		if (!isParallelIterator(iteratorTypes[0]) \|\|
!isParallelIterator(iteratorTypes[1]) \|\|		!isParallelIterator(iteratorTypes[1]) \|\|
!isReductionIterator(iteratorTypes[2]))		!isReductionIterator(iteratorTypes[2]))
return failure();		return failure();

if (vectorTransformsOptions.vectorContractLowering !=		if (!isRowMajorMatmul(op.indexing_maps()))
vector::VectorContractLowering::Matmul \|\|
!isRowMajorMatmul(op.indexing_maps()))
return failure();		return failure();

return success();		return success();
}		}

void ContractionOpToMatmulOpLowering::rewrite(vector::ContractionOp op,		void ContractionOpToMatmulOpLowering::rewrite(vector::ContractionOp op,
PatternRewriter &rewriter) const {		PatternRewriter &rewriter) const {
VectorType lhsType = op.getLhsType();		VectorType lhsType = op.getLhsType();
VectorType rhsType = op.getRhsType();		VectorType rhsType = op.getRhsType();
unsigned lhsRows = op.getLhsType().getShape()[0];		int64_t lhsRows = lhsType.getDimSize(0);
unsigned lhsColumns = op.getLhsType().getShape()[1];		int64_t lhsColumns = lhsType.getDimSize(1);
unsigned rhsColumns = op.getRhsType().getShape()[1];		int64_t rhsColumns = rhsType.getDimSize(1);

Type flattenedLHSType =		Type flattenedLHSType =
VectorType::get(lhsType.getNumElements(), lhsType.getElementType());		VectorType::get(lhsType.getNumElements(), lhsType.getElementType());
Type flattenedRHSType =		Type flattenedRHSType =
VectorType::get(rhsType.getNumElements(), rhsType.getElementType());		VectorType::get(rhsType.getNumElements(), rhsType.getElementType());
auto lhs = rewriter.create<vector::ShapeCastOp>(op.getLoc(), flattenedLHSType,		auto lhs = rewriter.create<vector::ShapeCastOp>(op.getLoc(), flattenedLHSType,
op.lhs());		op.lhs());
auto rhs = rewriter.create<vector::ShapeCastOp>(op.getLoc(), flattenedRHSType,		auto rhs = rewriter.create<vector::ShapeCastOp>(op.getLoc(), flattenedRHSType,
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (maps != infer({{m, k}, {k, n}, {m, n}}) &&
maps != infer({{k, m}, {n, k}, {n, m}}))		maps != infer({{k, m}, {n, k}, {n, m}}))
return failure();		return failure();
return success();		return success();
}		}

void ContractionOpToOuterProductOpLowering::rewrite(		void ContractionOpToOuterProductOpLowering::rewrite(
vector::ContractionOp op, PatternRewriter &rewriter) const {		vector::ContractionOp op, PatternRewriter &rewriter) const {
Location loc = op.getLoc();		Location loc = op.getLoc();
unsigned reductionSize = 0;		int64_t reductionSize = 0;
VectorType lhsType = op.getLhsType();		VectorType lhsType = op.getLhsType();
Value lhs = op.lhs(), rhs = op.rhs(), res = op.acc();		Value lhs = op.lhs(), rhs = op.rhs(), res = op.acc();

// Transpose arguments to make them ready for lowering to OuterProduct. The		// Transpose arguments to make them ready for lowering to OuterProduct. The
// constraint to match is that we must load full rows at a time with		// constraint to match is that we must load full rows at a time with
// vector::ExtractOp.		// vector::ExtractOp.
using MapList = ArrayRef<ArrayRef<AffineExpr>>;		using MapList = ArrayRef<ArrayRef<AffineExpr>>;
auto infer = [](MapList m) { return AffineMap::inferFromExprList(m); };		auto infer = [](MapList m) { return AffineMap::inferFromExprList(m); };
AffineExpr m, n, k;		AffineExpr m, n, k;
bindDims(rewriter.getContext(), m, n, k);		bindDims(rewriter.getContext(), m, n, k);
SmallVector<int64_t, 2> perm{1, 0};		SmallVector<int64_t, 2> perm{1, 0};
SmallVector<AffineMap, 4> maps = op.getIndexingMaps();		SmallVector<AffineMap, 4> maps = op.getIndexingMaps();
// First batch of cases, no need to output permute.		// First batch of cases, no need to output permute.
if (maps == infer({{m, k}, {k, n}, {m, n}})) {		if (maps == infer({{m, k}, {k, n}, {m, n}})) {
// This is the classical row-major matmul. Just permute the lhs.		// This is the classical row-major matmul. Just permute the lhs.
reductionSize = lhsType.getShape()[1];		reductionSize = lhsType.getDimSize(1);
lhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);		lhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);
} else if (maps == infer({{m, k}, {n, k}, {m, n}})) {		} else if (maps == infer({{m, k}, {n, k}, {m, n}})) {
// TODO: may be better to fail and use some vector<k> -> scalar reduction.		// TODO: may be better to fail and use some vector<k> -> scalar reduction.
reductionSize = lhsType.getShape()[1];		reductionSize = lhsType.getDimSize(1);
lhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);		lhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);
rhs = rewriter.create<vector::TransposeOp>(loc, rhs, perm);		rhs = rewriter.create<vector::TransposeOp>(loc, rhs, perm);
} else if (maps == infer({{k, m}, {k, n}, {m, n}})) {		} else if (maps == infer({{k, m}, {k, n}, {m, n}})) {
// No need to permute anything.		// No need to permute anything.
reductionSize = lhsType.getShape()[0];		reductionSize = lhsType.getDimSize(0);
} else if (maps == infer({{k, m}, {n, k}, {m, n}})) {		} else if (maps == infer({{k, m}, {n, k}, {m, n}})) {
// Just permute the rhs.		// Just permute the rhs.
reductionSize = lhsType.getShape()[0];		reductionSize = lhsType.getDimSize(0);
rhs = rewriter.create<vector::TransposeOp>(loc, rhs, perm);		rhs = rewriter.create<vector::TransposeOp>(loc, rhs, perm);
}		}
// Second batch of cases, reshuffle to avoid output permute.		// Second batch of cases, reshuffle to avoid output permute.
else if (maps == infer({{m, k}, {k, n}, {n, m}})) {		else if (maps == infer({{m, k}, {k, n}, {n, m}})) {
// This is the classical row-major matmul. Just permute the lhs.		// This is the classical row-major matmul. Just permute the lhs.
reductionSize = lhsType.getShape()[1];		reductionSize = lhsType.getDimSize(1);
Value tmp = rhs;		Value tmp = rhs;
rhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);		rhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);
lhs = tmp;		lhs = tmp;
} else if (maps == infer({{m, k}, {n, k}, {n, m}})) {		} else if (maps == infer({{m, k}, {n, k}, {n, m}})) {
// TODO: may be better to fail and use some vector<k> -> scalar reduction.		// TODO: may be better to fail and use some vector<k> -> scalar reduction.
reductionSize = lhsType.getShape()[1];		reductionSize = lhsType.getDimSize(1);
Value tmp = rhs;		Value tmp = rhs;
rhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);		rhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);
lhs = rewriter.create<vector::TransposeOp>(loc, tmp, perm);		lhs = rewriter.create<vector::TransposeOp>(loc, tmp, perm);
} else if (maps == infer({{k, m}, {k, n}, {n, m}})) {		} else if (maps == infer({{k, m}, {k, n}, {n, m}})) {
// No need to permute anything, but still swap lhs and rhs.		// No need to permute anything, but still swap lhs and rhs.
reductionSize = lhsType.getShape()[0];		reductionSize = lhsType.getDimSize(0);
std::swap(lhs, rhs);		std::swap(lhs, rhs);
} else if (maps == infer({{k, m}, {n, k}, {n, m}})) {		} else if (maps == infer({{k, m}, {n, k}, {n, m}})) {
// Just permute the rhs.		// Just permute the rhs.
reductionSize = lhsType.getShape()[0];		reductionSize = lhsType.getDimSize(0);
Value tmp = lhs;		Value tmp = lhs;
lhs = rewriter.create<vector::TransposeOp>(loc, rhs, perm);		lhs = rewriter.create<vector::TransposeOp>(loc, rhs, perm);
rhs = tmp;		rhs = tmp;
}		}
assert(reductionSize > 0);		assert(reductionSize > 0);

// ExtractOp does not allow dynamic indexing, we must unroll explicitly.		// ExtractOp does not allow dynamic indexing, we must unroll explicitly.
for (unsigned k = 0; k < reductionSize; ++k) {		for (unsigned k = 0; k < reductionSize; ++k) {
Value a = rewriter.create<vector::ExtractOp>(op.getLoc(), lhs, k);		Value a = rewriter.create<vector::ExtractOp>(op.getLoc(), lhs, k);
Value b = rewriter.create<vector::ExtractOp>(op.getLoc(), rhs, k);		Value b = rewriter.create<vector::ExtractOp>(op.getLoc(), rhs, k);
res = rewriter.create<vector::OuterProductOp>(op.getLoc(), a, b, res);		res = rewriter.create<vector::OuterProductOp>(op.getLoc(), a, b, res);
}		}
rewriter.replaceOp(op, res);		rewriter.replaceOp(op, res);
}		}

		/// Progressive lowering of a `vector.contract %a, %b, %c` with
		/// matvec semantics to series of AXPY operations that are chained
		/// through FMA operations.
		///
		/// This only kicks in when VectorTransformsOptions is set to AXPY.
		//
		// TODO (ajcbik): this is very similar, but not quite the same as
		rriddleUnsubmitted Not Done Reply Inline Actions Please do not add usernames in TODOs, same below. rriddle: Please do not add usernames in TODOs, same below.
		// the outerproduct lowering above; merge the two?
		LogicalResult
		ContractionOpToAXPYLowering::match(vector::ContractionOp op) const {
		// TODO(ajcbik): implement masks
		if (llvm::size(op.masks()) != 0)
		return failure();

		if (vectorTransformsOptions.vectorContractLowering !=
		vector::VectorContractLowering::AXPY)
		return failure();

		auto iteratorTypes = op.iterator_types().getValue();
		if (!isParallelIterator(iteratorTypes[0]) \|\|
		!isReductionIterator(iteratorTypes[1]))
		return failure();

		// See if a series of AXPY operations chained through FMA operations
		// could replace the default DOT implementation.
		using MapList = ArrayRef<ArrayRef<AffineExpr>>;
		auto infer = [](MapList m) { return AffineMap::inferFromExprList(m); };
		AffineExpr m, n;
		bindDims(op.getContext(), m, n);
		SmallVector<AffineMap, 4> maps = op.getIndexingMaps();
		if (maps != infer({{m, n}, {n}, {m}}) && // mat-vec
		maps != infer({{n, m}, {n}, {m}}) && // mat-trans-vec
		maps != infer({{n}, {m, n}, {m}}) && // vec-mat
		maps != infer({{n}, {n, m}, {m}})) // vec-mat-trans
		return failure();
		return success();
		}

		void ContractionOpToAXPYLowering::rewrite(vector::ContractionOp op,
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Great, thanks for keeping this consistent with the contraction lowering. nicolasvasilache: Great, thanks for keeping this consistent with the contraction lowering.
		PatternRewriter &rewriter) const {
		Location loc = op.getLoc();
		VectorType lhsType = op.getLhsType();
		Value lhs = op.lhs(), rhs = op.rhs(), res = op.acc();

		using MapList = ArrayRef<ArrayRef<AffineExpr>>;
		auto infer = [](MapList m) { return AffineMap::inferFromExprList(m); };
		AffineExpr m, n;
		ftynseUnsubmitted Done Reply Inline Actions Would it make sense to have a fused `matchAndRewrite` and avoid duplicate checks here? We can just have the trailing `else` below that returns failure(). ftynse: Would it make sense to have a fused `matchAndRewrite` and avoid duplicate checks here? We can…
		nicolasvasilacheUnsubmitted Done Reply Inline Actions @ftynse this is similar to the `vector.outerproduct` case for which there is additional fallback logic to degrade to the FMA (now called Dot) version. nicolasvasilache: @ftynse this is similar to the `vector.outerproduct` case for which there is additional…
		ftynseUnsubmitted Done Reply Inline Actions It wouldn't anyhow affect the logic. You can fail the pattern in `matchAndRewrite` and the next pattern will be applied. The default implementation of matchAndRewrite is literally calling match and rewrite in a row, https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/IR/PatternMatch.h#L146. ftynse: It wouldn't anyhow affect the logic. You can fail the pattern in `matchAndRewrite` and the next…
		aartbikAuthorUnsubmitted Done Reply Inline Actions For now, I kept the three special cases of contract lowering in this form. aartbik: For now, I kept the three special cases of contract lowering in this form.
		bindDims(op.getContext(), m, n);
		SmallVector<int64_t, 2> perm{1, 0};
		//
		SmallVector<AffineMap, 4> maps = op.getIndexingMaps();
		int64_t reductionSize = 0;
		if (maps == infer({{m, n}, {n}, {m}})) {
		// Case mat-vec: transpose.
		ftynseUnsubmitted Done Reply Inline Actions Nit: why is this using getDimSize(), but everything below uses getShape()[] ? ftynse: Nit: why is this using getDimSize(), but everything below uses getShape()[] ?
		nicolasvasilacheUnsubmitted Done Reply Inline Actions I'd say let's use `getDimSize` everywhere (including the `outerproduct` part). nicolasvasilache: I'd say let's use `getDimSize` everywhere (including the `outerproduct` part).
		reductionSize = lhsType.getDimSize(1);
		lhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);
		} else if (maps == infer({{n, m}, {n}, {m}})) {
		// Case mat-trans-vec: ready to go.
		reductionSize = lhsType.getDimSize(0);
		} else if (maps == infer({{n}, {m, n}, {m}})) {
		// Case vec-mat: swap and transpose.
		reductionSize = lhsType.getDimSize(0);
		std::swap(lhs, rhs);
		lhs = rewriter.create<vector::TransposeOp>(loc, lhs, perm);
		} else if (maps == infer({{n}, {n, m}, {m}})) {
		// Case vec-mat-trans: swap and ready to go.
		reductionSize = lhsType.getDimSize(0);
		std::swap(lhs, rhs);
		}
		assert(reductionSize > 0);

		// A direct series of AXPY operations, chained through FMA.
		Type resType = op.getResultType();
		for (int64_t k = 0; k < reductionSize; ++k) {
		Value a = rewriter.create<vector::ExtractOp>(loc, lhs, k);
		Value s = rewriter.create<vector::ExtractOp>(loc, rhs, k);
		Value b = rewriter.create<vector::BroadcastOp>(loc, resType, s);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions With`vector.outerproduct` vector/scalar polymorphism, this piece of code would be exactly l. 1720: for (unsigned k = 0; k < reductionSize; ++k) { Value a = rewriter.create<vector::ExtractOp>(op.getLoc(), lhs, k); Value b = rewriter.create<vector::ExtractOp>(op.getLoc(), rhs, k); res = rewriter.create<vector::OuterProductOp>(op.getLoc(), a, b, res); } rewriter.replaceOp(op, res); In other words, AXPY is a non-progressive lowering to broadcast + FMA. Making it progressive would allow some nice refactoring IMO, but it requires agreeing that `vector.outerproduct` should work with either `1dx1d -> 2d`, `scalar x 1d -> 1d` and `1d x scalar -> 1d` forms. @aartbik @ftynse @mehdi_amini thoughts? nicolasvasilache: With`vector.outerproduct` vector/scalar polymorphism, this piece of code would be exactly l.
		ftynseUnsubmitted Done Reply Inline Actions but it requires agreeing that vector.outerproduct should work with either 1dx1d -> 2d, scalar x 1d -> 1d and 1d x scalar -> 1d forms. I won't object to this, sounds like a reasonable generalization ftynse: > but it requires agreeing that vector.outerproduct should work with either 1dx1d -> 2d…
		aartbikAuthorUnsubmitted Done Reply Inline Actions That's the plan (since dotproduct with a vector/scalar is just a loop around axpy). aartbik: That's the plan (since dotproduct with a vector/scalar is just a loop around axpy).
		res = rewriter.create<vector::FMAOp>(loc, a, b, res);
		}
		rewriter.replaceOp(op, res);
		}

/// Progressive lowering of ContractionOp.		/// Progressive lowering of ContractionOp.
/// One:		/// One:
/// %x = vector.contract with at least one free/batch dimension		/// %x = vector.contract with at least one free/batch dimension
/// is replaced by:		/// is replaced by:
/// %a = vector.contract with one less free/batch dimension		/// %a = vector.contract with one less free/batch dimension
/// %b = vector.contract with one less free/batch dimension		/// %b = vector.contract with one less free/batch dimension
/// ..		/// ..
/// %x = combine %a %b ..		/// %x = combine %a %b ..
/// until a pure contraction is reached (no free/batch dimensions),		/// until a pure contraction is reached (no free/batch dimensions),
/// which is replaced by a dot-product/reduction pair.		/// which is replaced by a dot-product.
///		///
/// TODO(ajcbik): break down into transpose/reshape/cast ops		/// This only kicks in when either VectorTransformsOptions is set
/// when they become available to avoid code dup		/// to DOT or when other contraction patterns fail.
/// TODO(ajcbik): investigate lowering order impact on performance		//
		// TODO(ajcbik): break down into transpose/reshape/cast ops
		// when they become available to avoid code dup
		// TODO(ajcbik): investigate lowering order impact on performance
LogicalResult		LogicalResult
ContractionOpLowering::matchAndRewrite(vector::ContractionOp op,		ContractionOpLowering::matchAndRewrite(vector::ContractionOp op,
PatternRewriter &rewriter) const {		PatternRewriter &rewriter) const {

// TODO(ajcbik): implement masks.		// TODO(ajcbik): implement masks.
if (llvm::size(op.masks()) != 0)		if (llvm::size(op.masks()) != 0)
return failure();		return failure();
// TODO(thomasraoux): support mixed mode contract lowering.		// TODO(thomasraoux): support mixed mode contract lowering.
if (op.getLhsType().getElementType() !=		if (op.getLhsType().getElementType() !=
getElementTypeOrSelf(op.getAccType()) \|\|		getElementTypeOrSelf(op.getAccType()) \|\|
op.getRhsType().getElementType() != getElementTypeOrSelf(op.getAccType()))		op.getRhsType().getElementType() != getElementTypeOrSelf(op.getAccType()))
return failure();		return failure();

// TODO(ntv, ajcbik): implement benefits, cost models.		// TODO(ntv, ajcbik): implement benefits, cost models.
MLIRContext *ctx = op.getContext();		MLIRContext *ctx = op.getContext();
ContractionOpToMatmulOpLowering pat1(vectorTransformsOptions, ctx);		ContractionOpToMatmulOpLowering pat1(vectorTransformsOptions, ctx);
if (succeeded(pat1.match(op)))		if (succeeded(pat1.match(op)))
return failure();		return failure();
ContractionOpToOuterProductOpLowering pat2(vectorTransformsOptions, ctx);		ContractionOpToOuterProductOpLowering pat2(vectorTransformsOptions, ctx);
if (succeeded(pat2.match(op)))		if (succeeded(pat2.match(op)))
return failure();		return failure();
		ContractionOpToAXPYLowering pat3(vectorTransformsOptions, ctx);
		if (succeeded(pat3.match(op)))
		return failure();

// Find first batch dimension in LHS/RHS, and lower when found.		// Find first batch dimension in LHS/RHS, and lower when found.
std::vector<std::pair<int64_t, int64_t>> batchDimMap = op.getBatchDimMap();		std::vector<std::pair<int64_t, int64_t>> batchDimMap = op.getBatchDimMap();
if (!batchDimMap.empty()) {		if (!batchDimMap.empty()) {
int64_t lhsIndex = batchDimMap[0].first;		int64_t lhsIndex = batchDimMap[0].first;
int64_t rhsIndex = batchDimMap[0].second;		int64_t rhsIndex = batchDimMap[0].second;
rewriter.replaceOp(op, lowerParallel(op, lhsIndex, rhsIndex, rewriter));		rewriter.replaceOp(op, lowerParallel(op, lhsIndex, rhsIndex, rewriter));
return success();		return success();
▲ Show 20 Lines • Show All 169 Lines • ▼ Show 20 Lines	patterns.insert<BroadcastOpLowering,
ConstantMaskOpLowering,		ConstantMaskOpLowering,
OuterProductOpLowering,		OuterProductOpLowering,
ShapeCastOp2DDownCastRewritePattern,		ShapeCastOp2DDownCastRewritePattern,
ShapeCastOp2DUpCastRewritePattern,		ShapeCastOp2DUpCastRewritePattern,
ShapeCastOpRewritePattern>(context);		ShapeCastOpRewritePattern>(context);
patterns.insert<TransposeOpLowering,		patterns.insert<TransposeOpLowering,
ContractionOpLowering,		ContractionOpLowering,
ContractionOpToMatmulOpLowering,		ContractionOpToMatmulOpLowering,
ContractionOpToOuterProductOpLowering>(parameters, context);		ContractionOpToOuterProductOpLowering,
		ContractionOpToAXPYLowering>(parameters, context);
// clang-format on		// clang-format on
}		}

mlir/test/Dialect/Vector/vector-contract-matvec-transforms.mlir

This file was added.

				// RUN: mlir-opt %s -test-vector-contraction-conversion=vector-axpy=1 \| FileCheck %s

				#matvec_accesses = [
				affine_map<(i, j) -> (i, j)>,
				affine_map<(i, j) -> (j)>,
				affine_map<(i, j) -> (i)>
				]
				#matvec_trait = {
				indexing_maps = #matvec_accesses,
				iterator_types = ["parallel", "reduction"]
				}

				#mattransvec_accesses = [
				affine_map<(i, j) -> (j, i)>,
				affine_map<(i, j) -> (j)>,
				affine_map<(i, j) -> (i)>
				]
				#mattransvec_trait = {
				indexing_maps = #mattransvec_accesses,
				iterator_types = ["parallel", "reduction"]
				}

				#vecmat_accesses = [
				affine_map<(i, j) -> (j)>,
				affine_map<(i, j) -> (i, j)>,
				affine_map<(i, j) -> (i)>
				]
				#vecmat_trait = {
				indexing_maps = #vecmat_accesses,
				iterator_types = ["parallel", "reduction"]
				}

				#vecmattrans_accesses = [
				affine_map<(i, j) -> (j)>,
				affine_map<(i, j) -> (j, i)>,
				affine_map<(i, j) -> (i)>
				]
				#vecmattrans_trait = {
				indexing_maps = #vecmattrans_accesses,
				iterator_types = ["parallel", "reduction"]
				}

				// CHECK-LABEL: func @matvec2x2
				// CHECK-SAME: %[[A:.*0]]: memref<vector<2x2xf32>>
				// CHECK-SAME: %[[B:.*1]]: memref<vector<2xf32>>
				// CHECK-SAME: %[[C:.*2]]: memref<vector<2xf32>>
				// CHECK: %[[C0:.*]] = constant dense<0.000000e+00> : vector<2x2xf32>
				// CHECK: %[[T0:.*]] = load %[[A]][] : memref<vector<2x2xf32>>
				// CHECK: %[[T1:.*]] = load %[[B]][] : memref<vector<2xf32>>
				// CHECK: %[[T2:.*]] = load %[[C]][] : memref<vector<2xf32>>
				// CHECK: %[[T3:.*]] = vector.extract %[[T0]][0, 0] : vector<2x2xf32>
				// CHECK: %[[T4:.*]] = vector.insert %[[T3]], %[[C0]] [0, 0] : f32 into vector<2x2xf32>
				// CHECK: %[[T5:.*]] = vector.extract %[[T0]][1, 0] : vector<2x2xf32>
				// CHECK: %[[T6:.*]] = vector.insert %[[T5]], %[[T4]] [0, 1] : f32 into vector<2x2xf32>
				// CHECK: %[[T7:.*]] = vector.extract %[[T0]][0, 1] : vector<2x2xf32>
				// CHECK: %[[T8:.*]] = vector.insert %[[T7]], %[[T6]] [1, 0] : f32 into vector<2x2xf32>
				// CHECK: %[[T9:.*]] = vector.extract %[[T0]][1, 1] : vector<2x2xf32>
				// CHECK: %[[T10:.*]] = vector.insert %[[T9]], %[[T8]] [1, 1] : f32 into vector<2x2xf32>
				// CHECK: %[[T11:.*]] = vector.extract %[[T10]][0] : vector<2x2xf32>
				// CHECK: %[[T12:.*]] = vector.extract %[[T1]][0] : vector<2xf32>
				// CHECK: %[[T13:.*]] = splat %[[T12]] : vector<2xf32>
				// CHECK: %[[T14:.*]] = vector.fma %[[T11]], %[[T13]], %[[T2]] : vector<2xf32>
				// CHECK: %[[T15:.*]] = vector.extract %[[T10]][1] : vector<2x2xf32>
				// CHECK: %[[T16:.*]] = vector.extract %[[T1]][1] : vector<2xf32>
				// CHECK: %[[T17:.*]] = splat %[[T16]] : vector<2xf32>
				// CHECK: %[[T18:.*]] = vector.fma %[[T15]], %[[T17]], %[[T14]] : vector<2xf32>
				// CHECK: store %[[T18]], %[[C]][] : memref<vector<2xf32>>
				func @matvec2x2(%arg0: memref<vector<2x2xf32>>, %arg1: memref<vector<2xf32>>,
				%arg2: memref<vector<2xf32>>) {
				%A = load %arg0[] : memref<vector<2x2xf32>>
				%x = load %arg1[] : memref<vector<2xf32>>
				%b = load %arg2[] : memref<vector<2xf32>>
				%0 = vector.contract #matvec_trait %A, %x, %b : vector<2x2xf32>, vector<2xf32> into vector<2xf32>
				store %0, %arg2[] : memref<vector<2xf32>>
				return
				}

				// CHECK-LABEL: func @mattransvec2x2
				// CHECK-SAME: %[[A:.*0]]: memref<vector<2x2xf32>>
				// CHECK-SAME: %[[B:.*1]]: memref<vector<2xf32>>
				// CHECK-SAME: %[[C:.*2]]: memref<vector<2xf32>>
				// CHECK: %[[T0:.*]] = load %[[A]][] : memref<vector<2x2xf32>>
				// CHECK: %[[T1:.*]] = load %[[B]][] : memref<vector<2xf32>>
				// CHECK: %[[T2:.*]] = load %[[C]][] : memref<vector<2xf32>>
				// CHECK: %[[T3:.*]] = vector.extract %[[T0]][0] : vector<2x2xf32>
				// CHECK: %[[T4:.*]] = vector.extract %[[T1]][0] : vector<2xf32>
				// CHECK: %[[T5:.*]] = splat %[[T4]] : vector<2xf32>
				// CHECK: %[[T6:.*]] = vector.fma %[[T3]], %[[T5]], %[[T2]] : vector<2xf32>
				// CHECK: %[[T7:.*]] = vector.extract %[[T0]][1] : vector<2x2xf32>
				// CHECK: %[[T8:.*]] = vector.extract %[[T1]][1] : vector<2xf32>
				// CHECK: %[[T9:.*]] = splat %[[T8]] : vector<2xf32>
				// CHECK: %[[T10:.*]] = vector.fma %[[T7]], %[[T9]], %[[T6]] : vector<2xf32>
				// CHECK: store %[[T10]], %[[C]][] : memref<vector<2xf32>>
				func @mattransvec2x2(%arg0: memref<vector<2x2xf32>>, %arg1: memref<vector<2xf32>>,
				%arg2: memref<vector<2xf32>>) {
				%A = load %arg0[] : memref<vector<2x2xf32>>
				%x = load %arg1[] : memref<vector<2xf32>>
				%b = load %arg2[] : memref<vector<2xf32>>
				%0 = vector.contract #mattransvec_trait %A, %x, %b : vector<2x2xf32>, vector<2xf32> into vector<2xf32>
				store %0, %arg2[] : memref<vector<2xf32>>
				return
				}

				// CHECK-LABEL: func @vecmat2x2
				// CHECK-SAME: %[[A:.*0]]: memref<vector<2x2xf32>>
				// CHECK-SAME: %[[B:.*1]]: memref<vector<2xf32>>
				// CHECK-SAME: %[[C:.*2]]: memref<vector<2xf32>>
				// CHECK: %[[C0:.*]] = constant dense<0.000000e+00> : vector<2x2xf32>
				// CHECK: %[[T0:.*]] = load %[[A]][] : memref<vector<2x2xf32>>
				// CHECK: %[[T1:.*]] = load %[[B]][] : memref<vector<2xf32>>
				// CHECK: %[[T2:.*]] = load %[[C]][] : memref<vector<2xf32>>
				// CHECK: %[[T3:.*]] = vector.extract %[[T0]][0, 0] : vector<2x2xf32>
				// CHECK: %[[T4:.*]] = vector.insert %[[T3]], %[[C0]] [0, 0] : f32 into vector<2x2xf32>
				// CHECK: %[[T5:.*]] = vector.extract %[[T0]][1, 0] : vector<2x2xf32>
				// CHECK: %[[T6:.*]] = vector.insert %[[T5]], %[[T4]] [0, 1] : f32 into vector<2x2xf32>
				// CHECK: %[[T7:.*]] = vector.extract %[[T0]][0, 1] : vector<2x2xf32>
				// CHECK: %[[T8:.*]] = vector.insert %[[T7]], %[[T6]] [1, 0] : f32 into vector<2x2xf32>
				// CHECK: %[[T9:.*]] = vector.extract %[[T0]][1, 1] : vector<2x2xf32>
				// CHECK: %[[T10:.*]] = vector.insert %[[T9]], %[[T8]] [1, 1] : f32 into vector<2x2xf32>
				// CHECK: %[[T11:.*]] = vector.extract %[[T10]][0] : vector<2x2xf32>
				// CHECK: %[[T12:.*]] = vector.extract %[[T1]][0] : vector<2xf32>
				// CHECK: %[[T13:.*]] = splat %[[T12]] : vector<2xf32>
				// CHECK: %[[T14:.*]] = vector.fma %[[T11]], %[[T13]], %[[T2]] : vector<2xf32>
				// CHECK: %[[T15:.*]] = vector.extract %[[T10]][1] : vector<2x2xf32>
				// CHECK: %[[T16:.*]] = vector.extract %[[T1]][1] : vector<2xf32>
				// CHECK: %[[T17:.*]] = splat %[[T16]] : vector<2xf32>
				// CHECK: %[[T18:.*]] = vector.fma %[[T15]], %[[T17]], %[[T14]] : vector<2xf32>
				// CHECK: store %[[T18]], %[[C]][] : memref<vector<2xf32>>
				func @vecmat2x2(%arg0: memref<vector<2x2xf32>>, %arg1: memref<vector<2xf32>>,
				%arg2: memref<vector<2xf32>>) {
				%A = load %arg0[] : memref<vector<2x2xf32>>
				%x = load %arg1[] : memref<vector<2xf32>>
				%b = load %arg2[] : memref<vector<2xf32>>
				%0 = vector.contract #vecmat_trait %x, %A, %b : vector<2xf32>, vector<2x2xf32> into vector<2xf32>
				store %0, %arg2[] : memref<vector<2xf32>>
				return
				}

				// CHECK-LABEL: func @vecmattrans2x2
				// CHECK-SAME: %[[A:.*0]]: memref<vector<2x2xf32>>
				// CHECK-SAME: %[[B:.*1]]: memref<vector<2xf32>>
				// CHECK-SAME: %[[C:.*2]]: memref<vector<2xf32>>
				// CHECK: %[[T0:.*]] = load %[[A]][] : memref<vector<2x2xf32>>
				// CHECK: %[[T1:.*]] = load %[[B]][] : memref<vector<2xf32>>
				// CHECK: %[[T2:.*]] = load %[[C]][] : memref<vector<2xf32>>
				// CHECK: %[[T3:.*]] = vector.extract %[[T0]][0] : vector<2x2xf32>
				// CHECK: %[[T4:.*]] = vector.extract %[[T1]][0] : vector<2xf32>
				// CHECK: %[[T5:.*]] = splat %[[T4]] : vector<2xf32>
				// CHECK: %[[T6:.*]] = vector.fma %[[T3]], %[[T5]], %[[T2]] : vector<2xf32>
				// CHECK: %[[T7:.*]] = vector.extract %[[T0]][1] : vector<2x2xf32>
				// CHECK: %[[T8:.*]] = vector.extract %[[T1]][1] : vector<2xf32>
				// CHECK: %[[T9:.*]] = splat %[[T8]] : vector<2xf32>
				// CHECK: %[[T10:.*]] = vector.fma %[[T7]], %[[T9]], %[[T6]] : vector<2xf32>
				// CHECK: store %[[T10]], %[[C]][] : memref<vector<2xf32>>
				func @vecmattrans2x2(%arg0: memref<vector<2x2xf32>>, %arg1: memref<vector<2xf32>>,
				%arg2: memref<vector<2xf32>>) {
				%A = load %arg0[] : memref<vector<2x2xf32>>
				%x = load %arg1[] : memref<vector<2xf32>>
				%b = load %arg2[] : memref<vector<2xf32>>
				%0 = vector.contract #vecmattrans_trait %x, %A, %b : vector<2xf32>, vector<2x2xf32> into vector<2xf32>
				store %0, %arg2[] : memref<vector<2xf32>>
				return
				}

mlir/test/lib/Transforms/TestVectorTransforms.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	struct TestVectorContractionConversion
Option<bool> lowerToFlatTranspose{		Option<bool> lowerToFlatTranspose{
*this, "vector-flat-transpose",		*this, "vector-flat-transpose",
llvm::cl::desc("Lower 2-D vector.transpose to vector.flat_transpose"),		llvm::cl::desc("Lower 2-D vector.transpose to vector.flat_transpose"),
llvm::cl::init(false)};		llvm::cl::init(false)};
Option<bool> lowerToOuterProduct{		Option<bool> lowerToOuterProduct{
*this, "vector-outerproduct",		*this, "vector-outerproduct",
llvm::cl::desc("Lower vector.contract to vector.outerproduct"),		llvm::cl::desc("Lower vector.contract to vector.outerproduct"),
llvm::cl::init(false)};		llvm::cl::init(false)};
		Option<bool> lowerToAXPY{*this, "vector-axpy",
		llvm::cl::desc("Lower vector.contract to AXPY"),
		llvm::cl::init(false)};

void runOnFunction() override {		void runOnFunction() override {
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;

		// Test on one pattern in isolation.
if (lowerToOuterProduct) {		if (lowerToOuterProduct) {
VectorContractLowering lowering = VectorContractLowering::OuterProduct;		VectorContractLowering lowering = VectorContractLowering::OuterProduct;
VectorTransformsOptions options{lowering};		VectorTransformsOptions options{lowering};
patterns.insert<ContractionOpToOuterProductOpLowering>(options,		patterns.insert<ContractionOpToOuterProductOpLowering>(options,
&getContext());		&getContext());
applyPatternsAndFoldGreedily(getFunction(), patterns);		applyPatternsAndFoldGreedily(getFunction(), patterns);
return;		return;
}		}

VectorContractLowering contractLowering = VectorContractLowering::FMA;		// Test on all contract lowering patterns.
		VectorContractLowering contractLowering = VectorContractLowering::Dot;
if (lowerToFlatMatrix)		if (lowerToFlatMatrix)
contractLowering = VectorContractLowering::Matmul;		contractLowering = VectorContractLowering::Matmul;
		else if (lowerToAXPY)
		contractLowering = VectorContractLowering::AXPY;
VectorTransposeLowering transposeLowering =		VectorTransposeLowering transposeLowering =
VectorTransposeLowering::EltWise;		VectorTransposeLowering::EltWise;
if (lowerToFlatTranspose)		if (lowerToFlatTranspose)
transposeLowering = VectorTransposeLowering::Flat;		transposeLowering = VectorTransposeLowering::Flat;
VectorTransformsOptions options{contractLowering, transposeLowering};		VectorTransformsOptions options{contractLowering, transposeLowering};
populateVectorContractLoweringPatterns(patterns, &getContext(), options);		populateVectorContractLoweringPatterns(patterns, &getContext(), options);
applyPatternsAndFoldGreedily(getFunction(), patterns);		applyPatternsAndFoldGreedily(getFunction(), patterns);
}		}
Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Add choice between dot and axpy lowering of vector.contractClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 275218

mlir/include/mlir/Dialect/Vector/VectorOps.h

mlir/include/mlir/Dialect/Vector/VectorTransforms.h

mlir/lib/Dialect/Vector/VectorTransforms.cpp

mlir/test/Dialect/Vector/vector-contract-matvec-transforms.mlir

mlir/test/lib/Transforms/TestVectorTransforms.cpp

[mlir] [VectorOps] Add choice between dot and axpy lowering of vector.contract
ClosedPublic