This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/
-
mlir/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
-
Transforms.h
-
Utils/
-
StructuredOpsUtils.h
-
lib/Dialect/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
6/7
Vectorization.cpp
-
Vector/
-
VectorTransforms.cpp
-
test/
-
Dialect/Linalg/
-
Linalg/
-
vectorize-convolution.mlir
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
TestLinalgTransforms.cpp

Differential D111894

[mlir][Linalg] Add a first vectorization pattern for conv1d in NWCxWCF format.
ClosedPublic

Authored by nicolasvasilache on Oct 15 2021, 9:12 AM.

Download Raw Diff

Details

Reviewers

antiagainst
mravishankar
springerm
aartbik
ThomasRaoux
ftynse

Commits

rG6bb7d2474fe4: [mlir][Linalg] Add a first vectorization pattern for conv1d in NWCxWCF format.

Summary

This revision uses the newly refactored StructuredGenerator to create a simple vectorization for conv1d_nwc_wcf.

Note that the pattern is not specific to the op and is technically not even specific to the ConvolutionOpInterface (modulo minor details related to dilations and strides).

The overall design follows the same ideas as the lowering of vector::ContractionOp -> vector::OuterProduct: it seeks to be minimally complex, composable and extensible while avoiding inference analysis. Instead, we metaprogram the maps/indexings we expect and we match against them.

This is just a first stab and still needs to be evaluated for performance.
Other tradeoffs are possible that should be explored.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.Oct 15 2021, 9:12 AM

Herald added a reviewer: aartbik. · View Herald TranscriptOct 15 2021, 9:12 AM

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 19 others. · View Herald Transcript

nicolasvasilache requested review of this revision.Oct 15 2021, 9:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 15 2021, 9:12 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

nicolasvasilache mentioned this in D111722: [mlir][linalg] Add a pattern to vectorize convolution ops.Oct 15 2021, 9:15 AM

nicolasvasilache added a reviewer: ThomasRaoux.

Harbormaster completed remote builds in B129079: Diff 380032.Oct 15 2021, 9:25 AM

Very nice utilities like StructuredGenerator, iters, layout! Thanks a lot for putting this up! It's much clear to me to understand how to use them now. I just have a few questions inlined.

Do you want to land this? Or is this just for illustration purpose? I can change my patch (which also handles higher-D cases) to follow the structure here.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1465	This TODO is a bit confusing to me. We are already doing this right? You mean not?
1485	We don't need to check the region to see what is the payload computation?
1535	I think we don't want to read from `resShaped`, which is updated later at L1539; we want to read the initial output. We are updating non-opverlapping slices so it's fine to read the init output. The problem of reading from the continuously updated `resShaped` could confuse vector transformations as it causes interleaving read/write.

In D111894#3070764, @antiagainst wrote:

Very nice utilities like StructuredGenerator, iters, layout! Thanks a lot for putting this up! It's much clear to me to understand how to use them now. I just have a few questions inlined.

Do you want to land this? Or is this just for illustration purpose? I can change my patch (which also handles higher-D cases) to follow the structure here.

Yes, this is meant to land, I have another few changes incoming which allowed me to plug everything together end-to-end and benchmark.
Feel free to rebase on this and add other things that you need on top of this current structure.

nicolasvasilache marked 3 inline comments as done.Oct 19 2021, 7:24 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1465	was meant to convey: "address the fact that ..." rephrased to make it more readable.
1535	interestingly this actually improves vector transformations because the vector hoisting patterns do not work as expected if I always read from the same element but continuously write into the output: these patterns really want to see the same tensor being read/written. This is roughly a 8x performance difference.

Address review.
Refactor to properly integrate into vectorizeLinalgOp which is used by other patterns down the line.

Harbormaster completed remote builds in B129551: Diff 380691.Oct 19 2021, 7:49 AM

ftynse accepted this revision.Oct 20 2021, 6:36 AM

This revision is now accepted and ready to land.Oct 20 2021, 6:36 AM

antiagainst accepted this revision.Oct 20 2021, 7:01 AM

antiagainst added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1535	Hmm, that's not what I've seen previously... But maybe things have changed quite a bit. I'll also do some experiments on my side later.

Closed by commit rG6bb7d2474fe4: [mlir][Linalg] Add a first vectorization pattern for conv1d in NWCxWCF format. (authored by nicolasvasilache). · Explain WhyOct 20 2021, 7:03 AM

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rG6bb7d2474fe4: [mlir][Linalg] Add a first vectorization pattern for conv1d in NWCxWCF format..

nicolasvasilache added inline comments.Oct 20 2021, 12:54 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1535	note that the behavior I describe is on tensors, in your experiments, did you also do these vectorizations on tensors ? I'll send you an internal repro to see this.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Transforms.h

6 lines

Utils/

StructuredOpsUtils.h

8 lines

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

216 lines

Vector/

VectorTransforms.cpp

47 lines

test/

Dialect/

Linalg/

vectorize-convolution.mlir

108 lines

lib/

Dialect/

Linalg/

TestLinalgTransforms.cpp

1 line

Diff 380941

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

	Show All 35 Lines
	/// reshapes.			/// reshapes.
	bool skipUnitDimReshape(const OpResult &producer, OpOperand &consumer);			bool skipUnitDimReshape(const OpResult &producer, OpOperand &consumer);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Transformations exposed as function calls.			// Transformations exposed as function calls.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	using LinalgLoops = SmallVector<Operation *, 4>;			using LinalgLoops = SmallVector<Operation *, 4>;

	/// Populates patterns for vectorization of all ConvN-D ops.			/// [DEPRECATED] Populates patterns for vectorization of all ConvN-D ops.
	void populateConvVectorizationPatterns(			void populateConvVectorizationPatterns(
	MLIRContext *context, SmallVectorImpl<RewritePatternSet> &patterns,			MLIRContext *context, SmallVectorImpl<RewritePatternSet> &patterns,
	ArrayRef<int64_t> tileSizes);			ArrayRef<int64_t> tileSizes);

				/// Populates patterns for vectorizing convolution ops.
				void populateConvolutionVectorizationPatterns(RewritePatternSet &patterns,
				PatternBenefit benefit = 1);

	/// Populate patterns that convert `ElementwiseMappable` ops to linalg			/// Populate patterns that convert `ElementwiseMappable` ops to linalg
	/// parallel loops.			/// parallel loops.
	void populateElementwiseToLinalgConversionPatterns(RewritePatternSet &patterns);			void populateElementwiseToLinalgConversionPatterns(RewritePatternSet &patterns);

	/// Function type which is used to control when to stop fusion. It is expected			/// Function type which is used to control when to stop fusion. It is expected
	/// that OpOperand is not modified in the callback. The OpOperand is not marked			/// that OpOperand is not modified in the callback. The OpOperand is not marked
	/// as const to allow callers to use non-const methods.			/// as const to allow callers to use non-const methods.
	using ControlElementwiseOpsFusionFn =			using ControlElementwiseOpsFusionFn =
	▲ Show 20 Lines • Show All 1,157 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Utils/StructuredOpsUtils.h

Show All 19 Lines
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/IR/BuiltinAttributes.h"		#include "mlir/IR/BuiltinAttributes.h"
#include "mlir/IR/Location.h"		#include "mlir/IR/Location.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"

namespace mlir {		namespace mlir {

class PatternRewriter;		class OpBuilder;

/// Tests whether the given maps describe a row major matmul. The test is		/// Tests whether the given maps describe a row major matmul. The test is
/// permutation-invariant. Note that this only checks the affine maps from an		/// permutation-invariant. Note that this only checks the affine maps from an
/// operation, so does not perform any checks on the math being performed within		/// operation, so does not perform any checks on the math being performed within
/// the reduction.		/// the reduction.
bool isRowMajorMatmul(ArrayAttr indexingMaps);		bool isRowMajorMatmul(ArrayAttr indexingMaps);

/// Tests whether the given maps describe a column major matmul. The test is		/// Tests whether the given maps describe a column major matmul. The test is
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	public:
};		};
struct Red : public IteratorType {		struct Red : public IteratorType {
Red() : IteratorType(getReductionIteratorTypeName()) {}		Red() : IteratorType(getReductionIteratorTypeName()) {}
};		};
struct Win : public IteratorType {		struct Win : public IteratorType {
Win() : IteratorType(getWindowIteratorTypeName()) {}		Win() : IteratorType(getWindowIteratorTypeName()) {}
};		};

StructuredGenerator(PatternRewriter &rewriter, StructuredOpInterface op)		StructuredGenerator(OpBuilder &builder, StructuredOpInterface op)
: rewriter(rewriter), ctx(op.getContext()), loc(op.getLoc()),		: builder(builder), ctx(op.getContext()), loc(op.getLoc()),
iterators(op.iterator_types()), maps(op.getIndexingMaps()), op(op) {}		iterators(op.iterator_types()), maps(op.getIndexingMaps()), op(op) {}

bool iters(ArrayRef<IteratorType> its) {		bool iters(ArrayRef<IteratorType> its) {
if (its.size() != iterators.size())		if (its.size() != iterators.size())
return false;		return false;
for (int i = 0, e = its.size(); i != e; ++i) {		for (int i = 0, e = its.size(); i != e; ++i) {
if (!its[i].isOfType(iterators[i]))		if (!its[i].isOfType(iterators[i]))
return false;		return false;
}		}
return true;		return true;
}		}

bool layout(MapList l) {		bool layout(MapList l) {
auto infer = [](MapList m) { return AffineMap::inferFromExprList(m); };		auto infer = [](MapList m) { return AffineMap::inferFromExprList(m); };
return maps == infer(l);		return maps == infer(l);
}		}

protected:		protected:
PatternRewriter &rewriter;		OpBuilder &builder;
MLIRContext *ctx;		MLIRContext *ctx;
Location loc;		Location loc;
ArrayAttr iterators;		ArrayAttr iterators;
SmallVector<AffineMap, 4> maps;		SmallVector<AffineMap, 4> maps;
Operation *op;		Operation *op;
};		};

} // end namespace mlir		} // end namespace mlir

#endif // MLIR_UTILS_STRUCTUREDOPSUTILS_H		#endif // MLIR_UTILS_STRUCTUREDOPSUTILS_H

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show All 14 Lines
#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"		#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
#include "mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h"		#include "mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h"
#include "mlir/Dialect/Linalg/IR/LinalgOps.h"		#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Utils/StructuredOpsUtils.h"		#include "mlir/Dialect/Utils/StructuredOpsUtils.h"
#include "mlir/Dialect/Vector/VectorOps.h"		#include "mlir/Dialect/Vector/VectorOps.h"
		#include "mlir/Dialect/Vector/VectorTransforms.h"
#include "mlir/IR/AffineExpr.h"		#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/RegionUtils.h"		#include "mlir/Transforms/RegionUtils.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/Sequence.h"		#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/TypeSwitch.h"		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <type_traits>		#include <type_traits>

using namespace mlir;		using namespace mlir;
using namespace mlir::linalg;		using namespace mlir::linalg;

using llvm::dbgs;		using llvm::dbgs;

#define DEBUG_TYPE "linalg-vectorization"		#define DEBUG_TYPE "linalg-vectorization"

#define DBGS() (llvm::dbgs() << '[' << DEBUG_TYPE << "] ")		#define DBGS() (llvm::dbgs() << '[' << DEBUG_TYPE << "] ")
#define LDBG(X) LLVM_DEBUG(DBGS() << X)		#define LDBG(X) LLVM_DEBUG(DBGS() << X)

		// Forward declarations.
		static LogicalResult vectorizeContraction(OpBuilder &b, LinalgOp linalgOp,
		SmallVectorImpl<Value> &newResults);
		static FailureOr<Operation *>
		vectorizeConvolution(OpBuilder &b, ConvolutionOpInterface convOp);

/// Return the unique instance of OpType in `block` if it is indeed unique.		/// Return the unique instance of OpType in `block` if it is indeed unique.
/// Return null if none or more than 1 instances exist.		/// Return null if none or more than 1 instances exist.
template <typename OpType>		template <typename OpType>
static OpType getSingleOpOfType(Block &block) {		static OpType getSingleOpOfType(Block &block) {
OpType res;		OpType res;
block.walk([&](OpType op) {		block.walk([&](OpType op) {
if (res) {		if (res) {
res = nullptr;		res = nullptr;
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
/// ordering between reduction dimensions and is currently unsupported in		/// ordering between reduction dimensions and is currently unsupported in
/// Linalg. This limitation is motivated by the fact that e.g. min(max(X)) !=		/// Linalg. This limitation is motivated by the fact that e.g. min(max(X)) !=
/// max(min(X))		/// max(min(X))
// TODO: use in LinalgOp verification, there is a circular dependency atm.		// TODO: use in LinalgOp verification, there is a circular dependency atm.
static Operation matchLinalgReduction(OpOperand outputOperand) {		static Operation matchLinalgReduction(OpOperand outputOperand) {
auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());		auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());
unsigned outputPos =		unsigned outputPos =
outputOperand->getOperandNumber() - linalgOp.getNumInputs();		outputOperand->getOperandNumber() - linalgOp.getNumInputs();
// Only single combiner operatios are supported for now.		// Only single combiner operations are supported for now.
SmallVector<Operation *, 4> combinerOps;		SmallVector<Operation *, 4> combinerOps;
if (!matchReduction(linalgOp.getRegionOutputArgs(), outputPos, combinerOps) \|\|		if (!matchReduction(linalgOp.getRegionOutputArgs(), outputPos, combinerOps) \|\|
combinerOps.size() != 1)		combinerOps.size() != 1)
return nullptr;		return nullptr;

// Return the combiner operation.		// Return the combiner operation.
return combinerOps[0];		return combinerOps[0];
}		}
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	if (result.status == VectorizationStatus::NewOp) {
LDBG("new vector op: " << *result.newOp;);		LDBG("new vector op: " << *result.newOp;);
bvm.map(op.getResults(), result.newOp->getResults());		bvm.map(op.getResults(), result.newOp->getResults());
}		}
}		}

return success();		return success();
}		}

		/// Helper function to vectorize a `linalgOp` with contraction semantics in a
		/// generic fashion.
		/// This helper is needed atm because the truly generic implementation requires
		/// good vector.multi_reduce folding patterns that are currently NYI.
		// TODO: drop reliance on a specific pattern.
static LogicalResult vectorizeContraction(OpBuilder &b, LinalgOp linalgOp,		static LogicalResult vectorizeContraction(OpBuilder &b, LinalgOp linalgOp,
SmallVectorImpl<Value> &newResults) {		SmallVectorImpl<Value> &newResults) {
assert(isaContractionOpInterface(linalgOp) &&		assert(isaContractionOpInterface(linalgOp) &&
"expected vectorizeContraction preconditions to be met");		"expected vectorizeContraction preconditions to be met");
Location loc = linalgOp.getLoc();		Location loc = linalgOp.getLoc();
// Vectorize other ops as vector contraction.		// Vectorize other ops as vector contraction.
// TODO: interface.		// TODO: interface.
LDBG(""		LDBG(""
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	LogicalResult mlir::linalg::vectorizeLinalgOpPrecondition(Operation *op) {
if (linalgOp.hasDynamicShape()) {		if (linalgOp.hasDynamicShape()) {
LDBG("precondition failed: dynamic shape");		LDBG("precondition failed: dynamic shape");
return failure();		return failure();
}		}
if (isElementwise(op))		if (isElementwise(op))
return success();		return success();
if (isaContractionOpInterface(linalgOp))		if (isaContractionOpInterface(linalgOp))
return success();		return success();
		// TODO: isaConvolutionOpInterface that can also infer from generic features.
		// But we will still need stride/dilation attributes that will be annoying to
		// reverse-engineer...
		if (isa<ConvolutionOpInterface>(op))
		return success();
// TODO: the common vector shape is equal to the static loop sizes only when		// TODO: the common vector shape is equal to the static loop sizes only when
// all indexing maps are projected permutations. For convs and stencils the		// all indexing maps are projected permutations. For convs and stencils the
// logic will need to evolve.		// logic will need to evolve.
if (!allIndexingsAreProjectedPermutation(linalgOp)) {		if (!allIndexingsAreProjectedPermutation(linalgOp)) {
LDBG("precondition failed: not projected permutations");		LDBG("precondition failed: not projected permutations");
return failure();		return failure();
}		}
if (failed(reductionPreconditions(linalgOp))) {		if (failed(reductionPreconditions(linalgOp))) {
LDBG("precondition failed: reduction preconditions");		LDBG("precondition failed: reduction preconditions");
return failure();		return failure();
}		}
return success();		return success();
}		}

LogicalResult		LogicalResult
mlir::linalg::vectorizeLinalgOp(OpBuilder &b, Operation *op,		mlir::linalg::vectorizeLinalgOp(OpBuilder &b, Operation *op,
SmallVectorImpl<Value> &newResults) {		SmallVectorImpl<Value> &newResults) {
if (failed(vectorizeLinalgOpPrecondition(op)))		if (failed(vectorizeLinalgOpPrecondition(op)))
return failure();		return failure();

auto linalgOp = cast<LinalgOp>(op);		auto linalgOp = cast<LinalgOp>(op);
if (isaContractionOpInterface(linalgOp))		if (isaContractionOpInterface(linalgOp))
return vectorizeContraction(b, linalgOp, newResults);		return vectorizeContraction(b, linalgOp, newResults);

		// TODO: isaConvolutionOpInterface that can also infer from generic features.
		// But we will still need stride/dilation attributes that will be annoying to
		// reverse-engineer...
		if (auto convOp = dyn_cast<ConvolutionOpInterface>(op)) {
		FailureOr<Operation *> resultOrFail = vectorizeConvolution(b, convOp);
		if (failed(resultOrFail))
		return failure();
		Operation newOp = resultOrFail;
		llvm::append_range(newResults, newOp->getResults());
		return success();
		}

LDBG(""		LDBG(""
<< "Vectorize linalg op as a generic by broadcasting to "		<< "Vectorize linalg op as a generic by broadcasting to "
"maximal common shape: "		"maximal common shape: "
<< *op);		<< *op);
return vectorizeAsLinalgGeneric(b, linalgOp, newResults,		return vectorizeAsLinalgGeneric(b, linalgOp, newResults,
/broadcastToMaximalCommonShape=/true);		/broadcastToMaximalCommonShape=/true);
}		}

▲ Show 20 Lines • Show All 717 Lines • ▼ Show 20 Lines	rewriter.create<vector::TransferWriteOp>(
xferOp.getLoc(), xferOp.vector(), out, xferOp.indices(),		xferOp.getLoc(), xferOp.vector(), out, xferOp.indices(),
xferOp.permutation_map(), ArrayAttr());		xferOp.permutation_map(), ArrayAttr());

rewriter.eraseOp(copyOp);		rewriter.eraseOp(copyOp);
rewriter.eraseOp(xferOp);		rewriter.eraseOp(xferOp);

return success();		return success();
}		}

		//===----------------------------------------------------------------------===//
		// Convolution vectorization patterns
		//===----------------------------------------------------------------------===//
		namespace {
		/// Generate a vector implementation for:
		/// ```
		/// Op def: ( n, w, c, kw, f )
		/// Iters: ({Par(), Par(), Par(), Red(), Red()})
		/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c, f}, {n, w, f}}
		/// ```
		/// w and kw are unrolled.
		/// TODO: do not unroll w (resp. kw) when the strideW ( resp. dilationW) is > 1.
		antiagainstUnsubmitted Done Reply Inline Actions This TODO is a bit confusing to me. We are already doing this right? You mean not? antiagainst: This TODO is a bit confusing to me. We are already doing this right? You mean not?
		nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions was meant to convey: "address the fact that ..." rephrased to make it more readable. nicolasvasilache: was meant to convey: "address the fact that ..." rephrased to make it more readable.
		struct Conv1D_NWC_WCF_Generator : public StructuredGenerator<LinalgOp> {
		Conv1D_NWC_WCF_Generator(OpBuilder &builder, LinalgOp linalgOp, int strideW,
		int dilationW)
		: StructuredGenerator<LinalgOp>(builder, linalgOp), valid(false),
		strideW(strideW), dilationW(dilationW) {
		// Determine whether `linalgOp` can be generated with this generator
		if (linalgOp.getNumInputs() != 2 \|\| linalgOp.getNumOutputs() != 1)
		return;
		lhsShaped = linalgOp.inputs()[0];
		rhsShaped = linalgOp.inputs()[1];
		resShaped = linalgOp.outputs()[0];
		lhsShapedType = lhsShaped.getType().dyn_cast<ShapedType>();
		rhsShapedType = rhsShaped.getType().dyn_cast<ShapedType>();
		resShapedType = resShaped.getType().dyn_cast<ShapedType>();
		if (!lhsShapedType \|\| !rhsShapedType \|\| !resShapedType)
		return;
		if (lhsShapedType.getRank() != 3 \|\| rhsShapedType.getRank() != 3 \|\|
		resShapedType.getRank() != 3)
		return;

		antiagainstUnsubmitted Done Reply Inline Actions We don't need to check the region to see what is the payload computation? antiagainst: We don't need to check the region to see what is the payload computation?
		// Check for reduction `add` preceded by `mul`.
		Operation *reduceOp = matchLinalgReduction(linalgOp.getOutputOperand(0));
		if (!reduceOp)
		return;
		llvm::Optional<vector::CombiningKind> maybeKind;
		maybeKind = getKindForOp(reduceOp);
		if (!maybeKind \|\| *maybeKind != vector::CombiningKind::ADD)
		return;
		maybeKind = getKindForOp(&(linalgOp->getRegion(0).front().front()));
		if (!maybeKind \|\| *maybeKind != vector::CombiningKind::MUL)
		return;

		// The op is now known to be valid.
		valid = true;
		}

		/// Generate a vector implementation for:
		/// ```
		/// Op def: ( n, w, c, kw, f )
		/// Iters: ({Par(), Par(), Par(), Red(), Red()})
		/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c, f}, {n, w, f}}
		/// ```
		/// w and kw are unrolled.
		/// TODO: w (resp. kw) is unrolled when the strideW ( resp. dilationW) is > 1.
		FailureOr<Operation *> conv() {
		if (!valid)
		return failure();

		int nSize = lhsShapedType.getShape()[0];
		int wSize = resShapedType.getShape()[1];
		int cSize = lhsShapedType.getShape()[2];
		int kwSize = rhsShapedType.getShape()[0];
		int fSize = rhsShapedType.getShape()[2];

		vector::TransferWriteOp write;
		Value zero = builder.create<arith::ConstantIndexOp>(loc, 0);

		// Unroll along kw and read slices of lhs and rhs.
		// Alternatively we could preload both 3-d slices and extract smaller slices
		// iteratively without touching memory. But this will quickly spill.
		for (int64_t kw = 0; kw < kwSize; ++kw) {
		// Read rhs slice of size {1, c, f} @ [kw, 0, 0].
		Value kwVal = builder.create<arith::ConstantIndexOp>(loc, kw);
		VectorType rhsType =
		VectorType::get({1, cSize, fSize}, rhsShapedType.getElementType());
		Value rhs = builder.create<vector::TransferReadOp>(
		loc, rhsType, rhsShaped, ValueRange{kwVal, zero, zero});

		for (int64_t w = 0; w < wSize; ++w) {
		// Read lhs slice of size {n, 1, c} @ [0, sw * w + dw * kw, 0].
		antiagainstUnsubmitted Done Reply Inline Actions I think we don't want to read from `resShaped`, which is updated later at L1539; we want to read the initial output. We are updating non-opverlapping slices so it's fine to read the init output. The problem of reading from the continuously updated `resShaped` could confuse vector transformations as it causes interleaving read/write. antiagainst: I think we don't want to read from `resShaped`, which is updated later at L1539; we want to…
		nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions interestingly this actually improves vector transformations because the vector hoisting patterns do not work as expected if I always read from the same element but continuously write into the output: these patterns really want to see the same tensor being read/written. This is roughly a 8x performance difference. nicolasvasilache: interestingly this actually improves vector transformations because the vector hoisting…
		antiagainstUnsubmitted Not Done Reply Inline Actions Hmm, that's not what I've seen previously... But maybe things have changed quite a bit. I'll also do some experiments on my side later. antiagainst: Hmm, that's not what I've seen previously... But maybe things have changed quite a bit. I'll…
		nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions note that the behavior I describe is on tensors, in your experiments, did you also do these vectorizations on tensors ? I'll send you an internal repro to see this. nicolasvasilache: note that the behavior I describe is on tensors, in your experiments, did you also do these…
		Value lhsStridedIdx = builder.create<arith::ConstantIndexOp>(
		loc, strideW * w + dilationW * kw);
		VectorType lhsType =
		VectorType::get({nSize, 1, cSize}, lhsShapedType.getElementType());
		Value lhs = builder.create<vector::TransferReadOp>(
		loc, lhsType, lhsShaped, ValueRange{zero, lhsStridedIdx, zero});

		// Read res slice: {n, 1, f} @ [0, w, 0].
		Value wVal = builder.create<arith::ConstantIndexOp>(loc, w);
		VectorType resType =
		VectorType::get({nSize, 1, fSize}, resShapedType.getElementType());
		// When operating on tensors, reading from the updated value is required
		// for vector.transfer_read/write hoisting to function as expected.
		Value res = builder.create<vector::TransferReadOp>(
		loc, resType, resShaped, ValueRange{zero, wVal, zero});

		// Compute contraction: I{n, 1, c} * F{1, c, f} -> O{n, 1, f}
		StringRef par = Par().strRef, red = Red().strRef;
		AffineExpr n, one, f, c;
		bindDims(ctx, n, one, f, c);
		// clang-format off
		res = builder.create<vector::ContractionOp>(
		loc, lhs, rhs, res,
		/indexingMaps=/MapList{{n, one, c}, {one, c, f}, {n, one, f}},
		/iteratorTypes=/ArrayRef<StringRef>{par, par, par, red});
		// clang-format on

		// Write back res slice: {n, 1, f} @ [0, w, 0].
		write = builder.create<vector::TransferWriteOp>(
		loc, res, resShaped, ValueRange{zero, wVal, zero});
		if (write.getNumResults() == 1)
		resShaped = write->getResult(0);
		}
		}

		return write.getOperation();
		}

		/// Entry point that transposes into the common form:
		/// {{n, strideW * w + dilationW * kw, c}, {kw, c, f}, {n, w, f}}
		FailureOr<Operation *> generateConv() {
		AffineExpr n, w, f, kw, c;
		bindDims(ctx, n, w, f, kw, c);

		if (!iters({Par(), Par(), Par(), Red(), Red()}))
		return failure();

		// No transposition needed.
		if (layout({/lhsIndex/ {n, strideW * w + dilationW * kw, c},
		/rhsIndex/ {kw, c, f},
		/resIndex/ {n, w, f}}))
		return conv();
		return failure();
		}

		private:
		bool valid;
		int strideW, dilationW;
		Value lhsShaped, rhsShaped, resShaped;
		ShapedType lhsShapedType, rhsShapedType, resShapedType;
		};
		} // namespace

		/// Helper function to vectorize a `linalgOp` with convolution semantics.
		// TODO: extend the generic vectorization to support windows and drop this.
		static FailureOr<Operation *>
		vectorizeConvolution(OpBuilder &b, ConvolutionOpInterface convOp) {
		// TODO: these are legitimately part of ConvolutionOpInterface.
		auto strides = convOp->getAttrOfType<DenseIntElementsAttr>("strides");
		auto dilations = convOp->getAttrOfType<DenseIntElementsAttr>("dilations");
		auto stride = strides ? *strides.getValues<uint64_t>().begin() : 1;
		auto dilation = dilations ? *dilations.getValues<uint64_t>().begin() : 1;
		LinalgOp linalgOp = cast<LinalgOp>(convOp.getOperation());
		Conv1D_NWC_WCF_Generator e(b, linalgOp, stride, dilation);
		return e.generateConv();
		}

		struct VectorizeConvolution
		: public OpInterfaceRewritePattern<ConvolutionOpInterface> {
		using OpInterfaceRewritePattern::OpInterfaceRewritePattern;

		LogicalResult matchAndRewrite(ConvolutionOpInterface convOp,
		PatternRewriter &rewriter) const override {
		FailureOr<Operation *> resultOrFail =
		vectorizeConvolution(rewriter, convOp);
		if (failed(resultOrFail))
		return failure();
		Operation newOp = resultOrFail;
		if (newOp->getNumResults() == 0) {
		rewriter.eraseOp(convOp.getOperation());
		return success();
		}
		assert(newOp->getNumResults() == 1 && "expected single result");
		rewriter.replaceOp(convOp.getOperation(), newOp->getResult(0));
		return success();
		}
		};

		void mlir::linalg::populateConvolutionVectorizationPatterns(
		RewritePatternSet &patterns, PatternBenefit benefit) {
		patterns.add<VectorizeConvolution>(patterns.getContext(), benefit);
		}

mlir/lib/Dialect/Vector/VectorTransforms.cpp

Show First 20 Lines • Show All 1,251 Lines • ▼ Show 20 Lines	struct Red : public IteratorType {
Red() : IteratorType(getReductionIteratorTypeName()) {}		Red() : IteratorType(getReductionIteratorTypeName()) {}
};		};

/// Generate a vector implementation for matmat, matvec and tmatvec.		/// Generate a vector implementation for matmat, matvec and tmatvec.
/// This unrolls outer-products along the reduction dimension.		/// This unrolls outer-products along the reduction dimension.
struct UnrolledOuterProductGenerator		struct UnrolledOuterProductGenerator
: public StructuredGenerator<vector::ContractionOp> {		: public StructuredGenerator<vector::ContractionOp> {

UnrolledOuterProductGenerator(PatternRewriter &rewriter,		UnrolledOuterProductGenerator(OpBuilder &builder, vector::ContractionOp op)
vector::ContractionOp op)		: StructuredGenerator<vector::ContractionOp>(builder, op),
: StructuredGenerator<vector::ContractionOp>(rewriter, op),
kind(op.kind()), lhs(op.lhs()), rhs(op.rhs()), res(op.acc()),		kind(op.kind()), lhs(op.lhs()), rhs(op.rhs()), res(op.acc()),
lhsType(op.getLhsType()) {}		lhsType(op.getLhsType()) {}

Value t(Value v) {		Value t(Value v) {
static constexpr std::array<int64_t, 2> perm = {1, 0};		static constexpr std::array<int64_t, 2> perm = {1, 0};
return rewriter.create<vector::TransposeOp>(loc, v, perm);		return builder.create<vector::TransposeOp>(loc, v, perm);
}		}

LogicalResult outer_prod(Value lhs, Value rhs, Value res, int reductionSize) {		Value outer_prod(Value lhs, Value rhs, Value res, int reductionSize) {
assert(reductionSize > 0);		assert(reductionSize > 0);
for (int64_t k = 0; k < reductionSize; ++k) {		for (int64_t k = 0; k < reductionSize; ++k) {
Value a = rewriter.create<vector::ExtractOp>(loc, lhs, k);		Value a = builder.create<vector::ExtractOp>(loc, lhs, k);
Value b = rewriter.create<vector::ExtractOp>(loc, rhs, k);		Value b = builder.create<vector::ExtractOp>(loc, rhs, k);
res = rewriter.create<vector::OuterProductOp>(loc, res.getType(), a, b,		res = builder.create<vector::OuterProductOp>(loc, res.getType(), a, b,
res, kind);		res, kind);
}		}
rewriter.replaceOp(op, res);		return res;
return success();
}		}

/// Two outer parallel, one inner reduction (matmat flavor).		/// Two outer parallel, one inner reduction (matmat flavor).
LogicalResult matmat() {		FailureOr<Value> matmat() {
if (!iters({Par(), Par(), Red()}))		if (!iters({Par(), Par(), Red()}))
return failure();		return failure();
// Set up the parallel/reduction structure in the right form.		// Set up the parallel/reduction structure in the right form.
AffineExpr m, n, k;		AffineExpr m, n, k;
bindDims(rewriter.getContext(), m, n, k);		bindDims(builder.getContext(), m, n, k);
// Classical row-major matmul: Just permute the lhs.		// Classical row-major matmul: Just permute the lhs.
if (layout({{m, k}, {k, n}, {m, n}}))		if (layout({{m, k}, {k, n}, {m, n}}))
return outer_prod(t(lhs), rhs, res, lhsType.getDimSize(1));		return outer_prod(t(lhs), rhs, res, lhsType.getDimSize(1));
// TODO: may be better to fail and use some vector<k> -> scalar reduction.		// TODO: may be better to fail and use some vector<k> -> scalar reduction.
if (layout({{m, k}, {n, k}, {m, n}})) {		if (layout({{m, k}, {n, k}, {m, n}})) {
Value tlhs = t(lhs);		Value tlhs = t(lhs);
return outer_prod(tlhs, t(rhs), res, lhsType.getDimSize(1));		return outer_prod(tlhs, t(rhs), res, lhsType.getDimSize(1));
}		}
Show All 15 Lines	FailureOr<Value> matmat() {
if (layout({{k, m}, {k, n}, {n, m}}))		if (layout({{k, m}, {k, n}, {n, m}}))
return outer_prod(rhs, lhs, res, lhsType.getDimSize(0));		return outer_prod(rhs, lhs, res, lhsType.getDimSize(0));
if (layout({{k, m}, {n, k}, {n, m}}))		if (layout({{k, m}, {n, k}, {n, m}}))
return outer_prod(t(rhs), lhs, res, lhsType.getDimSize(0));		return outer_prod(t(rhs), lhs, res, lhsType.getDimSize(0));
return failure();		return failure();
}		}

/// One outer parallel, one inner reduction (matvec flavor)		/// One outer parallel, one inner reduction (matvec flavor)
LogicalResult matvec() {		FailureOr<Value> matvec() {
if (!iters({Par(), Red()}))		if (!iters({Par(), Red()}))
return failure();		return failure();
AffineExpr m, k;		AffineExpr m, k;
bindDims(rewriter.getContext(), m, k);		bindDims(builder.getContext(), m, k);

// Case mat-vec: transpose.		// Case mat-vec: transpose.
if (layout({{m, k}, {k}, {m}}))		if (layout({{m, k}, {k}, {m}}))
return outer_prod(t(lhs), rhs, res, lhsType.getDimSize(1));		return outer_prod(t(lhs), rhs, res, lhsType.getDimSize(1));
// Case mat-trans-vec: ready to go.		// Case mat-trans-vec: ready to go.
if (layout({{k, m}, {k}, {m}}))		if (layout({{k, m}, {k}, {m}}))
return outer_prod(lhs, rhs, res, lhsType.getDimSize(0));		return outer_prod(lhs, rhs, res, lhsType.getDimSize(0));
// Case vec-mat: swap and transpose.		// Case vec-mat: swap and transpose.
if (layout({{k}, {m, k}, {m}}))		if (layout({{k}, {m, k}, {m}}))
return outer_prod(t(rhs), lhs, res, lhsType.getDimSize(0));		return outer_prod(t(rhs), lhs, res, lhsType.getDimSize(0));
// Case vec-mat-trans: swap and ready to go.		// Case vec-mat-trans: swap and ready to go.
if (layout({{k}, {k, m}, {m}}))		if (layout({{k}, {k, m}, {m}}))
return outer_prod(rhs, lhs, res, lhsType.getDimSize(0));		return outer_prod(rhs, lhs, res, lhsType.getDimSize(0));
return failure();		return failure();
}		}

//		//
// One outer reduction, one inner parallel (tmatvec flavor)		// One outer reduction, one inner parallel (tmatvec flavor)
//		//
LogicalResult tmatvec() {		FailureOr<Value> tmatvec() {
if (!iters({Red(), Par()}))		if (!iters({Red(), Par()}))
return failure();		return failure();
AffineExpr k, m;		AffineExpr k, m;
bindDims(rewriter.getContext(), k, m);		bindDims(builder.getContext(), k, m);

// Case mat-vec: transpose.		// Case mat-vec: transpose.
if (layout({{m, k}, {k}, {m}}))		if (layout({{m, k}, {k}, {m}}))
return outer_prod(t(lhs), rhs, res, lhsType.getDimSize(1));		return outer_prod(t(lhs), rhs, res, lhsType.getDimSize(1));
// Case mat-trans-vec: ready to go.		// Case mat-trans-vec: ready to go.
if (layout({{k, m}, {k}, {m}}))		if (layout({{k, m}, {k}, {m}}))
return outer_prod(lhs, rhs, res, lhsType.getDimSize(0));		return outer_prod(lhs, rhs, res, lhsType.getDimSize(0));
// Case vec-mat: swap and transpose.		// Case vec-mat: swap and transpose.
Show All 36 Lines	LogicalResult ContractionOpToOuterProductOpLowering::matchAndRewrite(
if (vectorTransformOptions.vectorContractLowering !=		if (vectorTransformOptions.vectorContractLowering !=
vector::VectorContractLowering::OuterProduct)		vector::VectorContractLowering::OuterProduct)
return failure();		return failure();

if (failed(filter(op)))		if (failed(filter(op)))
return failure();		return failure();

UnrolledOuterProductGenerator e(rewriter, op);		UnrolledOuterProductGenerator e(rewriter, op);
if (succeeded(e.matmat()))		FailureOr<Value> matmatRes = e.matmat();
		if (succeeded(matmatRes)) {
		rewriter.replaceOp(op, *matmatRes);
return success();		return success();
if (succeeded(e.matvec()))		}
		FailureOr<Value> matvecRes = e.matvec();
		if (succeeded(matvecRes)) {
		rewriter.replaceOp(op, *matvecRes);
return success();		return success();
if (succeeded(e.tmatvec()))		}
		FailureOr<Value> tmatvecRes = e.tmatvec();
		if (succeeded(tmatvecRes)) {
		rewriter.replaceOp(op, *tmatvecRes);
return success();		return success();
		}

return failure();		return failure();
}		}

LogicalResult		LogicalResult
ContractionOpToDotLowering::matchAndRewrite(vector::ContractionOp op,		ContractionOpToDotLowering::matchAndRewrite(vector::ContractionOp op,
PatternRewriter &rewriter) const {		PatternRewriter &rewriter) const {
// TODO: implement masks		// TODO: implement masks
▲ Show 20 Lines • Show All 2,197 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorize-convolution.mlir

This file was added.

				// RUN: mlir-opt -split-input-file -test-linalg-transform-patterns=test-linalg-to-vector-patterns %s \| FileCheck %s

				func @conv1d_nwc_4x2x8_memref(%input: memref<4x6x3xf32>, %filter: memref<1x3x8xf32>, %output: memref<4x2x8xf32>) {
				linalg.conv_1d_nwc_wcf
				{dilations = dense<1> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x6x3xf32>, memref<1x3x8xf32>)
				outs(%output : memref<4x2x8xf32>)
				return
				}

				// CHECK: #[[INPUT_MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
				// CHECK: #[[FILTER_MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d1, d3, d2)>
				// CHECK: #[[OUTPUT_MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>

				// CHECK: func @conv1d_nwc_4x2x8_memref
				// CHECK-SAME: (%[[INPUT:.+]]: memref<4x6x3xf32>, %[[FILTER:.+]]: memref<1x3x8xf32>, %[[OUTPUT:.+]]: memref<4x2x8xf32>)

				// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[C3:.+]] = arith.constant 3 : index
				// CHECK-DAG: %[[F0:.+]] = arith.constant 0.000000e+00 : f32

				/// w == 0, kw == 0
				// CHECK: %[[V_FILTER:.+]] = vector.transfer_read %[[FILTER]][%[[C0]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_INPUT0:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_OUTPUT_0:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[CONTRACT0:.+]] = vector.contract {
				// CHECK-SAME: indexing_maps = [#[[INPUT_MAP]], #[[FILTER_MAP]], #[[OUTPUT_MAP]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
				// CHECK-SAME: %[[V_INPUT0]], %[[V_FILTER]], %[[V_OUTPUT_0]]
				// CHECK-SAME: : vector<4x1x3xf32>, vector<1x3x8xf32> into vector<4x1x8xf32>
				// CHECK: vector.transfer_write %[[CONTRACT0]], %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]]

				/// w == 1, kw == 0
				// CHECK: %[[V_INPUT3:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C3]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_OUTPUT_1:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C1]], %[[C0]]], %[[F0]]
				// CHECK: %[[CONTRACT1:.+]] = vector.contract {
				// CHECK-SAME: indexing_maps = [#[[INPUT_MAP]], #[[FILTER_MAP]], #[[OUTPUT_MAP]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
				// CHECK-SAME: %[[V_INPUT3]], %[[V_FILTER]], %[[V_OUTPUT_1]]
				// CHECK-SAME: : vector<4x1x3xf32>, vector<1x3x8xf32> into vector<4x1x8xf32>
				// CHECK: vector.transfer_write %[[CONTRACT1]], %[[OUTPUT]][%[[C0]], %[[C1]], %[[C0]]]

				// -----

				func @conv1d_nwc_4x2x8_memref(%input: memref<4x6x3xf32>, %filter: memref<2x3x8xf32>, %output: memref<4x2x8xf32>) {
				linalg.conv_1d_nwc_wcf
				{dilations = dense<2> : tensor<1xi64>, strides = dense<3> : tensor<1xi64>}
				ins(%input, %filter : memref<4x6x3xf32>, memref<2x3x8xf32>)
				outs(%output : memref<4x2x8xf32>)
				return
				}

				// CHECK: #[[INPUT_MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
				// CHECK: #[[FILTER_MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d1, d3, d2)>
				// CHECK: #[[OUTPUT_MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>

				// CHECK: func @conv1d_nwc_4x2x8_memref
				// CHECK-SAME: (%[[INPUT:.+]]: memref<4x6x3xf32>, %[[FILTER:.+]]: memref<2x3x8xf32>, %[[OUTPUT:.+]]: memref<4x2x8xf32>)

				// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[C2:.+]] = arith.constant 2 : index
				// CHECK-DAG: %[[C3:.+]] = arith.constant 3 : index
				// CHECK-DAG: %[[C5:.+]] = arith.constant 5 : index
				// CHECK-DAG: %[[F0:.+]] = arith.constant 0.000000e+00 : f32

				/// w == 0, kw == 0
				// CHECK: %[[V_FILTER_A:.+]] = vector.transfer_read %[[FILTER]][%[[C0]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_INPUT0_A:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_OUTPUT_0_A:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[CONTRACT0_A:.+]] = vector.contract {
				// CHECK-SAME: indexing_maps = [#[[INPUT_MAP]], #[[FILTER_MAP]], #[[OUTPUT_MAP]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
				// CHECK-SAME: %[[V_INPUT0_A]], %[[V_FILTER_A]], %[[V_OUTPUT_0_A]]
				// CHECK-SAME: : vector<4x1x3xf32>, vector<1x3x8xf32> into vector<4x1x8xf32>
				// CHECK: vector.transfer_write %[[CONTRACT0_A]], %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]]

				/// w == 0, kw == 1
				// CHECK: %[[V_INPUT3_A:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C3]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_OUTPUT_1_A:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C1]], %[[C0]]], %[[F0]]
				// CHECK: %[[CONTRACT1_A:.+]] = vector.contract {
				// CHECK-SAME: indexing_maps = [#[[INPUT_MAP]], #[[FILTER_MAP]], #[[OUTPUT_MAP]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
				// CHECK-SAME: %[[V_INPUT3_A]], %[[V_FILTER_A]], %[[V_OUTPUT_1_A]]
				// CHECK-SAME: : vector<4x1x3xf32>, vector<1x3x8xf32> into vector<4x1x8xf32>
				// CHECK: vector.transfer_write %[[CONTRACT1_A]], %[[OUTPUT]][%[[C0]], %[[C1]], %[[C0]]]

				/// w == 1, kw == 0
				// CHECK: %[[V_FILTER_B:.+]] = vector.transfer_read %[[FILTER]][%[[C1]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_INPUT0_B:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C2]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_OUTPUT_0_B:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]], %[[F0]]
				// CHECK: %[[CONTRACT0_B:.+]] = vector.contract {
				// CHECK-SAME: indexing_maps = [#[[INPUT_MAP]], #[[FILTER_MAP]], #[[OUTPUT_MAP]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
				// CHECK-SAME: %[[V_INPUT0_B]], %[[V_FILTER_B]], %[[V_OUTPUT_0_B]]
				// CHECK-SAME: : vector<4x1x3xf32>, vector<1x3x8xf32> into vector<4x1x8xf32>
				// CHECK: vector.transfer_write %[[CONTRACT0_B]], %[[OUTPUT]][%[[C0]], %[[C0]], %[[C0]]]

				/// w == 1, kw == 1
				// CHECK: %[[V_INPUT3_B:.+]] = vector.transfer_read %[[INPUT]][%[[C0]], %[[C5]], %[[C0]]], %[[F0]]
				// CHECK: %[[V_OUTPUT_1_B:.+]] = vector.transfer_read %[[OUTPUT]][%[[C0]], %[[C1]], %[[C0]]], %[[F0]]
				// CHECK: %[[CONTRACT1_B:.+]] = vector.contract {
				// CHECK-SAME: indexing_maps = [#[[INPUT_MAP]], #[[FILTER_MAP]], #[[OUTPUT_MAP]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
				// CHECK-SAME: %[[V_INPUT3_B]], %[[V_FILTER_B]], %[[V_OUTPUT_1_B]]
				// CHECK-SAME: : vector<4x1x3xf32>, vector<1x3x8xf32> into vector<4x1x8xf32>
				// CHECK: vector.transfer_write %[[CONTRACT1_B]], %[[OUTPUT]][%[[C0]], %[[C1]], %[[C0]]]

mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp

	Show First 20 Lines • Show All 547 Lines • ▼ Show 20 Lines

	static void applyLinalgToVectorPatterns(FuncOp funcOp) {			static void applyLinalgToVectorPatterns(FuncOp funcOp) {
	RewritePatternSet patterns(funcOp.getContext());			RewritePatternSet patterns(funcOp.getContext());
	patterns.add<LinalgVectorizationPattern>(			patterns.add<LinalgVectorizationPattern>(
	funcOp.getContext(),			funcOp.getContext(),
	LinalgTransformationFilter()			LinalgTransformationFilter()
	.addOpFilter<ContractionOpInterface, FillOp, CopyOp, GenericOp>());			.addOpFilter<ContractionOpInterface, FillOp, CopyOp, GenericOp>());
	populatePadTensorOpVectorizationPatterns(patterns);			populatePadTensorOpVectorizationPatterns(patterns);
				populateConvolutionVectorizationPatterns(patterns);
	(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));			(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
	}			}

	static void applyPadTensorToGenericPatterns(FuncOp funcOp) {			static void applyPadTensorToGenericPatterns(FuncOp funcOp) {
	RewritePatternSet patterns(funcOp.getContext());			RewritePatternSet patterns(funcOp.getContext());
	patterns.add<PadTensorOpTransformationPattern>(funcOp.getContext());			patterns.add<PadTensorOpTransformationPattern>(funcOp.getContext());
	(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));			(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
	}			}
	▲ Show 20 Lines • Show All 215 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Add a first vectorization pattern for conv1d in NWCxWCF format.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 380941

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/include/mlir/Dialect/Utils/StructuredOpsUtils.h

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/lib/Dialect/Vector/VectorTransforms.cpp

mlir/test/Dialect/Linalg/vectorize-convolution.mlir

mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp

[mlir][Linalg] Add a first vectorization pattern for conv1d in NWCxWCF format.
ClosedPublic