This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Analysis/
-
SliceAnalysis.h
-
Dialect/Vector/
-
Vector/
2/2
VectorOps.td
-
lib/Dialect/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
2/2
Vectorization.cpp
-
Vector/
2/2
VectorOps.cpp
-
VectorTransforms.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
transform-patterns-matmul-to-vector.mlir
-
vectorization.mlir

Differential D101165

[mlir][Linalg] Generalize linalg vectorization
ClosedPublic

Authored by nicolasvasilache on Apr 23 2021, 7:28 AM.

Download Raw Diff

Details

Reviewers

ThomasRaoux
asaadaldien
aartbik
bkramer

Commits

rGb6113db955aa: [mlir][Linalg] Generalize linalg vectorization

Summary

This revision adds support for vectorizing more general linalg operations with projected permutation maps.

This is achieved by eagerly broadcasting the intermediate vector to the common size
of the iteration domain of the linalg op. This allows a much more natural expression of
generalized vectorization but may introduce additional computations until all the
proper canonicalizations are implemented.

This generalization modifies the vector.transfer_read/write permutation logic and
exposes the fact that the logic employed in vector.contract was too ad-hoc.

As a consequence, changes occur in the permutation / transposition logic for contraction. In turn this prompts supporting more cases in the lowering of contract
to matrix intrinsics, which is required to make the corresponding tests pass.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	140 ms	x64 debian > MLIR.Dialect/Vector::vector-contract-transforms.mlir
	210 ms	x64 windows > MLIR.Dialect/Vector::vector-contract-transforms.mlir

Event Timeline

nicolasvasilache created this revision.Apr 23 2021, 7:28 AM

Herald added a reviewer: aartbik. · View Herald TranscriptApr 23 2021, 7:28 AM

Herald added subscribers: dcaballe, cota, mravishankar and 17 others. · View Herald Transcript

nicolasvasilache requested review of this revision.Apr 23 2021, 7:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 7:28 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B100572: Diff 340026.Apr 23 2021, 8:50 AM

ThomasRaoux added inline comments.Apr 23 2021, 8:51 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
183	You need to check that vecType != nullptr
201–202	Do we have a way to test for this in pre-condition so that we don't reach this part if running vectorization on linalg op with this case? That would break things like IREE where vectorization is ran on all generic ops.
mlir/lib/Dialect/Vector/VectorOps.cpp
2202	It doesn't depend on the TransferReadOp? Should this be static?

nicolasvasilache added a reviewer: bkramer.Apr 27 2021, 8:41 AM

Fix incorrect order of rershape / transpose.

asaadaldien added inline comments.Apr 28 2021, 2:14 PM

mlir/include/mlir/Dialect/Vector/VectorOps.td
295	I think we should also have an accumulation argument `Variadic<AnyType>:$acc`

nicolasvasilache marked 3 inline comments as done.Apr 28 2021, 2:50 PM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Vector/VectorOps.cpp
2202	It is a static method in TransferReadOp, see the .td decl.

Address review.

nicolasvasilache marked an inline comment as done.Apr 28 2021, 2:57 PM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/Vector/VectorOps.td
295	We can but let's add it in a separate CL when we have a concrete use case: this revision does not need it and is already complex enough to my taste. When you do the lowering, if you see an opportunity to add the acc please do so.

nicolasvasilache marked an inline comment as done.Apr 28 2021, 3:06 PM

LGTM! I benchmarked this on IREE's CPU backend and its causing no regression.

My only concern is other downstream systems (cc: @ThomasRaoux for SPIR-V) might fail with illegal op because of not lowering vectro.multi_reduction, so how about optionally enable/disable the vectorization if any of iterators is a reduction ?

Harbormaster completed remote builds in B101496: Diff 341313.Apr 28 2021, 3:52 PM

Harbormaster completed remote builds in B101504: Diff 341324.Apr 28 2021, 5:03 PM

In D101165#2724184, @asaadaldien wrote:

LGTM! I benchmarked this on IREE's CPU backend and its causing no regression.

My only concern is other downstream systems (cc: @ThomasRaoux for SPIR-V) might fail with illegal op because of not lowering vectro.multi_reduction, so how about optionally enable/disable the vectorization if any of iterators is a reduction ?

I think it is easy enough to do this filtering on the IREE side so I don't think we have to add it here.

ThomasRaoux accepted this revision.Apr 28 2021, 10:54 PM

This revision is now accepted and ready to land.Apr 28 2021, 10:54 PM

also + @sgrechanik re. reduction detection that we'll want to unify

Closed by commit rGb6113db955aa: [mlir][Linalg] Generalize linalg vectorization (authored by nicolasvasilache). · Explain WhyApr 29 2021, 12:48 AM

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rGb6113db955aa: [mlir][Linalg] Generalize linalg vectorization.

Revision Contents

Path

Size

mlir/

include/

mlir/

Analysis/

SliceAnalysis.h

4 lines

Dialect/

Vector/

VectorOps.td

76 lines

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

302 lines

Vector/

VectorOps.cpp

62 lines

VectorTransforms.cpp

82 lines

test/

Dialect/

Linalg/

transform-patterns-matmul-to-vector.mlir

2 lines

vectorization.mlir

124 lines

Diff 340026

mlir/include/mlir/Analysis/SliceAnalysis.h

Show All 27 Lines
using TransitiveFilter = llvm::function_ref<bool(Operation *)>;		using TransitiveFilter = llvm::function_ref<bool(Operation *)>;

/// Fills `forwardSlice` with the computed forward slice (i.e. all		/// Fills `forwardSlice` with the computed forward slice (i.e. all
/// the transitive uses of op), without including that operation.		/// the transitive uses of op), without including that operation.
///		///
/// This additionally takes a TransitiveFilter which acts as a frontier:		/// This additionally takes a TransitiveFilter which acts as a frontier:
/// when looking at uses transitively, an operation that does not pass the		/// when looking at uses transitively, an operation that does not pass the
/// filter is never propagated through. This allows in particular to carve out		/// filter is never propagated through. This allows in particular to carve out
/// the scope within a ForInst or the scope within an IfInst.		/// the scope within a ForOp or the scope within an IfOp.
///		///
/// The implementation traverses the use chains in postorder traversal for		/// The implementation traverses the use chains in postorder traversal for
/// efficiency reasons: if an operation is already in `forwardSlice`, no		/// efficiency reasons: if an operation is already in `forwardSlice`, no
/// need to traverse its uses again. Since use-def chains form a DAG, this		/// need to traverse its uses again. Since use-def chains form a DAG, this
/// terminates.		/// terminates.
///		///
/// Upon return to the root call, `forwardSlice` is filled with a		/// Upon return to the root call, `forwardSlice` is filled with a
/// postorder list of uses (i.e. a reverse topological order). To get a proper		/// postorder list of uses (i.e. a reverse topological order). To get a proper
Show All 32 Lines	void getForwardSlice(Value root, SetVector<Operation > forwardSlice,
TransitiveFilter filter = nullptr /* pass-through*/);		TransitiveFilter filter = nullptr /* pass-through*/);

/// Fills `backwardSlice` with the computed backward slice (i.e.		/// Fills `backwardSlice` with the computed backward slice (i.e.
/// all the transitive defs of op), without including that operation.		/// all the transitive defs of op), without including that operation.
///		///
/// This additionally takes a TransitiveFilter which acts as a frontier:		/// This additionally takes a TransitiveFilter which acts as a frontier:
/// when looking at defs transitively, an operation that does not pass the		/// when looking at defs transitively, an operation that does not pass the
/// filter is never propagated through. This allows in particular to carve out		/// filter is never propagated through. This allows in particular to carve out
/// the scope within a ForInst or the scope within an IfInst.		/// the scope within a ForOp or the scope within an IfOp.
///		///
/// The implementation traverses the def chains in postorder traversal for		/// The implementation traverses the def chains in postorder traversal for
/// efficiency reasons: if an operation is already in `backwardSlice`, no		/// efficiency reasons: if an operation is already in `backwardSlice`, no
/// need to traverse its definitions again. Since useuse-def chains form a DAG,		/// need to traverse its definitions again. Since useuse-def chains form a DAG,
/// this terminates.		/// this terminates.
///		///
/// Upon return to the root call, `backwardSlice` is filled with a		/// Upon return to the root call, `backwardSlice` is filled with a
/// postorder list of defs. This happens to be a topological order, from the		/// postorder list of defs. This happens to be a topological order, from the
▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Vector/VectorOps.td

Show First 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	def Vector_ReductionOp :
}];		}];
let extraClassDeclaration = [{		let extraClassDeclaration = [{
VectorType getVectorType() {		VectorType getVectorType() {
return vector().getType().cast<VectorType>();		return vector().getType().cast<VectorType>();
}		}
}];		}];
}		}

		def Vector_MultiDimReductionOp :
		Vector_Op<"multi_reduction", [NoSideEffect,
		PredOpTrait<"source operand and result have same element type",
		TCresVTEtIsSameAsOpBase<0, 0>>]>,
		Arguments<(ins Vector_CombiningKindAttr:$kind,
		AnyVector:$source,
		I64ArrayAttr:$reduction_dims)>,
		asaadaldienUnsubmitted Done Reply Inline Actions I think we should also have an accumulation argument `Variadic<AnyType>:$acc` asaadaldien: I think we should also have an accumulation argument `Variadic<AnyType>:$acc`
		nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions We can but let's add it in a separate CL when we have a concrete use case: this revision does not need it and is already complex enough to my taste. When you do the lowering, if you see an opportunity to add the acc please do so. nicolasvasilache: We can but let's add it in a separate CL when we have a concrete use case: this revision does…
		Results<(outs AnyType:$dest)> {
		let summary = "Multi-dimensional reduction operation";
		let description = [{
		Reduces an n-D vector into an (n-k)-D vector using the given operation
		(add/mul/min/max for int/fp and and/or/xor for int only).

		Example:

		```mlir
		%1 = vector.multi_reduction "add", %0 [1, 3] :
		vector<4x8x16x32xf32> into vector<4x16xf32>
		```
		}];
		let builders = [
		OpBuilder<(ins "Value":$source, "ArrayRef<bool>":$reductionMask,
		"CombiningKind":$kind)>
		];
		let extraClassDeclaration = [{
		static StringRef getKindAttrName() { return "kind"; }
		static StringRef getReductionDimsAttrName() { return "reduction_dims"; }

		VectorType getSourceVectorType() {
		return source().getType().cast<VectorType>();
		}
		VectorType getDestVectorType() {
		return dest().getType().cast<VectorType>();
		}

		SmallVector<bool> getReductionMask() {
		SmallVector<bool> res(getSourceVectorType().getRank(), false);
		for (auto ia : reduction_dims().getAsRange<IntegerAttr>())
		res[ia.getInt()] = true;
		return res;
		}
		static SmallVector<bool> getReductionMask(
		ArrayRef<int64_t> reductionDims, unsigned sourceRank) {
		SmallVector<bool> res(sourceRank, false);
		for (auto idx : reductionDims)
		res[idx] = true;
		return res;
		}

		static SmallVector<int64_t> inferDestShape(
		ArrayRef<int64_t> shape, ArrayRef<bool> reducedDimsMask) {
		assert(shape.size() == reducedDimsMask.size() &&
		"shape and maks of different sizes");
		SmallVector<int64_t> res;
		for (auto it : llvm::zip(reducedDimsMask, shape))
		if (!std::get<0>(it))
		res.push_back(std::get<1>(it));
		return res;
		}
		}];
		let assemblyFormat =
		"$kind `,` $source attr-dict $reduction_dims `:` type($source) `to` type($dest)";
		}

def Vector_BroadcastOp :		def Vector_BroadcastOp :
Vector_Op<"broadcast", [NoSideEffect,		Vector_Op<"broadcast", [NoSideEffect,
PredOpTrait<"source operand and result have same element type",		PredOpTrait<"source operand and result have same element type",
TCresVTEtIsSameAsOpBase<0, 0>>]>,		TCresVTEtIsSameAsOpBase<0, 0>>]>,
Arguments<(ins AnyType:$source)>,		Arguments<(ins AnyType:$source)>,
Results<(outs AnyVector:$vector)> {		Results<(outs AnyVector:$vector)> {
let summary = "broadcast operation";		let summary = "broadcast operation";
let description = [{		let description = [{
▲ Show 20 Lines • Show All 1,015 Lines • ▼ Show 20 Lines	OpBuilder<(ins "Type":$vector, "Value":$source,
"ValueRange":$indices, "AffineMapAttr":$permutationMap, "Value":$padding,		"ValueRange":$indices, "AffineMapAttr":$permutationMap, "Value":$padding,
"ArrayAttr":$inBounds)>,		"ArrayAttr":$inBounds)>,
// Builder that does not set mask.		// Builder that does not set mask.
OpBuilder<(ins "Type":$vector, "Value":$source,		OpBuilder<(ins "Type":$vector, "Value":$source,
"ValueRange":$indices, "AffineMap":$permutationMap, "Value":$padding,		"ValueRange":$indices, "AffineMap":$permutationMap, "Value":$padding,
"ArrayAttr":$inBounds)>		"ArrayAttr":$inBounds)>
];		];

		let extraClassDeclaration = [{
		/// Return a new `result` map with `0` inserted in the proper positions so
		/// that vector.transfer_read `result` produces a vector of same element
		/// type as `vt` and shape `targetShape.
		/// Assume that `map` is a permutation map for a vector.transfer_read op,
		/// `vt` the vector type produced by the vector.transfer_read and
		/// `targetShape` is the desired `targetShape` for a broadcast version of
		/// `vt`.
		static AffineMap insertBroadcasts(AffineMap map, VectorType vt,
		ArrayRef<int64_t> targetShape);
		}];

let hasFolder = 1;		let hasFolder = 1;
}		}

def Vector_TransferWriteOp :		def Vector_TransferWriteOp :
Vector_Op<"transfer_write", [		Vector_Op<"transfer_write", [
DeclareOpInterfaceMethods<VectorTransferOpInterface>,		DeclareOpInterfaceMethods<VectorTransferOpInterface>,
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,		DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,		DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
▲ Show 20 Lines • Show All 1,080 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

//===- Vectorization.cpp - Implementation of linalg Vectorization ---------===//		//===- Vectorization.cpp - Implementation of linalg Vectorization ---------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the linalg dialect Vectorization transformations.		// This file implements the linalg dialect Vectorization transformations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h"		#include "mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h"
#include "mlir/Dialect/Linalg/IR/LinalgOps.h"		#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/StandardOps/EDSC/Intrinsics.h"		#include "mlir/Dialect/StandardOps/EDSC/Intrinsics.h"
#include "mlir/Dialect/Utils/StructuredOpsUtils.h"		#include "mlir/Dialect/Utils/StructuredOpsUtils.h"
#include "mlir/Dialect/Vector/EDSC/Intrinsics.h"		#include "mlir/Dialect/Vector/EDSC/Intrinsics.h"
#include "mlir/Dialect/Vector/VectorOps.h"		#include "mlir/Dialect/Vector/VectorOps.h"
#include "mlir/IR/AffineExpr.h"		#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/RegionUtils.h"		#include "mlir/Transforms/RegionUtils.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <type_traits>		#include <type_traits>

using namespace mlir;		using namespace mlir;
using namespace mlir::edsc;		using namespace mlir::edsc;
using namespace mlir::edsc::intrinsics;		using namespace mlir::edsc::intrinsics;
using namespace mlir::linalg;		using namespace mlir::linalg;
Show All 12 Lines	if (res) {
return WalkResult::interrupt();		return WalkResult::interrupt();
}		}
res = op;		res = op;
return WalkResult::advance();		return WalkResult::advance();
});		});
return res;		return res;
}		}

		/// Given an indexing `map` coming from a LinalgOp indexing, restricted to a
		/// projectedPermutation, compress the unused dimensions to serve as a
		/// permutation_map for a vector transfer operation.
		/// For example, given a linalg op such as:
		///
		/// ```
		/// %0 = linalg.generic {
		/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d4, d0, d2)>,
		/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d1, d3)>
		/// }
		/// ins(%0 : tensor<2x3x4xf32>)
		/// outs(%1 : tensor<5x6xf32>)
		/// ```
		///
		/// the iteration domain size of the linalg op is 3x5x4x6x2. The first affine
		/// map is reindexed to `affine_map<(d0, d1, d2) -> (d2, d0, d1)>`, the second
		/// affine map is reindexed to `affine_map<(d0, d1) -> (d0, d1)>`.
		static AffineMap reindexIndexingMap(AffineMap map) {
		assert(map.isProjectedPermutation() && "expected projected permutation");
		auto res = compressUnusedDims(map);
		assert(res.getNumDims() == res.getNumResults() &&
		"expected reindexed map with same number of dims and results");
		return res;
		}

/// Helper data structure to represent the result of vectorization.		/// Helper data structure to represent the result of vectorization.
/// In certain specific cases, like terminators, we do not want to propagate/		/// In certain specific cases, like terminators, we do not want to propagate/
enum VectorizationStatus {		enum VectorizationStatus {
/// Op failed to vectorize.		/// Op failed to vectorize.
Failure = 0,		Failure = 0,
/// Op vectorized and custom function took care of replacement logic		/// Op vectorized and custom function took care of replacement logic
NoReplace,		NoReplace,
/// Op vectorized into a new Op whose results will replace original Op's		/// Op vectorized into a new Op whose results will replace original Op's
Show All 14 Lines
/// ShapedType of `v`.		/// ShapedType of `v`.
static VectorType extractVectorTypeFromShapedValue(Value v) {		static VectorType extractVectorTypeFromShapedValue(Value v) {
auto st = v.getType().cast<ShapedType>();		auto st = v.getType().cast<ShapedType>();
if (st.isa<MemRefType>() && st.getShape().empty())		if (st.isa<MemRefType>() && st.getShape().empty())
return VectorType();		return VectorType();
return VectorType::get(st.getShape(), st.getElementType());		return VectorType::get(st.getShape(), st.getElementType());
}		}

		/// Given an `outputOperand` of a LinalgOp, compute the intersection of the
		/// forward slice starting from `outputOperand` and the backward slice
		/// starting from the corresponding linalg.yield operand.
		/// This intersection is assumed to have a single binary operation that is
		/// the reduction operation. Multiple reduction operations would impose an
		/// ordering between reduction dimensions and is currently unsupported in
		/// Linalg. This limitation is motivated by the fact that e.g.
		/// min(max(X)) != max(min(X))
		// TODO: use in LinalgOp verification, there is a circular dependency atm.
		static Operation *getSingleBinaryOpAssumedReduction(OpOperand &outputOperand) {
		auto linalgOp = cast<LinalgOp>(outputOperand.getOwner());
		auto yieldOp = cast<YieldOp>(linalgOp->getRegion(0).front().getTerminator());
		unsigned yieldNum =
		outputOperand.getOperandNumber() - linalgOp.getNumInputs();
		llvm::SetVector<Operation *> backwardSlice, forwardSlice;
		BlockArgument bbArg = linalgOp->getRegion(0).front().getArgument(
		outputOperand.getOperandNumber());
		Value yieldVal = yieldOp->getOperand(yieldNum);
		getBackwardSlice(yieldVal, &backwardSlice, [&](Operation *op) {
		return op->getParentOp() == linalgOp;
		});
		backwardSlice.insert(yieldVal.getDefiningOp());
		getForwardSlice(bbArg, &forwardSlice,
		[&](Operation *op) { return op->getParentOp() == linalgOp; });
		// Search for the (assumed unique) elementwiseMappable op at the intersection
		// of forward and backward slices.
		Operation *reductionOp = nullptr;
		for (Operation *op : llvm::reverse(backwardSlice)) {
		if (!forwardSlice.contains(op))
		continue;
		if (OpTrait::hasElementwiseMappableTraits(op)) {
		if (reductionOp) {
		// Reduction detection fails: found more than 1 elementwise-mappable op.
		return nullptr;
		}
		reductionOp = op;
		}
		}
		// TODO: also assert no other subsequent ops break the reduction.
		return reductionOp;
		}

		/// If `value` of assumed VectorType has a shape different than `shape`, try to
		/// build and return a new vector.broadcast to `shape`.
		/// Otherwise, just return `value`.
		// TODO: this is best effort atm and there is currently no guarantee of
		// correctness for the broadcast semantics.
		static Value broadcastIfNeeded(OpBuilder &builder, Value value,
		ArrayRef<int64_t> shape) {
		unsigned numDimsGtOne = std::count_if(shape.begin(), shape.end(),
		[](int64_t val) { return val > 1; });
		auto vecType = value.getType().dyn_cast<VectorType>();
		if (shape.empty() \|\|
		(vecType != nullptr &&
		(vecType.getShape() == shape \|\| vecType.getRank() > numDimsGtOne)))
		return value;
		auto newVecType = VectorType::get(shape, vecType ? vecType.getElementType()
		: value.getType());
		return builder.create<vector::BroadcastOp>(
		builder.getInsertionPoint()->getLoc(), newVecType, value);
		}

		/// If value of assumed VectorType has a shape different than `shape`, build and
		/// return a new vector.broadcast to `shape`.
		/// Otherwise, just return value.
		static Value reduceIfNeeded(OpBuilder &builder, VectorType targetVectorType,
		Value value, OpOperand &outputOperand) {
		assert(targetVectorType.getShape() ==
		outputOperand.get().getType().cast<ShapedType>().getShape());
		auto vecType = value.getType().dyn_cast<VectorType>();
		if (vecType.getShape() == targetVectorType.getShape())
		ThomasRaouxUnsubmitted Done Reply Inline Actions You need to check that vecType != nullptr ThomasRaoux: You need to check that vecType != nullptr
		return value;
		// At this point, we know we need to reduce. Detect the reduction operator.
		// TODO: Use the generic reduction detection util.
		Operation *reductionOp = getSingleBinaryOpAssumedReduction(outputOperand);
		assert(reductionOp && "expected reduction op.");
		auto linalgOp = cast<LinalgOp>(outputOperand.getOwner());
		unsigned pos = 0;
		MLIRContext *ctx = builder.getContext();
		SmallVector<AffineExpr> exprs;
		for (auto s : linalgOp.iterator_types())
		if (isParallelIterator(s))
		exprs.push_back(getAffineDimExpr(pos++, ctx));
		auto loc = reductionOp->getLoc();
		// TODO: reuse common CombiningKing logic and support more than add.
		auto kind = llvm::TypeSwitch<Operation *, vector::CombiningKind>(reductionOp)
		.Case<AddIOp, AddFOp>(
		[&](auto op) { return vector::CombiningKind::ADD; })
		.Default([&](auto op) {
		llvm_unreachable("Unsupported reduction");
		ThomasRaouxUnsubmitted Done Reply Inline Actions Do we have a way to test for this in pre-condition so that we don't reach this part if running vectorization on linalg op with this case? That would break things like IREE where vectorization is ran on all generic ops. ThomasRaoux: Do we have a way to test for this in pre-condition so that we don't reach this part if running…
		return vector::CombiningKind::ADD;
		});
		unsigned idx = 0;
		SmallVector<bool> reductionMask(linalgOp.iterator_types().size(), false);
		for (auto attr : linalgOp.iterator_types()) {
		if (isReductionIteratorType(attr))
		reductionMask[idx] = true;
		++idx;
		}
		return builder.create<vector::MultiDimReductionOp>(loc, value, reductionMask,
		kind);
		}

/// Build a vector.transfer_read from `source` at indices set to all `0`.		/// Build a vector.transfer_read from `source` at indices set to all `0`.
/// If source has rank zero, build an memref.load.		/// If source has rank zero, build an memref.load.
/// Return the produced value.		/// Return the produced value.
static Value buildVectorRead(OpBuilder &builder, Value source,		static Value buildVectorRead(OpBuilder &builder, Value source,
VectorType vectorType, AffineMap map) {		VectorType vectorType, AffineMap map) {
edsc::ScopedContext scope(builder);		edsc::ScopedContext scope(builder);
auto shapedType = source.getType().cast<ShapedType>();		auto shapedType = source.getType().cast<ShapedType>();
if (vectorType) {
SmallVector<Value> indices(shapedType.getRank(), std_constant_index(0));		SmallVector<Value> indices(shapedType.getRank(), std_constant_index(0));
if (map)
return vector_transfer_read(vectorType, source, indices, map);		return vector_transfer_read(vectorType, source, indices, map);
return vector_transfer_read(vectorType, source, indices);
}
return memref_load(source);
}		}

/// Build a vector.transfer_write of `value` into `dest` at indices set to all		/// Build a vector.transfer_write of `value` into `outputOperand` at indices set
/// `0`. If `dest` has null rank, build an memref.store.		/// to all `0`; where `outputOperand` is an output operand of the LinalgOp
		/// currently being vectorized. If `dest` has null rank, build an memref.store.
/// Return the produced value or null if no value is produced.		/// Return the produced value or null if no value is produced.
static Value buildVectorWrite(OpBuilder &builder, Value value, Value dest) {		static Value buildVectorWrite(OpBuilder &builder, Value value,
		OpOperand &outputOperand) {
edsc::ScopedContext scope(builder);		edsc::ScopedContext scope(builder);
Operation *write;		Operation *write;
auto shapedType = dest.getType().cast<ShapedType>();		auto shapedType = outputOperand.get().getType().cast<ShapedType>();
if (VectorType vectorType = extractVectorTypeFromShapedValue(dest)) {		if (VectorType vectorType =
		extractVectorTypeFromShapedValue(outputOperand.get())) {
		auto linalgOp = cast<LinalgOp>(outputOperand.getOwner());
		AffineMap map = reindexIndexingMap(
		linalgOp.getIndexingMap(outputOperand.getOperandNumber()));
SmallVector<Value> indices(shapedType.getRank(), std_constant_index(0));		SmallVector<Value> indices(shapedType.getRank(), std_constant_index(0));
if (vectorType != value.getType())		value = broadcastIfNeeded(builder, value, vectorType.getShape());
value = vector_broadcast(vectorType, value);		value = reduceIfNeeded(builder, vectorType, value, outputOperand);
write = vector_transfer_write(value, dest, indices);		write = vector_transfer_write(value, outputOperand.get(), indices, map);
} else {		} else {
write = memref_store(value, dest);		write = memref_store(value, outputOperand.get());
}		}
LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: vectorized op: " << *write);		LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: vectorized op: " << *write);
if (!write->getResults().empty())		if (!write->getResults().empty())
return write->getResult(0);		return write->getResult(0);
return Value();		return Value();
}		}

/// If value of assumed VectorType has a shape different than `shape`, buil and
/// return a new vector.broadcast to `shape`.
/// Otherwise, just return value.
static Value broadcastIfNeeded(OpBuilder &builder, Value value,
ArrayRef<int64_t> shape) {
auto vecType = value.getType().dyn_cast<VectorType>();
if (shape.empty() \|\| (vecType != nullptr && vecType.getShape() == shape))
return value;
auto newVecType = VectorType::get(shape, vecType ? vecType.getElementType()
: value.getType());
return builder.create<vector::BroadcastOp>(
builder.getInsertionPoint()->getLoc(), newVecType, value);
}

// Custom vectorization function type. Produce a vector form of Operation*		// Custom vectorization function type. Produce a vector form of Operation*
// assuming all its vectorized operands are already in the BlockAndValueMapping.		// assuming all its vectorized operands are already in the BlockAndValueMapping.
// Return nullptr if the Operation cannot be vectorized.		// Return nullptr if the Operation cannot be vectorized.
using CustomVectorizationHook = std::function<VectorizationResult(		using CustomVectorizationHook = std::function<VectorizationResult(
Operation *, const BlockAndValueMapping &)>;		Operation *, const BlockAndValueMapping &)>;

/// Helper function to vectorize the terminator of a `linalgOp`. New result		/// Helper function to vectorize the terminator of a `linalgOp`. New result
/// vector values are appended to `newResults`. Return		/// vector values are appended to `newResults`. Return
/// VectorizationStatus::NoReplace to signal the vectorization algorithm that it		/// VectorizationStatus::NoReplace to signal the vectorization algorithm that it
/// should not try to map produced operations and instead return the results		/// should not try to map produced operations and instead return the results
/// using the `newResults` vector making them available to the		/// using the `newResults` vector making them available to the
/// vectorization algorithm for RAUW. This function is meant to be used as a		/// vectorization algorithm for RAUW. This function is meant to be used as a
/// CustomVectorizationHook.		/// CustomVectorizationHook.
static VectorizationResult		static VectorizationResult
vectorizeLinalgYield(OpBuilder &builder, Operation *op,		vectorizeLinalgYield(OpBuilder &builder, Operation *op,
const BlockAndValueMapping &bvm, LinalgOp linalgOp,		const BlockAndValueMapping &bvm, LinalgOp linalgOp,
SmallVectorImpl<Value> &newResults) {		SmallVectorImpl<Value> &newResults) {
auto yieldOp = dyn_cast<linalg::YieldOp>(op);		auto yieldOp = dyn_cast<linalg::YieldOp>(op);
if (!yieldOp)		if (!yieldOp)
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};
for (auto outputs : llvm::enumerate(yieldOp.values())) {		for (auto outputs : llvm::enumerate(yieldOp.values())) {
// TODO: Scan for an opportunity for reuse.		// TODO: Scan for an opportunity for reuse.
// TODO: use a map.		// TODO: use a map.
Value vectorValue = bvm.lookup(outputs.value());		Value vectorValue = bvm.lookup(outputs.value());
Value newResult = buildVectorWrite(builder, vectorValue,		Value newResult = buildVectorWrite(
linalgOp.getOutput(outputs.index()));		builder, vectorValue, linalgOp.getOutputOpOperands()[outputs.index()]);
if (newResult)		if (newResult)
newResults.push_back(newResult);		newResults.push_back(newResult);
}		}
return VectorizationResult{VectorizationStatus::NoReplace, nullptr};		return VectorizationResult{VectorizationStatus::NoReplace, nullptr};
}		}

/// Helper function to vectorize the index operations of a `linalgOp`. Return		/// Helper function to vectorize the index operations of a `linalgOp`. Return
/// VectorizationStatus::NewOp to signal the vectorization algorithm that it		/// VectorizationStatus::NewOp to signal the vectorization algorithm that it
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = linalgOp.getNumOutputs(); i < e; i++) {
if (!linalgOp.getOutputIndexingMap(i).isIdentity())		if (!linalgOp.getOutputIndexingMap(i).isIdentity())
return false;		return false;
}		}
if (linalgOp->getNumRegions() != 1)		if (linalgOp->getNumRegions() != 1)
return false;		return false;
return hasOnlyScalarElementwiseOp(linalgOp->getRegion(0));		return hasOnlyScalarElementwiseOp(linalgOp->getRegion(0));
}		}

// Calculate the map to apply to transfer_read to convert the input shape into
// the output shape.
static AffineMap getTransferReadMap(LinalgOp linalgOp, unsigned argIndex) {
AffineMap linalgMap = linalgOp.getIndexingMap(argIndex);
MLIRContext *context = linalgMap.getContext();
AffineExpr zero = mlir::getAffineConstantExpr(0, context);
SmallVector<AffineExpr, 4> exprs(linalgMap.getNumInputs(), zero);
for (unsigned i : llvm::seq(unsigned(0), linalgMap.getNumResults())) {
exprs[linalgMap.getDimPosition(i)] = getAffineDimExpr(i, context);
}
return AffineMap::get(linalgMap.getNumResults(), /symbolCount=/0, exprs,
context);
}

/// Generic vectorization function that rewrites the body of a `linalgOp` into		/// Generic vectorization function that rewrites the body of a `linalgOp` into
/// vector form. Generic vectorization proceeds as follows:		/// vector form. Generic vectorization proceeds as follows:
/// 1. Verify the `linalgOp` has one non-empty region.		/// 1. Verify the `linalgOp` has one non-empty region.
/// 2. Values defined above the region are mapped to themselves and will be		/// 2. Values defined above the region are mapped to themselves and will be
/// broadcasted on a per-need basis by their consumers.		/// broadcasted on a per-need basis by their consumers.
/// 3. Each region argument is vectorized into a vector.transfer_read (or 0-d		/// 3. Each region argument is vectorized into a vector.transfer_read (or 0-d
/// load).		/// load).
/// TODO: Reuse opportunities for RAR dependencies.		/// TODO: Reuse opportunities for RAR dependencies.
/// 4a. Register CustomVectorizationHook for YieldOp to capture the results.		/// 4a. Register CustomVectorizationHook for YieldOp to capture the results.
/// 4b. Register CustomVectorizationHook for IndexOp to access the iteration		/// 4b. Register CustomVectorizationHook for IndexOp to access the iteration
/// indices.		/// indices.
/// 5. Iteratively call vectorizeOneOp on the region operations.		/// 5. Iteratively call vectorizeOneOp on the region operations.
		///
		/// When `broadcastToMaximalCommonShape` is set to true, eager broadcasting is
		/// performed to the maximal common vector size implied by the `linalgOp`
		/// iteration space. This eager broadcasting is introduced in the
		/// permutation_map of the vector.transfer_read operations. The eager
		/// broadcasting makes it trivial to detrmine where broadcast, transposes and
		/// reductions should occur, without any bookkeeping. The tradeoff is that, in
		/// the absence of good canonicalizations, the amount of work increases.
		/// This is not deemed a problem as we expect canonicalizations and foldings to
		/// aggressively clean up the useless work.
LogicalResult vectorizeAsLinalgGeneric(		LogicalResult vectorizeAsLinalgGeneric(
OpBuilder &builder, LinalgOp linalgOp, SmallVectorImpl<Value> &newResults,		OpBuilder &builder, LinalgOp linalgOp, SmallVectorImpl<Value> &newResults,
		bool broadcastToMaximalCommonShape = false,
ArrayRef<CustomVectorizationHook> customVectorizationHooks = {}) {		ArrayRef<CustomVectorizationHook> customVectorizationHooks = {}) {
// 1. Fail to vectorize if the operation does not have one non-empty region.		// 1. Fail to vectorize if the operation does not have one non-empty region.
if (linalgOp->getNumRegions() != 1 \|\| linalgOp->getRegion(0).empty())		if (linalgOp->getNumRegions() != 1 \|\| linalgOp->getRegion(0).empty())
return failure();		return failure();
auto &block = linalgOp->getRegion(0).front();		auto &block = linalgOp->getRegion(0).front();

BlockAndValueMapping bvm;
// 2. Values defined above the region can only be broadcast for now. Make them		// 2. Values defined above the region can only be broadcast for now. Make them
// map to themselves.		// map to themselves.
		BlockAndValueMapping bvm;
SetVector<Value> valuesSet;		SetVector<Value> valuesSet;
mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet);		mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet);
bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());		bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());

		if (linalgOp.getNumOutputs() == 0)
		return failure();

		// TODO: the common vector shape is equal to the static loop sizes only when
		// all indexing maps are projected permutations. For convs and stencils the
		// logic will need to evolve.
		SmallVector<int64_t> commonVectorShape = linalgOp.computeStaticLoopSizes();

// 3. Turn all BBArgs into vector.transfer_read / load.		// 3. Turn all BBArgs into vector.transfer_read / load.
SmallVector<AffineMap> indexings;		SmallVector<AffineMap> indexings;
for (auto bbarg : block.getArguments()) {		for (auto bbarg : block.getArguments()) {
Value vectorArg = linalgOp.getShapedOperand(bbarg.getArgNumber());		Value shapedArg = linalgOp.getShapedOperand(bbarg.getArgNumber());
AffineMap map;		ShapedType shapedType = shapedArg.getType().cast<ShapedType>();
VectorType vectorType = extractVectorTypeFromShapedValue(vectorArg);		// TODO: 0-d vectors.
if (isElementwise(linalgOp) &&		if (shapedType.getShape().empty()) {
!linalgOp.getIndexingMap(bbarg.getArgNumber()).isMinorIdentity()) {		Value loaded =
// Currently assume we don't support output permutations.		builder.create<memref::LoadOp>(linalgOp.getLoc(), shapedArg);
assert(linalgOp.getNumOutputs() > 0 &&		LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: new vectorized bbarg("
linalgOp.getOutputIndexingMap(0).isIdentity());		<< bbarg.getArgNumber() << "): " << loaded);
ArrayRef<int64_t> outputShape =		bvm.map(bbarg, loaded);
linalgOp.getOutputShapedType(0).getShape();		bvm.map(shapedArg, loaded);
vectorType = VectorType::get(outputShape, vectorType.getElementType());		continue;
map = getTransferReadMap(linalgOp, bbarg.getArgNumber());		}
		AffineMap map = inversePermutation(
		reindexIndexingMap(linalgOp.getIndexingMap(bbarg.getArgNumber())));
		VectorType vectorType = VectorType::get(map.compose(shapedType.getShape()),
		shapedType.getElementType());
		if (broadcastToMaximalCommonShape) {
		map = vector::TransferReadOp::insertBroadcasts(map, vectorType,
		commonVectorShape);
		vectorType =
		VectorType::get(commonVectorShape, vectorType.getElementType());
}		}
Value vectorRead = buildVectorRead(builder, vectorArg, vectorType, map);		Value vectorRead = buildVectorRead(builder, shapedArg, vectorType, map);
LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: new vectorized bbarg("		LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: new vectorized bbarg("
<< bbarg.getArgNumber() << "): " << vectorRead);		<< bbarg.getArgNumber() << "): " << vectorRead);
bvm.map(bbarg, vectorRead);		bvm.map(bbarg, vectorRead);
bvm.map(vectorArg, vectorRead);		bvm.map(shapedArg, vectorRead);
}		}

auto hooks = llvm::to_vector<4>(customVectorizationHooks);		auto hooks = llvm::to_vector<4>(customVectorizationHooks);
// 4a. Register CustomVectorizationHook for yieldOp.		// 4a. Register CustomVectorizationHook for yieldOp.
CustomVectorizationHook vectorizeYield =		CustomVectorizationHook vectorizeYield =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeLinalgYield(builder, op, bvm, linalgOp, newResults);		return vectorizeLinalgYield(builder, op, bvm, linalgOp, newResults);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	CustomVectorizationHook vectorizeContraction =
if (!isa<MulIOp, MulFOp>(op))		if (!isa<MulIOp, MulFOp>(op))
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};
auto outShape = linalgOp.getOutputShapedType(0).getShape();		auto outShape = linalgOp.getOutputShapedType(0).getShape();
auto vType = outShape.empty()		auto vType = outShape.empty()
? op->getResult(0).getType()		? op->getResult(0).getType()
: VectorType::get(outShape, op->getResult(0).getType());		: VectorType::get(outShape, op->getResult(0).getType());
auto zero =		auto zero =
builder.create<ConstantOp>(loc, vType, builder.getZeroAttr(vType));		builder.create<ConstantOp>(loc, vType, builder.getZeroAttr(vType));
		// Indexing maps at the time of vector.transfer_read are adjusted to order
		// vector dimensions in the same order as the canonical linalg op iteration
		// space order.
		// The indexings for the contraction therefore need to be adjusted.
		// TODO: consider dropping contraction special casing altogether, this will
		// require more advanced canonicalizations involving vector.multi_reduction
		// that are not yet available.
		SmallVector<AffineMap> indexingMaps{
		inversePermutation(reindexIndexingMap(linalgOp.getIndexingMap(0)))
		.compose(linalgOp.getIndexingMap(0)),
		inversePermutation(reindexIndexingMap(linalgOp.getIndexingMap(1)))
		.compose(linalgOp.getIndexingMap(1)),
		inversePermutation(reindexIndexingMap(linalgOp.getIndexingMap(2)))
		.compose(linalgOp.getIndexingMap(2))};
Operation *contract = builder.create<vector::ContractionOp>(		Operation *contract = builder.create<vector::ContractionOp>(
loc, bvm.lookup(op->getOperand(0)), bvm.lookup(op->getOperand(1)), zero,		loc, bvm.lookup(op->getOperand(0)), bvm.lookup(op->getOperand(1)), zero,
linalgOp.indexing_maps(), linalgOp.iterator_types());		builder.getAffineMapArrayAttr(indexingMaps), linalgOp.iterator_types());
return VectorizationResult{VectorizationStatus::NewOp, contract};		return VectorizationResult{VectorizationStatus::NewOp, contract};
};		};
return vectorizeAsLinalgGeneric(builder, linalgOp, newResults,		return vectorizeAsLinalgGeneric(builder, linalgOp, newResults,
		/broadcastToMaximalCommonShape=/false,
{vectorizeContraction});		{vectorizeContraction});
}		}

		static bool allIndexingsAreProjectedPermutation(LinalgOp op) {
		return llvm::all_of(op.getIndexingMaps(),
		[](AffineMap m) { return m.isProjectedPermutation(); });
		}

LogicalResult mlir::linalg::vectorizeLinalgOpPrecondition(Operation *op) {		LogicalResult mlir::linalg::vectorizeLinalgOpPrecondition(Operation *op) {
auto linalgOp = cast<linalg::LinalgOp>(op);		auto linalgOp = cast<linalg::LinalgOp>(op);
// All types must be static shape to go to vector.		// All types must be static shape to go to vector.
for (Value operand : linalgOp.getShapedOperands())		for (Value operand : linalgOp.getShapedOperands())
if (!operand.getType().cast<ShapedType>().hasStaticShape())		if (!operand.getType().cast<ShapedType>().hasStaticShape())
return failure();		return failure();
for (Type outputTensorType : linalgOp.getOutputTensorTypes())		for (Type outputTensorType : linalgOp.getOutputTensorTypes())
if (!outputTensorType.cast<ShapedType>().hasStaticShape())		if (!outputTensorType.cast<ShapedType>().hasStaticShape())
return failure();		return failure();
if (isElementwise(op))		if (isElementwise(op))
return success();		return success();
return success(isaContractionOpInterface(linalgOp));		if (isaContractionOpInterface(linalgOp))
		return success();
		// TODO: the common vector shape is equal to the static loop sizes only when
		// all indexing maps are projected permutations. For convs and stencils the
		// logic will need to evolve.
		// TODO: probably need some extra checks for reduction followed by consumer
		// ops that may not commute (e.g. linear reduction + non-linear instructions).
		if (allIndexingsAreProjectedPermutation(linalgOp))
		return success();
		return failure();
}		}

LogicalResult		LogicalResult
mlir::linalg::vectorizeLinalgOp(OpBuilder &builder, Operation *op,		mlir::linalg::vectorizeLinalgOp(OpBuilder &builder, Operation *op,
SmallVectorImpl<Value> &newResults) {		SmallVectorImpl<Value> &newResults) {
if (failed(vectorizeLinalgOpPrecondition(op)))		if (failed(vectorizeLinalgOpPrecondition(op)))
return failure();		return failure();

edsc::ScopedContext scope(builder, op->getLoc());		edsc::ScopedContext scope(builder, op->getLoc());
if (isElementwise(op)) {		auto linalgOp = cast<LinalgOp>(op);
LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: "
<< "Vectorize linalg op as a generic: " << *op);		if (isaContractionOpInterface(linalgOp))
return vectorizeAsLinalgGeneric(builder, cast<LinalgOp>(op), newResults);		return vectorizeContraction(builder, linalgOp, newResults);
}

return vectorizeContraction(builder, cast<LinalgOp>(op), newResults);		LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: "
		<< "Vectorize linalg op as a generic by broadcasting to "
		"maximal common shape: "
		<< *op);
		return vectorizeAsLinalgGeneric(builder, linalgOp, newResults,
		/broadcastToMaximalCommonShape=/true);
}		}

//----------------------------------------------------------------------------//		//----------------------------------------------------------------------------//
// Misc. vectorization patterns.		// Misc. vectorization patterns.
//----------------------------------------------------------------------------//		//----------------------------------------------------------------------------//

/// Rewrite a PadTensorOp into a sequence of InitTensorOp, TransferReadOp and		/// Rewrite a PadTensorOp into a sequence of InitTensorOp, TransferReadOp and
/// TransferWriteOp. For now, this only applies when all low and high paddings		/// TransferWriteOp. For now, this only applies when all low and high paddings
▲ Show 20 Lines • Show All 385 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorOps.cpp

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines
}		}

ArrayAttr vector::getVectorSubscriptAttr(Builder &builder,		ArrayAttr vector::getVectorSubscriptAttr(Builder &builder,
ArrayRef<int64_t> values) {		ArrayRef<int64_t> values) {
return builder.getI64ArrayAttr(values);		return builder.getI64ArrayAttr(values);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// MultiDimReductionOp
		//===----------------------------------------------------------------------===//

		void vector::MultiDimReductionOp::build(OpBuilder &builder,
		OperationState &result, Value source,
		ArrayRef<bool> reductionMask,
		CombiningKind kind) {
		result.addOperands(source);
		auto sourceVectorType = source.getType().cast<VectorType>();
		auto targetShape = MultiDimReductionOp::inferDestShape(
		sourceVectorType.getShape(), reductionMask);
		auto targetVectorType =
		VectorType::get(targetShape, sourceVectorType.getElementType());
		result.addTypes(targetVectorType);

		SmallVector<int64_t> reductionDims;
		for (auto en : llvm::enumerate(reductionMask))
		if (en.value())
		reductionDims.push_back(en.index());
		result.addAttribute(getReductionDimsAttrName(),
		builder.getI64ArrayAttr(reductionDims));
		result.addAttribute(getKindAttrName(),
		CombiningKindAttr::get(kind, builder.getContext()));
		}

		static LogicalResult verify(MultiDimReductionOp op) {
		auto reductionMask = op.getReductionMask();
		auto targetShape = MultiDimReductionOp::inferDestShape(
		op.getSourceVectorType().getShape(), reductionMask);
		auto targetVectorType =
		VectorType::get(targetShape, op.getSourceVectorType().getElementType());
		if (targetVectorType != op.getDestVectorType())
		return op.emitError("invalid output vector type: ")
		<< op.getDestVectorType() << " (expected: " << targetVectorType
		<< ")";
		return success();
		}

		//===----------------------------------------------------------------------===//
// ReductionOp		// ReductionOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

static LogicalResult verify(ReductionOp op) {		static LogicalResult verify(ReductionOp op) {
// Verify for 1-D vector.		// Verify for 1-D vector.
int64_t rank = op.getVectorType().getRank();		int64_t rank = op.getVectorType().getRank();
if (rank != 1)		if (rank != 1)
return op.emitOpError("unsupported reduction rank: ") << rank;		return op.emitOpError("unsupported reduction rank: ") << rank;
▲ Show 20 Lines • Show All 1,912 Lines • ▼ Show 20 Lines	void ExtractStridedSliceOp::getCanonicalizationPatterns(
results.add<StridedSliceConstantMaskFolder, StridedSliceConstantFolder,		results.add<StridedSliceConstantMaskFolder, StridedSliceConstantFolder,
StridedSliceBroadcast>(context);		StridedSliceBroadcast>(context);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TransferReadOp		// TransferReadOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		AffineMap TransferReadOp::insertBroadcasts(AffineMap map, VectorType vt,
		ThomasRaouxUnsubmitted Done Reply Inline Actions It doesn't depend on the TransferReadOp? Should this be static? ThomasRaoux: It doesn't depend on the TransferReadOp? Should this be static?
		nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions It is a static method in TransferReadOp, see the .td decl. nicolasvasilache: It is a static method in TransferReadOp, see the .td decl.
		ArrayRef<int64_t> targetShape) {
		unsigned targetRank = targetShape.size();
		assert(vt.getShape().size() <= targetRank && "mismatching ranks");
		if (vt.getShape().size() == targetRank)
		return map;
		MLIRContext *ctx = map.getContext();
		SmallVector<AffineExpr> exprs;
		exprs.reserve(targetRank);
		for (unsigned idx = 0, vtidx = 0; idx < targetRank; ++idx) {
		// If shapes match, just keep the existing indexing and advance ranks.
		if (vtidx < vt.getShape().size() &&
		vt.getShape()[vtidx] == targetShape[idx]) {
		exprs.push_back(map.getResult(vtidx));
		++vtidx;
		continue;
		}
		// Otherwise insert a broadcast.
		exprs.push_back(getAffineConstantExpr(0, ctx));
		}
		return AffineMap::get(map.getNumDims(), /numSymbols=/0, exprs, ctx);
		}

template <typename EmitFun>		template <typename EmitFun>
static LogicalResult verifyPermutationMap(AffineMap permutationMap,		static LogicalResult verifyPermutationMap(AffineMap permutationMap,
EmitFun emitOpError) {		EmitFun emitOpError) {
SmallVector<bool, 8> seen(permutationMap.getNumInputs(), false);		SmallVector<bool, 8> seen(permutationMap.getNumInputs(), false);
for (auto expr : permutationMap.getResults()) {		for (auto expr : permutationMap.getResults()) {
auto dim = expr.dyn_cast<AffineDimExpr>();		auto dim = expr.dyn_cast<AffineDimExpr>();
auto zero = expr.dyn_cast<AffineConstantExpr>();		auto zero = expr.dyn_cast<AffineConstantExpr>();
if (zero) {		if (zero) {
▲ Show 20 Lines • Show All 1,579 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorTransforms.cpp

Show First 20 Lines • Show All 1,746 Lines • ▼ Show 20 Lines	static Value createMul(Location loc, Value x, Value y, bool isInt,
return rewriter.create<MulFOp>(loc, x, y);		return rewriter.create<MulFOp>(loc, x, y);
}		}

namespace mlir {		namespace mlir {

/// Progressively lower a `vector.contract %a, %b, %c` with row-major matmul		/// Progressively lower a `vector.contract %a, %b, %c` with row-major matmul
/// semantics to:		/// semantics to:
/// ```		/// ```
/// %flattened_a = vector.shape_cast %a		/// %mta = maybe_transpose
/// %flattened_b = vector.shape_cast %b		/// %mtb = maybe_transpose
		/// %flattened_a = vector.shape_cast %mta
		/// %flattened_b = vector.shape_cast %mtb
/// %flattened_d = vector.matmul %flattened_a, %flattened_b		/// %flattened_d = vector.matmul %flattened_a, %flattened_b
/// %d = vector.shape_cast %%flattened_d		/// %mtd = vector.shape_cast %flattened_d
		/// %d = maybe_untranspose %mtd
/// %e = add %c, %d		/// %e = add %c, %d
/// ```		/// ```
/// `vector.matmul` later lowers to `llvm.matrix.multiply`.		/// `vector.matmul` later lowers to `llvm.matrix.multiply`.
//		//
/// This only kicks in when VectorTransformsOptions is set to OuterProduct and		/// This only kicks in when VectorTransformsOptions is set to `Matmul`.
/// the vector.contract op is a row-major matrix multiply.		/// vector.transpose operations are inserted if the vector.contract op is not a
LogicalResult ContractionOpToMatmulOpLowering::matchAndRewrite(		/// row-major matrix multiply.
vector::ContractionOp op, PatternRewriter &rewriter) const {		LogicalResult
		ContractionOpToMatmulOpLowering::matchAndRewrite(vector::ContractionOp op,
		PatternRewriter &rew) const {
// TODO: implement masks		// TODO: implement masks
if (llvm::size(op.masks()) != 0)		if (llvm::size(op.masks()) != 0)
return failure();		return failure();
if (vectorTransformsOptions.vectorContractLowering !=		if (vectorTransformsOptions.vectorContractLowering !=
vector::VectorContractLowering::Matmul)		vector::VectorContractLowering::Matmul)
return failure();		return failure();
if (failed(filter(op)))		if (failed(filter(op)))
return failure();		return failure();

auto iteratorTypes = op.iterator_types().getValue();		auto iteratorTypes = op.iterator_types().getValue();
if (!isParallelIterator(iteratorTypes[0]) \|\|		if (!isParallelIterator(iteratorTypes[0]) \|\|
!isParallelIterator(iteratorTypes[1]) \|\|		!isParallelIterator(iteratorTypes[1]) \|\|
!isReductionIterator(iteratorTypes[2]))		!isReductionIterator(iteratorTypes[2]))
return failure();		return failure();

if (!isRowMajorMatmul(op.indexing_maps()))
return failure();

Type elementType = op.getLhsType().getElementType();		Type elementType = op.getLhsType().getElementType();
if (!elementType.isIntOrFloat())		if (!elementType.isIntOrFloat())
return failure();		return failure();

VectorType lhsType = op.getLhsType();		// Perform lhs + rhs transpositions to conform to matmul row-major semantics.
VectorType rhsType = op.getRhsType();		// Bail out if the contraction cannot be put in this form.
		MLIRContext *ctx = op.getContext();
		Location loc = op.getLoc();
		AffineExpr m, n, k;
		bindDims(rew.getContext(), m, n, k);
		// LHS must be A(m, k) or A(k, m).
		Value lhs = op.lhs();
		auto lhsMap = op.indexing_maps()[0].cast<AffineMapAttr>().getValue();
		if (lhsMap == AffineMap::get(3, 0, {k, m}, ctx))
		lhs = rew.create<vector::TransposeOp>(loc, lhs, ArrayRef<int64_t>{1, 0});
		else if (lhsMap != AffineMap::get(3, 0, {m, k}, ctx))
		return failure();

		// RHS must be B(k, n) or B(n, k).
		Value rhs = op.rhs();
		auto rhsMap = op.indexing_maps()[1].cast<AffineMapAttr>().getValue();
		if (rhsMap == AffineMap::get(3, 0, {n, k}, ctx))
		rhs = rew.create<vector::TransposeOp>(loc, rhs, ArrayRef<int64_t>{1, 0});
		else if (rhsMap != AffineMap::get(3, 0, {k, n}, ctx))
		return failure();

		// At this point lhs and rhs are in row-major.
		VectorType lhsType = lhs.getType().cast<VectorType>();
		VectorType rhsType = rhs.getType().cast<VectorType>();
int64_t lhsRows = lhsType.getDimSize(0);		int64_t lhsRows = lhsType.getDimSize(0);
int64_t lhsColumns = lhsType.getDimSize(1);		int64_t lhsColumns = lhsType.getDimSize(1);
int64_t rhsColumns = rhsType.getDimSize(1);		int64_t rhsColumns = rhsType.getDimSize(1);

Type flattenedLHSType =		Type flattenedLHSType =
VectorType::get(lhsType.getNumElements(), lhsType.getElementType());		VectorType::get(lhsType.getNumElements(), lhsType.getElementType());
		lhs = rew.create<vector::ShapeCastOp>(loc, flattenedLHSType, lhs);

Type flattenedRHSType =		Type flattenedRHSType =
VectorType::get(rhsType.getNumElements(), rhsType.getElementType());		VectorType::get(rhsType.getNumElements(), rhsType.getElementType());
auto lhs = rewriter.create<vector::ShapeCastOp>(op.getLoc(), flattenedLHSType,		rhs = rew.create<vector::ShapeCastOp>(loc, flattenedRHSType, rhs);
op.lhs());
auto rhs = rewriter.create<vector::ShapeCastOp>(op.getLoc(), flattenedRHSType,		Value mul = rew.create<vector::MatmulOp>(loc, lhs, rhs, lhsRows, lhsColumns,
op.rhs());		rhsColumns);
		mul = rew.create<vector::ShapeCastOp>(loc, op.acc().getType(), mul);
Value mul = rewriter.create<vector::MatmulOp>(op.getLoc(), lhs, rhs, lhsRows,		Value res = elementType.isa<IntegerType>()
lhsColumns, rhsColumns);		? static_cast<Value>(rew.create<AddIOp>(loc, op.acc(), mul))
mul = rewriter.create<vector::ShapeCastOp>(op.getLoc(), op.acc().getType(),		: static_cast<Value>(rew.create<AddFOp>(loc, op.acc(), mul));
mul);
if (elementType.isa<IntegerType>())		// ACC must be C(m, n) or C(n, m).
rewriter.replaceOpWithNewOp<AddIOp>(op, op.acc(), mul);		auto accMap = op.indexing_maps()[2].cast<AffineMapAttr>().getValue();
else		if (accMap == AffineMap::get(3, 0, {n, m}, ctx))
rewriter.replaceOpWithNewOp<AddFOp>(op, op.acc(), mul);		res = rew.create<vector::TransposeOp>(loc, res, ArrayRef<int64_t>{1, 0});
		else if (accMap != AffineMap::get(3, 0, {m, n}, ctx))
		llvm_unreachable("invalid contraction semantics");

		rew.replaceOp(op, res);
return success();		return success();
}		}

/// Progressively lower a `vector.contract %a, %b, %c` with row-major matmul		/// Progressively lower a `vector.contract %a, %b, %c` with row-major matmul
/// semantics to a reduction_size-unrolled sequence:		/// semantics to a reduction_size-unrolled sequence:
/// ```		/// ```
/// %at = vector.transpose %a, [1, 0]		/// %at = vector.transpose %a, [1, 0]
/// %bRow0 = vector.extract %b[0]		/// %bRow0 = vector.extract %b[0]
▲ Show 20 Lines • Show All 1,792 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/transform-patterns-matmul-to-vector.mlir

	Show All 16 Lines
	// CHECK: vector.transfer_write {{.*}} : vector<8x12xf32>, memref<8x12xf32>			// CHECK: vector.transfer_write {{.*}} : vector<8x12xf32>, memref<8x12xf32>
	//			//
	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK: linalg.copy			// CHECK: linalg.copy
	// CHECK: linalg.copy			// CHECK: linalg.copy
	//			//
	// CHECK: vector.contract			// CHECK: vector.contract
	// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"]			// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"]
	// CHECK-SAME: : vector<8x16xf32>, vector<16x12xf32> into vector<8x12xf32>			// CHECK-SAME: : vector<8x16xf32>, vector<12x16xf32> into vector<8x12xf32>
	//			//
	// CHECK: linalg.copy			// CHECK: linalg.copy

mlir/test/Dialect/Linalg/vectorization.mlir

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	#matmul_trait = {
indexing_maps = [		indexing_maps = [
affine_map<(m, n, k) -> (m, k)>,		affine_map<(m, n, k) -> (m, k)>,
affine_map<(m, n, k) -> (k, n)>,		affine_map<(m, n, k) -> (k, n)>,
affine_map<(m, n, k) -> (m, n)>		affine_map<(m, n, k) -> (m, n)>
],		],
iterator_types = ["parallel", "parallel", "reduction"]		iterator_types = ["parallel", "parallel", "reduction"]
}		}

		// CHECK-DAG: #[[$trans_2d:.*]] = affine_map<(d0, d1) -> (d1, d0)>
// CHECK-DAG: #[[$mk:.*]] = affine_map<(d0, d1, d2) -> (d0, d2)>		// CHECK-DAG: #[[$mk:.*]] = affine_map<(d0, d1, d2) -> (d0, d2)>
// CHECK-DAG: #[[$kn:.*]] = affine_map<(d0, d1, d2) -> (d2, d1)>		// CHECK-DAG: #[[$nk:.*]] = affine_map<(d0, d1, d2) -> (d1, d2)>
// CHECK-DAG: #[[$mn:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>		// CHECK-DAG: #[[$mn:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>

// CHECK-LABEL: func @vectorization_test		// CHECK-LABEL: func @vectorization_test
func @vectorization_test(%A: memref<8x16xf32>, %B: memref<16x32xf32>,		func @vectorization_test(%A: memref<8x16xf32>, %B: memref<16x32xf32>,
%C: memref<8x32xf32>) {		%C: memref<8x32xf32>) {
// CHECK: vector.transfer_read %{{.*}} : memref<8x16xf32>, vector<8x16xf32>		// CHECK: vector.transfer_read %{{.*}} : memref<8x16xf32>, vector<8x16xf32>
// CHECK: vector.transfer_read %{{.*}} : memref<16x32xf32>, vector<16x32xf32>		// CHECK: vector.transfer_read %{{.*}} : memref<16x32xf32>, vector<32x16xf32>
// CHECK: vector.transfer_read %{{.*}} : memref<8x32xf32>, vector<8x32xf32>		// CHECK: vector.transfer_read %{{.*}} : memref<8x32xf32>, vector<8x32xf32>
// CHECK: vector.contract {indexing_maps = [#[[$mk]], #[[$kn]], #[[$mn]]]		// CHECK: vector.contract {indexing_maps = [#[[$mk]], #[[$nk]], #[[$mn]]]
// CHECK-SAME: vector<8x16xf32>, vector<16x32xf32> into vector<8x32xf32>		// CHECK-SAME: vector<8x16xf32>, vector<32x16xf32> into vector<8x32xf32>
// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xf32>, memref<8x32xf32>		// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xf32>, memref<8x32xf32>
linalg.generic #matmul_trait		linalg.generic #matmul_trait
ins(%A, %B : memref<8x16xf32>, memref<16x32xf32>)		ins(%A, %B : memref<8x16xf32>, memref<16x32xf32>)
outs(%C : memref<8x32xf32>) {		outs(%C : memref<8x32xf32>) {
^bb(%a: f32, %b: f32, %c: f32) :		^bb(%a: f32, %b: f32, %c: f32) :
%d = mulf %a, %b: f32		%d = mulf %a, %b: f32
%e = addf %c, %d: f32		%e = addf %c, %d: f32
linalg.yield %e : f32		linalg.yield %e : f32
Show All 9 Lines	#matmul_trait = {
indexing_maps = [		indexing_maps = [
affine_map<(m, n, k) -> (m, k)>,		affine_map<(m, n, k) -> (m, k)>,
affine_map<(m, n, k) -> (k, n)>,		affine_map<(m, n, k) -> (k, n)>,
affine_map<(m, n, k) -> (m, n)>		affine_map<(m, n, k) -> (m, n)>
],		],
iterator_types = ["parallel", "parallel", "reduction"]		iterator_types = ["parallel", "parallel", "reduction"]
}		}

		// CHECK-DAG: #[[$trans_2d:.*]] = affine_map<(d0, d1) -> (d1, d0)>
// CHECK-DAG: #[[$mk:.*]] = affine_map<(d0, d1, d2) -> (d0, d2)>		// CHECK-DAG: #[[$mk:.*]] = affine_map<(d0, d1, d2) -> (d0, d2)>
// CHECK-DAG: #[[$kn:.*]] = affine_map<(d0, d1, d2) -> (d2, d1)>		// CHECK-DAG: #[[$nk:.*]] = affine_map<(d0, d1, d2) -> (d1, d2)>
// CHECK-DAG: #[[$mn:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>		// CHECK-DAG: #[[$mn:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>

// CHECK-LABEL: func @vectorization_test_integer		// CHECK-LABEL: func @vectorization_test_integer
func @vectorization_test_integer(%A: memref<8x16xi32>, %B: memref<16x32xi32>,		func @vectorization_test_integer(%A: memref<8x16xi32>, %B: memref<16x32xi32>,
%C: memref<8x32xi32>) {		%C: memref<8x32xi32>) {
// CHECK: vector.transfer_read %{{.*}} : memref<8x16xi32>, vector<8x16xi32>		// CHECK: vector.transfer_read %{{.*}} : memref<8x16xi32>, vector<8x16xi32>
// CHECK: vector.transfer_read %{{.*}} : memref<16x32xi32>, vector<16x32xi32>		// CHECK: vector.transfer_read %{{.*}} : memref<16x32xi32>, vector<32x16xi32>
// CHECK: vector.transfer_read %{{.*}} : memref<8x32xi32>, vector<8x32xi32>		// CHECK: vector.transfer_read %{{.*}} : memref<8x32xi32>, vector<8x32xi32>
// CHECK: vector.contract {indexing_maps = [#[[$mk]], #[[$kn]], #[[$mn]]],		// CHECK: vector.contract {indexing_maps = [#[[$mk]], #[[$nk]], #[[$mn]]],
// CHECK-SAME: vector<8x16xi32>, vector<16x32xi32> into vector<8x32xi32>		// CHECK-SAME: vector<8x16xi32>, vector<32x16xi32> into vector<8x32xi32>
// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xi32>, memref<8x32xi32>		// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xi32>, memref<8x32xi32>
linalg.generic #matmul_trait		linalg.generic #matmul_trait
ins(%A, %B : memref<8x16xi32>, memref<16x32xi32>)		ins(%A, %B : memref<8x16xi32>, memref<16x32xi32>)
outs(%C : memref<8x32xi32>) {		outs(%C : memref<8x32xi32>) {
^bb(%a: i32, %b: i32, %c: i32) :		^bb(%a: i32, %b: i32, %c: i32) :
%d = muli %a, %b: i32		%d = muli %a, %b: i32
%e = addi %c, %d: i32		%e = addi %c, %d: i32
linalg.yield %e : i32		linalg.yield %e : i32
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	func @generic_vectorize(%arg0: memref<4x256xf32>,
ins(%arg1, %arg2: memref<4x256xf32>, memref<256xf32>)		ins(%arg1, %arg2: memref<4x256xf32>, memref<256xf32>)
outs(		outs(
%arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0 :		%arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0, %arg0 :
memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>,		memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>,
memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>,		memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>, memref<4x256xf32>,
memref<4x256xf32>, memref<4x256xf32>) {		memref<4x256xf32>, memref<4x256xf32>) {
^bb0(%arg3 : f32, %arg4 : f32, %arg5: f32, %arg6: f32, %arg7: f32, %arg8: f32,		^bb0(%arg3 : f32, %arg4 : f32, %arg5: f32, %arg6: f32, %arg7: f32, %arg8: f32,
// CHECK: %[[V2:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : memref<4x256xf32>, vector<4x256xf32>		// CHECK: %[[V2:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : memref<4x256xf32>, vector<4x256xf32>
// CHECK: %[[V0:.]] = vector.transfer_read %[[ARG2]][%[[C0]]], {{.}} : memref<256xf32>, vector<256xf32>		// CHECK: %[[V0:.]] = vector.transfer_read %[[ARG2]][%[[C0]]], {{.}} : memref<256xf32>, vector<4x256xf32>
// CHECK: %[[V3:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : memref<4x256xf32>, vector<4x256xf32>		// CHECK: %[[V3:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : memref<4x256xf32>, vector<4x256xf32>
// CHECK: %[[V1:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : memref<4x256xf32>, vector<4x256xf32>		// CHECK: %[[V1:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : memref<4x256xf32>, vector<4x256xf32>
%arg9 : f32, %arg10 : f32, %arg11 : f32, %arg12 : f32, %arg13 : f32,		%arg9 : f32, %arg10 : f32, %arg11 : f32, %arg12 : f32, %arg13 : f32,
%arg14 : f32):		%arg14 : f32):
// CHECK: %[[V0B:.*]] = vector.broadcast %[[V0]] : vector<256xf32> to vector<4x256xf32>		// CHECK: %[[ADD:.*]] = addf %[[V0]], %[[V1]] : vector<4x256xf32>
// CHECK: %[[ADD:.*]] = addf %[[V0B]], %[[V1]] : vector<4x256xf32>
%6 = addf %arg4, %arg6 : f32		%6 = addf %arg4, %arg6 : f32
// CHECK: %[[CMP:.*]] = cmpf ogt, %[[V2]], %[[V1]] : vector<4x256xf32>		// CHECK: %[[CMP:.*]] = cmpf ogt, %[[V2]], %[[V1]] : vector<4x256xf32>
%7 = cmpf ogt, %arg3, %arg6 : f32		%7 = cmpf ogt, %arg3, %arg6 : f32
// CHECK: %[[ARG3B:.*]] = vector.broadcast %[[ARG3]] : f32 to vector<4x256xf32>		// CHECK: %[[ARG3B:.*]] = vector.broadcast %[[ARG3]] : f32 to vector<4x256xf32>
%8 = constant 2.0 : f32		%8 = constant 2.0 : f32
// CHECK: %[[DIV:.*]] = divf %[[V3]], %[[ARG3B]] : vector<4x256xf32>		// CHECK: %[[DIV:.*]] = divf %[[V3]], %[[ARG3B]] : vector<4x256xf32>
%9 = divf %arg5, %i : f32		%9 = divf %arg5, %i : f32
// CHECK: %[[EXP:.*]] = math.exp2 %[[V3]] : vector<4x256xf32>		// CHECK: %[[EXP:.*]] = math.exp2 %[[V3]] : vector<4x256xf32>
%10 = math.exp2 %arg5 : f32		%10 = math.exp2 %arg5 : f32
// CHECK: %[[MUL:.*]] = mulf %[[V3]], %[[CST0]] : vector<4x256xf32>		// CHECK: %[[MUL:.*]] = mulf %[[V3]], %[[CST0]] : vector<4x256xf32>
%11 = mulf %arg5, %8 : f32		%11 = mulf %arg5, %8 : f32
// CHECK: %[[RSQRT:.*]] = math.rsqrt %[[V3]] : vector<4x256xf32>		// CHECK: %[[RSQRT:.*]] = math.rsqrt %[[V3]] : vector<4x256xf32>
%12 = math.rsqrt %arg5 : f32		%12 = math.rsqrt %arg5 : f32
// CHECK: %[[SEL:.*]] = select %[[CMP]], %[[V3]], %[[V1]] : vector<4x256xi1>, vector<4x256xf32>		// CHECK: %[[SEL:.*]] = select %[[CMP]], %[[V3]], %[[V1]] : vector<4x256xi1>, vector<4x256xf32>
%13 = select %7, %arg5, %arg6 : f32		%13 = select %7, %arg5, %arg6 : f32
// CHECK: %[[V0B:.*]] = vector.broadcast %[[V0]] : vector<256xf32> to vector<4x256xf32>		// CHECK: %[[SUB:.*]] = subf %[[V3]], %[[V0]] : vector<4x256xf32>
// CHECK: %[[SUB:.*]] = subf %[[V3]], %[[V0B]] : vector<4x256xf32>
%14 = subf %arg5, %arg4 : f32		%14 = subf %arg5, %arg4 : f32
// CHECK: %[[TAN:.*]] = math.tanh %[[V3]] : vector<4x256xf32>		// CHECK: %[[TAN:.*]] = math.tanh %[[V3]] : vector<4x256xf32>
%15 = math.tanh %arg5 : f32		%15 = math.tanh %arg5 : f32
// CHECK: vector.transfer_write %[[ADD]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>		// CHECK: vector.transfer_write %[[ADD]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>
// CHECK: vector.transfer_write %[[CST0]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>		// CHECK: vector.transfer_write %[[CST0]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>
// CHECK: vector.transfer_write %[[CST1]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>		// CHECK: vector.transfer_write %[[CST1]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>
// CHECK: vector.transfer_write %[[DIV]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>		// CHECK: vector.transfer_write %[[DIV]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>
// CHECK: vector.transfer_write %[[EXP]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>		// CHECK: vector.transfer_write %[[EXP]], %[[ARG0]][%[[C0]], %[[C0]]] {{.*}} : vector<4x256xf32>, memref<4x256xf32>
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	outs(
tensor<4x256xf32>, tensor<4x256xf32>) {		tensor<4x256xf32>, tensor<4x256xf32>) {
^bb0(%arg3 : f32, %arg4 : f32, %arg5: f32, %arg6: f32, %arg7: f32, %arg8: f32,		^bb0(%arg3 : f32, %arg4 : f32, %arg5: f32, %arg6: f32, %arg7: f32, %arg8: f32,
%arg9 : f32, %arg10 : f32, %arg11 : f32, %arg12 : f32, %arg13 : f32,		%arg9 : f32, %arg10 : f32, %arg11 : f32, %arg12 : f32, %arg13 : f32,
%arg14 : f32):		%arg14 : f32):
// CHECK-DAG: %[[CST0:.*]] = constant dense<2.000000e+00> : vector<4x256xf32>		// CHECK-DAG: %[[CST0:.*]] = constant dense<2.000000e+00> : vector<4x256xf32>
// CHECK-DAG: %[[CST1:.*]] = constant dense<1.000000e+00> : vector<4x256xf32>		// CHECK-DAG: %[[CST1:.*]] = constant dense<1.000000e+00> : vector<4x256xf32>
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
// CHECK: %[[V2:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : tensor<4x256xf32>, vector<4x256xf32>		// CHECK: %[[V2:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : tensor<4x256xf32>, vector<4x256xf32>
// CHECK: %[[V0:.]] = vector.transfer_read %[[ARG2]][%[[C0]]], {{.}} : tensor<256xf32>, vector<256xf32>		// CHECK: %[[V0:.]] = vector.transfer_read %[[ARG2]][%[[C0]]], {{.}} : tensor<256xf32>, vector<4x256xf32>
// CHECK: %[[V3:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : tensor<4x256xf32>, vector<4x256xf32>		// CHECK: %[[V3:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : tensor<4x256xf32>, vector<4x256xf32>
// CHECK: %[[V1:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : tensor<4x256xf32>, vector<4x256xf32>		// CHECK: %[[V1:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : tensor<4x256xf32>, vector<4x256xf32>
// CHECK: %[[V0B:.*]] = vector.broadcast %[[V0]] : vector<256xf32> to vector<4x256xf32>		// CHECK: %[[ADD:.*]] = addf %[[V0]], %[[V1]] : vector<4x256xf32>
// CHECK: %[[ADD:.*]] = addf %[[V0B]], %[[V1]] : vector<4x256xf32>
%6 = addf %arg4, %arg6 : f32		%6 = addf %arg4, %arg6 : f32
// CHECK: %[[CMP:.*]] = cmpf ogt, %[[V2]], %[[V1]] : vector<4x256xf32>		// CHECK: %[[CMP:.*]] = cmpf ogt, %[[V2]], %[[V1]] : vector<4x256xf32>
%7 = cmpf ogt, %arg3, %arg6 : f32		%7 = cmpf ogt, %arg3, %arg6 : f32
// CHECK: %[[ARG3B:.*]] = vector.broadcast %[[ARG3]] : f32 to vector<4x256xf32>		// CHECK: %[[ARG3B:.*]] = vector.broadcast %[[ARG3]] : f32 to vector<4x256xf32>
%8 = constant 2.0 : f32		%8 = constant 2.0 : f32
// CHECK: %[[DIV:.*]] = divf %[[V3]], %[[ARG3B]] : vector<4x256xf32>		// CHECK: %[[DIV:.*]] = divf %[[V3]], %[[ARG3B]] : vector<4x256xf32>
%9 = divf %arg5, %i : f32		%9 = divf %arg5, %i : f32
// CHECK: %[[EXP:.*]] = math.exp2 %[[V3]] : vector<4x256xf32>		// CHECK: %[[EXP:.*]] = math.exp2 %[[V3]] : vector<4x256xf32>
%10 = math.exp2 %arg5 : f32		%10 = math.exp2 %arg5 : f32
// CHECK: %[[MUL:.*]] = mulf %[[V3]], %[[CST0]] : vector<4x256xf32>		// CHECK: %[[MUL:.*]] = mulf %[[V3]], %[[CST0]] : vector<4x256xf32>
%11 = mulf %arg5, %8 : f32		%11 = mulf %arg5, %8 : f32
// CHECK: %[[RSQRT:.*]] = math.rsqrt %[[V3]] : vector<4x256xf32>		// CHECK: %[[RSQRT:.*]] = math.rsqrt %[[V3]] : vector<4x256xf32>
%12 = math.rsqrt %arg5 : f32		%12 = math.rsqrt %arg5 : f32
// CHECK: %[[SEL:.*]] = select %[[CMP]], %[[V3]], %[[V1]] : vector<4x256xi1>, vector<4x256xf32>		// CHECK: %[[SEL:.*]] = select %[[CMP]], %[[V3]], %[[V1]] : vector<4x256xi1>, vector<4x256xf32>
%13 = select %7, %arg5, %arg6 : f32		%13 = select %7, %arg5, %arg6 : f32
// CHECK: %[[V0B:.*]] = vector.broadcast %[[V0]] : vector<256xf32> to vector<4x256xf32>		// CHECK: %[[SUB:.*]] = subf %[[V3]], %[[V0]] : vector<4x256xf32>
// CHECK: %[[SUB:.*]] = subf %[[V3]], %[[V0B]] : vector<4x256xf32>
%14 = subf %arg5, %arg4 : f32		%14 = subf %arg5, %arg4 : f32
// CHECK: %[[TAN:.*]] = math.tanh %[[V3]] : vector<4x256xf32>		// CHECK: %[[TAN:.*]] = math.tanh %[[V3]] : vector<4x256xf32>
%15 = math.tanh %arg5 : f32		%15 = math.tanh %arg5 : f32
// CHECK: %[[R0:.]] = vector.transfer_write %[[ADD]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>		// CHECK: %[[R0:.]] = vector.transfer_write %[[ADD]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>
// CHECK: %[[R1:.]] = vector.transfer_write %[[CST0]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>		// CHECK: %[[R1:.]] = vector.transfer_write %[[CST0]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>
// CHECK: %[[R2:.]] = vector.transfer_write %[[CST1]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>		// CHECK: %[[R2:.]] = vector.transfer_write %[[CST1]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>
// CHECK: %[[R3:.]] = vector.transfer_write %[[DIV]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>		// CHECK: %[[R3:.]] = vector.transfer_write %[[DIV]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>
// CHECK: %[[R4:.]] = vector.transfer_write %[[EXP]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>		// CHECK: %[[R4:.]] = vector.transfer_write %[[EXP]], %[[ARG0]][%[[C0]], %[[C0]]] {{.}} : vector<4x256xf32>, tensor<4x256xf32>
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
// CHECK-SAME: (%[[ARG0:.]]: tensor<8x4xf32>, %[[ARG1:.]]: tensor<4x12xf32>,		// CHECK-SAME: (%[[ARG0:.]]: tensor<8x4xf32>, %[[ARG1:.]]: tensor<4x12xf32>,
// CHECK-SAME: %[[ARG2:.*]]: tensor<8x12xf32>) -> tensor<8x12xf32>		// CHECK-SAME: %[[ARG2:.*]]: tensor<8x12xf32>) -> tensor<8x12xf32>
func @matmul_tensors(		func @matmul_tensors(
%arg0: tensor<8x4xf32>, %arg1: tensor<4x12xf32>, %arg2: tensor<8x12xf32>)		%arg0: tensor<8x4xf32>, %arg1: tensor<4x12xf32>, %arg2: tensor<8x12xf32>)
-> tensor<8x12xf32> {		-> tensor<8x12xf32> {
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
// CHECK-DAG: %[[VEC_C0:.*]] = constant dense<0.000000e+00> : vector<8x12xf32>		// CHECK-DAG: %[[VEC_C0:.*]] = constant dense<0.000000e+00> : vector<8x12xf32>
// CHECK-DAG: %[[V0:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : tensor<8x4xf32>, vector<8x4xf32>		// CHECK-DAG: %[[V0:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : tensor<8x4xf32>, vector<8x4xf32>
// CHECK-DAG: %[[V1:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : tensor<4x12xf32>, vector<4x12xf32>		// CHECK-DAG: %[[V1:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : tensor<4x12xf32>, vector<12x4xf32>
// CHECK-DAG: %[[V2:.]] = vector.transfer_read %[[ARG2]][%[[C0]], %[[C0]]], {{.}} : tensor<8x12xf32>, vector<8x12xf32>		// CHECK-DAG: %[[V2:.]] = vector.transfer_read %[[ARG2]][%[[C0]], %[[C0]]], {{.}} : tensor<8x12xf32>, vector<8x12xf32>
//		//
// linalg contraction lowers to %tmp = vector.contract %a, %b, %c0 followed by addf %c, %tmp.		// linalg contraction lowers to %tmp = vector.contract %a, %b, %c0 followed by addf %c, %tmp.
// a later canonicalization fuses the add into vector.contract.		// a later canonicalization fuses the add into vector.contract.
// CHECK: %[[C:.]] = vector.contract {{.}} iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %[[V0]], %[[V1]], %[[VEC_C0]] : vector<8x4xf32>, vector<4x12xf32> into vector<8x12xf32>		// CHECK: %[[C:.*]] = vector.contract
		// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>}
		// CHECK-SAME: %[[V0]], %[[V1]], %[[VEC_C0]] :
		// CHECK-SAME: vector<8x4xf32>, vector<12x4xf32> into vector<8x12xf32>
// CHECK: %[[C2:.*]] = addf %[[V2]], %[[C]] : vector<8x12xf32>		// CHECK: %[[C2:.*]] = addf %[[V2]], %[[C]] : vector<8x12xf32>
// CHECK: %[[W:.*]] = vector.transfer_write %[[C2]], %[[ARG2]][%[[C0]], %[[C0]]] {in_bounds = [true, true]} : vector<8x12xf32>, tensor<8x12xf32>		// CHECK: %[[W:.*]] = vector.transfer_write %[[C2]], %[[ARG2]][%[[C0]], %[[C0]]] {in_bounds = [true, true]} : vector<8x12xf32>, tensor<8x12xf32>
%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x4xf32>, tensor<4x12xf32>)		%0 = linalg.matmul ins(%arg0, %arg1: tensor<8x4xf32>, tensor<4x12xf32>)
outs(%arg2: tensor<8x12xf32>)		outs(%arg2: tensor<8x12xf32>)
-> tensor<8x12xf32>		-> tensor<8x12xf32>
// CHECK: return %[[W]] : tensor<8x12xf32>		// CHECK: return %[[W]] : tensor<8x12xf32>
return %0 : tensor<8x12xf32>		return %0 : tensor<8x12xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @matmul_i8_i8_i32		// CHECK-LABEL: func @matmul_i8_i8_i32
// CHECK-SAME: %[[ARG0:[a-z0-9]+]]: memref<4x6xi8>		// CHECK-SAME: %[[ARG0:[a-z0-9]+]]: memref<4x6xi8>
// CHECK-SAME: %[[ARG1:[a-z0-9]+]]: memref<6x12xi8>		// CHECK-SAME: %[[ARG1:[a-z0-9]+]]: memref<6x12xi8>
// CHECK-SAME: %[[ARG2:[a-z0-9]+]]: memref<4x12xi32>		// CHECK-SAME: %[[ARG2:[a-z0-9]+]]: memref<4x12xi32>
func @matmul_i8_i8_i32(%a: memref<4x6xi8>, %b: memref<6x12xi8>, %c: memref<4x12xi32>) {		func @matmul_i8_i8_i32(%a: memref<4x6xi8>, %b: memref<6x12xi8>, %c: memref<4x12xi32>) {
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
// CHECK-DAG: %[[VEC_C0:.*]] = constant dense<0> : vector<4x12xi32>		// CHECK-DAG: %[[VEC_C0:.*]] = constant dense<0> : vector<4x12xi32>
// CHECK-DAG: %[[V0:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : memref<4x6xi8>, vector<4x6xi8>		// CHECK-DAG: %[[V0:.]] = vector.transfer_read %[[ARG0]][%[[C0]], %[[C0]]], {{.}} : memref<4x6xi8>, vector<4x6xi8>
// CHECK-DAG: %[[V1:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : memref<6x12xi8>, vector<6x12xi8>		// CHECK-DAG: %[[V1:.]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], {{.}} : memref<6x12xi8>, vector<12x6xi8>
// CHECK-DAG: %[[V2:.]] = vector.transfer_read %[[ARG2]][%[[C0]], %[[C0]]], {{.}} : memref<4x12xi32>, vector<4x12xi32>		// CHECK-DAG: %[[V2:.]] = vector.transfer_read %[[ARG2]][%[[C0]], %[[C0]]], {{.}} : memref<4x12xi32>, vector<4x12xi32>
// CHECK-DAG: %[[V0_32:.*]] = sexti %[[V0]] : vector<4x6xi8> to vector<4x6xi32>		// CHECK-DAG: %[[V0_32:.*]] = sexti %[[V0]] : vector<4x6xi8> to vector<4x6xi32>
// CHECK-DAG: %[[V1_32:.*]] = sexti %[[V1]] : vector<6x12xi8> to vector<6x12xi32>		// CHECK-DAG: %[[V1_32:.*]] = sexti %[[V1]] : vector<12x6xi8> to vector<12x6xi32>
//		//
// linalg contraction lowers to %tmp = vector.contract %a, %b, %c0 followed by addf %c, %tmp.		// linalg contraction lowers to %tmp = vector.contract %a, %b, %c0 followed by addf %c, %tmp.
// a later canonicalization fuses the add into vector.contract.		// a later canonicalization fuses the add into vector.contract.
// CHECK: %[[C:.]] = vector.contract {{.}} iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %[[V0_32]], %[[V1_32]], %[[VEC_C0]]		// CHECK: %[[C:.*]] = vector.contract
// CHECK-SAME: vector<4x6xi32>, vector<6x12xi32> into vector<4x12xi32>		// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>}
		// CHECK-SAME: %[[V0_32]], %[[V1_32]], %[[VEC_C0]]
		// CHECK-SAME: vector<4x6xi32>, vector<12x6xi32> into vector<4x12xi32>
// CHECK: %[[RES:.*]] = addi %[[V2]], %[[C]] : vector<4x12xi32>		// CHECK: %[[RES:.*]] = addi %[[V2]], %[[C]] : vector<4x12xi32>
// CHECK: vector.transfer_write %[[RES]], %[[ARG2]][%[[C0]], %[[C0]]] {in_bounds = [true, true]}		// CHECK: vector.transfer_write %[[RES]], %[[ARG2]][%[[C0]], %[[C0]]] {in_bounds = [true, true]}
// CHECK-SAME: vector<4x12xi32>, memref<4x12xi32>		// CHECK-SAME: vector<4x12xi32>, memref<4x12xi32>
linalg.matmul_i8_i8_i32 ins(%a, %b : memref<4x6xi8>, memref<6x12xi8>)		linalg.matmul_i8_i8_i32 ins(%a, %b : memref<4x6xi8>, memref<6x12xi8>)
outs(%c: memref<4x12xi32>)		outs(%c: memref<4x12xi32>)
return		return
}		}

Show All 13 Lines	%0 = linalg.pad_tensor %arg0 low[0, %c0, 0] high[0, 0, %c0] {
^bb0(%arg1: index, %arg2: index, %arg3: index):		^bb0(%arg1: index, %arg2: index, %arg3: index):
linalg.yield %pad_value : f32		linalg.yield %pad_value : f32
} : tensor<?x?x?xf32> to tensor<2x3x4xf32>		} : tensor<?x?x?xf32> to tensor<2x3x4xf32>

// CHECK: return %[[WRITTEN]] : tensor<2x3x4xf32>		// CHECK: return %[[WRITTEN]] : tensor<2x3x4xf32>
return %0 : tensor<2x3x4xf32>		return %0 : tensor<2x3x4xf32>
}		}

		// -----

// CHECK-LABEL: func @pad_static_high_padding		// CHECK-LABEL: func @pad_static_high_padding
// CHECK: linalg.pad_tensor		// CHECK: linalg.pad_tensor
func @pad_static_high_padding(%arg0: tensor<?x?x?xf32>, %pad_value: f32) -> tensor<2x3x4xf32> {		func @pad_static_high_padding(%arg0: tensor<?x?x?xf32>, %pad_value: f32) -> tensor<2x3x4xf32> {
%0 = linalg.pad_tensor %arg0 low[0, 0, 0] high[0, 1, 0] {		%0 = linalg.pad_tensor %arg0 low[0, 0, 0] high[0, 1, 0] {
^bb0(%arg1: index, %arg2: index, %arg3: index):		^bb0(%arg1: index, %arg2: index, %arg3: index):
linalg.yield %pad_value : f32		linalg.yield %pad_value : f32
} : tensor<?x?x?xf32> to tensor<2x3x4xf32>		} : tensor<?x?x?xf32> to tensor<2x3x4xf32>
return %0 : tensor<2x3x4xf32>		return %0 : tensor<2x3x4xf32>
}		}

		// -----

// CHECK-LABEL: func @pad_dynamic		// CHECK-LABEL: func @pad_dynamic
// CHECK: linalg.pad_tensor		// CHECK: linalg.pad_tensor
func @pad_dynamic(%arg0: tensor<1x2x2x?xf32>, %low: index, %high: index,		func @pad_dynamic(%arg0: tensor<1x2x2x?xf32>, %low: index, %high: index,
%pad_value: f32) -> tensor<6x?x?x?xf32> {		%pad_value: f32) -> tensor<6x?x?x?xf32> {
%0 = linalg.pad_tensor %arg0 low[2, %low, 3, 3] high[3, 3, %high, 2] {		%0 = linalg.pad_tensor %arg0 low[2, %low, 3, 3] high[3, 3, %high, 2] {
^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):		^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index):
linalg.yield %pad_value : f32		linalg.yield %pad_value : f32
} : tensor<1x2x2x?xf32> to tensor<6x?x?x?xf32>		} : tensor<1x2x2x?xf32> to tensor<6x?x?x?xf32>
return %0 : tensor<6x?x?x?xf32>		return %0 : tensor<6x?x?x?xf32>
}		}

		// -----

		// CHECK-DAG: #[[$M0:.*]] = affine_map<(d0, d1) -> (d0, d1, 0)>

		// CHECK-LABEL: func @sum_exp
		func @sum_exp(%input: tensor<4x16x8xf32>, %output: tensor<4x16xf32>)
		-> tensor<4x16xf32>
		{
		// CHECK: vector.transfer_read {{.*}} : tensor<4x16x8xf32>, vector<4x16x8xf32>
		// CHECK: vector.transfer_read {{.*}} {permutation_map = #[[$M0]]} : tensor<4x16xf32>, vector<4x16x8xf32>
		// CHECK: math.exp {{.*}} : vector<4x16x8xf32>
		// CHECK: addf {{.*}} : vector<4x16x8xf32>
		// CHECK: vector.multi_reduction #vector.kind<add>, %{{.*}} [2] : vector<4x16x8xf32> to vector<4x16xf32>
		// CHECK: vector.transfer_write {{.*}} : vector<4x16xf32>, tensor<4x16xf32>
		// CHECK: return {{.*}} : tensor<4x16xf32>
		%0 = linalg.generic {
		indexing_maps = [
		affine_map<(d0, d1, d2) -> (d0, d1, d2)>,
		affine_map<(d0, d1, d2) -> (d0, d1)>
		],
		iterator_types = ["parallel", "parallel", "reduction"]
		} ins(%input : tensor<4x16x8xf32>) outs(%output : tensor<4x16xf32>) {
		^bb0(%arg0: f32, %arg1: f32): // no predecessors
		%1 = math.exp %arg0 : f32
		%2 = addf %1, %arg1 : f32
		linalg.yield %2 : f32
		} -> tensor<4x16xf32>
		return %0 : tensor<4x16xf32>
		}

		// -----

		// CHECK-DAG: #[[$M1:.*]] = affine_map<(d0, d1) -> (d1, d0, 0, 0)>
		// CHECK-DAG: #[[$M2:.*]] = affine_map<(d0, d1) -> (0, 0, d1, d0)>
		// CHECK-DAG: #[[$M3:.*]] = affine_map<(d0, d1) -> (d1, 0, 0, d0)>
		// CHECK-DAG: #[[$M4:.*]] = affine_map<(d0, d1) -> (d1, d0)>

		// CHECK-LABEL: func @sum_exp_2
		func @sum_exp_2(%input: tensor<3x2xf32>, %input_2: tensor<5x4xf32>, %output: tensor<5x2xf32>)
		-> tensor<5x2xf32>
		{
		// CHECK: vector.transfer_read {{.*}} {permutation_map = #[[$M1]]} : tensor<3x2xf32>, vector<2x3x4x5xf32>
		// CHECK: vector.transfer_read {{.*}} {permutation_map = #[[$M2]]} : tensor<5x4xf32>, vector<2x3x4x5xf32>
		// CHECK: vector.transfer_read {{.*}} {permutation_map = #[[$M3]]} : tensor<5x2xf32>, vector<2x3x4x5xf32>
		// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>
		// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>
		// CHECK: addf {{.*}} : vector<2x3x4x5xf32>
		// CHECK: addf {{.*}} : vector<2x3x4x5xf32>
		// CHECK: vector.multi_reduction #vector.kind<add>, {{.*}} [1, 2] : vector<2x3x4x5xf32> to vector<2x5xf32>
		// CHECK: vector.transfer_write {{.*}} {permutation_map = #[[$M4]]} : vector<2x5xf32>, tensor<5x2xf32>
		// CHECK: return {{.*}} : tensor<5x2xf32>
		%0 = linalg.generic {
		indexing_maps = [
		affine_map<(d0, d1, d2, d3) -> (d1, d0)>,
		affine_map<(d0, d1, d2, d3) -> (d3, d2)>,
		affine_map<(d0, d1, d2, d3) -> (d3, d0)>
		],
		iterator_types = ["parallel", "reduction", "reduction", "parallel"]
		} ins(%input, %input_2 : tensor<3x2xf32>, tensor<5x4xf32>) outs(%output : tensor<5x2xf32>) {
		^bb0(%arg0: f32, %arg1: f32, %arg2: f32): // no predecessors
		%1 = math.exp %arg0 : f32
		%2 = math.exp %arg1 : f32
		%3 = addf %1, %2 : f32
		%4 = addf %3, %arg2 : f32
		linalg.yield %4 : f32
		} -> tensor<5x2xf32>
		return %0 : tensor<5x2xf32>
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Generalize linalg vectorizationClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 340026

mlir/include/mlir/Analysis/SliceAnalysis.h

mlir/include/mlir/Dialect/Vector/VectorOps.td

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/lib/Dialect/Vector/VectorOps.cpp

mlir/lib/Dialect/Vector/VectorTransforms.cpp

mlir/test/Dialect/Linalg/transform-patterns-matmul-to-vector.mlir

mlir/test/Dialect/Linalg/vectorization.mlir

[mlir][Linalg] Generalize linalg vectorization
ClosedPublic