This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Vector/
-
mlir/
-
Dialect/
-
Vector/
-
VectorOps.h
1/1
VectorOps.td
-
lib/Dialect/Vector/
-
Dialect/
-
Vector/
-
CMakeLists.txt
23/23
VectorMultiDimReductionTransforms.cpp
-
VectorOps.cpp
-
VectorTransforms.cpp
-
test/Dialect/Vector/
-
Dialect/
-
Vector/
-
canonicalize.mlir
-
ops.mlir
-
vector-multi-reduction-lowering.mlir

Differential D111442

[mlir][Vector] Let vector.multi_reduction reduce down to a scalar.
ClosedPublic

Authored by nicolasvasilache on Oct 8 2021, 10:27 AM.

Download Raw Diff

Details

Reviewers

pifon2a
ftynse
aartbik
dcaballe
ThomasRaoux

Commits

rG31270eb16501: [mlir][Vector] Let vector.multi_reduction reduce down to a scalar.

Summary

vector.multi_reduction currently does not allow reducing down to a scalar.
This creates corner cases that are hard to handle during vectorization.
This revision extends the semantics and adds the proper transforms, lowerings and canonicalizations to allow lowering out of vector.multi_reduction to other abstractions all the way to LLVM.

In a future, where we will also allow 0-d vectors, scalars will still be relevant: 0-d vector and scalars are not equivalent on all hardware.

In the process, splice out the implementation patterns related to vector.multi_reduce into a new file.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nicolasvasilache created this revision.Oct 8 2021, 10:27 AM

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 19 others. · View Herald TranscriptOct 8 2021, 10:27 AM

nicolasvasilache requested review of this revision.Oct 8 2021, 10:27 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2021, 10:27 AM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B127822: Diff 378294.Oct 8 2021, 10:40 AM

springerm added a subscriber: springerm.Oct 11 2021, 12:59 AM

springerm added inline comments.

mlir/lib/Dialect/Vector/VectorMultiDimReductionTransforms.cpp
29–50	This will probably get out-of-sync with the rest of the file pretty fast... Any reason to put this here instead of class/struct comment?
34	comma
38–39	line break not necessary
55	transpose
145	I don't quite understand the notion of "parallel". Does it just mean "don't reduce but concatenate"?
146	ArrayRef<bool>
187–189	Shouldn't this be checking for some kind of equality of "outer" dims?
225	nd -> 2d?

Just a general comment. This is quite interesting that this kind of transformation/canonicalization is happening now on different levels of the lowering stack. There are transforms on HLO level, that convert reductions to row/column reductions or to a 1D reduction. We can do the same transformation in Linalg, which would not prevent tile-n-fuse of input producers happening. Or we might need both.

mlir/lib/Dialect/Vector/VectorMultiDimReductionTransforms.cpp
2	nit: Multi-reduction
10	I am not sure what "target-independent rewrites as 1->N patterns" means.
25	nit: maybe just "vector-multi-reduction" or "vector-reduction"?
123	nit: multi

This revision is now accepted and ready to land.Oct 11 2021, 1:24 AM

Thanks, Nicolas! LGTM. I added a bunch of nits below.
On a personal note, I would like to better understand when the reduction transformation that involves transposition is worth it in terms of performance. Or is the goal of that transformation to fill the implementation gap in the runtime for the time being?
Anyways, probably a discussion for some other day :)

mlir/include/mlir/Dialect/Vector/VectorOps.td
303–304	We should extend the `into an (n-k)-D vector` to add the scalar case
mlir/lib/Dialect/Vector/VectorMultiDimReductionTransforms.cpp
10	rewrites -> rewrites of MultiDimReduction op?
29	If this is file summary, it should go to the file section (line 9)
29–50	+1. I would just add a brief summary to the file section and move the details to the file section.
35	nit: nd reads a bit weird... maybe nd -> n-D? Same for 2d? There is also a 1-d below.
55	`//` -> `///` here and in all the classes/methods below.
81	nit: pre-increment per coding standards.
154	nit: spell out this `auto` and some others above and below (integers, Value, Location, etc.) would help readability a lot.

nicolasvasilache added a reviewer: ThomasRaoux.Oct 11 2021, 3:01 PM

Address review.

Formatting.

mlir/lib/Dialect/Vector/VectorMultiDimReductionTransforms.cpp
29	moved to .h
145	In practice it is "not-reduce", other places in the file used "parallel", likely as analogy with Linalg. If we feel this is confusing and we want to improve this, we should do a global followup cleanup.
146	nope, that would create memory errors, we need ownership. SmallVector it is
154	ints I generally try to avoid so that I don't inadvertently introduce casts, updating the rest.

This revision was landed with ongoing or failed builds.Oct 12 2021, 4:04 AM

Closed by commit rG31270eb16501: [mlir][Vector] Let vector.multi_reduction reduce down to a scalar. (authored by nicolasvasilache). · Explain Why

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rG31270eb16501: [mlir][Vector] Let vector.multi_reduction reduce down to a scalar..

Harbormaster completed remote builds in B128319: Diff 378959.Oct 12 2021, 4:26 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

VectorOps.h

24 lines

VectorOps.td

35 lines

lib/

Dialect/

Vector/

CMakeLists.txt

1 line

VectorMultiDimReductionTransforms.cpp

409 lines

VectorOps.cpp

29 lines

VectorTransforms.cpp

327 lines

test/

Dialect/

Vector/

canonicalize.mlir

11 lines

ops.mlir

8 lines

vector-multi-reduction-lowering.mlir

16 lines

Diff 378963

mlir/include/mlir/Dialect/Vector/VectorOps.h

	Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	/// broadcasts and transposes.			/// broadcasts and transposes.
	void populateVectorTransferPermutationMapLoweringPatterns(			void populateVectorTransferPermutationMapLoweringPatterns(
	RewritePatternSet &patterns);			RewritePatternSet &patterns);

	/// These patterns materialize masks for various vector ops such as transfers.			/// These patterns materialize masks for various vector ops such as transfers.
	void populateVectorMaskMaterializationPatterns(RewritePatternSet &patterns,			void populateVectorMaskMaterializationPatterns(RewritePatternSet &patterns,
	bool enableIndexOptimizations);			bool enableIndexOptimizations);

	// Collect a set of patterns to convert vector.multi_reduction op into			/// Collect a set of patterns to convert vector.multi_reduction op into
	// a sequence of vector.reduction ops.			/// a sequence of vector.reduction ops. The patterns comprise:
				/// - InnerOuterDimReductionConversion: rewrites vector.multi_reduction such
				/// that all reduction dimensions are either innermost or outermost, by adding
				/// the proper vector.transpose operations.
				/// - ReduceMultiDimReductionRank: once in innermost or outermost reduction
				/// form, rewrites n-D vector.multi_reduction into 2-D vector.multi_reduction,
				/// by introducing vector.shape_cast ops to collapse + multi-reduce + expand
				/// back.
				/// - TwoDimMultiReductionToElementWise: once in 2-D vector.multi_reduction
				/// form, with an outermost reduction dimension, unroll the outer dimension
				/// to obtain a sequence of 1-D vector ops. This also has an opportunity for
				/// tree-reduction (in the future).
				/// - TwoDimMultiReductionToReduction: once in 2-D vector.multi_reduction form,
				/// with an innermost reduction dimension, unroll the outer dimension to
				/// obtain a sequence of extract + vector.reduction + insert. This can further
				/// lower to horizontal reduction ops.
				/// - OneDimMultiReductionToTwoDim: for cases that reduce to 1-D vector<k>
				/// reduction (and are thus missing either a parallel or a reduction), we lift
				/// them back up to 2-D with a simple vector.shape_cast to vector<1xk> so that
				/// the other patterns can kick in, thus fully exiting out of the
				/// vector.multi_reduction abstraction.
	void populateVectorMultiReductionLoweringPatterns(			void populateVectorMultiReductionLoweringPatterns(
	RewritePatternSet &patterns, bool useInnerDimsForReduction = false);			RewritePatternSet &patterns, bool useInnerDimsForReduction = false);

	/// Collect a set of patterns to propagate insert_map/extract_map in the ssa			/// Collect a set of patterns to propagate insert_map/extract_map in the ssa
	/// chain.			/// chain.
	void populatePropagateVectorDistributionPatterns(RewritePatternSet &patterns);			void populatePropagateVectorDistributionPatterns(RewritePatternSet &patterns);

	/// An attribute that specifies the combining function for `vector.contract`,			/// An attribute that specifies the combining function for `vector.contract`,
	▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Vector/VectorOps.td

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	def Vector_MultiDimReductionOp :
Vector_Op<"multi_reduction", [NoSideEffect,		Vector_Op<"multi_reduction", [NoSideEffect,
PredOpTrait<"source operand and result have same element type",		PredOpTrait<"source operand and result have same element type",
TCresVTEtIsSameAsOpBase<0, 0>>]>,		TCresVTEtIsSameAsOpBase<0, 0>>]>,
Arguments<(ins Vector_CombiningKindAttr:$kind,		Arguments<(ins Vector_CombiningKindAttr:$kind,
AnyVector:$source,		AnyVector:$source,
I64ArrayAttr:$reduction_dims)>,		I64ArrayAttr:$reduction_dims)>,
Results<(outs AnyType:$dest)> {		Results<(outs AnyType:$dest)> {
let summary = "Multi-dimensional reduction operation";		let summary = "Multi-dimensional reduction operation";
let description = [{		let description = [{
Reduces an n-D vector into an (n-k)-D vector using the given operation		Reduces an n-D vector into an (n-k)-D vector (or a scalar when k == n)
		dcaballeUnsubmitted Done Reply Inline Actions We should extend the `into an (n-k)-D vector` to add the scalar case dcaballe: We should extend the `into an (n-k)-D vector` to add the scalar case
(add/mul/min/max for int/fp and and/or/xor for int only).		using the given operation (add/mul/min/max for int/fp and and/or/xor for
		int only).

Example:		Example:

```mlir		```mlir
%1 = vector.multi_reduction "add", %0 [1, 3] :		%1 = vector.multi_reduction "add", %0 [1, 3] :
vector<4x8x16x32xf32> into vector<4x16xf32>		vector<4x8x16x32xf32> into vector<4x16xf32>
		%2 = vector.multi_reduction "add", %1 [0, 1] :
		vector<4x16xf32> into f32
```		```
}];		}];
let builders = [		let builders = [
OpBuilder<(ins "Value":$source, "ArrayRef<bool>":$reductionMask,		OpBuilder<(ins "Value":$source, "ArrayRef<bool>":$reductionMask,
"CombiningKind":$kind)>		"CombiningKind":$kind)>
];		];
let extraClassDeclaration = [{		let extraClassDeclaration = [{
static StringRef getKindAttrName() { return "kind"; }		static StringRef getKindAttrName() { return "kind"; }
static StringRef getReductionDimsAttrName() { return "reduction_dims"; }		static StringRef getReductionDimsAttrName() { return "reduction_dims"; }

VectorType getSourceVectorType() {		VectorType getSourceVectorType() {
return source().getType().cast<VectorType>();		return source().getType().cast<VectorType>();
}		}
VectorType getDestVectorType() {		Type getDestType() {
return dest().getType().cast<VectorType>();		return dest().getType();
		}

		bool isReducedDim(int64_t d) {
		assert(d >= 0 && d < static_cast<int64_t>(getReductionMask().size()) &&
		"d overflows the number of dims");
		return getReductionMask()[d];
}		}

SmallVector<bool> getReductionMask() {		SmallVector<bool> getReductionMask() {
SmallVector<bool> res(getSourceVectorType().getRank(), false);		SmallVector<bool> res(getSourceVectorType().getRank(), false);
for (auto ia : reduction_dims().getAsRange<IntegerAttr>())		for (auto ia : reduction_dims().getAsRange<IntegerAttr>())
res[ia.getInt()] = true;		res[ia.getInt()] = true;
return res;		return res;
}		}
static SmallVector<bool> getReductionMask(		static SmallVector<bool> getReductionMask(
ArrayRef<int64_t> reductionDims, unsigned sourceRank) {		ArrayRef<int64_t> reductionDims, unsigned sourceRank) {
SmallVector<bool> res(sourceRank, false);		SmallVector<bool> res(sourceRank, false);
for (auto idx : reductionDims)		for (auto idx : reductionDims)
res[idx] = true;		res[idx] = true;
return res;		return res;
}		}

static SmallVector<int64_t> inferDestShape(		static SmallVector<int64_t> inferDestShape(
ArrayRef<int64_t> shape, ArrayRef<bool> reducedDimsMask) {		ArrayRef<int64_t> sourceShape, ArrayRef<bool> reducedDimsMask) {
assert(shape.size() == reducedDimsMask.size() &&		assert(sourceShape.size() == reducedDimsMask.size() &&
"shape and maks of different sizes");		"sourceShape and maks of different sizes");
SmallVector<int64_t> res;		SmallVector<int64_t> res;
for (auto it : llvm::zip(reducedDimsMask, shape))		for (auto it : llvm::zip(reducedDimsMask, sourceShape))
if (!std::get<0>(it))		if (!std::get<0>(it))
res.push_back(std::get<1>(it));		res.push_back(std::get<1>(it));
return res;		return res;
}		}

		static Type inferDestType(
		ArrayRef<int64_t> sourceShape, ArrayRef<bool> reducedDimsMask, Type elementType) {
		auto targetShape = inferDestShape(sourceShape, reducedDimsMask);
		// TODO: update to also allow 0-d vectors when available.
		if (targetShape.empty())
		return elementType;
		return VectorType::get(targetShape, elementType);
		}
}];		}];
let assemblyFormat =		let assemblyFormat =
"$kind `,` $source attr-dict $reduction_dims `:` type($source) `to` type($dest)";		"$kind `,` $source attr-dict $reduction_dims `:` type($source) `to` type($dest)";
		let hasFolder = 1;
}		}

def Vector_BroadcastOp :		def Vector_BroadcastOp :
Vector_Op<"broadcast", [NoSideEffect,		Vector_Op<"broadcast", [NoSideEffect,
PredOpTrait<"source operand and result have same element type",		PredOpTrait<"source operand and result have same element type",
TCresVTEtIsSameAsOpBase<0, 0>>]>,		TCresVTEtIsSameAsOpBase<0, 0>>]>,
Arguments<(ins AnyType:$source)>,		Arguments<(ins AnyType:$source)>,
Results<(outs AnyVector:$vector)> {		Results<(outs AnyVector:$vector)> {
▲ Show 20 Lines • Show All 1,953 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/CMakeLists.txt

	add_mlir_dialect_library(MLIRVector			add_mlir_dialect_library(MLIRVector
	VectorOps.cpp			VectorOps.cpp
				VectorMultiDimReductionTransforms.cpp
	VectorTransferOpTransforms.cpp			VectorTransferOpTransforms.cpp
	VectorTransforms.cpp			VectorTransforms.cpp
	VectorUtils.cpp			VectorUtils.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Vector			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Vector

	DEPENDS			DEPENDS
	Show All 17 Lines

mlir/lib/Dialect/Vector/VectorMultiDimReductionTransforms.cpp

This file was added.

				//===- VectorMultiDimReductionTransforms.cpp - Multi-Reduction Transforms -===//
				//
				pifon2aUnsubmitted Done Reply Inline Actions nit: Multi-reduction pifon2a: nit: Multi-reduction
				/// Part of the LLVM Project, under the Apache License v2.0 with LLVM
				/// Exceptions. See https://llvm.org/LICENSE.txt for license information.
				/// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				/// This file implements target-independent rewrites of MultiDimReductionOp.
				//
				pifon2aUnsubmitted Done Reply Inline Actions I am not sure what "target-independent rewrites as 1->N patterns" means. pifon2a: I am not sure what "target-independent rewrites as 1->N patterns" means.
				dcaballeUnsubmitted Done Reply Inline Actions rewrites -> rewrites of MultiDimReduction op? dcaballe: rewrites -> rewrites of MultiDimReduction op?
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/Vector/VectorOps.h"
				#include "mlir/Dialect/Vector/VectorTransforms.h"
				#include "mlir/Dialect/Vector/VectorUtils.h"
				#include "mlir/IR/AffineExpr.h"
				#include "mlir/IR/AffineMap.h"
				#include "mlir/IR/Attributes.h"
				#include "mlir/IR/Builders.h"
				#include "mlir/IR/BuiltinOps.h"
				#include "mlir/IR/ImplicitLocOpBuilder.h"
				#include "mlir/IR/TypeUtilities.h"

				#define DEBUG_TYPE "vector-multi-reduction"

				pifon2aUnsubmitted Done Reply Inline Actions nit: maybe just "vector-multi-reduction" or "vector-reduction"? pifon2a: nit: maybe just "vector-multi-reduction" or "vector-reduction"?
				using namespace mlir;

				/// This file implements the following transformations as composable atomic
				/// patterns.
				dcaballeUnsubmitted Done Reply Inline Actions If this is file summary, it should go to the file section (line 9) dcaballe: If this is file summary, it should go to the file section (line 9)
				nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions moved to .h nicolasvasilache: moved to .h

				/// Converts vector.multi_reduction into inner-most/outer-most reduction form
				/// by using vector.transpose
				class InnerOuterDimReductionConversion
				: public OpRewritePattern<vector::MultiDimReductionOp> {
				springermUnsubmitted Done Reply Inline Actions comma springerm: comma
				public:
				dcaballeUnsubmitted Done Reply Inline Actions nit: nd reads a bit weird... maybe nd -> n-D? Same for 2d? There is also a 1-d below. dcaballe: nit: nd reads a bit weird... maybe nd -> n-D? Same for 2d? There is also a 1-d below.
				using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

				explicit InnerOuterDimReductionConversion(MLIRContext *context,
				bool useInnerDimsForReduction)
				springermUnsubmitted Done Reply Inline Actions line break not necessary springerm: line break not necessary
				: mlir::OpRewritePattern<vector::MultiDimReductionOp>(context),
				useInnerDimsForReduction(useInnerDimsForReduction) {}

				LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
				PatternRewriter &rewriter) const override {
				auto src = multiReductionOp.source();
				auto loc = multiReductionOp.getLoc();
				auto srcRank = multiReductionOp.getSourceVectorType().getRank();

				// Separate reduction and parallel dims
				auto reductionDimsRange =
				springermUnsubmitted Done Reply Inline Actions This will probably get out-of-sync with the rest of the file pretty fast... Any reason to put this here instead of class/struct comment? springerm: This will probably get out-of-sync with the rest of the file pretty fast... Any reason to put…
				dcaballeUnsubmitted Done Reply Inline Actions +1. I would just add a brief summary to the file section and move the details to the file section. dcaballe: +1. I would just add a brief summary to the file section and move the details to the file…
				multiReductionOp.reduction_dims().getAsValueRange<IntegerAttr>();
				auto reductionDims = llvm::to_vector<4>(llvm::map_range(
				reductionDimsRange, [](APInt a) { return a.getZExtValue(); }));
				llvm::SmallDenseSet<int64_t> reductionDimsSet(reductionDims.begin(),
				reductionDims.end());
				springermUnsubmitted Done Reply Inline Actions transpose springerm: transpose
				dcaballeUnsubmitted Done Reply Inline Actions `//` -> `///` here and in all the classes/methods below. dcaballe: `//` -> `///` here and in all the classes/methods below.
				int64_t reductionSize = reductionDims.size();
				SmallVector<int64_t, 4> parallelDims;
				for (int64_t i = 0; i < srcRank; ++i)
				if (!reductionDimsSet.contains(i))
				parallelDims.push_back(i);

				// Add transpose only if inner-most/outer-most dimensions are not parallel
				if (useInnerDimsForReduction &&
				(parallelDims ==
				llvm::to_vector<4>(llvm::seq<int64_t>(0, parallelDims.size()))))
				return failure();

				if (!useInnerDimsForReduction &&
				(parallelDims !=
				llvm::to_vector<4>(llvm::seq<int64_t>(0, parallelDims.size()))))
				return failure();

				SmallVector<int64_t, 4> indices;
				if (useInnerDimsForReduction) {
				indices.append(parallelDims.begin(), parallelDims.end());
				indices.append(reductionDims.begin(), reductionDims.end());
				} else {
				indices.append(reductionDims.begin(), reductionDims.end());
				indices.append(parallelDims.begin(), parallelDims.end());
				}
				auto transposeOp = rewriter.create<vector::TransposeOp>(loc, src, indices);
				dcaballeUnsubmitted Done Reply Inline Actions nit: pre-increment per coding standards. dcaballe: nit: pre-increment per coding standards.
				SmallVector<bool> reductionMask(srcRank, false);
				for (int i = 0; i < reductionSize; ++i) {
				if (useInnerDimsForReduction)
				reductionMask[srcRank - i - 1] = true;
				else
				reductionMask[i] = true;
				}
				rewriter.replaceOpWithNewOp<vector::MultiDimReductionOp>(
				multiReductionOp, transposeOp.result(), reductionMask,
				multiReductionOp.kind());
				return success();
				}

				private:
				const bool useInnerDimsForReduction;
				};

				/// Reduces the rank of vector.multi_reduction nd -> 2d given all reduction
				/// dimensions are either inner most or outer most.
				class ReduceMultiDimReductionRank
				: public OpRewritePattern<vector::MultiDimReductionOp> {
				public:
				using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

				explicit ReduceMultiDimReductionRank(MLIRContext *context,
				bool useInnerDimsForReduction)
				: mlir::OpRewritePattern<vector::MultiDimReductionOp>(context),
				useInnerDimsForReduction(useInnerDimsForReduction) {}

				LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
				PatternRewriter &rewriter) const override {
				auto srcRank = multiReductionOp.getSourceVectorType().getRank();
				auto srcShape = multiReductionOp.getSourceVectorType().getShape();
				auto loc = multiReductionOp.getLoc();

				// If rank less than 2, nothing to do.
				if (srcRank < 2)
				return failure();

				// If already rank-2 ["parallel", "reduce"] or ["reduce", "parallel"] bail.
				SmallVector<bool> reductionMask = multiReductionOp.getReductionMask();
				if (srcRank == 2 && reductionMask.front() != reductionMask.back())
				pifon2aUnsubmitted Done Reply Inline Actions nit: multi pifon2a: nit: multi
				return failure();

				// 1. Separate reduction and parallel dims.
				SmallVector<int64_t, 4> parallelDims, parallelShapes;
				SmallVector<int64_t, 4> reductionDims, reductionShapes;
				for (auto it : llvm::enumerate(reductionMask)) {
				int64_t i = it.index();
				bool isReduction = it.value();
				if (isReduction) {
				reductionDims.push_back(i);
				reductionShapes.push_back(srcShape[i]);
				} else {
				parallelDims.push_back(i);
				parallelShapes.push_back(srcShape[i]);
				}
				}

				// 2. Compute flattened parallel and reduction sizes.
				int flattenedParallelDim = 0;
				int flattenedReductionDim = 0;
				if (parallelShapes.size() > 0) {
				flattenedParallelDim = 1;
				springermUnsubmitted Done Reply Inline Actions I don't quite understand the notion of "parallel". Does it just mean "don't reduce but concatenate"? springerm: I don't quite understand the notion of "parallel". Does it just mean "don't reduce but…
				nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions In practice it is "not-reduce", other places in the file used "parallel", likely as analogy with Linalg. If we feel this is confusing and we want to improve this, we should do a global followup cleanup. nicolasvasilache: In practice it is "not-reduce", other places in the file used "parallel", likely as analogy…
				for (auto d : parallelShapes)
				springermUnsubmitted Done Reply Inline Actions ArrayRef<bool> springerm: ArrayRef<bool>
				nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions nope, that would create memory errors, we need ownership. SmallVector it is nicolasvasilache: nope, that would create memory errors, we need ownership. SmallVector it is
				flattenedParallelDim *= d;
				}
				if (reductionShapes.size() > 0) {
				flattenedReductionDim = 1;
				for (auto d : reductionShapes)
				flattenedReductionDim *= d;
				}
				// We must at least have some parallel or some reduction.
				dcaballeUnsubmitted Done Reply Inline Actions nit: spell out this `auto` and some others above and below (integers, Value, Location, etc.) would help readability a lot. dcaballe: nit: spell out this `auto` and some others above and below (integers, Value, Location, etc.)…
				nicolasvasilacheAuthorUnsubmitted Done Reply Inline Actions ints I generally try to avoid so that I don't inadvertently introduce casts, updating the rest. nicolasvasilache: ints I generally try to avoid so that I don't inadvertently introduce casts, updating the rest.
				assert((flattenedParallelDim \|\| flattenedReductionDim) &&
				"expected at least one parallel or reduction dim");

				// 3. Fail if reduction/parallel dims are not contiguous.
				// Check parallelDims are exactly [0 .. size).
				int64_t counter = 0;
				if (useInnerDimsForReduction &&
				llvm::any_of(parallelDims, [&](int64_t i) { return i != counter++; }))
				return failure();
				// Check parallelDims are exactly {reductionDims.size()} + [0 .. size).
				counter = reductionDims.size();
				if (!useInnerDimsForReduction &&
				llvm::any_of(parallelDims, [&](int64_t i) { return i != counter++; }))
				return failure();

				// 4. Shape cast to collapse consecutive parallel (resp. reduction dim) into
				// a single parallel (resp. reduction) dim.
				SmallVector<bool, 2> mask;
				SmallVector<int64_t, 2> vectorShape;
				if (flattenedParallelDim) {
				mask.push_back(false);
				vectorShape.push_back(flattenedParallelDim);
				}
				if (flattenedReductionDim) {
				mask.push_back(true);
				vectorShape.push_back(flattenedReductionDim);
				}
				if (!useInnerDimsForReduction && vectorShape.size() == 2) {
				std::swap(mask.front(), mask.back());
				std::swap(vectorShape.front(), vectorShape.back());
				}
				auto castedType = VectorType::get(
				vectorShape, multiReductionOp.getSourceVectorType().getElementType());
				Value cast = rewriter.create<vector::ShapeCastOp>(
				loc, castedType, multiReductionOp.source());
				springermUnsubmitted Done Reply Inline Actions Shouldn't this be checking for some kind of equality of "outer" dims? springerm: Shouldn't this be checking for some kind of equality of "outer" dims?

				// 5. Creates the flattened form of vector.multi_reduction with inner/outer
				// most dim as reduction.
				auto newOp = rewriter.create<vector::MultiDimReductionOp>(
				loc, cast, mask, multiReductionOp.kind());

				// 6. If there are no parallel shapes, the result is a scalar.
				// TODO: support 0-d vectors when available.
				if (parallelShapes.empty()) {
				rewriter.replaceOp(multiReductionOp, newOp.dest());
				return success();
				}

				// 7. Creates shape cast for the output n-D -> 2-D
				VectorType outputCastedType = VectorType::get(
				parallelShapes,
				multiReductionOp.getSourceVectorType().getElementType());
				rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(
				multiReductionOp, outputCastedType, newOp.dest());
				return success();
				}

				private:
				const bool useInnerDimsForReduction;
				};

				/// Unrolls vector.multi_reduction with outermost reductions
				/// and combines results
				struct TwoDimMultiReductionToElementWise
				: public OpRewritePattern<vector::MultiDimReductionOp> {
				using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
				PatternRewriter &rewriter) const override {
				auto srcRank = multiReductionOp.getSourceVectorType().getRank();
				// Rank-2 ["parallel", "reduce"] or bail.
				springermUnsubmitted Done Reply Inline Actions nd -> 2d? springerm: nd -> 2d?
				if (srcRank != 2)
				return failure();

				if (multiReductionOp.isReducedDim(1) \|\| !multiReductionOp.isReducedDim(0))
				return failure();

				auto loc = multiReductionOp.getLoc();
				ArrayRef<int64_t> srcShape =
				multiReductionOp.getSourceVectorType().getShape();

				Type elementType = getElementTypeOrSelf(multiReductionOp.getDestType());
				if (!elementType.isIntOrIndexOrFloat())
				return failure();

				Value condition;
				Value result =
				rewriter.create<vector::ExtractOp>(loc, multiReductionOp.source(), 0)
				.getResult();
				for (int64_t i = 1; i < srcShape[0]; i++) {
				auto operand =
				rewriter.create<vector::ExtractOp>(loc, multiReductionOp.source(), i);
				switch (multiReductionOp.kind()) {
				case vector::CombiningKind::ADD:
				if (elementType.isIntOrIndex())
				result = rewriter.create<AddIOp>(loc, operand, result);
				else
				result = rewriter.create<AddFOp>(loc, operand, result);
				break;
				case vector::CombiningKind::MUL:
				if (elementType.isIntOrIndex())
				result = rewriter.create<MulIOp>(loc, operand, result);
				else
				result = rewriter.create<MulFOp>(loc, operand, result);
				break;
				case vector::CombiningKind::MINUI:
				result = rewriter.create<MinUIOp>(loc, operand, result);
				break;
				case vector::CombiningKind::MINSI:
				result = rewriter.create<MinSIOp>(loc, operand, result);
				break;
				case vector::CombiningKind::MINF:
				result = rewriter.create<MinFOp>(loc, operand, result);
				break;
				case vector::CombiningKind::MAXUI:
				result = rewriter.create<MaxUIOp>(loc, operand, result);
				break;
				case vector::CombiningKind::MAXSI:
				result = rewriter.create<MaxSIOp>(loc, operand, result);
				break;
				case vector::CombiningKind::MAXF:
				result = rewriter.create<MaxFOp>(loc, operand, result);
				break;
				case vector::CombiningKind::AND:
				result = rewriter.create<AndOp>(loc, operand, result);
				break;
				case vector::CombiningKind::OR:
				result = rewriter.create<OrOp>(loc, operand, result);
				break;
				case vector::CombiningKind::XOR:
				result = rewriter.create<XOrOp>(loc, operand, result);
				break;
				}
				}

				rewriter.replaceOp(multiReductionOp, result);
				return success();
				}
				};

				/// Converts 2d vector.multi_reduction with inner most reduction dimension into
				/// a sequence of vector.reduction ops.
				struct TwoDimMultiReductionToReduction
				: public OpRewritePattern<vector::MultiDimReductionOp> {
				using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
				PatternRewriter &rewriter) const override {
				auto srcRank = multiReductionOp.getSourceVectorType().getRank();
				if (srcRank != 2)
				return failure();

				if (multiReductionOp.isReducedDim(0) \|\| !multiReductionOp.isReducedDim(1))
				return failure();

				auto loc = multiReductionOp.getLoc();
				Value result = rewriter.create<ConstantOp>(
				loc, multiReductionOp.getDestType(),
				rewriter.getZeroAttr(multiReductionOp.getDestType()));
				int outerDim = multiReductionOp.getSourceVectorType().getShape()[0];

				// TODO: Add vector::CombiningKind attribute instead of string to
				// vector.reduction.
				auto getKindStr = [](vector::CombiningKind kind) {
				switch (kind) {
				case vector::CombiningKind::ADD:
				return "add";
				case vector::CombiningKind::MUL:
				return "mul";
				case vector::CombiningKind::MINUI:
				return "minui";
				case vector::CombiningKind::MINSI:
				return "minsi";
				case vector::CombiningKind::MINF:
				return "minf";
				case vector::CombiningKind::MAXUI:
				return "maxui";
				case vector::CombiningKind::MAXSI:
				return "maxsi";
				case vector::CombiningKind::MAXF:
				return "maxf";
				case vector::CombiningKind::AND:
				return "and";
				case vector::CombiningKind::OR:
				return "or";
				case vector::CombiningKind::XOR:
				return "xor";
				}
				llvm_unreachable("unknown combining kind");
				};

				for (int i = 0; i < outerDim; ++i) {
				auto v = rewriter.create<vector::ExtractOp>(
				loc, multiReductionOp.source(), ArrayRef<int64_t>{i});
				auto reducedValue = rewriter.create<vector::ReductionOp>(
				loc, getElementTypeOrSelf(multiReductionOp.getDestType()),
				rewriter.getStringAttr(getKindStr(multiReductionOp.kind())), v,
				ValueRange{});
				result = rewriter.create<vector::InsertElementOp>(loc, reducedValue,
				result, i);
				}
				rewriter.replaceOp(multiReductionOp, result);
				return success();
				}
				};

				/// Converts 1d vector.multi_reduction with a single reduction dimension to a 2d
				/// form with both a single parallel and reduction dimension.
				/// This is achieved with a simple vector.shape_cast that inserts a leading 1.
				/// The case with a single parallel dimension is a noop and folds away
				/// separately.
				struct OneDimMultiReductionToTwoDim
				: public OpRewritePattern<vector::MultiDimReductionOp> {
				using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
				PatternRewriter &rewriter) const override {
				auto srcRank = multiReductionOp.getSourceVectorType().getRank();
				// Rank-1 or bail.
				if (srcRank != 1)
				return failure();

				auto loc = multiReductionOp.getLoc();
				auto srcVectorType = multiReductionOp.getSourceVectorType();
				auto srcShape = srcVectorType.getShape();
				auto castedType = VectorType::get(ArrayRef<int64_t>{1, srcShape.back()},
				srcVectorType.getElementType());
				assert(!multiReductionOp.getDestType().isa<VectorType>() &&
				"multi_reduction with a single dimension expects a scalar result");

				// If the unique dim is reduced and we insert a parallel in front, we need a
				// {false, true} mask.
				SmallVector<bool, 2> mask{false, true};

				/// vector.extract(vector.multi_reduce(vector.shape_cast(v, 1xk)), 0)
				Value cast = rewriter.create<vector::ShapeCastOp>(
				loc, castedType, multiReductionOp.source());
				Value reduced = rewriter.create<vector::MultiDimReductionOp>(
				loc, cast, mask, multiReductionOp.kind());
				rewriter.replaceOpWithNewOp<vector::ExtractOp>(multiReductionOp, reduced,
				ArrayRef<int64_t>{0});
				return success();
				}
				};

				void mlir::vector::populateVectorMultiReductionLoweringPatterns(
				RewritePatternSet &patterns, bool useInnerDimsForReduction) {
				patterns.add<InnerOuterDimReductionConversion, ReduceMultiDimReductionRank,
				OneDimMultiReductionToTwoDim>(patterns.getContext(),
				useInnerDimsForReduction);
				if (useInnerDimsForReduction)
				patterns.add<TwoDimMultiReductionToReduction>(patterns.getContext());
				else
				patterns.add<TwoDimMultiReductionToElementWise>(patterns.getContext());
				}

mlir/lib/Dialect/Vector/VectorOps.cpp

	Show First 20 Lines • Show All 254 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	void vector::MultiDimReductionOp::build(OpBuilder &builder,			void vector::MultiDimReductionOp::build(OpBuilder &builder,
	OperationState &result, Value source,			OperationState &result, Value source,
	ArrayRef<bool> reductionMask,			ArrayRef<bool> reductionMask,
	CombiningKind kind) {			CombiningKind kind) {
	result.addOperands(source);			result.addOperands(source);
	auto sourceVectorType = source.getType().cast<VectorType>();			auto sourceVectorType = source.getType().cast<VectorType>();
	auto targetShape = MultiDimReductionOp::inferDestShape(			auto targetType = MultiDimReductionOp::inferDestType(
	sourceVectorType.getShape(), reductionMask);			sourceVectorType.getShape(), reductionMask,
	auto targetVectorType =			sourceVectorType.getElementType());
	VectorType::get(targetShape, sourceVectorType.getElementType());			result.addTypes(targetType);
	result.addTypes(targetVectorType);

	SmallVector<int64_t> reductionDims;			SmallVector<int64_t> reductionDims;
	for (auto en : llvm::enumerate(reductionMask))			for (auto en : llvm::enumerate(reductionMask))
	if (en.value())			if (en.value())
	reductionDims.push_back(en.index());			reductionDims.push_back(en.index());
	result.addAttribute(getReductionDimsAttrName(),			result.addAttribute(getReductionDimsAttrName(),
	builder.getI64ArrayAttr(reductionDims));			builder.getI64ArrayAttr(reductionDims));
	result.addAttribute(getKindAttrName(),			result.addAttribute(getKindAttrName(),
	CombiningKindAttr::get(kind, builder.getContext()));			CombiningKindAttr::get(kind, builder.getContext()));
	}			}

	static LogicalResult verify(MultiDimReductionOp op) {			static LogicalResult verify(MultiDimReductionOp op) {
	auto reductionMask = op.getReductionMask();			auto reductionMask = op.getReductionMask();
	auto targetShape = MultiDimReductionOp::inferDestShape(			auto targetType = MultiDimReductionOp::inferDestType(
	op.getSourceVectorType().getShape(), reductionMask);			op.getSourceVectorType().getShape(), reductionMask,
	auto targetVectorType =			op.getSourceVectorType().getElementType());
	VectorType::get(targetShape, op.getSourceVectorType().getElementType());			// TODO: update to support 0-d vectors when available.
	if (targetVectorType != op.getDestVectorType())			if (targetType != op.getDestType())
	return op.emitError("invalid output vector type: ")			return op.emitError("invalid output vector type: ")
	<< op.getDestVectorType() << " (expected: " << targetVectorType			<< op.getDestType() << " (expected: " << targetType << ")";
	<< ")";
	return success();			return success();
	}			}

				OpFoldResult MultiDimReductionOp::fold(ArrayRef<Attribute> operands) {
				// Single parallel dim, this is a noop.
				if (getSourceVectorType().getRank() == 1 && !isReducedDim(0))
				return source();
				return {};
				}

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// ReductionOp			// ReductionOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	static LogicalResult verify(ReductionOp op) {			static LogicalResult verify(ReductionOp op) {
	// Verify for 1-D vector.			// Verify for 1-D vector.
	int64_t rank = op.getVectorType().getRank();			int64_t rank = op.getVectorType().getRank();
	if (rank != 1)			if (rank != 1)
	▲ Show 20 Lines • Show All 3,589 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorTransforms.cpp

Show First 20 Lines • Show All 869 Lines • ▼ Show 20 Lines	case CombiningKind::MUL:
combinedResult = rewriter.create<MulFOp>(loc, mul, acc);		combinedResult = rewriter.create<MulFOp>(loc, mul, acc);
break;		break;
case CombiningKind::MINF:		case CombiningKind::MINF:
combinedResult = rewriter.create<MinFOp>(loc, mul, acc);		combinedResult = rewriter.create<MinFOp>(loc, mul, acc);
break;		break;
case CombiningKind::MAXF:		case CombiningKind::MAXF:
combinedResult = rewriter.create<MaxFOp>(loc, mul, acc);		combinedResult = rewriter.create<MaxFOp>(loc, mul, acc);
break;		break;
case CombiningKind::ADD: // Already handled this special case above.		case CombiningKind::ADD: // Already handled this special case above.
case CombiningKind::AND: // Only valid for integer types.		case CombiningKind::AND: // Only valid for integer types.
case CombiningKind::MINUI: // Only valid for integer types.		case CombiningKind::MINUI: // Only valid for integer types.
case CombiningKind::MINSI: // Only valid for integer types.		case CombiningKind::MINSI: // Only valid for integer types.
case CombiningKind::MAXUI: // Only valid for integer types.		case CombiningKind::MAXUI: // Only valid for integer types.
case CombiningKind::MAXSI: // Only valid for integer types.		case CombiningKind::MAXSI: // Only valid for integer types.
case CombiningKind::OR: // Only valid for integer types.		case CombiningKind::OR: // Only valid for integer types.
case CombiningKind::XOR: // Only valid for integer types.		case CombiningKind::XOR: // Only valid for integer types.
return Optional<Value>();		return Optional<Value>();
}		}
return Optional<Value>(combinedResult);		return Optional<Value>(combinedResult);
}		}
};		};

/// Progressive lowering of ConstantMaskOp.		/// Progressive lowering of ConstantMaskOp.
/// One:		/// One:
▲ Show 20 Lines • Show All 2,605 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::CreateMaskOp op,
}		}
return failure();		return failure();
}		}

private:		private:
const bool enableIndexOptimizations;		const bool enableIndexOptimizations;
};		};

// Converts vector.multi_reduction into inner-most/outer-most reduction form
// by using vector.tranpose
class InnerOuterDimReductionConversion
: public OpRewritePattern<vector::MultiDimReductionOp> {
public:
using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

explicit InnerOuterDimReductionConversion(MLIRContext *context,
bool useInnerDimsForReduction)
: mlir::OpRewritePattern<vector::MultiDimReductionOp>(context),
useInnerDimsForReduction(useInnerDimsForReduction) {}

LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
PatternRewriter &rewriter) const override {
auto src = multiReductionOp.source();
auto loc = multiReductionOp.getLoc();
auto srcRank = multiReductionOp.getSourceVectorType().getRank();

// Separate reduction and parallel dims
auto reductionDimsRange =
multiReductionOp.reduction_dims().getAsValueRange<IntegerAttr>();
auto reductionDims = llvm::to_vector<4>(llvm::map_range(
reductionDimsRange, [](APInt a) { return a.getZExtValue(); }));
llvm::SmallDenseSet<int64_t> reductionDimsSet(reductionDims.begin(),
reductionDims.end());
int64_t reductionSize = reductionDims.size();
SmallVector<int64_t, 4> parallelDims;
for (int64_t i = 0; i < srcRank; i++) {
if (!reductionDimsSet.contains(i))
parallelDims.push_back(i);
}

// Add transpose only if inner-most/outer-most dimensions are not parallel
if (useInnerDimsForReduction &&
(parallelDims ==
llvm::to_vector<4>(llvm::seq<int64_t>(0, parallelDims.size()))))
return failure();

if (!useInnerDimsForReduction &&
(parallelDims !=
llvm::to_vector<4>(llvm::seq<int64_t>(0, parallelDims.size()))))
return failure();

SmallVector<int64_t, 4> indices;
if (useInnerDimsForReduction) {
indices.append(parallelDims.begin(), parallelDims.end());
indices.append(reductionDims.begin(), reductionDims.end());
} else {
indices.append(reductionDims.begin(), reductionDims.end());
indices.append(parallelDims.begin(), parallelDims.end());
}
auto transposeOp = rewriter.create<vector::TransposeOp>(loc, src, indices);
SmallVector<bool> reductionMask(srcRank, false);
for (int i = 0; i < reductionSize; ++i) {
if (useInnerDimsForReduction)
reductionMask[srcRank - i - 1] = true;
else
reductionMask[i] = true;
}
rewriter.replaceOpWithNewOp<vector::MultiDimReductionOp>(
multiReductionOp, transposeOp.result(), reductionMask,
multiReductionOp.kind());
return success();
}

private:
const bool useInnerDimsForReduction;
};

// Reduces the rank of vector.mult_reduction nd -> 2d given all reduction
// dimensions are either inner most or outer most.
class ReduceMultiDimReductionRank
: public OpRewritePattern<vector::MultiDimReductionOp> {
public:
using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

explicit ReduceMultiDimReductionRank(MLIRContext *context,
bool useInnerDimsForReduction)
: mlir::OpRewritePattern<vector::MultiDimReductionOp>(context),
useInnerDimsForReduction(useInnerDimsForReduction) {}

LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
PatternRewriter &rewriter) const override {
auto srcRank = multiReductionOp.getSourceVectorType().getRank();
auto srcShape = multiReductionOp.getSourceVectorType().getShape();
auto loc = multiReductionOp.getLoc();
if (srcRank == 2)
return failure();

// Separate reduction and parallel dims
auto reductionDimsRange =
multiReductionOp.reduction_dims().getAsValueRange<IntegerAttr>();
auto reductionDims = llvm::to_vector<4>(llvm::map_range(
reductionDimsRange, [](APInt a) { return a.getZExtValue(); }));
llvm::SmallDenseSet<int64_t> reductionDimsSet(reductionDims.begin(),
reductionDims.end());
SmallVector<int64_t, 4> parallelDims, parallelShapes;
int canonicalReductionDim = 1;
int canonicalParallelDim = 1;
for (int64_t i = 0; i < srcRank; i++) {
if (!reductionDimsSet.contains(i)) {
parallelDims.push_back(i);
parallelShapes.push_back(srcShape[i]);
canonicalParallelDim *= srcShape[i];
} else {
canonicalReductionDim *= srcShape[i];
}
}

// Fail if reduction dims are not either inner-most or outer-most
if (useInnerDimsForReduction &&
(parallelDims !=
llvm::to_vector<4>(llvm::seq<int64_t>(0, parallelDims.size()))))
return failure();

if (!useInnerDimsForReduction &&
(parallelDims ==
llvm::to_vector<4>(llvm::seq<int64_t>(0, parallelDims.size()))))
return failure();

// Creates shape cast for the inputs n_d -> 2d
int64_t outerDim =
useInnerDimsForReduction ? canonicalParallelDim : canonicalReductionDim;
int64_t innerDim =
useInnerDimsForReduction ? canonicalReductionDim : canonicalParallelDim;

auto castedType = VectorType::get(
ArrayRef<int64_t>{outerDim, innerDim},
multiReductionOp.getSourceVectorType().getElementType());
auto castedOp = rewriter.create<vector::ShapeCastOp>(
loc, castedType, multiReductionOp.source());

// Creates the canonical form of 2d vector.multi_reduction with inner/outer
// most dim as reduction.
SmallVector<bool, 2> mask{!useInnerDimsForReduction,
useInnerDimsForReduction};
auto newOp = rewriter.create<vector::MultiDimReductionOp>(
loc, castedOp.result(), mask, multiReductionOp.kind());

// Creates shape cast for the output 2d -> nd
VectorType outputCastedType = VectorType::get(
parallelShapes,
multiReductionOp.getSourceVectorType().getElementType());
Value castedOutputOp = rewriter.create<vector::ShapeCastOp>(
loc, outputCastedType, newOp.dest());

rewriter.replaceOp(multiReductionOp, castedOutputOp);
return success();
}

private:
const bool useInnerDimsForReduction;
};

// Unrolls vector.multi_reduction with outermost reductions
// and combines results
struct UnrollOuterMultiReduction
: public OpRewritePattern<vector::MultiDimReductionOp> {
using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
PatternRewriter &rewriter) const override {
auto srcRank = multiReductionOp.getSourceVectorType().getRank();
if (srcRank != 2)
return failure();

if (multiReductionOp.getReductionMask()[1] \|\|
!multiReductionOp.getReductionMask()[0])
return failure();

auto loc = multiReductionOp.getLoc();
ArrayRef<int64_t> srcShape =
multiReductionOp.getSourceVectorType().getShape();

Type elementType = multiReductionOp.getDestVectorType().getElementType();
if (!elementType.isIntOrIndexOrFloat())
return failure();

Value condition;
Value result =
rewriter.create<vector::ExtractOp>(loc, multiReductionOp.source(), 0)
.getResult();
for (int64_t i = 1; i < srcShape[0]; i++) {
auto operand =
rewriter.create<vector::ExtractOp>(loc, multiReductionOp.source(), i);
switch (multiReductionOp.kind()) {
case vector::CombiningKind::ADD:
if (elementType.isIntOrIndex())
result = rewriter.create<AddIOp>(loc, operand, result);
else
result = rewriter.create<AddFOp>(loc, operand, result);
break;
case vector::CombiningKind::MUL:
if (elementType.isIntOrIndex())
result = rewriter.create<MulIOp>(loc, operand, result);
else
result = rewriter.create<MulFOp>(loc, operand, result);
break;
case vector::CombiningKind::MINUI:
result = rewriter.create<MinUIOp>(loc, operand, result);
break;
case vector::CombiningKind::MINSI:
result = rewriter.create<MinSIOp>(loc, operand, result);
break;
case vector::CombiningKind::MINF:
result = rewriter.create<MinFOp>(loc, operand, result);
break;
case vector::CombiningKind::MAXUI:
result = rewriter.create<MaxUIOp>(loc, operand, result);
break;
case vector::CombiningKind::MAXSI:
result = rewriter.create<MaxSIOp>(loc, operand, result);
break;
case vector::CombiningKind::MAXF:
result = rewriter.create<MaxFOp>(loc, operand, result);
break;
case vector::CombiningKind::AND:
result = rewriter.create<AndOp>(loc, operand, result);
break;
case vector::CombiningKind::OR:
result = rewriter.create<OrOp>(loc, operand, result);
break;
case vector::CombiningKind::XOR:
result = rewriter.create<XOrOp>(loc, operand, result);
break;
}
}

rewriter.replaceOp(multiReductionOp, result);
return success();
}
};

// Converts 2d vector.multi_reduction with inner most reduction dimension into a
// sequence of vector.reduction ops.
struct TwoDimMultiReductionToReduction
: public OpRewritePattern<vector::MultiDimReductionOp> {
using OpRewritePattern<vector::MultiDimReductionOp>::OpRewritePattern;

LogicalResult matchAndRewrite(vector::MultiDimReductionOp multiReductionOp,
PatternRewriter &rewriter) const override {
auto srcRank = multiReductionOp.getSourceVectorType().getRank();
if (srcRank != 2)
return failure();

if (multiReductionOp.getReductionMask()[0] \|\|
!multiReductionOp.getReductionMask()[1])
return failure();

auto loc = multiReductionOp.getLoc();

Value result =
multiReductionOp.getDestVectorType().getElementType().isIntOrIndex()
? rewriter.create<ConstantOp>(
loc, multiReductionOp.getDestVectorType(),
DenseElementsAttr::get(multiReductionOp.getDestVectorType(),
0))
: rewriter.create<ConstantOp>(
loc, multiReductionOp.getDestVectorType(),
DenseElementsAttr::get(multiReductionOp.getDestVectorType(),
0.0f));

int outerDim = multiReductionOp.getSourceVectorType().getShape()[0];

// TODO: Add vector::CombiningKind attribute instead of string to
// vector.reduction.
auto getKindStr = [](vector::CombiningKind kind) {
switch (kind) {
case vector::CombiningKind::ADD:
return "add";
case vector::CombiningKind::MUL:
return "mul";
case vector::CombiningKind::MINUI:
return "minui";
case vector::CombiningKind::MINSI:
return "minsi";
case vector::CombiningKind::MINF:
return "minf";
case vector::CombiningKind::MAXUI:
return "maxui";
case vector::CombiningKind::MAXSI:
return "maxsi";
case vector::CombiningKind::MAXF:
return "maxf";
case vector::CombiningKind::AND:
return "and";
case vector::CombiningKind::OR:
return "or";
case vector::CombiningKind::XOR:
return "xor";
}
llvm_unreachable("unknown combining kind");
};

for (int i = 0; i < outerDim; ++i) {
auto v = rewriter.create<vector::ExtractOp>(
loc, multiReductionOp.source(), ArrayRef<int64_t>{i});
auto reducedValue = rewriter.create<vector::ReductionOp>(
loc, multiReductionOp.getDestVectorType().getElementType(),
rewriter.getStringAttr(getKindStr(multiReductionOp.kind())), v,
ValueRange{});
result = rewriter.create<vector::InsertElementOp>(loc, reducedValue,
result, i);
}
rewriter.replaceOp(multiReductionOp, result);
return success();
}
};

void mlir::vector::populateVectorMaskMaterializationPatterns(		void mlir::vector::populateVectorMaskMaterializationPatterns(
RewritePatternSet &patterns, bool enableIndexOptimizations) {		RewritePatternSet &patterns, bool enableIndexOptimizations) {
patterns.add<VectorCreateMaskOpConversion,		patterns.add<VectorCreateMaskOpConversion,
MaterializeTransferMask<vector::TransferReadOp>,		MaterializeTransferMask<vector::TransferReadOp>,
MaterializeTransferMask<vector::TransferWriteOp>>(		MaterializeTransferMask<vector::TransferWriteOp>>(
patterns.getContext(), enableIndexOptimizations);		patterns.getContext(), enableIndexOptimizations);
}		}

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
void mlir::vector::populateVectorTransferLoweringPatterns(		void mlir::vector::populateVectorTransferLoweringPatterns(
RewritePatternSet &patterns, llvm::Optional<unsigned> maxTransferRank) {		RewritePatternSet &patterns, llvm::Optional<unsigned> maxTransferRank) {
patterns.add<TransferReadToVectorLoadLowering,		patterns.add<TransferReadToVectorLoadLowering,
TransferWriteToVectorStoreLowering>(patterns.getContext(),		TransferWriteToVectorStoreLowering>(patterns.getContext(),
maxTransferRank);		maxTransferRank);
patterns.add<VectorLoadToMemrefLoadLowering>(patterns.getContext());		patterns.add<VectorLoadToMemrefLoadLowering>(patterns.getContext());
}		}

void mlir::vector::populateVectorMultiReductionLoweringPatterns(
RewritePatternSet &patterns, bool useInnerDimsForReduction) {
patterns.add<InnerOuterDimReductionConversion, ReduceMultiDimReductionRank>(
patterns.getContext(), useInnerDimsForReduction);
if (useInnerDimsForReduction)
patterns.add<TwoDimMultiReductionToReduction>(patterns.getContext());
else
patterns.add<UnrollOuterMultiReduction>(patterns.getContext());
}

void mlir::vector::populateVectorUnrollPatterns(		void mlir::vector::populateVectorUnrollPatterns(
RewritePatternSet &patterns, const UnrollVectorOptions &options) {		RewritePatternSet &patterns, const UnrollVectorOptions &options) {
patterns.add<UnrollTransferReadPattern, UnrollTransferWritePattern,		patterns.add<UnrollTransferReadPattern, UnrollTransferWritePattern,
UnrollContractionPattern, UnrollElementwisePattern>(		UnrollContractionPattern, UnrollElementwisePattern>(
patterns.getContext(), options);		patterns.getContext(), options);
}		}

mlir/test/Dialect/Vector/canonicalize.mlir

	Show First 20 Lines • Show All 1,020 Lines • ▼ Show 20 Lines
	// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>			// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {			func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {
	%c0 = constant 0 : index			%c0 = constant 0 : index
	%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>			%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>
	%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>			%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>
	return %1 : tensor<?x?x12xf32>			return %1 : tensor<?x?x12xf32>
	}			}

				// -----

				// CHECK-LABEL: func @vector_multi_reduction_single_parallel(
				// CHECK-SAME: %[[v:.*]]: vector<2xf32>
				func @vector_multi_reduction_single_parallel(%arg0: vector<2xf32>) -> vector<2xf32> {
				%0 = vector.multi_reduction #vector.kind<mul>, %arg0 [] : vector<2xf32> to vector<2xf32>

				// CHECK: return %[[v]] : vector<2xf32>
				return %0 : vector<2xf32>
				}

mlir/test/Dialect/Vector/ops.mlir

Show First 20 Lines • Show All 615 Lines • ▼ Show 20 Lines	func @extract_insert_map(%v: vector<32xf32>, %v2: vector<16x32xf32>,
// CHECK: %[[R:.]] = vector.insert_map %[[V]], %{{.}}[%{{.*}}] : vector<2xf32> into vector<32xf32>		// CHECK: %[[R:.]] = vector.insert_map %[[V]], %{{.}}[%{{.*}}] : vector<2xf32> into vector<32xf32>
%r = vector.insert_map %vd, %v[%id0] : vector<2xf32> into vector<32xf32>		%r = vector.insert_map %vd, %v[%id0] : vector<2xf32> into vector<32xf32>
// CHECK: %[[R1:.]] = vector.insert_map %[[V1]], %{{.}}[%{{.}}, %{{.}}] : vector<4x2xf32> into vector<16x32xf32>		// CHECK: %[[R1:.]] = vector.insert_map %[[V1]], %{{.}}[%{{.}}, %{{.}}] : vector<4x2xf32> into vector<16x32xf32>
%r2 = vector.insert_map %vd2, %v2[%id0, %id1] : vector<4x2xf32> into vector<16x32xf32>		%r2 = vector.insert_map %vd2, %v2[%id0, %id1] : vector<4x2xf32> into vector<16x32xf32>
// CHECK: return %[[R]], %[[R1]] : vector<32xf32>, vector<16x32xf32>		// CHECK: return %[[R]], %[[R1]] : vector<32xf32>, vector<16x32xf32>
return %r, %r2 : vector<32xf32>, vector<16x32xf32>		return %r, %r2 : vector<32xf32>, vector<16x32xf32>
}		}

		// CHECK-LABEL: @multi_reduction
		func @multi_reduction(%0: vector<4x8x16x32xf32>) -> f32 {
		%1 = vector.multi_reduction #vector.kind<add>, %0 [1, 3] :
		vector<4x8x16x32xf32> to vector<4x16xf32>
		%2 = vector.multi_reduction #vector.kind<add>, %1 [0, 1] :
		vector<4x16xf32> to f32
		return %2 : f32
		}

mlir/test/Dialect/Vector/vector-multi-reduction-lowering.mlir

	Show All 11 Lines
	// CHECK: %[[V0:.+]] = vector.extract %[[INPUT]][0]			// CHECK: %[[V0:.+]] = vector.extract %[[INPUT]][0]
	// CHECK: %[[RV0:.+]] = vector.reduction "mul", %[[V0]] : vector<4xf32> into f32			// CHECK: %[[RV0:.+]] = vector.reduction "mul", %[[V0]] : vector<4xf32> into f32
	// CHECK: %[[RESULT_VEC_1:.+]] = vector.insertelement %[[RV0:.+]], %[[RESULT_VEC_0]][%[[C0]] : i32] : vector<2xf32>			// CHECK: %[[RESULT_VEC_1:.+]] = vector.insertelement %[[RV0:.+]], %[[RESULT_VEC_0]][%[[C0]] : i32] : vector<2xf32>
	// CHECK: %[[V1:.+]] = vector.extract %[[INPUT]][1]			// CHECK: %[[V1:.+]] = vector.extract %[[INPUT]][1]
	// CHECK: %[[RV1:.+]] = vector.reduction "mul", %[[V1]] : vector<4xf32> into f32			// CHECK: %[[RV1:.+]] = vector.reduction "mul", %[[V1]] : vector<4xf32> into f32
	// CHECK: %[[RESULT_VEC:.+]] = vector.insertelement %[[RV1:.+]], %[[RESULT_VEC_1]][%[[C1]] : i32] : vector<2xf32>			// CHECK: %[[RESULT_VEC:.+]] = vector.insertelement %[[RV1:.+]], %[[RESULT_VEC_1]][%[[C1]] : i32] : vector<2xf32>
	// CHECK: return %[[RESULT_VEC]]			// CHECK: return %[[RESULT_VEC]]

				func @vector_multi_reduction_to_scalar(%arg0: vector<2x4xf32>) -> f32 {
				%0 = vector.multi_reduction #vector.kind<mul>, %arg0 [0, 1] : vector<2x4xf32> to f32
				return %0 : f32
				}
				// CHECK-LABEL: func @vector_multi_reduction_to_scalar
				// CHECK-SAME: %[[INPUT:.+]]: vector<2x4xf32>
				// CHECK: %[[CASTED:.*]] = vector.shape_cast %[[INPUT]] : vector<2x4xf32> to vector<8xf32>
				// CHECK: %[[REDUCED:.*]] = vector.reduction "mul", %[[CASTED]] : vector<8xf32> into f32
				// CHECK: %[[INSERTED:.]] = vector.insertelement %[[REDUCED]], {{.}} : vector<1xf32>
				// CHECK: %[[RES:.*]] = vector.extract %[[INSERTED]][0] : vector<1xf32>
				// CHECK: return %[[RES]]

	func @vector_reduction_inner(%arg0: vector<2x3x4x5xi32>) -> vector<2x3xi32> {			func @vector_reduction_inner(%arg0: vector<2x3x4x5xi32>) -> vector<2x3xi32> {
	%0 = vector.multi_reduction #vector.kind<add>, %arg0 [2, 3] : vector<2x3x4x5xi32> to vector<2x3xi32>			%0 = vector.multi_reduction #vector.kind<add>, %arg0 [2, 3] : vector<2x3x4x5xi32> to vector<2x3xi32>
	return %0 : vector<2x3xi32>			return %0 : vector<2x3xi32>
	}			}
	// CHECK-LABEL: func @vector_reduction_inner			// CHECK-LABEL: func @vector_reduction_inner
	// CHECK-SAME: %[[INPUT:.+]]: vector<2x3x4x5xi32>			// CHECK-SAME: %[[INPUT:.+]]: vector<2x3x4x5xi32>
	// CHECK: %[[FLAT_RESULT_VEC_0:.+]] = constant dense<0> : vector<6xi32>			// CHECK: %[[FLAT_RESULT_VEC_0:.+]] = constant dense<0> : vector<6xi32>
	// CHECK-DAG: %[[C0:.+]] = constant 0 : i32			// CHECK-DAG: %[[C0:.+]] = constant 0 : i32
	Show All 17 Lines
	// CHECK: %[[FLAT_RESULT_VEC_4:.+]] = vector.insertelement %[[V3R]], %[[FLAT_RESULT_VEC_3]][%[[C3]] : i32] : vector<6xi32>			// CHECK: %[[FLAT_RESULT_VEC_4:.+]] = vector.insertelement %[[V3R]], %[[FLAT_RESULT_VEC_3]][%[[C3]] : i32] : vector<6xi32>
	// CHECK: %[[V4:.+]] = vector.extract %[[RESHAPED_INPUT]][4] : vector<6x20xi32>			// CHECK: %[[V4:.+]] = vector.extract %[[RESHAPED_INPUT]][4] : vector<6x20xi32>
	// CHECK: %[[V4R:.+]] = vector.reduction "add", %[[V4]] : vector<20xi32> into i32			// CHECK: %[[V4R:.+]] = vector.reduction "add", %[[V4]] : vector<20xi32> into i32
	// CHECK: %[[FLAT_RESULT_VEC_5:.+]] = vector.insertelement %[[V4R]], %[[FLAT_RESULT_VEC_4]][%[[C4]] : i32] : vector<6xi32>			// CHECK: %[[FLAT_RESULT_VEC_5:.+]] = vector.insertelement %[[V4R]], %[[FLAT_RESULT_VEC_4]][%[[C4]] : i32] : vector<6xi32>
	/// CHECK: %[[V5:.+]] = vector.extract %[[RESHAPED_INPUT]][5] : vector<6x20xi32>			/// CHECK: %[[V5:.+]] = vector.extract %[[RESHAPED_INPUT]][5] : vector<6x20xi32>
	// CHECK: %[[V5R:.+]] = vector.reduction "add", %[[V5]] : vector<20xi32> into i32			// CHECK: %[[V5R:.+]] = vector.reduction "add", %[[V5]] : vector<20xi32> into i32
	// CHECK: %[[FLAT_RESULT_VEC:.+]] = vector.insertelement %[[V5R]], %[[FLAT_RESULT_VEC_5]][%[[C5]] : i32] : vector<6xi32>			// CHECK: %[[FLAT_RESULT_VEC:.+]] = vector.insertelement %[[V5R]], %[[FLAT_RESULT_VEC_5]][%[[C5]] : i32] : vector<6xi32>
	// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[FLAT_RESULT_VEC]] : vector<6xi32> to vector<2x3xi32>			// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[FLAT_RESULT_VEC]] : vector<6xi32> to vector<2x3xi32>
	// CHECK: return %[[RESULT]]			// CHECK: return %[[RESULT]]


	func @vector_multi_reduction_transposed(%arg0: vector<2x3x4x5xf32>) -> vector<2x5xf32> {			func @vector_multi_reduction_transposed(%arg0: vector<2x3x4x5xf32>) -> vector<2x5xf32> {
	%0 = vector.multi_reduction #vector.kind<add>, %arg0 [1, 2] : vector<2x3x4x5xf32> to vector<2x5xf32>			%0 = vector.multi_reduction #vector.kind<add>, %arg0 [1, 2] : vector<2x3x4x5xf32> to vector<2x5xf32>
	return %0 : vector<2x5xf32>			return %0 : vector<2x5xf32>
	}			}

	// CHECK-LABEL: func @vector_multi_reduction_transposed			// CHECK-LABEL: func @vector_multi_reduction_transposed
	// CHECK-SAME: %[[INPUT:.+]]: vector<2x3x4x5xf32>			// CHECK-SAME: %[[INPUT:.+]]: vector<2x3x4x5xf32>
	// CHECK: %[[TRANSPOSED_INPUT:.+]] = vector.transpose %[[INPUT]], [0, 3, 1, 2] : vector<2x3x4x5xf32> to vector<2x5x3x4xf32>			// CHECK: %[[TRANSPOSED_INPUT:.+]] = vector.transpose %[[INPUT]], [0, 3, 1, 2] : vector<2x3x4x5xf32> to vector<2x5x3x4xf32>
	// CHECK: vector.shape_cast %[[TRANSPOSED_INPUT]] : vector<2x5x3x4xf32> to vector<10x12xf32>			// CHECK: vector.shape_cast %[[TRANSPOSED_INPUT]] : vector<2x5x3x4xf32> to vector<10x12xf32>
	// CHECK: %[[RESULT:.+]] = vector.shape_cast %{{.*}} : vector<10xf32> to vector<2x5xf32>			// CHECK: %[[RESULT:.+]] = vector.shape_cast %{{.*}} : vector<10xf32> to vector<2x5xf32>
	// CHECK: return %[[RESULT]]			// CHECK: return %[[RESULT]]

	func @vector_multi_reduction_ordering(%arg0: vector<3x2x4xf32>) -> vector<2x4xf32> {			func @vector_multi_reduction_ordering(%arg0: vector<3x2x4xf32>) -> vector<2x4xf32> {
	%0 = vector.multi_reduction #vector.kind<mul>, %arg0 [0] : vector<3x2x4xf32> to vector<2x4xf32>			%0 = vector.multi_reduction #vector.kind<mul>, %arg0 [0] : vector<3x2x4xf32> to vector<2x4xf32>
	return %0 : vector<2x4xf32>			return %0 : vector<2x4xf32>
	}			}
	// CHECK-LABEL: func @vector_multi_reduction_ordering			// CHECK-LABEL: func @vector_multi_reduction_ordering
	// CHECK-SAME: %[[INPUT:.+]]: vector<3x2x4xf32>			// CHECK-SAME: %[[INPUT:.+]]: vector<3x2x4xf32>
	// CHECK: %[[RESULT_VEC_0:.+]] = constant dense<{{.*}}> : vector<8xf32>			// CHECK: %[[RESULT_VEC_0:.+]] = constant dense<{{.*}}> : vector<8xf32>
	Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Let vector.multi_reduction reduce down to a scalar.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 378963

mlir/include/mlir/Dialect/Vector/VectorOps.h

mlir/include/mlir/Dialect/Vector/VectorOps.td

mlir/lib/Dialect/Vector/CMakeLists.txt

mlir/lib/Dialect/Vector/VectorMultiDimReductionTransforms.cpp

mlir/lib/Dialect/Vector/VectorOps.cpp

mlir/lib/Dialect/Vector/VectorTransforms.cpp

mlir/test/Dialect/Vector/canonicalize.mlir

mlir/test/Dialect/Vector/ops.mlir

mlir/test/Dialect/Vector/vector-multi-reduction-lowering.mlir

[mlir][Vector] Let vector.multi_reduction reduce down to a scalar.
ClosedPublic