This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/
-
mlir/
-
Dialect/
-
Linalg/
-
IR/
-
LinalgInterfaces.td
-
TransformOps/
2/2
LinalgTransformOps.td
-
Transforms/
-
Transforms.h
-
Vector/
-
IR/
-
VectorOps.td
-
Interfaces/
-
MaskableOpInterface.td
-
Transforms/
-
Passes.h
-
lib/Dialect/
-
Dialect/
-
Linalg/
-
TransformOps/
2/2
LinalgTransformOps.cpp
-
Transforms/
25/37
Vectorization.cpp
-
Vector/
-
IR/
-
CMakeLists.txt
-
VectorOps.cpp
-
Transforms/
-
LowerVectorMask.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
vectorization.mlir
-
utils/bazel/llvm-project-overlay/mlir/
-
bazel/
-
llvm-project-overlay/
-
mlir/
-
BUILD.bazel

Differential D137690

[mlir][Vector] Initial masking support in Linalg vectorizer
ClosedPublic

Authored by dcaballe on Nov 8 2022, 11:09 PM.

Download Raw Diff

Details

Reviewers

rriddle
aartbik
nicolasvasilache
hanchung

Commits

rG72fd36448d7c: [mlir][Vector] Initial masking support in Linalg vectorizer

Summary

This patch introduces the initial bits to support vector masking
using the vector.mask operation. Vectorization changes should be
NFC for non-masked cases. We can't test masked cases directly until
we extend the Transform dialect to support masking.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dcaballe created this revision.Nov 8 2022, 11:09 PM

Herald added a reviewer: rriddle. · View Herald TranscriptNov 8 2022, 11:09 PM

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 21 others. · View Herald Transcript

dcaballe requested review of this revision.Nov 8 2022, 11:09 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptNov 8 2022, 11:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 11:09 PM

Herald added subscribers: • pcwang-thead, limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

dcaballe added inline comments.Nov 8 2022, 11:12 PM

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1828–1830	Hey @nicolasvasilache, do you think you could help extending the Transform dialect so that we can provide the vector sizes for masked dims?

dcaballe planned changes to this revision.Nov 8 2022, 11:26 PM

Harbormaster completed remote builds in B196829: Diff 474162.Nov 8 2022, 11:39 PM

Fixing a couple of issues when no vector sizes for masked dimensions are not provided

Harbormaster completed remote builds in B197329: Diff 474887.Nov 11 2022, 4:44 PM

nicolasvasilache requested changes to this revision.Nov 13 2022, 6:39 PM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1828–1830	Happy to! Could you temporarily hardcode some size in there and add some test IR so I can see what I should expect? This will likely require a new transform op that is not a blanket "vectorize the world" so that we can pass the information you want at a finer granularity. This will likely need some iteration to get to a reasonably scalable usage. Left some other review comments in the meantime.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
73	super-nit: can we make the line spacing uniform between methods(here) and members(below)?
108	plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit.
108	Should this be a (better named) helper on the LinalgOp interface? This seems to reimplement common functionality (but maybe not exactly) and rely on deep internal Linalg assumptions (e.g. all possible ways of defining the extent of an iterator have to match and you can therefore take the first one). The name makes it hard to understand what it does and we should be doing any such manipulation in a very localized place in LinalgOp.
136	unsigned purge here and everywhere plz
153	Can we call these `staticUpperBounds` everywhere? And the other ones `dynamicUpperBounds` ? This seems easier to me to relate to what we're looking to do instead of `vecSizesForMaskedDims` and `extractDynamicVectorDimValues`.
157	How about early exit here ? if (!linalgOp.hasDynamicShape()) { canonicalVecShape = linalgOp.getStaticLoopRanges(); return success(); } I don't think you need the checks and debugs after that in the static case?
165	Seems fishy, what happens in this case ? I'd expect this to be not fail gracefully .. Make it an assert and lift logic to the precondition to avoid this? Edit: ah scratch that, I see that this is just after the precondition, can we make it part of the precondition?
170	Can we sprinkle a few precompute prefixes in some of these APIs to make it clear what happens at init time?
177	LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs() << ...));
247	pass RewriterBase here and everywhere possible post https://reviews.llvm.org/D137922 plz
272	Plz use `updateRootInPlace` once RewriterBase is piped through.
856	nit: can we spell this as: // 3.a. Convert the indexing map for this input/output to a transfer read... ... /// 3.a.i For input reads we use the canonical vector shape. if (linalgOp.isDpsInput(opOperand)) ... } else { /// 3.a.ii For output reads (iteration-carried dependence, e.g., reductions) ... // 3.b. If masked, set in-bounds to true. ... // 3.c. Not all ops support 0-d vectors,
1083	Can we make this init the state as part of the precondition?
mlir/lib/IR/AffineMap.cpp
340 ↗	(On Diff #474887)	`Fails when called on a non-projected-permutation.` is misleading here. It expects a projected permutation otherwise it crashes. Failing would have returned llvm::None without crashing unless I am missing something?
343 ↗	(On Diff #474887)	Better name and doc please, this is much too confusing. In fact I think you can just do something like `llvm::find(map.getResults(), AffineDimExpr::get(input))` at the client and avoid adding more APIs to AffineMap

This revision now requires changes to proceed.Nov 13 2022, 6:39 PM

Addressed feedback
New changes around vectorization initState and canonical vector shape computation

Herald added subscribers: hanchung, ThomasRaoux, jsetoain. · View Herald TranscriptNov 23 2022, 6:21 PM

dcaballe added inline comments.Nov 23 2022, 6:21 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
108	plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit. Sorry,`dim` is a misnomer as it refers to the dimension position, not the size. It has to be `unsigned` as that's what it's expected by the `AffineMap::getXXXYYYPosition()`. Added to `Pos` to a few names. Hopefully that's better. Moved it to LinalgOp interface. I can't think of a better name for the utility... It's just mapping an iteration space dimension to a dimension of an operand... Any other suggestion? I'm happy to replace with with any other existing utility but I couldn't find any.
136	As explained before, renamed to indicate that it's the position, not the size.
153	I had already renamed this locally to `inputVectorSizes` and change a bit the meaning. The input sizes are now taken into account to compute the canonical vector shape and if they are also provided for static shapes they should match the size of the static shapes. We are passing them all now to simplify the client API, including the transform dialect, as it's easier to provide all the vector sizes than having to filter out the static ones. Let me know if that works.
157	This is gone now. This code has changed a bit in the last version.
165	Let me know if it makes more sense in the new version, where the `inputVectorSizes`, if provided, should match the `linalgop.getNumLoops()`. Otherwise, this would be a bug.
170	Much better!
177	Ah! I didn't know this utility! It's been such a pain to always print SmallVector's... Thanks!
247	I think we are going in the opposite direction based on the review comments?
1083	Probably better to separate the concerns. I think even part of the precondition checks are reused outside of the vectorizer. A public interface was introduced recently.
mlir/lib/IR/AffineMap.cpp
343 ↗	(On Diff #474887)	I had renamed this like 10 times. It's a difficult name. Hopefully it's better now :).

Harbormaster completed remote builds in B199334: Diff 477667.Nov 23 2022, 7:16 PM

dcaballe planned changes to this revision.Nov 24 2022, 12:12 AM

nicolasvasilache added inline comments.Nov 29 2022, 6:54 AM

mlir/lib/IR/AffineMap.cpp
343 ↗	(On Diff #474887)	Wait .. what do I see just above .. literally the same functionality modulo an assert .. Can we just have a single Optional<int64_t> AffineMap::getResultPosition(AffineExpr e) const { for (int64_t i = 0, numResults = getNumResults(); i < numResults; i++) if (getResult(i) == e) return i; return llvm::None; } and let clients do the assertions they want ? It seems very counterproductive to have all these special case functions with slightly varying assertions and hard to grok names ..

Addressed feedback + minor fixes.
Please ignore the AffineMap utility. It will be removed after rebasing on top of D138946.

Herald added a reviewer: hanchung. · View Herald TranscriptNov 30 2022, 5:50 PM

Harbormaster completed remote builds in B200399: Diff 479129.Nov 30 2022, 5:51 PM

nicolasvasilache mentioned this in D137922: [mlir][Linalg] NFC - Purge OpBuilder uses in favor of RewriterBase in places unrelated to op definitions.Dec 1 2022, 3:01 AM

I do not understand the implication of

// TODO: We mask the transfer.transfer_write here because this op is
// special-cased. A linalg.yield may produced multiple vector.transfer_write
// ops and can't be mapped using BlockAndValueMapping.
AffineMap opOperandMap = linalgOp.getMatchingIndexingMap(opOperand);
write = state.maskOperation(b, write, linalgOp, opOperandMap);

.. also I am not seeing any test changes, so it seems you are adding a lot of code that is not tested and not activated ?

mlir/include/mlir/IR/AffineMap.h
178 ↗	(On Diff #479129)	I think this goes away with the rebase, just flagging for removal so we don't forget.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
68	Why not make this a ctor?
247	We cannot have code like this: Operation *maskOpTerminator = &maskOp.getMaskRegion().front().back(); for (auto &en : llvm::enumerate(opToMask->getResults())) en.value().replaceAllUsesExcept(maskOp.getResult(en.index()), maskOpTerminator); it must use a RewriterBase with `updateRootInPlace`
283	Hmm .. what's the contract between this `createRegionMask` lambda and the insertion points during `builder.create<vector::MaskOp>` ? I've seen too much ugly stuff re. insertion points leaking across function call boundaries. Let's add an OpBuilder::InsertionGuard at the top of this function.
299–300	this must use a RewriterBase with updateRootInPlace
544	nit: produce

This revision now requires changes to proceed.Dec 1 2022, 7:00 AM

Addressed feedback.

.. also I am not seeing any test changes, so it seems you are adding a lot of code that is not tested and not activated ?

Changes in the overall vectorization algorithm are tested with existing vectorization tests. This patch is NFC for those. Masking is not enabled if inputVectorSizes are not provided. If they are provided, only elementwise ops without reductions and fully dynamic shapes are vectorized.
I can't add unit tests until the new operation for masked vectorization is added to the transform dialect, as we no longer have the vectorizer testing pass. However, this PR has been extensively tested in IREE, both with and without masking for even more cases than the currently supported right now.
Waiting on the new transform dialect op to land to add more tests.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
68	Because it can fail, at least for now. This will change once we support more cases with masking. Then, we could assert and turn it into a constructor.
247	I can use `updateRootInPlace` when we have a rewriter here but I can't do any replacement method because `opToMask` is moved inside the mask region, not replaced.
283	Added guard. This is a simple lambda to create the op region. We follow the same approach for `scf.if` and other region ops. The only contract is that the region needs to have a `vector::YieldOp`, which is described in the `vector.mask` doc.
299–300	Added TODO until we have a rewriter.
544	I do not understand the implication of TODO: We mask the transfer.transfer_write here because this op is special-cased. A linalg.yield may produced multiple vector.transfer_write // ops and can't be mapped using BlockAndValueMapping. Good point. This is a comment for an old problem. I removed it and moved this code to `buildVectorWrite`.
mlir/lib/IR/AffineMap.cpp
343 ↗	(On Diff #474887)	Extracted this to https://reviews.llvm.org/D138946

Harbormaster completed remote builds in B200701: Diff 479525.Dec 2 2022, 1:40 AM

Added testing support to Transform dialect + tests

Harbormaster completed remote builds in B201270: Diff 480308.Dec 5 2022, 6:23 PM

Rebase + remove dead code (wrong rebase)

Harbormaster completed remote builds in B201299: Diff 480353.Dec 6 2022, 4:31 AM

Waiting on the new transform dialect op to land to add more tests.

Thanks for integrating it and adding tests, the testing part LGTM, till need to make another pass on the last version of the code.

Thanks @dcaballe !

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
1131	"definite failure"
1149	can you add a `TODO: applyToOne` plz ?
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
222	Use the `find` API and the iterator you get from it to avoid multi-lookups
263	nit: uppermension :) ?
491	can you add a TODO to tighten op semantics so that we don't mix inbounds and mask since this is well defined?
872	I'll need to revisit all this in light of the broadcast separation. From a cursory glance this looks reasonable, let's land and iterate.
1017	ok as a first appox.

This revision is now accepted and ready to land.Dec 6 2022, 10:07 AM

gflegar added a subscriber: gflegar.Dec 9 2022, 12:33 AM

Thanks! I addressed comments. Landing now...

Closed by commit rG72fd36448d7c: [mlir][Vector] Initial masking support in Linalg vectorizer (authored by dcaballe). · Explain WhyDec 12 2022, 5:36 PM

This revision was automatically updated to reflect the committed changes.

dcaballe added a commit: rG72fd36448d7c: [mlir][Vector] Initial masking support in Linalg vectorizer.

@dcaballe, I was looking at this PR as I was doing some spelunking and I realize we are not testing the case of the SSA value as well as the error case when the SSA value is not a constant.

Can we please add the missing tests in a a followup ?

Herald added subscribers: wangpc, bviyer, awarzynski. · View Herald TranscriptAug 10 2023, 5:15 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

IR/

LinalgInterfaces.td

30 lines

TransformOps/

LinalgTransformOps.td

41 lines

Transforms/

Transforms.h

16 lines

Vector/

IR/

VectorOps.td

10 lines

Interfaces/

MaskableOpInterface.td

18 lines

Transforms/

Passes.h

6 lines

lib/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.cpp

83 lines

Transforms/

Vectorization.cpp

551 lines

Vector/

IR/

CMakeLists.txt

2 lines

VectorOps.cpp

32 lines

Transforms/

LowerVectorMask.cpp

18 lines

test/

Dialect/

Linalg/

vectorization.mlir

148 lines

utils/

bazel/

llvm-project-overlay/

mlir/

BUILD.bazel

1 line

Diff 482323

mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td

Show First 20 Lines • Show All 585 Lines • ▼ Show 20 Lines	InterfaceMethod<
result.reserve($_op->getNumOperands());		result.reserve($_op->getNumOperands());
llvm::transform(		llvm::transform(
this->getOperation()->getOpOperands(),		this->getOperation()->getOpOperands(),
std::back_inserter(result),		std::back_inserter(result),
[](OpOperand &opOperand) { return &opOperand; });		[](OpOperand &opOperand) { return &opOperand; });
return result;		return result;
}]		}]
>,		>,
		InterfaceMethod<
		/desc=/[{
		Given a dimension of the iteration space of a Linalg operation, finds an
		operand in the operation that is defined on such dimension. Returns
		whether such operand was found or not. If found, also returns the
		operand value and the dimension position within the operand.
		}],
		/retTy=/"LogicalResult",
		/methodName=/"mapIterationSpaceDimToOperandDim",
		/args=/(ins "unsigned":$dimPos,
		"::mlir::Value &":$operand,
		"unsigned &":$operandDimPos),
		/methodBody=/"",
		/defaultImplementation=/[{
		// Retrieve the operand and its dimension position from the first
		// operand with a permutation map that is defined on such dimension.
		for (auto [i, idxMap] : llvm::enumerate($_op.getIndexingMapsArray())) {
		if (idxMap.isProjectedPermutation()) {
		if (auto mayOperandDim = idxMap.getResultPosition(
		getAffineDimExpr(dimPos, idxMap.getContext()))) {
		operand = $_op->getOperand(i);
		operandDimPos = *mayOperandDim;
		return success();
		}
		}
		}

		return failure();
		}]
		>,
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
// Linalg generalization hooks.		// Linalg generalization hooks.
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
InterfaceMethod<		InterfaceMethod<
/desc=/[{		/desc=/[{
Hook to provide a custom AffineMap used to compute all the operand		Hook to provide a custom AffineMap used to compute all the operand
subshapes given loop bounds. This is used to answer the question: "given		subshapes given loop bounds. This is used to answer the question: "given
an iteration space over the codomain, what are the subshapes of the		an iteration space over the codomain, what are the subshapes of the
▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

Show First 20 Lines • Show All 1,109 Lines • ▼ Show 20 Lines	def VectorizeOp : Op<Transform_Dialect, "structured.vectorize",
let extraClassDeclaration = [{		let extraClassDeclaration = [{
::mlir::DiagnosedSilenceableFailure applyToOne(		::mlir::DiagnosedSilenceableFailure applyToOne(
::mlir::Operation *target,		::mlir::Operation *target,
::llvm::SmallVectorImpl<::mlir::Operation *> &results,		::llvm::SmallVectorImpl<::mlir::Operation *> &results,
::mlir::transform::TransformState &state);		::mlir::transform::TransformState &state);
}];		}];
}		}

		def MaskedVectorizeOp : Op<Transform_Dialect, "structured.masked_vectorize",
		[DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
		TransformOpInterface]> {
		let description = [{
		Vectorize the target ops, which must be Linalg ops, with masked vectors
		of the specified size.

		The vector sizes can be either static or dynamic (SSA values). In case of
		SSA values, the handle must be mapped to exactly one payload op with
		exactly one index-typed result.

		#### Return modes:

		This operation produces a definite failure if the dynamic vector sizes (SSA
		nicolasvasilacheUnsubmitted Done Reply Inline Actions "definite failure" nicolasvasilache: "definite failure"
		values) do not satify the constraints mentioned above. It produces a
		silenceable failure if at least one target op is not a Linalg op or fails to
		vectorize.
		}];

		let arguments = (ins PDL_Operation:$target,
		Variadic<PDL_Operation>:$vector_sizes,
		DefaultValuedOptionalAttr<DenseI64ArrayAttr, "{}">:
		$static_vector_sizes);
		let results = (outs);
		let assemblyFormat = [{
		$target
		`vector_sizes` custom<DynamicIndexList>($vector_sizes,
		$static_vector_sizes)
		attr-dict
		}];

		let extraClassDeclaration = [{
		nicolasvasilacheUnsubmitted Done Reply Inline Actions can you add a `TODO: applyToOne` plz ? nicolasvasilache: can you add a `TODO: applyToOne` plz ?
		// TODO: applyToOne.
		::mlir::DiagnosedSilenceableFailure apply(
		::mlir::transform::TransformResults &transformResults,
		::mlir::transform::TransformState &state);

		::llvm::SmallVector<::mlir::OpFoldResult> getMixedVectorSizes();
		}];
		}

#endif // LINALG_TRANSFORM_OPS		#endif // LINALG_TRANSFORM_OPS

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

	Show First 20 Lines • Show All 338 Lines • ▼ Show 20 Lines
	/// 2. Take a full view on the buffer.			/// 2. Take a full view on the buffer.
	/// 3. Take a partial slice of the full view in step 2. and copy into it.			/// 3. Take a partial slice of the full view in step 2. and copy into it.
	///			///
	/// Return the modified linalg op (the modification happens in place) as well			/// Return the modified linalg op (the modification happens in place) as well
	/// as all the copy ops created.			/// as all the copy ops created.
	FailureOr<LinalgOp> promoteSubViews(OpBuilder &b, LinalgOp op,			FailureOr<LinalgOp> promoteSubViews(OpBuilder &b, LinalgOp op,
	const LinalgPromotionOptions &options);			const LinalgPromotionOptions &options);

	/// Emit a suitable vector form for a Linalg op with fully static shape.			/// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`
	LogicalResult vectorize(RewriterBase &builder, LinalgOp linalgOp,			/// are used to vectorize this operation. `inputVectorSizes` must match the rank
				/// of the iteration space of the operation and the sizes must be smaller or
				/// equal than their counterpart interation space sizes, if static.
				/// `inputVectorShapes` also allows the vectorization of operations with dynamic
				/// shapes.
				LogicalResult vectorize(RewriterBase &rewriter, LinalgOp linalgOp,
				ArrayRef<int64_t> inputVectorSizes = {},
	bool vectorizeNDExtract = false);			bool vectorizeNDExtract = false);

	/// Emit a suitable vector form for a Copy op with fully static shape.			/// Emit a suitable vector form for a Copy op with fully static shape.
	LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);			LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);

	/// Emit a loop nest of `scf.for` with the proper body for `linalgOp`.			/// Emit a loop nest of `scf.for` with the proper body for `linalgOp`.
	FailureOr<LinalgLoops> linalgOpToLoops(PatternRewriter &rewriter,			FailureOr<LinalgLoops> linalgOpToLoops(PatternRewriter &rewriter,
	LinalgOp linalgOp);			LinalgOp linalgOp);
	Show All 10 Lines
	// Preconditions that ensure the corresponding transformation succeeds and can			// Preconditions that ensure the corresponding transformation succeeds and can
	// be applied as a rewrite pattern.			// be applied as a rewrite pattern.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// Promote memref.subviews feeding linalg-on-buffers operations.			/// Promote memref.subviews feeding linalg-on-buffers operations.
	LogicalResult promoteSubviewsPrecondition(Operation *op,			LogicalResult promoteSubviewsPrecondition(Operation *op,
	LinalgPromotionOptions options);			LinalgPromotionOptions options);

	/// Return success if the operation can be vectorized.			/// Return success if the operation can be vectorized.
	LogicalResult vectorizeLinalgOpPrecondition(LinalgOp linalgOp,			LogicalResult
				vectorizeLinalgOpPrecondition(LinalgOp linalgOp,
				ArrayRef<int64_t> inputVectorSizes = {},
	bool vectorizeNDExtract = false);			bool vectorizeNDExtract = false);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Transformations exposed as rewrite patterns.			// Transformations exposed as rewrite patterns.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	using TileSizeComputationFunction =			using TileSizeComputationFunction =
	std::function<SmallVector<Value, 4>(OpBuilder &, Operation *)>;			std::function<SmallVector<Value, 4>(OpBuilder &, Operation *)>;

	▲ Show 20 Lines • Show All 710 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

Show First 20 Lines • Show All 444 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
VectorType getVectorType() {		VectorType getVectorType() {
return getVector().getType().cast<VectorType>();		return getVector().getType().cast<VectorType>();
}		}

/// Return the dimensions of the result vector that were formerly ones in the		/// Return the dimensions of the result vector that were formerly ones in the
/// source tensor and thus correspond to "dim-1" broadcasting.		/// source tensor and thus correspond to "dim-1" broadcasting.
llvm::SetVector<int64_t> computeBroadcastedUnitDims();		llvm::SetVector<int64_t> computeBroadcastedUnitDims();

/// Broadcast `value` to a vector of `dstShape`, knowing that exactly the		/// Broadcast `value` to a vector of `dstShape`, knowing that exactly the
/// `broadcastedDims` dimensions in the dstShape are broadcasted.		/// `broadcastedDims` dimensions in the dstShape are broadcasted.
/// This requires (and asserts) that the broadcast is free of dim-1		/// This requires (and asserts) that the broadcast is free of dim-1
/// broadcasting.		/// broadcasting.
/// Since vector.broadcast only allows expanding leading dimensions, an extra		/// Since vector.broadcast only allows expanding leading dimensions, an extra
/// vector.transpose may be inserted to make the broadcast possible.		/// vector.transpose may be inserted to make the broadcast possible.
/// `value`, `dstShape` and `broadcastedDims` must be properly specified or		/// `value`, `dstShape` and `broadcastedDims` must be properly specified or
/// the helper will assert. This means:		/// the helper will assert. This means:
/// 1. `dstShape` must not be empty.		/// 1. `dstShape` must not be empty.
/// 2. `broadcastedDims` must be confined to [0 .. rank(value.getVectorType)]		/// 2. `broadcastedDims` must be confined to [0 .. rank(value.getVectorType)]
/// 2. `dstShape` trimmed of the dimensions specified in `broadcastedDims`		/// 2. `dstShape` trimmed of the dimensions specified in `broadcastedDims`
// must match the `value` shape.		// must match the `value` shape.
static Value createOrFoldBroadcastOp(		static Value createOrFoldBroadcastOp(
OpBuilder &b, Value value,		OpBuilder &b, Value value,
ArrayRef<int64_t> dstShape,		ArrayRef<int64_t> dstShape,
▲ Show 20 Lines • Show All 706 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
}		}
}];		}];
let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasFolder = 1;		let hasFolder = 1;
let hasVerifier = 1;		let hasVerifier = 1;
let assemblyFormat = "$vector attr-dict `:` type($vector) `to` type(results)";		let assemblyFormat = "$vector attr-dict `:` type($vector) `to` type(results)";
}		}

		// TODO: Tighten semantics so that masks and inbounds can't be used
		// simultaneously within the same transfer op.
def Vector_TransferReadOp :		def Vector_TransferReadOp :
Vector_Op<"transfer_read", [		Vector_Op<"transfer_read", [
DeclareOpInterfaceMethods<VectorTransferOpInterface>,		DeclareOpInterfaceMethods<VectorTransferOpInterface>,
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,		DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
DeclareOpInterfaceMethods<MaskableOpInterface>,		DeclareOpInterfaceMethods<MaskableOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,		DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
AttrSizedOperandSegments		AttrSizedOperandSegments
]>,		]>,
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	def Vector_TransferReadOp :
}];		}];

let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasCustomAssemblyFormat = 1;		let hasCustomAssemblyFormat = 1;
let hasFolder = 1;		let hasFolder = 1;
let hasVerifier = 1;		let hasVerifier = 1;
}		}

		// TODO: Tighten semantics so that masks and inbounds can't be used
		// simultaneously within the same transfer op.
def Vector_TransferWriteOp :		def Vector_TransferWriteOp :
Vector_Op<"transfer_write", [		Vector_Op<"transfer_write", [
DeclareOpInterfaceMethods<VectorTransferOpInterface>,		DeclareOpInterfaceMethods<VectorTransferOpInterface>,
DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,		DeclareOpInterfaceMethods<VectorUnrollOpInterface, ["getShapeForUnroll"]>,
DeclareOpInterfaceMethods<MaskableOpInterface>,		DeclareOpInterfaceMethods<MaskableOpInterface>,
DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,		DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
AttrSizedOperandSegments,		AttrSizedOperandSegments,
DestinationStyleOpInterface		DestinationStyleOpInterface
▲ Show 20 Lines • Show All 1,429 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Vector/Interfaces/MaskableOpInterface.td

Show All 25 Lines	let methods = [
InterfaceMethod<		InterfaceMethod<
/desc=/"Returns true if the operation is masked by a "		/desc=/"Returns true if the operation is masked by a "
"MaskingOpInterface.",		"MaskingOpInterface.",
/retTy=/"bool",		/retTy=/"bool",
/methodName=/"isMasked",		/methodName=/"isMasked",
/args=/(ins),		/args=/(ins),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
return mlir::isa<mlir::vector::MaskingOpInterface>($_op->getParentOp());		mlir::Operation *parentOp = $_op->getParentOp();
		return parentOp &&
		mlir::isa<mlir::vector::MaskingOpInterface>(parentOp);
}]>,		}]>,
InterfaceMethod<		InterfaceMethod<
/desc=/"Returns the MaskingOpInterface masking this operation.",		/desc=/"Returns the MaskingOpInterface masking this operation.",
/retTy=/"mlir::vector::MaskingOpInterface",		/retTy=/"mlir::vector::MaskingOpInterface",
/methodName=/"getMaskingOp",		/methodName=/"getMaskingOp",
/args=/(ins),		/args=/(ins),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
return mlir::cast<mlir::vector::MaskingOpInterface>(		return mlir::cast<mlir::vector::MaskingOpInterface>(
$_op->getParentOp());		$_op->getParentOp());
}]>,		}]>,
InterfaceMethod<		InterfaceMethod<
/desc=/"Returns true if the operation can have a passthru argument when"		/desc=/"Returns true if the operation can have a passthru argument when"
" masked.",		" masked.",
/retTy=/"bool",		/retTy=/"bool",
/methodName=/"supportsPassthru",		/methodName=/"supportsPassthru",
/args=/(ins),		/args=/(ins),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
return false;		return false;
}]>,		}]>,
InterfaceMethod<		InterfaceMethod<
/desc=/"Returns the mask type expected by this operation. It requires "		/desc=/"Returns the mask type expected by this operation. Mostly used"
"the operation to be vectorized.",		" for verification purposes. It requires the operation to be "
/retTy=/"mlir::VectorType",		"vectorized.",
		/retTy=/"mlir::Type",
/methodName=/"getExpectedMaskType",		/methodName=/"getExpectedMaskType",
/args=/(ins),		/args=/(ins),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/"">,
// Default implementation is only aimed for operations that implement the
// `getVectorType()` method.
return $_op.getVectorType().cloneWith(/shape=/std::nullopt,
IntegerType::get($_op.getContext(), /width=/1));
}]>,
];		];
}		}

#endif // MLIR_DIALECT_VECTOR_INTERFACES_MASKABLEOPINTERFACE_TD		#endif // MLIR_DIALECT_VECTOR_INTERFACES_MASKABLEOPINTERFACE_TD

mlir/include/mlir/Dialect/Vector/Transforms/Passes.h

	Show All 16 Lines
	#include "mlir/Dialect/Vector/Transforms/Passes.h.inc"			#include "mlir/Dialect/Vector/Transforms/Passes.h.inc"

	/// Creates an instance of the `vector` dialect bufferization pass.			/// Creates an instance of the `vector` dialect bufferization pass.
	std::unique_ptr<Pass> createVectorBufferizePass();			std::unique_ptr<Pass> createVectorBufferizePass();

	/// Creates an instance of the `vector.mask` lowering pass.			/// Creates an instance of the `vector.mask` lowering pass.
	std::unique_ptr<Pass> createLowerVectorMaskPass();			std::unique_ptr<Pass> createLowerVectorMaskPass();

				/// Populates instances of `MaskOpRewritePattern` to lower masked operations
				/// with `vector.mask`. Patterns should rewrite the `vector.mask` operation and
				/// not its nested `MaskableOpInterface`.
				void populateVectorMaskLoweringPatternsForSideEffectingOps(
				RewritePatternSet &patterns);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/Vector/Transforms/Passes.h.inc"			#include "mlir/Dialect/Vector/Transforms/Passes.h.inc"
	} // namespace vector			} // namespace vector
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_VECTOR_TRANSFORMS_PASSES_H_			#endif // MLIR_DIALECT_VECTOR_TRANSFORMS_PASSES_H_

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

Show All 15 Lines
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/PDL/IR/PDL.h"		#include "mlir/Dialect/PDL/IR/PDL.h"
#include "mlir/Dialect/PDL/IR/PDLTypes.h"		#include "mlir/Dialect/PDL/IR/PDLTypes.h"
#include "mlir/Dialect/SCF/Transforms/TileUsingInterface.h"		#include "mlir/Dialect/SCF/Transforms/TileUsingInterface.h"
#include "mlir/Dialect/Transform/IR/TransformDialect.h"		#include "mlir/Dialect/Transform/IR/TransformDialect.h"
#include "mlir/Dialect/Transform/IR/TransformInterfaces.h"		#include "mlir/Dialect/Transform/IR/TransformInterfaces.h"
#include "mlir/Dialect/Transform/IR/TransformUtils.h"		#include "mlir/Dialect/Transform/IR/TransformUtils.h"
#include "mlir/IR/BuiltinTypes.h"		#include "mlir/IR/BuiltinTypes.h"
		#include "mlir/IR/Matchers.h"
#include "mlir/IR/OpDefinition.h"		#include "mlir/IR/OpDefinition.h"
#include "mlir/Interfaces/TilingInterface.h"		#include "mlir/Interfaces/TilingInterface.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "llvm/ADT/StringSet.h"		#include "llvm/ADT/StringSet.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::linalg;		using namespace mlir::linalg;
▲ Show 20 Lines • Show All 1,787 Lines • ▼ Show 20 Lines	struct VectorizationPattern : public RewritePattern {
explicit VectorizationPattern(MLIRContext *context,		explicit VectorizationPattern(MLIRContext *context,
bool vectorizeExtract = false)		bool vectorizeExtract = false)
: RewritePattern(MatchAnyOpTypeTag(), /benefit=/1, context),		: RewritePattern(MatchAnyOpTypeTag(), /benefit=/1, context),
vectorizeNDExtract(vectorizeExtract) {}		vectorizeNDExtract(vectorizeExtract) {}
LogicalResult matchAndRewrite(Operation *op,		LogicalResult matchAndRewrite(Operation *op,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
LinalgOp linalgOp = dyn_cast<LinalgOp>(op);		LinalgOp linalgOp = dyn_cast<LinalgOp>(op);
if (!linalgOp)		if (!linalgOp)
return rewriter.notifyMatchFailure(op, "expected Linalg Op");		return rewriter.notifyMatchFailure(op, "expected Linalg Op");
return vectorize(rewriter, linalgOp, vectorizeNDExtract);		return vectorize(rewriter, linalgOp, /inputVectorSizes=/{},
		vectorizeNDExtract);
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Hey @nicolasvasilache, do you think you could help extending the Transform dialect so that we can provide the vector sizes for masked dims? dcaballe: Hey @nicolasvasilache, do you think you could help extending the Transform dialect so that we…
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Happy to! Could you temporarily hardcode some size in there and add some test IR so I can see what I should expect? This will likely require a new transform op that is not a blanket "vectorize the world" so that we can pass the information you want at a finer granularity. This will likely need some iteration to get to a reasonably scalable usage. Left some other review comments in the meantime. nicolasvasilache: Happy to! Could you temporarily hardcode some size in there and add some test IR so I can see…
}		}

private:		private:
/// Controls whether to vectorize `tensor.extract` when the input tensor is		/// Controls whether to vectorize `tensor.extract` when the input tensor is
/// rank >= 2.		/// rank >= 2.
bool vectorizeNDExtract = false;		bool vectorizeNDExtract = false;
};		};
} // namespace		} // namespace
Show All 32 Lines	transform::VectorizeOp::applyToOne(Operation *target,
if (failed(applyPatternsAndFoldGreedily(target, std::move(patterns))))		if (failed(applyPatternsAndFoldGreedily(target, std::move(patterns))))
return emitDefaultDefiniteFailure(target);		return emitDefaultDefiniteFailure(target);

results.push_back(target);		results.push_back(target);
return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// MaskedVectorizeOp
		//===----------------------------------------------------------------------===//

		DiagnosedSilenceableFailure transform::MaskedVectorizeOp::apply(
		mlir::transform::TransformResults &transformResults,
		mlir::transform::TransformState &state) {
		IRRewriter rewriter(getContext());
		ArrayRef<Operation *> targets = state.getPayloadOps(getTarget());
		if (targets.empty())
		return DiagnosedSilenceableFailure::success();

		SmallVector<int64_t> vectorSizes;
		for (OpFoldResult sz : getMixedVectorSizes()) {
		if (sz.is<Attribute>()) {
		auto attr = sz.get<Attribute>();
		vectorSizes.push_back(attr.cast<IntegerAttr>().getInt());
		continue;
		}

		ArrayRef<Operation *> szPayloads = state.getPayloadOps(sz.get<Value>());
		if (szPayloads.size() != 1) {
		auto diag = this->emitOpError(
		"requires vector size handle that is mapped to 1 payload op");
		diag.attachNote(sz.get<Value>().getLoc())
		<< "mapped to " << szPayloads.size() << " payload ops";
		return DiagnosedSilenceableFailure::definiteFailure();
		}

		Operation *szPayloadOp = szPayloads[0];
		if (szPayloadOp->getNumResults() != 1 \|\|
		!szPayloadOp->getResult(0).getType().isIndex()) {
		auto diag = this->emitOpError(
		"requires vector size payload op with 1 index result");
		diag.attachNote(szPayloadOp->getLoc()) << "vector size payload op";
		return DiagnosedSilenceableFailure::definiteFailure();
		}

		IntegerAttr attr;
		if (!matchPattern(szPayloadOp->getResult(0), m_Constant(&attr))) {
		auto diag = this->emitOpError("requires constant vector size");
		diag.attachNote(szPayloadOp->getLoc()) << "vector size payload op";
		return DiagnosedSilenceableFailure::definiteFailure();
		}

		vectorSizes.push_back(attr.getInt());
		}

		// TODO: Check that the correct number of vectorSizes was provided.

		for (Operation *target : targets) {
		auto linalgOp = dyn_cast<LinalgOp>(target);
		if (!linalgOp) {
		Diagnostic diag(target->getLoc(), DiagnosticSeverity::Error);
		diag << "cannot vectorize non-Linalg op";
		return DiagnosedSilenceableFailure::silenceableFailure(std::move(diag));
		}

		if (failed(linalg::vectorize(rewriter, linalgOp, vectorSizes))) {
		Diagnostic diag(target->getLoc(), DiagnosticSeverity::Error);
		diag << "failed to vectorize op";
		return DiagnosedSilenceableFailure::silenceableFailure(std::move(diag));
		}
		}

		return DiagnosedSilenceableFailure::success();
		}

		void transform::MaskedVectorizeOp::getEffects(
		SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
		consumesHandle(getTarget(), effects);
		onlyReadsHandle(getVectorSizes(), effects);
		}

		SmallVector<OpFoldResult> MaskedVectorizeOp::getMixedVectorSizes() {
		OpBuilder b(getContext());
		return getMixedValues(getStaticVectorSizes(), getVectorSizes(), b);
		}

		//===----------------------------------------------------------------------===//
// Transform op registration		// Transform op registration
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// Registers new ops and declares PDL as dependent dialect since the		/// Registers new ops and declares PDL as dependent dialect since the
/// additional ops are using PDL types for operands and results.		/// additional ops are using PDL types for operands and results.
class LinalgTransformDialectExtension		class LinalgTransformDialectExtension
: public transform::TransformDialectExtension<		: public transform::TransformDialectExtension<
Show All 30 Lines

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

//===- Vectorization.cpp - Implementation of linalg Vectorization ---------===//		//===- Vectorization.cpp - Implementation of linalg Vectorization ---------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the linalg dialect Vectorization transformations.		// This file implements the linalg dialect Vectorization transformations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Analysis/SliceAnalysis.h"		#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Dialect/Affine/Analysis/LoopAnalysis.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"		#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Utils/StructuredOpsUtils.h"		#include "mlir/Dialect/Utils/StructuredOpsUtils.h"
#include "mlir/Dialect/Vector/IR/VectorOps.h"		#include "mlir/Dialect/Vector/IR/VectorOps.h"
#include "mlir/Dialect/Vector/Transforms/VectorTransforms.h"		#include "mlir/Dialect/Vector/Interfaces/MaskableOpInterface.h"
#include "mlir/IR/AffineExpr.h"		#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/Matchers.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/Pass/Pass.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/RegionUtils.h"		#include "mlir/Transforms/RegionUtils.h"
#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/Sequence.h"		#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/TypeSwitch.h"		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <type_traits>		#include <type_traits>

using namespace mlir;		using namespace mlir;
Show All 19 Lines	if (res) {
return WalkResult::interrupt();		return WalkResult::interrupt();
}		}
res = op;		res = op;
return WalkResult::advance();		return WalkResult::advance();
});		});
return res;		return res;
}		}

		/// Contains the vectorization state and related methods used across the
		/// vectorization process of a given operation.
		struct VectorizationState {
		VectorizationState(RewriterBase &rewriter) : rewriterGuard(rewriter) {}

		/// Initializes the vectorization state, including the computation of the
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Why not make this a ctor? nicolasvasilache: Why not make this a ctor?
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Because it can fail, at least for now. This will change once we support more cases with masking. Then, we could assert and turn it into a constructor. dcaballe: Because it can fail, at least for now. This will change once we support more cases with masking.
		/// canonical vector shape for vectorization.
		LogicalResult initState(RewriterBase &rewriter, LinalgOp linalgOp,
		ArrayRef<int64_t> inputVectorSizes);

		/// Returns the canonical vector shape used to vectorize the iteration space.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions super-nit: can we make the line spacing uniform between methods(here) and members(below)? nicolasvasilache: super-nit: can we make the line spacing uniform between methods(here) and members(below)?
		ArrayRef<int64_t> getCanonicalVecShape() const { return canonicalVecShape; }

		/// Masks an operation with the canonical vector mask if the operation needs
		/// masking. Returns the masked operation or the original operation if masking
		/// is not needed. If provided, the canonical mask for this operation is
		/// permuted using `maybeMaskingMap`.
		Operation maskOperation(RewriterBase &rewriter, Operation opToMask,
		LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskingMap = std::nullopt);

		private:
		/// Initializes the iteration space static sizes using the Linalg op
		/// information. This may become more complicated in the future.
		void initIterSpaceStaticSizes(LinalgOp linalgOp) {
		iterSpaceStaticSizes.append(linalgOp.getStaticLoopRanges());
		}

		/// Generates 'tensor.dim' operations for all the dynamic dimensions of the
		/// iteration space to be vectorized and store them in
		/// `iterSpaceDynamicSizes`.
		LogicalResult precomputeIterSpaceDynamicSizes(RewriterBase &rewriter,
		LinalgOp linalgOp);

		/// Create or retrieve an existing mask value to mask `opToMask` in the
		/// canonical vector iteration space. If `maybeMaskingMap` the mask is
		/// permuted using that permutation map. If a new mask is created, it will be
		/// cached for future users.
		Value getOrCreateMaskFor(RewriterBase &rewriter, Operation *opToMask,
		LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskingMap);

		// Holds the compile-time static sizes of the iteration space to vectorize.
		// Dynamic dimensions are represented using ShapedType::kDynamicSize.
		SmallVector<int64_t> iterSpaceStaticSizes;

		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit. nicolasvasilache: plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit. Sorry,`dim` is a misnomer as it refers to the dimension position, not the size. It has to be `unsigned` as that's what it's expected by the `AffineMap::getXXXYYYPosition()`. Added to `Pos` to a few names. Hopefully that's better. Moved it to LinalgOp interface. I can't think of a better name for the utility... It's just mapping an iteration space dimension to a dimension of an operand... Any other suggestion? I'm happy to replace with with any other existing utility but I couldn't find any. dcaballe: > plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Should this be a (better named) helper on the LinalgOp interface? This seems to reimplement common functionality (but maybe not exactly) and rely on deep internal Linalg assumptions (e.g. all possible ways of defining the extent of an iterator have to match and you can therefore take the first one). The name makes it hard to understand what it does and we should be doing any such manipulation in a very localized place in LinalgOp. nicolasvasilache: Should this be a (better named) helper on the LinalgOp interface? This seems to reimplement…
		/// Holds the runtime sizes of the iteration spaces to vectorize. Static
		/// dimensions are represented with a empty value.
		SmallVector<Value> iterSpaceDynamicSizes;

		/// Holds the canonical vector shape used to vectorize the iteration space.
		SmallVector<int64_t> canonicalVecShape;

		/// Holds the active masks for permutations of the canonical vector iteration
		/// space.
		DenseMap<AffineMap, Value> activeMaskCache;

		/// Global vectorization guard for the incoming rewriter. It's initialized
		/// when the vectorization state is initialized.
		OpBuilder::InsertionGuard rewriterGuard;
		};

		/// Generates 'tensor.dim' operations for all the dynamic dimensions of the
		/// iteration space to be vectorized and store them in
		/// `iterSpaceDynamicSizes`.
		LogicalResult
		VectorizationState::precomputeIterSpaceDynamicSizes(RewriterBase &rewriter,
		LinalgOp linalgOp) {
		// TODO: Support 0-d vectors.
		for (int vecDim = 0, end = canonicalVecShape.size(); vecDim < end; ++vecDim) {
		if (!ShapedType::isDynamic(iterSpaceStaticSizes[vecDim])) {
		// Add a empty value for static dimensions.
		iterSpaceDynamicSizes.push_back(Value());
		continue;
		nicolasvasilacheUnsubmitted Done Reply Inline Actions unsigned purge here and everywhere plz nicolasvasilache: unsigned purge here and everywhere plz
		dcaballeAuthorUnsubmitted Done Reply Inline Actions As explained before, renamed to indicate that it's the position, not the size. dcaballe: As explained before, renamed to indicate that it's the position, not the size.
		}

		// Find an operand defined on this dimension of the iteration space to
		// extract the runtime dimension size.
		Value operand;
		unsigned operandDimPos;
		if (failed(linalgOp.mapIterationSpaceDimToOperandDim(vecDim, operand,
		operandDimPos)))
		return failure();

		Value dynamicDim = linalgOp.hasTensorSemantics()
		? (Value)rewriter.create<tensor::DimOp>(
		linalgOp.getLoc(), operand, operandDimPos)
		: (Value)rewriter.create<memref::DimOp>(
		linalgOp.getLoc(), operand, operandDimPos);
		iterSpaceDynamicSizes.push_back(dynamicDim);
		}
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can we call these `staticUpperBounds` everywhere? And the other ones `dynamicUpperBounds` ? This seems easier to me to relate to what we're looking to do instead of `vecSizesForMaskedDims` and `extractDynamicVectorDimValues`. nicolasvasilache: Can we call these `staticUpperBounds` everywhere? And the other ones `dynamicUpperBounds` ?
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I had already renamed this locally to `inputVectorSizes` and change a bit the meaning. The input sizes are now taken into account to compute the canonical vector shape and if they are also provided for static shapes they should match the size of the static shapes. We are passing them all now to simplify the client API, including the transform dialect, as it's easier to provide all the vector sizes than having to filter out the static ones. Let me know if that works. dcaballe: I had already renamed this locally to `inputVectorSizes` and change a bit the meaning. The…

		return success();
		}

		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions How about early exit here ? if (!linalgOp.hasDynamicShape()) { canonicalVecShape = linalgOp.getStaticLoopRanges(); return success(); } I don't think you need the checks and debugs after that in the static case? nicolasvasilache: How about early exit here ? ``` if (!linalgOp.hasDynamicShape()) { canonicalVecShape =…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions This is gone now. This code has changed a bit in the last version. dcaballe: This is gone now. This code has changed a bit in the last version.
		/// Initializes the vectorization state, including the computation of the
		/// canonical vector shape for vectorization.
		// TODO: Move this to the constructor when we can remove the failure cases.
		LogicalResult
		VectorizationState::initState(RewriterBase &rewriter, LinalgOp linalgOp,
		ArrayRef<int64_t> inputVectorSizes) {
		// Initialize the insertion point.
		rewriter.setInsertionPoint(linalgOp);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Seems fishy, what happens in this case ? I'd expect this to be not fail gracefully .. Make it an assert and lift logic to the precondition to avoid this? Edit: ah scratch that, I see that this is just after the precondition, can we make it part of the precondition? nicolasvasilache: Seems fishy, what happens in this case ? I'd expect this to be not fail gracefully .. Make it…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Let me know if it makes more sense in the new version, where the `inputVectorSizes`, if provided, should match the `linalgop.getNumLoops()`. Otherwise, this would be a bug. dcaballe: Let me know if it makes more sense in the new version, where the `inputVectorSizes`, if…

		if (!inputVectorSizes.empty()) {
		// Get the canonical vector shape from the input vector sizes provided. This
		// path should be taken to vectorize code with dynamic shapes and when using
		// vector sizes greater than the iteration space sizes.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Can we sprinkle a few precompute prefixes in some of these APIs to make it clear what happens at init time? nicolasvasilache: Can we sprinkle a few precompute prefixes in some of these APIs to make it clear what happens…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Much better! dcaballe: Much better!
		canonicalVecShape.append(inputVectorSizes.begin(), inputVectorSizes.end());
		} else {
		// Compute the canonical vector shape from the operation shape. If there are
		// dynamic shapes, the operation won't be vectorized.
		canonicalVecShape = linalgOp.getStaticLoopRanges();
		}

		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs() << ...)); nicolasvasilache: ``` LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs() << ...)); ```
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Ah! I didn't know this utility! It's been such a pain to always print SmallVector's... Thanks! dcaballe: Ah! I didn't know this utility! It's been such a pain to always print SmallVector's... Thanks!
		LDBG("Canonical vector shape: ");
		LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs()));
		LLVM_DEBUG(llvm::dbgs() << "\n");

		// Initialize iteration space static sizes.
		initIterSpaceStaticSizes(linalgOp);

		// Extract and register the runtime value of any potential dynamic shape
		// needed to compute a mask during vectorization.
		if (failed(precomputeIterSpaceDynamicSizes(rewriter, linalgOp)))
		return failure();

		if (ShapedType::isDynamicShape(canonicalVecShape))
		return failure();
		return success();
		}

		/// Create or retrieve an existing mask value to mask `opToMask` in the
		/// canonical vector iteration space. If `maybeMaskingMap` the mask is permuted
		/// using that permutation map. If a new mask is created, it will be cached for
		/// future users.
		Value VectorizationState::getOrCreateMaskFor(
		RewriterBase &rewriter, Operation *opToMask, LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskingMap) {
		// No mask is needed if the operation is not maskable.
		auto maskableOp = dyn_cast<vector::MaskableOpInterface>(opToMask);
		if (!maskableOp)
		return Value();

		assert(!maskableOp.isMasked() &&
		"Masking an operation that is already masked");

		// If no masking map was provided, use an identity map with the loop dims.
		assert((!maybeMaskingMap \|\| *maybeMaskingMap) &&
		"Unexpected null mask permutation map");
		AffineMap maskingMap =
		maybeMaskingMap ? *maybeMaskingMap
		: AffineMap::getMultiDimIdentityMap(
		linalgOp.getNumLoops(), rewriter.getContext());

		LDBG("Masking map: " << maskingMap << "\n");

		// Return the active mask for the masking map of this operation if it was
		// already created.
		auto activeMaskIt = activeMaskCache.find(maskingMap);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Use the `find` API and the iterator you get from it to avoid multi-lookups nicolasvasilache: Use the `find` API and the iterator you get from it to avoid multi-lookups
		if (activeMaskIt != activeMaskCache.end()) {
		Value mask = activeMaskIt->second;
		LDBG("Reusing mask: " << mask << "\n");
		return mask;
		}

		// Compute permuted projection of the iteration space to be masked and the
		// corresponding mask shape. If the resulting iteration space dimensions are
		// static and identical to the mask shape, masking is not needed for this
		// operation.
		// TODO: Improve this check. Only projected permutation indexing maps are
		// supported.
		SmallVector<int64_t> permutedStaticSizes =
		applyPermutationMap(maskingMap, ArrayRef<int64_t>(iterSpaceStaticSizes));
		SmallVector<int64_t> maskShape =
		applyPermutationMap(maskingMap, ArrayRef<int64_t>(canonicalVecShape));
		LDBG("Mask shape: ");
		LLVM_DEBUG(llvm::interleaveComma(maskShape, llvm::dbgs()));
		LLVM_DEBUG(llvm::dbgs() << "\n");

		if (permutedStaticSizes == maskShape) {
		LDBG("Masking is not needed for masking map: " << maskingMap << "\n");
		activeMaskCache[maskingMap] = Value();
		return Value();
		}
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions pass RewriterBase here and everywhere possible post https://reviews.llvm.org/D137922 plz nicolasvasilache: pass RewriterBase here and everywhere possible post https://reviews.llvm.org/D137922 plz
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I think we are going in the opposite direction based on the review comments? dcaballe: I think we are going in the opposite direction based on the review comments?
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions We cannot have code like this: Operation maskOpTerminator = &maskOp.getMaskRegion().front().back(); for (auto &en : llvm::enumerate(opToMask->getResults())) en.value().replaceAllUsesExcept(maskOp.getResult(en.index()), maskOpTerminator); it must use a RewriterBase with `updateRootInPlace` nicolasvasilache:* We cannot have code like this: ``` Operation *maskOpTerminator = &maskOp.getMaskRegion().
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I can use `updateRootInPlace` when we have a rewriter here but I can't do any replacement method because `opToMask` is moved inside the mask region, not replaced. dcaballe: I can use `updateRootInPlace` when we have a rewriter here but I can't do any replacement…

		// Compute the mask upper bound values by combining the permuted iteration
		// space static sizes and the dynamic values.
		SmallVector<Value> permutedDynamicSizes =
		applyPermutationMap(maskingMap, ArrayRef<Value>(iterSpaceDynamicSizes));
		SmallVector<Value> upperBounds;
		for (auto [staticBound, dynBound] :
		llvm::zip(permutedStaticSizes, permutedDynamicSizes))
		upperBounds.push_back(ShapedType::isDynamic(staticBound)
		? dynBound
		: rewriter.create<arith::ConstantIndexOp>(
		linalgOp.getLoc(), staticBound));

		assert(!maskShape.empty() && !upperBounds.empty() &&
		"Masked 0-d vectors are not supported yet");

		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: uppermension :) ? nicolasvasilache: nit: uppermension :) ?
		// Create the mask based on the dimension size values.
		auto maskType = VectorType::get(maskShape, rewriter.getI1Type());
		Value mask = rewriter.create<vector::CreateMaskOp>(linalgOp.getLoc(),
		maskType, upperBounds);
		LDBG("Creating new mask: " << mask << "\n");
		activeMaskCache[maskingMap] = mask;
		return mask;
		}

		nicolasvasilacheUnsubmitted Done Reply Inline Actions Plz use `updateRootInPlace` once RewriterBase is piped through. nicolasvasilache: Plz use `updateRootInPlace` once RewriterBase is piped through.
		/// Masks an operation with the canonical vector mask if the operation needs
		/// masking. Returns the masked operation or the original operation if masking
		/// is not needed. If provided, the canonical mask for this operation is
		/// permuted using `maybeMaskingMap`.
		Operation *
		VectorizationState::maskOperation(RewriterBase &rewriter, Operation *opToMask,
		LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskingMap) {
		LDBG("Trying to mask: " << *opToMask << "\n");

		// Create or retrieve mask for this operation.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Hmm .. what's the contract between this `createRegionMask` lambda and the insertion points during `builder.create<vector::MaskOp>` ? I've seen too much ugly stuff re. insertion points leaking across function call boundaries. Let's add an OpBuilder::InsertionGuard at the top of this function. nicolasvasilache: Hmm .. what's the contract between this `createRegionMask` lambda and the insertion points…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Added guard. This is a simple lambda to create the op region. We follow the same approach for `scf.if` and other region ops. The only contract is that the region needs to have a `vector::YieldOp`, which is described in the `vector.mask` doc. dcaballe: Added guard. This is a simple lambda to create the op region. We follow the same approach for…
		Value mask =
		getOrCreateMaskFor(rewriter, opToMask, linalgOp, maybeMaskingMap);

		if (!mask) {
		LDBG("No mask required\n");
		return opToMask;
		}

		// Wrap the operation with a new `vector.mask` and update D-U chain.
		assert(opToMask && "Expected a valid operation to mask");
		auto opResults = opToMask->getResultTypes();
		auto createRegionMask = [opToMask](OpBuilder &builder, Location loc) {
		Block *insBlock = builder.getInsertionBlock();
		// Create a block, put an op in that block. Look for a utility.
		// Maybe in conversion pattern rewriter. Way to avoid splice.
		// Set insertion point.
		insBlock->getOperations().splice(
		nicolasvasilacheUnsubmitted Done Reply Inline Actions this must use a RewriterBase with updateRootInPlace nicolasvasilache: this must use a RewriterBase with updateRootInPlace
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Added TODO until we have a rewriter. dcaballe: Added TODO until we have a rewriter.
		insBlock->begin(), opToMask->getBlock()->getOperations(), opToMask);
		builder.create<vector::YieldOp>(loc, opToMask->getResults());
		};
		// TODO: Allow multiple results in vector.mask.
		auto maskOp =
		opResults.empty()
		? rewriter.create<vector::MaskOp>(opToMask->getLoc(), mask,
		createRegionMask)
		: rewriter.create<vector::MaskOp>(opToMask->getLoc(),
		opToMask->getResultTypes().front(),
		mask, createRegionMask);

		Operation *maskOpTerminator = &maskOp.getMaskRegion().front().back();

		for (auto [resIdx, resVal] : llvm::enumerate(opToMask->getResults()))
		rewriter.replaceAllUsesExcept(resVal, maskOp.getResult(resIdx),
		maskOpTerminator);

		LDBG("Masked operation: " << *maskOp << "\n");
		return maskOp;
		}

/// Given an indexing `map` coming from a LinalgOp indexing, restricted to a		/// Given an indexing `map` coming from a LinalgOp indexing, restricted to a
/// projectedPermutation, compress the unused dimensions to serve as a		/// projectedPermutation, compress the unused dimensions to serve as a
/// permutation_map for a vector transfer operation.		/// permutation_map for a vector transfer operation.
/// For example, given a linalg op such as:		/// For example, given a linalg op such as:
///		///
/// ```		/// ```
/// %0 = linalg.generic {		/// %0 = linalg.generic {
/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d4, d0, d2)>,		/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d4, d0, d2)>,
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
}		}

/// Build a vector.transfer_write of `value` into `outputOperand` at indices set		/// Build a vector.transfer_write of `value` into `outputOperand` at indices set
/// to all `0`; where `outputOperand` is an output operand of the LinalgOp		/// to all `0`; where `outputOperand` is an output operand of the LinalgOp
/// currently being vectorized. If `dest` has null rank, build an memref.store.		/// currently being vectorized. If `dest` has null rank, build an memref.store.
/// Return the produced value or null if no value is produced.		/// Return the produced value or null if no value is produced.
// Note: this is a true builder that notifies the OpBuilder listener.		// Note: this is a true builder that notifies the OpBuilder listener.
// TODO: Consider moving as a static helper on the ReduceOp.		// TODO: Consider moving as a static helper on the ReduceOp.
static Value buildVectorWrite(OpBuilder &b, Value value,		static Value buildVectorWrite(RewriterBase &rewriter, Value value,
OpOperand *outputOperand) {		OpOperand *outputOperand,
Operation *write;		VectorizationState &state) {
Location loc = value.getLoc();		Location loc = value.getLoc();
auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());		auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());
ArrayRef<int64_t> shape = linalgOp.getShape(outputOperand);		AffineMap opOperandMap = linalgOp.getMatchingIndexingMap(outputOperand);
auto vectorType = VectorType::get(		auto vectorType =
shape, getElementTypeOrSelf(outputOperand->get().getType()));		VectorType::get(opOperandMap.compose(state.getCanonicalVecShape()),
		getElementTypeOrSelf(outputOperand->get().getType()));

		Operation *write;
if (vectorType.getRank() > 0) {		if (vectorType.getRank() > 0) {
// 0-d case is still special: do not invert the reindexing map.		AffineMap writeMap = reindexIndexingMap(opOperandMap);
AffineMap map =
reindexIndexingMap(linalgOp.getMatchingIndexingMap(outputOperand));
SmallVector<int64_t> transposeShape =
applyPermutationMap(inversePermutation(map), vectorType.getShape());
assert(!transposeShape.empty() && "unexpected empty transpose shape");
vectorType = VectorType::get(transposeShape, vectorType.getElementType());
SmallVector<Value> indices(linalgOp.getRank(outputOperand),		SmallVector<Value> indices(linalgOp.getRank(outputOperand),
b.create<arith::ConstantIndexOp>(loc, 0));		rewriter.create<arith::ConstantIndexOp>(loc, 0));
value = broadcastIfNeeded(b, value, vectorType.getShape());		value = broadcastIfNeeded(rewriter, value, vectorType.getShape());
write = b.create<vector::TransferWriteOp>(		write = rewriter.create<vector::TransferWriteOp>(
loc, value, outputOperand->get(), indices, map);		loc, value, outputOperand->get(), indices, writeMap);
} else {		} else {
		// 0-d case is still special: do not invert the reindexing writeMap.
if (!value.getType().isa<VectorType>())		if (!value.getType().isa<VectorType>())
value = b.create<vector::BroadcastOp>(loc, vectorType, value);		value = rewriter.create<vector::BroadcastOp>(loc, vectorType, value);
assert(value.getType() == vectorType && "incorrect type");		assert(value.getType() == vectorType && "incorrect type");
write = b.create<vector::TransferWriteOp>(		write = rewriter.create<vector::TransferWriteOp>(
loc, value, outputOperand->get(), ValueRange{});		loc, value, outputOperand->get(), ValueRange{});
}		}
LDBG("vectorized op: " << *write);
		write = state.maskOperation(rewriter, write, linalgOp, opOperandMap);

		// If masked, set in-bounds to true. Masking guarantees that the access will
		nicolasvasilacheUnsubmitted Done Reply Inline Actions can you add a TODO to tighten op semantics so that we don't mix inbounds and mask since this is well defined? nicolasvasilache: can you add a TODO to tighten op semantics so that we don't mix inbounds and mask since this is…
		// be in-bounds.
		if (auto maskOp = dyn_cast<vector::MaskingOpInterface>(write)) {
		auto maskedWriteOp = cast<vector::TransferWriteOp>(maskOp.getMaskableOp());
		SmallVector<bool> inBounds(maskedWriteOp.getVectorType().getRank(), true);
		maskedWriteOp.setInBoundsAttr(rewriter.getBoolArrayAttr(inBounds));
		}

		LDBG("vectorized op: " << *write << "\n");
if (!write->getResults().empty())		if (!write->getResults().empty())
return write->getResult(0);		return write->getResult(0);
return Value();		return Value();
}		}

// Custom vectorization precondition function type. This is intented to be used		// Custom vectorization precondition function type. This is intented to be used
// with CustomVectorizationHook. Returns success if the corresponding custom		// with CustomVectorizationHook. Returns success if the corresponding custom
// hook can vectorize the op.		// hook can vectorize the op.
Show All 10 Lines
/// vector values are appended to `newResults`. Return		/// vector values are appended to `newResults`. Return
/// VectorizationStatus::NoReplace to signal the vectorization algorithm that it		/// VectorizationStatus::NoReplace to signal the vectorization algorithm that it
/// should not try to map produced operations and instead return the results		/// should not try to map produced operations and instead return the results
/// using the `newResults` vector making them available to the vectorization		/// using the `newResults` vector making them available to the vectorization
/// algorithm for RAUW. This function is meant to be used as a		/// algorithm for RAUW. This function is meant to be used as a
/// CustomVectorizationHook.		/// CustomVectorizationHook.
static VectorizationResult		static VectorizationResult
vectorizeLinalgYield(RewriterBase &rewriter, Operation *op,		vectorizeLinalgYield(RewriterBase &rewriter, Operation *op,
const BlockAndValueMapping &bvm, LinalgOp linalgOp,		const BlockAndValueMapping &bvm, VectorizationState &state,
SmallVectorImpl<Value> &newResults) {		LinalgOp linalgOp, SmallVectorImpl<Value> &newResults) {
auto yieldOp = dyn_cast<linalg::YieldOp>(op);		auto yieldOp = dyn_cast<linalg::YieldOp>(op);
if (!yieldOp)		if (!yieldOp)
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};
for (const auto &outputs : llvm::enumerate(yieldOp.getValues())) {		for (const auto &output : llvm::enumerate(yieldOp.getValues())) {
// TODO: Scan for an opportunity for reuse.		// TODO: Scan for an opportunity for reuse.
// TODO: use a map.		// TODO: use a map.
Value vectorValue = bvm.lookup(outputs.value());		Value vectorValue = bvm.lookup(output.value());
Value newResult = buildVectorWrite(		Value newResult =
rewriter, vectorValue, linalgOp.getDpsInitOperand(outputs.index()));		buildVectorWrite(rewriter, vectorValue,
		linalgOp.getDpsInitOperand(output.index()), state);
if (newResult)		if (newResult)
newResults.push_back(newResult);		newResults.push_back(newResult);
}		}

return VectorizationResult{VectorizationStatus::NoReplace, nullptr};		return VectorizationResult{VectorizationStatus::NoReplace, nullptr};
}		}

		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: produce nicolasvasilache: nit: produce
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I do not understand the implication of TODO: We mask the transfer.transfer_write here because this op is special-cased. A linalg.yield may produced multiple vector.transfer_write // ops and can't be mapped using BlockAndValueMapping. Good point. This is a comment for an old problem. I removed it and moved this code to `buildVectorWrite`. dcaballe: > I do not understand the implication of > > TODO: We mask the transfer.transfer_write here…
/// Helper function to vectorize the index operations of a `linalgOp`. Return		/// Helper function to vectorize the index operations of a `linalgOp`. Return
/// VectorizationStatus::NewOp to signal the vectorization algorithm that it		/// VectorizationStatus::NewOp to signal the vectorization algorithm that it
/// should map the produced operations. This function is meant to be used as a		/// should map the produced operations. This function is meant to be used as a
/// CustomVectorizationHook.		/// CustomVectorizationHook.
static VectorizationResult		static VectorizationResult
vectorizeLinalgIndex(RewriterBase &rewriter, Operation *op, LinalgOp linalgOp) {		vectorizeLinalgIndex(RewriterBase &rewriter, Operation *op, LinalgOp linalgOp) {
IndexOp indexOp = dyn_cast<linalg::IndexOp>(op);		IndexOp indexOp = dyn_cast<linalg::IndexOp>(op);
if (!indexOp)		if (!indexOp)
▲ Show 20 Lines • Show All 172 Lines • ▼ Show 20 Lines
/// the `bvm` mapping. As a consequence, this function is meant to be called on		/// the `bvm` mapping. As a consequence, this function is meant to be called on
/// a topologically-sorted list of ops.		/// a topologically-sorted list of ops.
/// This function does not update `bvm` but returns a VectorizationStatus that		/// This function does not update `bvm` but returns a VectorizationStatus that
/// instructs the caller what `bvm` update needs to occur.		/// instructs the caller what `bvm` update needs to occur.
static VectorizationResult		static VectorizationResult
vectorizeOneOp(RewriterBase &rewriter, LinalgOp linalgOp, Operation *op,		vectorizeOneOp(RewriterBase &rewriter, LinalgOp linalgOp, Operation *op,
const BlockAndValueMapping &bvm,		const BlockAndValueMapping &bvm,
ArrayRef<CustomVectorizationHook> customVectorizationHooks) {		ArrayRef<CustomVectorizationHook> customVectorizationHooks) {
LDBG("vectorize op " << *op);		LDBG("vectorize op " << *op << "\n");

// 1. Try to apply any CustomVectorizationHook.		// 1. Try to apply any CustomVectorizationHook.
if (!customVectorizationHooks.empty()) {		if (!customVectorizationHooks.empty()) {
for (auto &customFunc : customVectorizationHooks) {		for (auto &customFunc : customVectorizationHooks) {
VectorizationResult result = customFunc(op, bvm);		VectorizationResult result = customFunc(op, bvm);
if (result.status == VectorizationStatus::Failure)		if (result.status == VectorizationStatus::Failure)
continue;		continue;
return result;		return result;
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
/// iteration space. This eager broadcasting is introduced in the		/// iteration space. This eager broadcasting is introduced in the
/// permutation_map of the vector.transfer_read operations. The eager		/// permutation_map of the vector.transfer_read operations. The eager
/// broadcasting makes it trivial to detrmine where broadcast, transposes and		/// broadcasting makes it trivial to detrmine where broadcast, transposes and
/// reductions should occur, without any bookkeeping. The tradeoff is that, in		/// reductions should occur, without any bookkeeping. The tradeoff is that, in
/// the absence of good canonicalizations, the amount of work increases.		/// the absence of good canonicalizations, the amount of work increases.
/// This is not deemed a problem as we expect canonicalizations and foldings to		/// This is not deemed a problem as we expect canonicalizations and foldings to
/// aggressively clean up the useless work.		/// aggressively clean up the useless work.
static LogicalResult		static LogicalResult
vectorizeAsLinalgGeneric(RewriterBase &rewriter, LinalgOp linalgOp,		vectorizeAsLinalgGeneric(RewriterBase &rewriter, VectorizationState &state,
		LinalgOp linalgOp,
SmallVectorImpl<Value> &newResults) {		SmallVectorImpl<Value> &newResults) {
		LDBG("Vectorizing operation as linalg generic\n");
Block *block = linalgOp.getBlock();		Block *block = linalgOp.getBlock();

// 2. Values defined above the region can only be broadcast for now. Make them		// 2. Values defined above the region can only be broadcast for now. Make them
// map to themselves.		// map to themselves.
BlockAndValueMapping bvm;		BlockAndValueMapping bvm;
SetVector<Value> valuesSet;		SetVector<Value> valuesSet;
mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet);		mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet);
bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());		bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());

if (linalgOp.getNumDpsInits() == 0)		if (linalgOp.getNumDpsInits() == 0)
return failure();		return failure();

// TODO: the common vector shape is equal to the static loop sizes only when
// all indexing maps are projected permutations. For convs and stencils the
// logic will need to evolve.
SmallVector<int64_t> commonVectorShape = linalgOp.computeStaticLoopSizes();

// 3. Turn all BBArgs into vector.transfer_read / load.		// 3. Turn all BBArgs into vector.transfer_read / load.
Location loc = linalgOp.getLoc();		Location loc = linalgOp.getLoc();
Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);		Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);
for (OpOperand *opOperand : linalgOp.getOpOperandsMatchingBBargs()) {		for (OpOperand *opOperand : linalgOp.getOpOperandsMatchingBBargs()) {
BlockArgument bbarg = linalgOp.getMatchingBlockArgument(opOperand);		BlockArgument bbarg = linalgOp.getMatchingBlockArgument(opOperand);
if (linalgOp.isScalar(opOperand)) {		if (linalgOp.isScalar(opOperand)) {
bvm.map(bbarg, opOperand->get());		bvm.map(bbarg, opOperand->get());
continue;		continue;
}		}
VectorType readType;
AffineMap map;		// 3.a. Convert the indexing map for this input/output to a transfer read
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: can we spell this as: // 3.a. Convert the indexing map for this input/output to a transfer read... ... /// 3.a.i For input reads we use the canonical vector shape. if (linalgOp.isDpsInput(opOperand)) ... } else { /// 3.a.ii For output reads (iteration-carried dependence, e.g., reductions) ... // 3.b. If masked, set in-bounds to true. ... // 3.c. Not all ops support 0-d vectors, nicolasvasilache: nit: can we spell this as: ``` // 3.a. Convert the indexing map for this input/output to a…
// TODO: can we keep this simplification?		// permutation map and masking map.
// if (linalgOp.getShape(&opOperand).empty()) {		AffineMap indexingMap = linalgOp.getMatchingIndexingMap(opOperand);
// readType = VectorType::get({}, bbarg.getType());
// } else {		// Remove zeros from indexing map to use it as masking map.
if (opOperand->getOperandNumber() < linalgOp.getNumDpsInputs()) {		SmallVector<int64_t> zeroPos;
map = inverseAndBroadcastProjectedPermutation(		auto results = indexingMap.getResults();
linalgOp.getMatchingIndexingMap(opOperand));		for (auto result : llvm::enumerate(results)) {
readType = VectorType::get(commonVectorShape,		if (result.value().isa<AffineConstantExpr>()) {
getElementTypeOrSelf(opOperand->get()));		zeroPos.push_back(result.index());
		}
		}
		AffineMap maskingMap = indexingMap.dropResults(zeroPos);

		AffineMap readMap;
		SmallVector<int64_t> readVecShape;
		if (linalgOp.isDpsInput(opOperand)) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I'll need to revisit all this in light of the broadcast separation. From a cursory glance this looks reasonable, let's land and iterate. nicolasvasilache: I'll need to revisit all this in light of the broadcast separation. From a cursory glance this…
		// 3.a.i. For input reads we use the canonical vector shape.
		readMap = inverseAndBroadcastProjectedPermutation(indexingMap);
		readVecShape = llvm::to_vector(state.getCanonicalVecShape());
} else {		} else {
map = inversePermutation(		// 3.a.ii. For output reads (iteration-carried dependence, e.g.,
reindexIndexingMap(linalgOp.getMatchingIndexingMap(opOperand)));		// reductions), the vector shape is computed by mapping the canonical
readType = VectorType::get(map.compose(linalgOp.getShape(opOperand)),		// vector shape to the output domain and back to the canonical domain.
getElementTypeOrSelf(opOperand->get()));		readMap = inversePermutation(reindexIndexingMap(indexingMap));
		readVecShape =
		readMap.compose(indexingMap.compose(state.getCanonicalVecShape()));
}		}
// }

auto shape = linalgOp.getShape(opOperand);		auto readType =
SmallVector<Value> indices(shape.size(), zero);		VectorType::get(readVecShape, getElementTypeOrSelf(opOperand->get()));
Value readValue = rewriter.create<vector::TransferReadOp>(		SmallVector<Value> indices(linalgOp.getShape(opOperand).size(), zero);
loc, readType, opOperand->get(), indices, map);
// Not all ops support 0-d vectors, extract the scalar for now.		Operation *read = rewriter.create<vector::TransferReadOp>(
		loc, readType, opOperand->get(), indices, readMap);
		read = state.maskOperation(rewriter, read, linalgOp, maskingMap);
		Value readValue = read->getResult(0);

		// 3.b. If masked, set in-bounds to true. Masking guarantees that the access
		// will be in-bounds.
		if (auto maskOp = dyn_cast<vector::MaskingOpInterface>(read)) {
		SmallVector<bool> inBounds(readType.getRank(), true);
		cast<vector::TransferReadOp>(maskOp.getMaskableOp())
		.setInBoundsAttr(rewriter.getBoolArrayAttr(inBounds));
		}

		// 3.c. Not all ops support 0-d vectors, extract the scalar for now.
// TODO: remove this.		// TODO: remove this.
if (readValue.getType().cast<VectorType>().getRank() == 0)		if (readValue.getType().cast<VectorType>().getRank() == 0)
readValue = rewriter.create<vector::ExtractElementOp>(loc, readValue);		readValue = rewriter.create<vector::ExtractElementOp>(loc, readValue);

LDBG("new vectorized bbarg(" << bbarg.getArgNumber() << "): " << readValue);		LDBG("New vectorized bbarg(" << bbarg.getArgNumber() << "): " << readValue
		<< "\n");
bvm.map(bbarg, readValue);		bvm.map(bbarg, readValue);
bvm.map(opOperand->get(), readValue);		bvm.map(opOperand->get(), readValue);
}		}

SmallVector<CustomVectorizationHook> hooks;		SmallVector<CustomVectorizationHook> hooks;
// 4a. Register CustomVectorizationHook for yieldOp.		// 4a. Register CustomVectorizationHook for yieldOp.
CustomVectorizationHook vectorizeYield =		CustomVectorizationHook vectorizeYield =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeLinalgYield(rewriter, op, bvm, linalgOp, newResults);		return vectorizeLinalgYield(rewriter, op, bvm, state, linalgOp, newResults);
};		};
hooks.push_back(vectorizeYield);		hooks.push_back(vectorizeYield);

// 4b. Register CustomVectorizationHook for indexOp.		// 4b. Register CustomVectorizationHook for indexOp.
CustomVectorizationHook vectorizeIndex =		CustomVectorizationHook vectorizeIndex =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeLinalgIndex(rewriter, op, linalgOp);		return vectorizeLinalgIndex(rewriter, op, linalgOp);
};		};
hooks.push_back(vectorizeIndex);		hooks.push_back(vectorizeIndex);

// 4c. Register CustomVectorizationHook for extractOp.		// 4c. Register CustomVectorizationHook for extractOp.
CustomVectorizationHook vectorizeExtract =		CustomVectorizationHook vectorizeExtract =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeTensorExtract(rewriter, op, linalgOp, bvm);		return vectorizeTensorExtract(rewriter, op, linalgOp, bvm);
};		};
hooks.push_back(vectorizeExtract);		hooks.push_back(vectorizeExtract);

// 5. Iteratively call `vectorizeOneOp` to each op in the slice.		// 5. Iteratively call `vectorizeOneOp` to each op in the slice.
for (Operation &op : block->getOperations()) {		for (Operation &op : block->getOperations()) {
VectorizationResult result =		VectorizationResult result =
vectorizeOneOp(rewriter, linalgOp, &op, bvm, hooks);		vectorizeOneOp(rewriter, linalgOp, &op, bvm, hooks);
if (result.status == VectorizationStatus::Failure) {		if (result.status == VectorizationStatus::Failure) {
LDBG("failed to vectorize: " << op);		LDBG("failed to vectorize: " << op << "\n");
return failure();		return failure();
}		}
if (result.status == VectorizationStatus::NewOp) {		if (result.status == VectorizationStatus::NewOp) {
LDBG("new vector op: " << *result.newOp;);		Operation *maybeMaskedOp =
bvm.map(op.getResults(), result.newOp->getResults());		state.maskOperation(rewriter, result.newOp, linalgOp);
		LDBG("New vector op: " << *maybeMaskedOp << "\n");
		bvm.map(op.getResults(), maybeMaskedOp->getResults());
}		}
}		}

return success();		return success();
}		}

// TODO: probably need some extra checks for reduction followed by consumer		// TODO: probably need some extra checks for reduction followed by consumer
// ops that may not commute (e.g. linear reduction + non-linear instructions).		// ops that may not commute (e.g. linear reduction + non-linear instructions).
static LogicalResult reductionPreconditions(LinalgOp op) {		static LogicalResult reductionPreconditions(LinalgOp op) {
if (llvm::none_of(op.getIteratorTypesArray(), isReductionIterator)) {		if (llvm::none_of(op.getIteratorTypesArray(), isReductionIterator)) {
LDBG("reduction precondition failed: no reduction iterator");		LDBG("reduction precondition failed: no reduction iterator\n");
return failure();		return failure();
}		}
for (OpOperand *opOperand : op.getDpsInitOperands()) {		for (OpOperand *opOperand : op.getDpsInitOperands()) {
AffineMap indexingMap = op.getMatchingIndexingMap(opOperand);		AffineMap indexingMap = op.getMatchingIndexingMap(opOperand);
if (indexingMap.isPermutation())		if (indexingMap.isPermutation())
continue;		continue;

Operation *reduceOp = matchLinalgReduction(opOperand);		Operation *reduceOp = matchLinalgReduction(opOperand);
if (!reduceOp \|\| !getCombinerOpKind(reduceOp)) {		if (!reduceOp \|\| !getCombinerOpKind(reduceOp)) {
LDBG("reduction precondition failed: reduction detection failed");		LDBG("reduction precondition failed: reduction detection failed\n");
return failure();		return failure();
}		}
}		}
return success();		return success();
}		}

static LogicalResult vectorizeStaticLinalgOpPrecondition(		static LogicalResult vectorizeDynamicLinalgOpPrecondition(linalg::LinalgOp op) {
linalg::LinalgOp op,		// TODO: Masking only supports dynamic generic ops without reductions for now.
ArrayRef<CustomVectorizationPrecondition> customPreconditions,		if (!isElementwise(op) &&
		llvm::any_of(op.getIteratorTypesArray(), [](utils::IteratorType itType) {
		return itType != utils::IteratorType::parallel;
		}))
		return failure();

		// TODO: 0-d vectors are not supported yet.
		if (llvm::any_of(op.getIndexingMapsArray(), [](AffineMap map) {
		return map.isEmpty() \|\| map.getResults().empty();
		}))
		return failure();

		LDBG("Dynamically-shaped op meets vectorization pre-conditions\n");
		return success();
		}

		LogicalResult
		mlir::linalg::vectorizeLinalgOpPrecondition(LinalgOp linalgOp,
		ArrayRef<int64_t> inputVectorSizes,
bool vectorizeNDExtract) {		bool vectorizeNDExtract) {
		// Check API contract for input vector sizes.
		if (!inputVectorSizes.empty()) {
		assert(inputVectorSizes.size() == linalgOp.getNumLoops() &&
		"Input vector sizes don't match the number of loops");
		assert(!ShapedType::isDynamicShape(inputVectorSizes) &&
		"Input vector sizes can't have dynamic dimensions");
		assert(llvm::all_of(
		llvm::zip(linalgOp.getStaticLoopRanges(), inputVectorSizes),
		[](std::tuple<int64_t, int64_t> sizePair) {
		int64_t staticSize = std::get<0>(sizePair);
		int64_t inputSize = std::get<1>(sizePair);
		return ShapedType::isDynamic(staticSize) \|\|
		staticSize <= inputSize;
		}) &&
		"Input vector sizes must be smaller or equal than iteration space "
		"static sizes");
		}

		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions ok as a first appox. nicolasvasilache: ok as a first appox.
		// TODO: Masking is only supported for dynamic shapes so input vector sizes
		// must be empty if the op is not dynamic.
		if (!linalgOp.hasDynamicShape() && !inputVectorSizes.empty())
		return failure();

		if (linalgOp.hasDynamicShape() &&
		failed(vectorizeDynamicLinalgOpPrecondition(linalgOp)))
		return failure();

		SmallVector<CustomVectorizationPrecondition> customPreconditions;

		// Register CustomVectorizationPrecondition for extractOp.
		customPreconditions.push_back(tensorExtractVectorizationPrecondition);

// All types in the body should be a supported element type for VectorType.		// All types in the body should be a supported element type for VectorType.
for (Operation &innerOp : op->getRegion(0).front()) {		for (Operation &innerOp : linalgOp->getRegion(0).front()) {
// Check if any custom hook can vectorize the inner op.		// Check if any custom hook can vectorize the inner op.
if (llvm::any_of(		if (llvm::any_of(
customPreconditions,		customPreconditions,
[&](const CustomVectorizationPrecondition &customPrecondition) {		[&](const CustomVectorizationPrecondition &customPrecondition) {
return succeeded(		return succeeded(
customPrecondition(&innerOp, vectorizeNDExtract));		customPrecondition(&innerOp, vectorizeNDExtract));
})) {		})) {
continue;		continue;
}		}
if (llvm::any_of(innerOp.getOperandTypes(), [](Type type) {		if (llvm::any_of(innerOp.getOperandTypes(), [](Type type) {
return !VectorType::isValidElementType(type);		return !VectorType::isValidElementType(type);
})) {		})) {
return failure();		return failure();
}		}
if (llvm::any_of(innerOp.getResultTypes(), [](Type type) {		if (llvm::any_of(innerOp.getResultTypes(), [](Type type) {
return !VectorType::isValidElementType(type);		return !VectorType::isValidElementType(type);
})) {		})) {
return failure();		return failure();
}		}
}		}
if (isElementwise(op))		if (isElementwise(linalgOp))
return success();		return success();
// TODO: isaConvolutionOpInterface that can also infer from generic features.		// TODO: isaConvolutionOpInterface that can also infer from generic features.
// But we will still need stride/dilation attributes that will be annoying to		// But we will still need stride/dilation attributes that will be annoying to
// reverse-engineer...		// reverse-engineer...
if (isa<ConvolutionOpInterface>(op.getOperation()))		if (isa<ConvolutionOpInterface>(linalgOp.getOperation()))
return success();		return success();
// TODO: the common vector shape is equal to the static loop sizes only when		// TODO: the common vector shape is equal to the static loop sizes only when
// all indexing maps are projected permutations. For convs and stencils the		// all indexing maps are projected permutations. For convs and stencils the
// logic will need to evolve.		// logic will need to evolve.
if (!allIndexingsAreProjectedPermutation(op)) {		if (!allIndexingsAreProjectedPermutation(linalgOp)) {
LDBG("precondition failed: not projected permutations");		LDBG("precondition failed: not projected permutations\n");
return failure();		return failure();
}		}
if (failed(reductionPreconditions(op))) {		if (failed(reductionPreconditions(linalgOp))) {
LDBG("precondition failed: reduction preconditions");		LDBG("precondition failed: reduction preconditions\n");
return failure();		return failure();
}		}
return success();		return success();
}		}

LogicalResult		/// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`
mlir::linalg::vectorizeLinalgOpPrecondition(LinalgOp linalgOp,		/// are used to vectorize this operation. `inputVectorSizes` must match the rank
		/// of the iteration space of the operation and the sizes must be smaller or
		/// equal than their counterpart interation space sizes, if static.
		/// `inputVectorShapes` also allows the vectorization of operations with dynamic
		/// shapes.
		LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, LinalgOp linalgOp,
		ArrayRef<int64_t> inputVectorSizes,
bool vectorizeNDExtract) {		bool vectorizeNDExtract) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can we make this init the state as part of the precondition? nicolasvasilache: Can we make this init the state as part of the precondition?
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Probably better to separate the concerns. I think even part of the precondition checks are reused outside of the vectorizer. A public interface was introduced recently. dcaballe: Probably better to separate the concerns. I think even part of the precondition checks are…
// All types must be static shape to go to vector.		LDBG("Attempting to vectorize:\n" << linalgOp << "\n");
if (linalgOp.hasDynamicShape()) {		LDBG("Input vector sizes: ");
LDBG("precondition failed: dynamic shape");		LLVM_DEBUG(llvm::interleaveComma(inputVectorSizes, llvm::dbgs()));
		LLVM_DEBUG(llvm::dbgs() << "\n");

		if (failed(vectorizeLinalgOpPrecondition(linalgOp, inputVectorSizes,
		vectorizeNDExtract)))
return failure();		return failure();
}

SmallVector<CustomVectorizationPrecondition> customPreconditions;

// Register CustomVectorizationPrecondition for extractOp.
customPreconditions.push_back(tensorExtractVectorizationPrecondition);

return vectorizeStaticLinalgOpPrecondition(linalgOp, customPreconditions,		// Initialize vectorization state.
vectorizeNDExtract);		VectorizationState state(rewriter);
}		if (failed(state.initState(rewriter, linalgOp, inputVectorSizes))) {
		LDBG("Vectorization state couldn't be initialized\n");
LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, LinalgOp linalgOp,
bool vectorizeNDExtract) {
if (failed(vectorizeLinalgOpPrecondition(linalgOp, vectorizeNDExtract)))
return failure();		return failure();
		}

SmallVector<Value> results;		SmallVector<Value> results;
// TODO: isaConvolutionOpInterface that can also infer from generic		// TODO: isaConvolutionOpInterface that can also infer from generic
// features. Will require stride/dilation attributes inference.		// features. Will require stride/dilation attributes inference.
FailureOr<Operation *> convOr = vectorizeConvolution(rewriter, linalgOp);		FailureOr<Operation *> convOr = vectorizeConvolution(rewriter, linalgOp);
if (succeeded(convOr)) {		if (succeeded(convOr)) {
llvm::append_range(results, (*convOr)->getResults());		llvm::append_range(results, (*convOr)->getResults());
} else {		} else {
if (failed(vectorizeLinalgOpPrecondition(linalgOp, vectorizeNDExtract)))		if (failed(vectorizeLinalgOpPrecondition(linalgOp, inputVectorSizes,
		vectorizeNDExtract)))
return failure();		return failure();
LDBG("Vectorize generic by broadcasting to a common shape: " << linalgOp);		LDBG("Vectorize generic by broadcasting to the canonical vector shape\n");
if (failed(vectorizeAsLinalgGeneric(rewriter, linalgOp, results)))		// TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted to
		// 'OpBuilder' when it is passed over to some methods like
		// 'vectorizeAsLinalgGeneric'. This is highly problematic: if we erase an op
		// within these methods, the actual rewriter won't be notified and we will
		// end up with read-after-free issues!
		if (failed(vectorizeAsLinalgGeneric(rewriter, state, linalgOp, results)))
return failure();		return failure();
}		}

if (!results.empty())		if (!results.empty())
rewriter.replaceOp(linalgOp, results);		rewriter.replaceOp(linalgOp, results);
else		else
rewriter.eraseOp(linalgOp);		rewriter.eraseOp(linalgOp);

▲ Show 20 Lines • Show All 479 Lines • ▼ Show 20 Lines
/// Check whether there is any interleaved use of any `values` between		/// Check whether there is any interleaved use of any `values` between
/// `firstOp` and `secondOp`. Conservatively return `true` if any op or value		/// `firstOp` and `secondOp`. Conservatively return `true` if any op or value
/// is in a different block.		/// is in a different block.
static bool mayExistInterleavedUses(Operation firstOp, Operation secondOp,		static bool mayExistInterleavedUses(Operation firstOp, Operation secondOp,
ValueRange values) {		ValueRange values) {
if (firstOp->getBlock() != secondOp->getBlock() \|\|		if (firstOp->getBlock() != secondOp->getBlock() \|\|
!firstOp->isBeforeInBlock(secondOp)) {		!firstOp->isBeforeInBlock(secondOp)) {
LDBG("interleavedUses precondition failed, firstOp: "		LDBG("interleavedUses precondition failed, firstOp: "
<< firstOp << ", second op: " << secondOp);		<< firstOp << ", second op: " << secondOp << "\n");
return true;		return true;
}		}
for (auto v : values) {		for (auto v : values) {
for (auto &u : v.getUses()) {		for (auto &u : v.getUses()) {
Operation *owner = u.getOwner();		Operation *owner = u.getOwner();
if (owner == firstOp \|\| owner == secondOp)		if (owner == firstOp \|\| owner == secondOp)
continue;		continue;
// TODO: this is too conservative, use dominance info in the future.		// TODO: this is too conservative, use dominance info in the future.
if (owner->getBlock() == firstOp->getBlock() &&		if (owner->getBlock() == firstOp->getBlock() &&
(owner->isBeforeInBlock(firstOp) \|\| secondOp->isBeforeInBlock(owner)))		(owner->isBeforeInBlock(firstOp) \|\| secondOp->isBeforeInBlock(owner)))
continue;		continue;
LDBG(" found interleaved op " << owner << ", firstOp: " << firstOp		LDBG(" found interleaved op " << owner << ", firstOp: " << firstOp
<< ", second op: " << *secondOp);		<< ", second op: " << *secondOp << "\n");
return true;		return true;
}		}
}		}
return false;		return false;
}		}

/// Return the unique subview use of `v` if it is indeed unique, null		/// Return the unique subview use of `v` if it is indeed unique, null
/// otherwise.		/// otherwise.
▲ Show 20 Lines • Show All 696 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/IR/CMakeLists.txt

	add_mlir_dialect_library(MLIRVectorDialect			add_mlir_dialect_library(MLIRVectorDialect
	VectorOps.cpp			VectorOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Vector/IR			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Vector/IR

	DEPENDS			DEPENDS
				MLIRMaskableOpInterfaceIncGen
				MLIRMaskingOpInterfaceIncGen
	MLIRVectorOpsIncGen			MLIRVectorOpsIncGen
	MLIRVectorOpsEnumsIncGen			MLIRVectorOpsEnumsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArithDialect			MLIRArithDialect
	MLIRControlFlowInterfaces			MLIRControlFlowInterfaces
	MLIRDataLayoutInterfaces			MLIRDataLayoutInterfaces
	MLIRDestinationStyleOpInterface			MLIRDestinationStyleOpInterface
	Show All 9 Lines

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

Show First 20 Lines • Show All 441 Lines • ▼ Show 20 Lines	void ReductionOp::print(OpAsmPrinter &p) {
p << " ";		p << " ";
getKindAttr().print(p);		getKindAttr().print(p);
p << ", " << getVector();		p << ", " << getVector();
if (getAcc())		if (getAcc())
p << ", " << getAcc();		p << ", " << getAcc();
p << " : " << getVector().getType() << " into " << getDest().getType();		p << " : " << getVector().getType() << " into " << getDest().getType();
}		}

		// MaskableOpInterface methods.

		/// Returns the mask type expected by this operation.
		Type ReductionOp::getExpectedMaskType() {
		auto vecType = getVectorType();
		return vecType.cloneWith(std::nullopt,
		IntegerType::get(vecType.getContext(), /width=/1));
		}

Value mlir::vector::getVectorReductionOp(arith::AtomicRMWKind op,		Value mlir::vector::getVectorReductionOp(arith::AtomicRMWKind op,
OpBuilder &builder, Location loc,		OpBuilder &builder, Location loc,
Value vector) {		Value vector) {
switch (op) {		switch (op) {
case arith::AtomicRMWKind::addf:		case arith::AtomicRMWKind::addf:
case arith::AtomicRMWKind::addi:		case arith::AtomicRMWKind::addi:
return builder.create<vector::ReductionOp>(vector.getLoc(),		return builder.create<vector::ReductionOp>(vector.getLoc(),
CombiningKind::ADD, vector);		CombiningKind::ADD, vector);
▲ Show 20 Lines • Show All 2,998 Lines • ▼ Show 20 Lines	if (paddingType != sourceElementType)
return emitOpError(		return emitOpError(
"requires formal padding and source of the same elemental type");		"requires formal padding and source of the same elemental type");
}		}

return verifyPermutationMap(permutationMap,		return verifyPermutationMap(permutationMap,
[&](Twine t) { return emitOpError(t); });		[&](Twine t) { return emitOpError(t); });
}		}

		// MaskableOpInterface methods.

		/// Returns the mask type expected by this operation. Mostly used for
		/// verification purposes. It requires the operation to be vectorized."
		Type TransferReadOp::getExpectedMaskType() {
		return inferTransferReadMaskType(getVectorType(), getPermutationMap());
		}

template <typename TransferOp>		template <typename TransferOp>
static bool isInBounds(TransferOp op, int64_t resultIdx, int64_t indicesIdx) {		static bool isInBounds(TransferOp op, int64_t resultIdx, int64_t indicesIdx) {
// TODO: support more aggressive createOrFold on:		// TODO: support more aggressive createOrFold on:
// `op.indices()[indicesIdx] + vectorType < dim(op.source(), indicesIdx)`		// `op.indices()[indicesIdx] + vectorType < dim(op.source(), indicesIdx)`
if (op.getShapedType().isDynamicDim(indicesIdx))		if (op.getShapedType().isDynamicDim(indicesIdx))
return false;		return false;
Value index = op.getIndices()[indicesIdx];		Value index = op.getIndices()[indicesIdx];
auto cstOp = index.getDefiningOp<arith::ConstantIndexOp>();		auto cstOp = index.getDefiningOp<arith::ConstantIndexOp>();
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	if (failed(verifyTransferOp(cast<VectorTransferOpInterface>(getOperation()),
inferredMaskType, permutationMap,		inferredMaskType, permutationMap,
getInBounds() ? *getInBounds() : ArrayAttr())))		getInBounds() ? *getInBounds() : ArrayAttr())))
return failure();		return failure();

return verifyPermutationMap(permutationMap,		return verifyPermutationMap(permutationMap,
[&](Twine t) { return emitOpError(t); });		[&](Twine t) { return emitOpError(t); });
}		}

		// MaskableOpInterface methods.

		/// Returns the mask type expected by this operation. Mostly used for
		/// verification purposes.
		Type TransferWriteOp::getExpectedMaskType() {
		return inferTransferWriteMaskType(getVectorType(), getPermutationMap());
		}

/// Fold:		/// Fold:
/// ```		/// ```
/// %t1 = ...		/// %t1 = ...
/// %v = vector.transfer_read %t0[%c0...], {in_bounds = [true...]} :		/// %v = vector.transfer_read %t0[%c0...], {in_bounds = [true...]} :
/// tensor<static_sizesxf32>, vector<static_sizesxf32>		/// tensor<static_sizesxf32>, vector<static_sizesxf32>
/// %t2 = vector.transfer_write %v, %t1[%c0...] {in_bounds = [true...]} :		/// %t2 = vector.transfer_write %v, %t1[%c0...] {in_bounds = [true...]} :
/// vector<static_sizesxf32>, tensor<static_sizesxf32>		/// vector<static_sizesxf32>, tensor<static_sizesxf32>
/// ```		/// ```
▲ Show 20 Lines • Show All 1,458 Lines • ▼ Show 20 Lines	if (maskableOp->getNumResults() != getNumResults())
return emitOpError("expects number of results to match maskable operation "		return emitOpError("expects number of results to match maskable operation "
"number of results");		"number of results");

if (!llvm::equal(maskableOp->getResultTypes(), getResultTypes()))		if (!llvm::equal(maskableOp->getResultTypes(), getResultTypes()))
return emitOpError(		return emitOpError(
"expects result type to match maskable operation result type");		"expects result type to match maskable operation result type");

// Mask checks.		// Mask checks.
if (getMask().getType() != maskableOp.getExpectedMaskType())		Type expectedMaskType = maskableOp.getExpectedMaskType();
return emitOpError("expects a ") << maskableOp.getExpectedMaskType()		if (getMask().getType() != expectedMaskType)
<< " mask for the maskable operation";		return emitOpError("expects a ")
		<< expectedMaskType << " mask for the maskable operation";

// Passthru checks.		// Passthru checks.
Value passthru = getPassthru();		Value passthru = getPassthru();
if (passthru) {		if (passthru) {
if (!maskableOp.supportsPassthru())		if (!maskableOp.supportsPassthru())
return emitOpError(		return emitOpError(
"doesn't expect a passthru argument for this maskable operation");		"doesn't expect a passthru argument for this maskable operation");

▲ Show 20 Lines • Show All 320 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/Transforms/LowerVectorMask.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	matchAndRewriteMaskableOp(TransferWriteOp writeOp,
rewriter.replaceOpWithNewOp<TransferWriteOp>(		rewriter.replaceOpWithNewOp<TransferWriteOp>(
maskingOp.getOperation(), resultType, writeOp.getVector(),		maskingOp.getOperation(), resultType, writeOp.getVector(),
writeOp.getSource(), writeOp.getIndices(), writeOp.getPermutationMap(),		writeOp.getSource(), writeOp.getIndices(), writeOp.getPermutationMap(),
maskingOp.getMask(), writeOp.getInBounds().value_or(ArrayAttr()));		maskingOp.getMask(), writeOp.getInBounds().value_or(ArrayAttr()));
return success();		return success();
}		}
};		};

/// Populates instances of `MaskOpRewritePattern` to lower masked operations
/// with `vector.mask`. Patterns should rewrite the `vector.mask` operation and
/// not its nested `MaskableOpInterface`.
void populateVectorMaskLoweringPatternsForSideEffectingOps(
RewritePatternSet &patterns) {
patterns.add<MaskedTransferReadOpPattern, MaskedTransferWriteOpPattern>(
patterns.getContext());
}

struct LowerVectorMaskPass		struct LowerVectorMaskPass
: public vector::impl::LowerVectorMaskPassBase<LowerVectorMaskPass> {		: public vector::impl::LowerVectorMaskPassBase<LowerVectorMaskPass> {
using Base::Base;		using Base::Base;

void runOnOperation() override {		void runOnOperation() override {
Operation *op = getOperation();		Operation *op = getOperation();
MLIRContext *context = op->getContext();		MLIRContext *context = op->getContext();

RewritePatternSet loweringPatterns(context);		RewritePatternSet loweringPatterns(context);
populateVectorMaskLoweringPatternsForSideEffectingOps(loweringPatterns);		populateVectorMaskLoweringPatternsForSideEffectingOps(loweringPatterns);

if (failed(applyPatternsAndFoldGreedily(op->getRegions(),		if (failed(applyPatternsAndFoldGreedily(op->getRegions(),
std::move(loweringPatterns))))		std::move(loweringPatterns))))
signalPassFailure();		signalPassFailure();
}		}

void getDependentDialects(DialectRegistry &registry) const override {		void getDependentDialects(DialectRegistry &registry) const override {
registry.insert<vector::VectorDialect>();		registry.insert<vector::VectorDialect>();
}		}
};		};

} // namespace		} // namespace

		/// Populates instances of `MaskOpRewritePattern` to lower masked operations
		/// with `vector.mask`. Patterns should rewrite the `vector.mask` operation and
		/// not its nested `MaskableOpInterface`.
		void vector::populateVectorMaskLoweringPatternsForSideEffectingOps(
		RewritePatternSet &patterns) {
		patterns.add<MaskedTransferReadOpPattern, MaskedTransferWriteOpPattern>(
		patterns.getContext());
		}

std::unique_ptr<Pass> mlir::vector::createLowerVectorMaskPass() {		std::unique_ptr<Pass> mlir::vector::createLowerVectorMaskPass() {
return std::make_unique<LowerVectorMaskPass>();		return std::make_unique<LowerVectorMaskPass>();
}		}

mlir/test/Dialect/Linalg/vectorization.mlir

// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s

// -----

// CHECK-LABEL: contraction_dot		// CHECK-LABEL: contraction_dot
func.func @contraction_dot(%A: memref<1584xf32>, %B: memref<1584xf32>, %C: memref<f32>) {		func.func @contraction_dot(%A: memref<1584xf32>, %B: memref<1584xf32>, %C: memref<f32>) {

// CHECK: arith.mulf %{{.}}, %{{.}} : vector<1584xf32>		// CHECK: arith.mulf %{{.}}, %{{.}} : vector<1584xf32>
// CHECK: vector.multi_reduction <add>, %{{.}}, {{.}} [0] : vector<1584xf32> to f32		// CHECK: vector.multi_reduction <add>, %{{.}}, {{.}} [0] : vector<1584xf32> to f32
linalg.dot ins(%A, %B: memref<1584xf32>, memref<1584xf32>)		linalg.dot ins(%A, %B: memref<1584xf32>, memref<1584xf32>)
outs(%C: memref<f32>)		outs(%C: memref<f32>)
return		return
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	indexing_maps = [
affine_map<(m, n, k) -> (k, n)>,		affine_map<(m, n, k) -> (k, n)>,
affine_map<(m, n, k) -> (n, m)>		affine_map<(m, n, k) -> (n, m)>
],		],
iterator_types = ["parallel", "parallel", "reduction"]		iterator_types = ["parallel", "parallel", "reduction"]
}		}

// CHECK-LABEL: func @generic_output_transpose		// CHECK-LABEL: func @generic_output_transpose
func.func @generic_output_transpose(%A: memref<8x16xf32>, %B: memref<16x32xf32>,		func.func @generic_output_transpose(%A: memref<8x16xf32>, %B: memref<16x32xf32>,
%C: memref<32x8xf32>) {		%C: memref<32x8xf32>) {
// CHECK: vector.transfer_read %{{.*}} : memref<8x16xf32>, vector<8x32x16xf32>		// CHECK: vector.transfer_read %{{.*}} : memref<8x16xf32>, vector<8x32x16xf32>
// CHECK: vector.transfer_read %{{.*}} : memref<16x32xf32>, vector<8x32x16xf32>		// CHECK: vector.transfer_read %{{.*}} : memref<16x32xf32>, vector<8x32x16xf32>
// CHECK: %[[ACC:.]] = vector.transfer_read %{{.}} : memref<32x8xf32>, vector<8x32xf32>		// CHECK: %[[ACC:.]] = vector.transfer_read %{{.}} : memref<32x8xf32>, vector<8x32xf32>
// CHECK: %[[MUL:.]] = arith.mulf %{{.}}, %{{.*}} : vector<8x32x16xf32>		// CHECK: %[[MUL:.]] = arith.mulf %{{.}}, %{{.*}} : vector<8x32x16xf32>
// CHECK: %[[R:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[ACC]] [2] : vector<8x32x16xf32> to vector<8x32xf32>		// CHECK: %[[R:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[ACC]] [2] : vector<8x32x16xf32> to vector<8x32xf32>
// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xf32>, memref<32x8xf32>		// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xf32>, memref<32x8xf32>
linalg.generic #matmul_transpose_out_trait		linalg.generic #matmul_transpose_out_trait
ins(%A, %B : memref<8x16xf32>, memref<16x32xf32>)		ins(%A, %B : memref<8x16xf32>, memref<16x32xf32>)
▲ Show 20 Lines • Show All 1,461 Lines • ▼ Show 20 Lines
// CHECK-SAME: : vector<16x32x64xf32> to vector<16x64xf32>		// CHECK-SAME: : vector<16x32x64xf32> to vector<16x64xf32>

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.reduce"]} in %arg1		%0 = transform.structured.match ops{["linalg.reduce"]} in %arg1
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1		%2 = transform.structured.vectorize %1
}		}

		// -----

		func.func @vectorize_dynamic_identity(%arg0: tensor<?xf32>,
		%arg1: tensor<?xf32>,
		%arg2: tensor<?xf32>) -> tensor<?xf32> {
		%0 = linalg.generic { indexing_maps = [affine_map<(d0) -> (d0)>,
		affine_map<(d0) -> (d0)>,
		affine_map<(d0) -> (d0)>],
		iterator_types = ["parallel"] }
		ins(%arg0, %arg1 : tensor<?xf32>, tensor<?xf32>)
		outs(%arg2 : tensor<?xf32>) {
		^bb(%in0: f32, %in1: f32, %out: f32) :
		%0 = arith.addf %in0, %in1 : f32
		linalg.yield %0 : f32
		} -> tensor<?xf32>
		return %0 : tensor<?xf32>
		}

		// CHECK-LABEL: @vectorize_dynamic_identity
		// CHECK: %[[VAL_3:.*]] = arith.constant 0 : index
		// CHECK: %[[VAL_4:.]] = tensor.dim %{{.}}, %[[VAL_3]] : tensor<?xf32>
		// CHECK: %[[VAL_7:.*]] = vector.create_mask %[[VAL_4]] : vector<4xi1>
		// CHECK: %[[VAL_8:.]] = vector.mask %[[VAL_7]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
		// CHECK: %[[VAL_10:.]] = vector.mask %[[VAL_7]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
		// CHECK: %[[VAL_12:.]] = vector.mask %[[VAL_7]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
		// CHECK: %[[VAL_13:.*]] = arith.addf %[[VAL_8]], %[[VAL_10]] : vector<4xf32>
		// CHECK: %[[VAL_14:.]] = vector.mask %[[VAL_7]] { vector.transfer_write %{{.}} {in_bounds = [true]} : vector<4xf32>, tensor<?xf32> } : vector<4xi1> -> tensor<?xf32>

		transform.sequence failures(propagate) {
		^bb1(%arg1: !pdl.operation):
		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
		transform.structured.masked_vectorize %0 vector_sizes [4]
		}

		// -----

		func.func @vectorize_dynamic_1d_broadcast(%arg0: tensor<?xf32>,
		%arg1: tensor<?xf32>,
		%arg2: tensor<?xf32>) -> tensor<?xf32> {
		%0 = linalg.generic { indexing_maps = [affine_map<(d0) -> (0)>,
		affine_map<(d0) -> (d0)>,
		affine_map<(d0) -> (d0)>],
		iterator_types = ["parallel"] }
		ins(%arg0, %arg1 : tensor<?xf32>, tensor<?xf32>)
		outs(%arg2 : tensor<?xf32>) {
		^bb(%in0: f32, %in1: f32, %out: f32) :
		%0 = arith.addf %in0, %in1 : f32
		linalg.yield %0 : f32
		} -> tensor<?xf32>
		return %0 : tensor<?xf32>
		}

		// CHECK-LABEL: @vectorize_dynamic_1d_broadcast
		// CHECK: %[[VAL_3:.*]] = arith.constant 0 : index
		// CHECK: %[[VAL_4:.]] = tensor.dim %{{.}}, %[[VAL_3]] : tensor<?xf32>
		// CHECK: %[[VAL_7:.]] = vector.transfer_read %{{.}} {permutation_map = #{{.*}}} : tensor<?xf32>, vector<4xf32>
		// CHECK: %[[VAL_9:.*]] = vector.create_mask %[[VAL_4]] : vector<4xi1>
		// CHECK: %[[VAL_10:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
		// CHECK: %[[VAL_12:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
		// CHECK: %[[VAL_13:.*]] = arith.addf %[[VAL_7]], %[[VAL_10]] : vector<4xf32>
		// CHECK: %[[VAL_14:.]] = vector.mask %{{.}} { vector.transfer_write %[[VAL_13]], {{.*}} {in_bounds = [true]} : vector<4xf32>, tensor<?xf32> } : vector<4xi1> -> tensor<?xf32>

		transform.sequence failures(propagate) {
		^bb1(%arg1: !pdl.operation):
		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
		transform.structured.masked_vectorize %0 vector_sizes [4]
		}

		// -----

		func.func @vectorize_dynamic_2d_transpose(%arg0: tensor<?x?xf32>,
		%arg1: tensor<?x?xf32>,
		%arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
		%0 = linalg.generic { indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>,
		affine_map<(d0, d1) -> (d0, d1)>,
		affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"] }
		ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
		outs(%arg2 : tensor<?x?xf32>) {
		^bb(%in0: f32, %in1: f32, %out: f32) :
		%0 = arith.addf %in0, %in1 : f32
		linalg.yield %0 : f32
		} -> tensor<?x?xf32>
		return %0 : tensor<?x?xf32>
		}

		// CHECK-LABEL: @vectorize_dynamic_2d_transpose
		// CHECK: %[[VAL_3:.*]] = arith.constant 1 : index
		// CHECK: %[[VAL_4:.]] = tensor.dim %{{.}}, %[[VAL_3]] : tensor<?x?xf32>
		// CHECK: %[[VAL_5:.*]] = arith.constant 0 : index
		// CHECK: %[[VAL_6:.]] = tensor.dim %{{.}}, %[[VAL_5]] : tensor<?x?xf32>
		// CHECK: %[[VAL_9:.*]] = vector.create_mask %[[VAL_6]], %[[VAL_4]] : vector<8x4xi1>
		// CHECK: %[[VAL_10:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true, true], permutation_map = #{{.*}}} : tensor<?x?xf32>, vector<4x8xf32> } : vector<8x4xi1> -> vector<4x8xf32>
		// CHECK: %[[VAL_12:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_6]] : vector<4x8xi1>
		// CHECK: %[[VAL_13:.]] = vector.mask %[[VAL_12]] { vector.transfer_read %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<4x8xf32> } : vector<4x8xi1> -> vector<4x8xf32>
		// CHECK: %[[VAL_14:.*]] = arith.constant 0.000000e+00 : f32
		// CHECK: %[[VAL_15:.]] = vector.mask %[[VAL_12]] { vector.transfer_read %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<4x8xf32> } : vector<4x8xi1> -> vector<4x8xf32>
		// CHECK: %[[VAL_16:.*]] = arith.addf %[[VAL_10]], %[[VAL_13]] : vector<4x8xf32>
		// CHECK: %[[VAL_17:.]] = vector.mask %[[VAL_12]] { vector.transfer_write %[[VAL_16]], %{{.}} {in_bounds = [true, true]} : vector<4x8xf32>, tensor<?x?xf32> } : vector<4x8xi1> -> tensor<?x?xf32>

		transform.sequence failures(propagate) {
		^bb1(%arg1: !pdl.operation):
		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
		transform.structured.masked_vectorize %0 vector_sizes [4, 8]
		}

		// -----

		func.func @vectorize_dynamic_generic_2d_broadcast(%arg0: tensor<?x?xf32>,
		%arg1: tensor<?x?xf32>,
		%arg2: tensor<?x?xf32>) -> tensor<?x?xf32> {
		%0 = linalg.generic { indexing_maps = [affine_map<(d0, d1) -> (0, d1)>,
		affine_map<(d0, d1) -> (d0, d1)>,
		affine_map<(d0, d1) -> (d0, d1)>],
		iterator_types = ["parallel", "parallel"] }
		ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>)
		outs(%arg2 : tensor<?x?xf32>) {
		^bb(%in0: f32, %in1: f32, %out: f32) :
		%0 = arith.addf %in0, %in1 : f32
		linalg.yield %0 : f32
		} -> tensor<?x?xf32>
		return %0 : tensor<?x?xf32>
		}

		// CHECK-LABEL: @vectorize_dynamic_generic_2d_broadcast
		// CHECK: %[[VAL_3:.*]] = arith.constant 0 : index
		// CHECK: %[[VAL_4:.]] = tensor.dim %{{.}}, %[[VAL_3]] : tensor<?x?xf32>
		// CHECK: %[[VAL_5:.*]] = arith.constant 1 : index
		// CHECK: %[[VAL_6:.]] = tensor.dim %{{.}}, %[[VAL_5]] : tensor<?x?xf32>
		// CHECK: %[[VAL_9:.*]] = vector.create_mask %[[VAL_6]] : vector<8xi1>
		// CHECK: %[[VAL_10:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true, true], permutation_map = #{{.*}}} : tensor<?x?xf32>, vector<4x8xf32> } : vector<8xi1> -> vector<4x8xf32>
		// CHECK: %[[VAL_12:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_6]] : vector<4x8xi1>
		// CHECK: %[[VAL_13:.]] = vector.mask %[[VAL_12]] { vector.transfer_read %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<4x8xf32> } : vector<4x8xi1> -> vector<4x8xf32>
		// CHECK: %[[VAL_15:.]] = vector.mask %[[VAL_12]] { vector.transfer_read %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<4x8xf32> } : vector<4x8xi1> -> vector<4x8xf32>
		// CHECK: %[[VAL_16:.*]] = arith.addf %[[VAL_10]], %[[VAL_13]] : vector<4x8xf32>
		// CHECK: %[[VAL_18:.]] = vector.mask %[[VAL_12]] { vector.transfer_write %{{.}} {in_bounds = [true, true]} : vector<4x8xf32>, tensor<?x?xf32> } : vector<4x8xi1> -> tensor<?x?xf32>

		transform.sequence failures(propagate) {
		^bb1(%arg1: !pdl.operation):
		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
		transform.structured.masked_vectorize %0 vector_sizes [4, 8]
		}

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,302 Lines • ▼ Show 20 Lines	deps = [
":FuncDialect",		":FuncDialect",
":FuncTransforms",		":FuncTransforms",
":IR",		":IR",
":LinalgAnalysis",		":LinalgAnalysis",
":LinalgDialect",		":LinalgDialect",
":LinalgPassIncGen",		":LinalgPassIncGen",
":LinalgStructuredOpsIncGen",		":LinalgStructuredOpsIncGen",
":LinalgUtils",		":LinalgUtils",
		":MaskableOpInterface",
":MathDialect",		":MathDialect",
":MemRefDialect",		":MemRefDialect",
":MemRefTransforms",		":MemRefTransforms",
":Pass",		":Pass",
":SCFDialect",		":SCFDialect",
":SCFTransforms",		":SCFTransforms",
":SCFUtils",		":SCFUtils",
":SparseTensorDialect",		":SparseTensorDialect",
▲ Show 20 Lines • Show All 2,097 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Initial masking support in Linalg vectorizerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 482323

mlir/include/mlir/Dialect/Linalg/IR/LinalgInterfaces.td

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

mlir/include/mlir/Dialect/Vector/Interfaces/MaskableOpInterface.td

mlir/include/mlir/Dialect/Vector/Transforms/Passes.h

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/lib/Dialect/Vector/IR/CMakeLists.txt

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

mlir/lib/Dialect/Vector/Transforms/LowerVectorMask.cpp

mlir/test/Dialect/Linalg/vectorization.mlir

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

[mlir][Vector] Initial masking support in Linalg vectorizer
ClosedPublic