This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/Linalg/Transforms/
-
Linalg/
-
Transforms/
-
Transforms.h
-
IR/
1/1
AffineMap.h
-
lib/
-
Dialect/Linalg/
-
Linalg/
-
TransformOps/
2/2
LinalgTransformOps.cpp
-
Transforms/
25/37
Vectorization.cpp
-
IR/
5/5
AffineMap.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
vectorization.mlir

Differential D137690

[mlir][Vector] Initial masking support in Linalg vectorizer
ClosedPublic

Authored by dcaballe on Nov 8 2022, 11:09 PM.

Download Raw Diff

Details

Reviewers

rriddle
aartbik
nicolasvasilache
hanchung

Commits

rG72fd36448d7c: [mlir][Vector] Initial masking support in Linalg vectorizer

Summary

This patch introduces the initial bits to support vector masking
using the vector.mask operation. Vectorization changes should be
NFC for non-masked cases. We can't test masked cases directly until
we extend the Transform dialect to support masking.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dcaballe created this revision.Nov 8 2022, 11:09 PM

Herald added a reviewer: rriddle. · View Herald TranscriptNov 8 2022, 11:09 PM

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 21 others. · View Herald Transcript

dcaballe requested review of this revision.Nov 8 2022, 11:09 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptNov 8 2022, 11:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2022, 11:09 PM

Herald added subscribers: • pcwang-thead, limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

dcaballe added inline comments.Nov 8 2022, 11:12 PM

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1704	Hey @nicolasvasilache, do you think you could help extending the Transform dialect so that we can provide the vector sizes for masked dims?

dcaballe planned changes to this revision.Nov 8 2022, 11:26 PM

Harbormaster completed remote builds in B196829: Diff 474162.Nov 8 2022, 11:39 PM

Fixing a couple of issues when no vector sizes for masked dimensions are not provided

Harbormaster completed remote builds in B197329: Diff 474887.Nov 11 2022, 4:44 PM

nicolasvasilache requested changes to this revision.Nov 13 2022, 6:39 PM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1704	Happy to! Could you temporarily hardcode some size in there and add some test IR so I can see what I should expect? This will likely require a new transform op that is not a blanket "vectorize the world" so that we can pass the information you want at a finer granularity. This will likely need some iteration to get to a reasonably scalable usage. Left some other review comments in the meantime.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
80	super-nit: can we make the line spacing uniform between methods(here) and members(below)?
115	plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit.
115	Should this be a (better named) helper on the LinalgOp interface? This seems to reimplement common functionality (but maybe not exactly) and rely on deep internal Linalg assumptions (e.g. all possible ways of defining the extent of an iterator have to match and you can therefore take the first one). The name makes it hard to understand what it does and we should be doing any such manipulation in a very localized place in LinalgOp.
143	unsigned purge here and everywhere plz
160	Can we call these `staticUpperBounds` everywhere? And the other ones `dynamicUpperBounds` ? This seems easier to me to relate to what we're looking to do instead of `vecSizesForMaskedDims` and `extractDynamicVectorDimValues`.
164	How about early exit here ? if (!linalgOp.hasDynamicShape()) { canonicalVecShape = linalgOp.getStaticLoopRanges(); return success(); } I don't think you need the checks and debugs after that in the static case?
172	Seems fishy, what happens in this case ? I'd expect this to be not fail gracefully .. Make it an assert and lift logic to the precondition to avoid this? Edit: ah scratch that, I see that this is just after the precondition, can we make it part of the precondition?
177	Can we sprinkle a few precompute prefixes in some of these APIs to make it clear what happens at init time?
184	LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs() << ...));
254	pass RewriterBase here and everywhere possible post https://reviews.llvm.org/D137922 plz
279	Plz use `updateRootInPlace` once RewriterBase is piped through.
744	nit: can we spell this as: // 3.a. Convert the indexing map for this input/output to a transfer read... ... /// 3.a.i For input reads we use the canonical vector shape. if (linalgOp.isDpsInput(opOperand)) ... } else { /// 3.a.ii For output reads (iteration-carried dependence, e.g., reductions) ... // 3.b. If masked, set in-bounds to true. ... // 3.c. Not all ops support 0-d vectors,
929	Can we make this init the state as part of the precondition?
mlir/lib/IR/AffineMap.cpp
340	`Fails when called on a non-projected-permutation.` is misleading here. It expects a projected permutation otherwise it crashes. Failing would have returned llvm::None without crashing unless I am missing something?
343	Better name and doc please, this is much too confusing. In fact I think you can just do something like `llvm::find(map.getResults(), AffineDimExpr::get(input))` at the client and avoid adding more APIs to AffineMap

This revision now requires changes to proceed.Nov 13 2022, 6:39 PM

Addressed feedback
New changes around vectorization initState and canonical vector shape computation

Herald added subscribers: hanchung, ThomasRaoux, jsetoain. · View Herald TranscriptNov 23 2022, 6:21 PM

dcaballe added inline comments.Nov 23 2022, 6:21 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
115	plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit. Sorry,`dim` is a misnomer as it refers to the dimension position, not the size. It has to be `unsigned` as that's what it's expected by the `AffineMap::getXXXYYYPosition()`. Added to `Pos` to a few names. Hopefully that's better. Moved it to LinalgOp interface. I can't think of a better name for the utility... It's just mapping an iteration space dimension to a dimension of an operand... Any other suggestion? I'm happy to replace with with any other existing utility but I couldn't find any.
143	As explained before, renamed to indicate that it's the position, not the size.
160	I had already renamed this locally to `inputVectorSizes` and change a bit the meaning. The input sizes are now taken into account to compute the canonical vector shape and if they are also provided for static shapes they should match the size of the static shapes. We are passing them all now to simplify the client API, including the transform dialect, as it's easier to provide all the vector sizes than having to filter out the static ones. Let me know if that works.
164	This is gone now. This code has changed a bit in the last version.
172	Let me know if it makes more sense in the new version, where the `inputVectorSizes`, if provided, should match the `linalgop.getNumLoops()`. Otherwise, this would be a bug.
177	Much better!
184	Ah! I didn't know this utility! It's been such a pain to always print SmallVector's... Thanks!
254	I think we are going in the opposite direction based on the review comments?
929	Probably better to separate the concerns. I think even part of the precondition checks are reused outside of the vectorizer. A public interface was introduced recently.
mlir/lib/IR/AffineMap.cpp
343	I had renamed this like 10 times. It's a difficult name. Hopefully it's better now :).

Harbormaster completed remote builds in B199334: Diff 477667.Nov 23 2022, 7:16 PM

dcaballe planned changes to this revision.Nov 24 2022, 12:12 AM

nicolasvasilache added inline comments.Nov 29 2022, 6:54 AM

mlir/lib/IR/AffineMap.cpp
343	Wait .. what do I see just above .. literally the same functionality modulo an assert .. Can we just have a single Optional<int64_t> AffineMap::getResultPosition(AffineExpr e) const { for (int64_t i = 0, numResults = getNumResults(); i < numResults; i++) if (getResult(i) == e) return i; return llvm::None; } and let clients do the assertions they want ? It seems very counterproductive to have all these special case functions with slightly varying assertions and hard to grok names ..

Addressed feedback + minor fixes.
Please ignore the AffineMap utility. It will be removed after rebasing on top of D138946.

Herald added a reviewer: hanchung. · View Herald TranscriptNov 30 2022, 5:50 PM

Harbormaster completed remote builds in B200399: Diff 479129.Nov 30 2022, 5:51 PM

nicolasvasilache mentioned this in D137922: [mlir][Linalg] NFC - Purge OpBuilder uses in favor of RewriterBase in places unrelated to op definitions.Dec 1 2022, 3:01 AM

I do not understand the implication of

// TODO: We mask the transfer.transfer_write here because this op is
// special-cased. A linalg.yield may produced multiple vector.transfer_write
// ops and can't be mapped using BlockAndValueMapping.
AffineMap opOperandMap = linalgOp.getMatchingIndexingMap(opOperand);
write = state.maskOperation(b, write, linalgOp, opOperandMap);

.. also I am not seeing any test changes, so it seems you are adding a lot of code that is not tested and not activated ?

mlir/include/mlir/IR/AffineMap.h
177	I think this goes away with the rebase, just flagging for removal so we don't forget.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
75	Why not make this a ctor?
254	We cannot have code like this: Operation *maskOpTerminator = &maskOp.getMaskRegion().front().back(); for (auto &en : llvm::enumerate(opToMask->getResults())) en.value().replaceAllUsesExcept(maskOp.getResult(en.index()), maskOpTerminator); it must use a RewriterBase with `updateRootInPlace`
290	Hmm .. what's the contract between this `createRegionMask` lambda and the insertion points during `builder.create<vector::MaskOp>` ? I've seen too much ugly stuff re. insertion points leaking across function call boundaries. Let's add an OpBuilder::InsertionGuard at the top of this function.
306–307	this must use a RewriterBase with updateRootInPlace
472	nit: produce

This revision now requires changes to proceed.Dec 1 2022, 7:00 AM

Addressed feedback.

.. also I am not seeing any test changes, so it seems you are adding a lot of code that is not tested and not activated ?

Changes in the overall vectorization algorithm are tested with existing vectorization tests. This patch is NFC for those. Masking is not enabled if inputVectorSizes are not provided. If they are provided, only elementwise ops without reductions and fully dynamic shapes are vectorized.
I can't add unit tests until the new operation for masked vectorization is added to the transform dialect, as we no longer have the vectorizer testing pass. However, this PR has been extensively tested in IREE, both with and without masking for even more cases than the currently supported right now.
Waiting on the new transform dialect op to land to add more tests.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
75	Because it can fail, at least for now. This will change once we support more cases with masking. Then, we could assert and turn it into a constructor.
254	I can use `updateRootInPlace` when we have a rewriter here but I can't do any replacement method because `opToMask` is moved inside the mask region, not replaced.
290	Added guard. This is a simple lambda to create the op region. We follow the same approach for `scf.if` and other region ops. The only contract is that the region needs to have a `vector::YieldOp`, which is described in the `vector.mask` doc.
306–307	Added TODO until we have a rewriter.
472	I do not understand the implication of TODO: We mask the transfer.transfer_write here because this op is special-cased. A linalg.yield may produced multiple vector.transfer_write // ops and can't be mapped using BlockAndValueMapping. Good point. This is a comment for an old problem. I removed it and moved this code to `buildVectorWrite`.
mlir/lib/IR/AffineMap.cpp
343	Extracted this to https://reviews.llvm.org/D138946

Harbormaster completed remote builds in B200701: Diff 479525.Dec 2 2022, 1:40 AM

Added testing support to Transform dialect + tests

Harbormaster completed remote builds in B201270: Diff 480308.Dec 5 2022, 6:23 PM

Rebase + remove dead code (wrong rebase)

Harbormaster completed remote builds in B201299: Diff 480353.Dec 6 2022, 4:31 AM

Waiting on the new transform dialect op to land to add more tests.

Thanks for integrating it and adding tests, the testing part LGTM, till need to make another pass on the last version of the code.

Thanks @dcaballe !

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
1120 ↗	(On Diff #480353)	"definite failure"
1138 ↗	(On Diff #480353)	can you add a `TODO: applyToOne` plz ?
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
229	Use the `find` API and the iterator you get from it to avoid multi-lookups
270	nit: uppermension :) ?
436	can you add a TODO to tighten op semantics so that we don't mix inbounds and mask since this is well defined?
760	I'll need to revisit all this in light of the broadcast separation. From a cursory glance this looks reasonable, let's land and iterate.
891	ok as a first appox.

This revision is now accepted and ready to land.Dec 6 2022, 10:07 AM

gflegar added a subscriber: gflegar.Dec 9 2022, 12:33 AM

Thanks! I addressed comments. Landing now...

Closed by commit rG72fd36448d7c: [mlir][Vector] Initial masking support in Linalg vectorizer (authored by dcaballe). · Explain WhyDec 12 2022, 5:36 PM

This revision was automatically updated to reflect the committed changes.

dcaballe added a commit: rG72fd36448d7c: [mlir][Vector] Initial masking support in Linalg vectorizer.

@dcaballe, I was looking at this PR as I was doing some spelunking and I realize we are not testing the case of the SSA value as well as the error case when the SSA value is not a constant.

Can we please add the missing tests in a a followup ?

Herald added subscribers: wangpc, bviyer, awarzynski. · View Herald TranscriptAug 10 2023, 5:15 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Transforms.h

5 lines

IR/

AffineMap.h

6 lines

lib/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.cpp

1 line

Transforms/

Vectorization.cpp

432 lines

IR/

AffineMap.cpp

12 lines

test/

Dialect/

Linalg/

vectorization.mlir

4 lines

Diff 474162

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

	Show First 20 Lines • Show All 322 Lines • ▼ Show 20 Lines
	/// 2. Take a full view on the buffer.			/// 2. Take a full view on the buffer.
	/// 3. Take a partial slice of the full view in step 2. and copy into it.			/// 3. Take a partial slice of the full view in step 2. and copy into it.
	///			///
	/// Return the modified linalg op (the modification happens in place) as well			/// Return the modified linalg op (the modification happens in place) as well
	/// as all the copy ops created.			/// as all the copy ops created.
	FailureOr<LinalgOp> promoteSubViews(OpBuilder &b, LinalgOp op,			FailureOr<LinalgOp> promoteSubViews(OpBuilder &b, LinalgOp op,
	const LinalgPromotionOptions &options);			const LinalgPromotionOptions &options);

	/// Emit a suitable vector form for a Linalg op with fully static shape.			/// Emit a suitable vector form for a Linalg op.
	LogicalResult vectorize(RewriterBase &builder, LinalgOp linalgOp);			LogicalResult vectorize(RewriterBase &rewriter, LinalgOp linalgOp,
				ArrayRef<int64_t> vecSizesForMaskedDims = {});

	/// Emit a suitable vector form for a Copy op with fully static shape.			/// Emit a suitable vector form for a Copy op with fully static shape.
	LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);			LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);

	/// Emit a loop nest of `scf.for` with the proper body for `linalgOp`.			/// Emit a loop nest of `scf.for` with the proper body for `linalgOp`.
	FailureOr<LinalgLoops> linalgOpToLoops(PatternRewriter &rewriter,			FailureOr<LinalgLoops> linalgOpToLoops(PatternRewriter &rewriter,
	LinalgOp linalgOp);			LinalgOp linalgOp);

	▲ Show 20 Lines • Show All 685 Lines • Show Last 20 Lines

mlir/include/mlir/IR/AffineMap.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	public:
/// Extracts the position of the dimensional expression at the given result,		/// Extracts the position of the dimensional expression at the given result,
/// when the caller knows it is safe to do so.		/// when the caller knows it is safe to do so.
unsigned getDimPosition(unsigned idx) const;		unsigned getDimPosition(unsigned idx) const;

/// Extracts the permuted position where given input index resides.		/// Extracts the permuted position where given input index resides.
/// Fails when called on a non-permutation.		/// Fails when called on a non-permutation.
unsigned getPermutedPosition(unsigned input) const;		unsigned getPermutedPosition(unsigned input) const;

		/// Extracts the permuted position where the given input index resides.
		/// Returns `llvm::None` if the input index is projected. Fails when called on
		/// a non-projected-permutation.
		Optional<unsigned>
		getProjectedPermutationPermutedPosition(unsigned input) const;
		nicolasvasilacheUnsubmitted Done Reply Inline Actions I think this goes away with the rebase, just flagging for removal so we don't forget. nicolasvasilache: I think this goes away with the rebase, just flagging for removal so we don't forget.

/// Return true if any affine expression involves AffineDimExpr `position`.		/// Return true if any affine expression involves AffineDimExpr `position`.
bool isFunctionOfDim(unsigned position) const {		bool isFunctionOfDim(unsigned position) const {
return llvm::any_of(getResults(), [&](AffineExpr e) {		return llvm::any_of(getResults(), [&](AffineExpr e) {
return e.isFunctionOfDim(position);		return e.isFunctionOfDim(position);
});		});
}		}

/// Return true if any affine expression involves AffineSymbolExpr `position`.		/// Return true if any affine expression involves AffineSymbolExpr `position`.
▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

	Show First 20 Lines • Show All 1,694 Lines • ▼ Show 20 Lines
	struct VectorizationPattern : public RewritePattern {			struct VectorizationPattern : public RewritePattern {
	explicit VectorizationPattern(MLIRContext *context)			explicit VectorizationPattern(MLIRContext *context)
	: RewritePattern(MatchAnyOpTypeTag(), /benefit=/1, context) {}			: RewritePattern(MatchAnyOpTypeTag(), /benefit=/1, context) {}
	LogicalResult matchAndRewrite(Operation *op,			LogicalResult matchAndRewrite(Operation *op,
	PatternRewriter &rewriter) const override {			PatternRewriter &rewriter) const override {
	LinalgOp linalgOp = dyn_cast<LinalgOp>(op);			LinalgOp linalgOp = dyn_cast<LinalgOp>(op);
	if (!linalgOp)			if (!linalgOp)
	return failure();			return failure();
				// TODO: Pass vector sizes for masked dims.
	return vectorize(rewriter, linalgOp);			return vectorize(rewriter, linalgOp);
				dcaballeAuthorUnsubmitted Done Reply Inline Actions Hey @nicolasvasilache, do you think you could help extending the Transform dialect so that we can provide the vector sizes for masked dims? dcaballe: Hey @nicolasvasilache, do you think you could help extending the Transform dialect so that we…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Happy to! Could you temporarily hardcode some size in there and add some test IR so I can see what I should expect? This will likely require a new transform op that is not a blanket "vectorize the world" so that we can pass the information you want at a finer granularity. This will likely need some iteration to get to a reasonably scalable usage. Left some other review comments in the meantime. nicolasvasilache: Happy to! Could you temporarily hardcode some size in there and add some test IR so I can see…
	}			}
	};			};
	} // namespace			} // namespace

	DiagnosedSilenceableFailure			DiagnosedSilenceableFailure
	transform::VectorizeOp::applyToOne(Operation *target,			transform::VectorizeOp::applyToOne(Operation *target,
	SmallVectorImpl<Operation *> &results,			SmallVectorImpl<Operation *> &results,
	transform::TransformState &state) {			transform::TransformState &state) {
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show All 16 Lines
#include "mlir/Dialect/Func/IR/FuncOps.h"		#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h"		#include "mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Utils/StructuredOpsUtils.h"		#include "mlir/Dialect/Utils/StructuredOpsUtils.h"
#include "mlir/Dialect/Vector/IR/VectorOps.h"		#include "mlir/Dialect/Vector/IR/VectorOps.h"
		#include "mlir/Dialect/Vector/Interfaces/MaskableOpInterface.h"
#include "mlir/Dialect/Vector/Transforms/VectorTransforms.h"		#include "mlir/Dialect/Vector/Transforms/VectorTransforms.h"
#include "mlir/IR/AffineExpr.h"		#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/Matchers.h"		#include "mlir/IR/Matchers.h"
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
#include "mlir/Transforms/RegionUtils.h"		#include "mlir/Transforms/RegionUtils.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/Sequence.h"		#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/TypeSwitch.h"		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <type_traits>		#include <type_traits>
Show All 21 Lines	if (res) {
return WalkResult::interrupt();		return WalkResult::interrupt();
}		}
res = op;		res = op;
return WalkResult::advance();		return WalkResult::advance();
});		});
return res;		return res;
}		}

		/// Contains the vectorization state and related methods used across the
		/// vectorization process of a given operation.
		struct VectorizationState {
		/// Initializes the vectorization state, including the computation of the
		/// canonical vector shape for vectorization.
		LogicalResult initState(OpBuilder &builder, LinalgOp linalgOp,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Why not make this a ctor? nicolasvasilache: Why not make this a ctor?
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Because it can fail, at least for now. This will change once we support more cases with masking. Then, we could assert and turn it into a constructor. dcaballe: Because it can fail, at least for now. This will change once we support more cases with masking.
		ArrayRef<int64_t> vecSizesForMaskedDims);

		/// Returns the canonical vector shape used to vectorize the iteration space.
		ArrayRef<int64_t> getCanonicalVecShape() const { return canonicalVecShape; }
		/// Masks an operation with the canonical vector mask if the operation needs
		nicolasvasilacheUnsubmitted Done Reply Inline Actions super-nit: can we make the line spacing uniform between methods(here) and members(below)? nicolasvasilache: super-nit: can we make the line spacing uniform between methods(here) and members(below)?
		/// masking. Returns the masked operation or the original operation if masking
		/// is not needed. If provided, the canonical mask for this operation is
		/// permuted using `maybeMaskPermMap`.
		Operation maskOperation(OpBuilder &builder, Operation opToMask,
		LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskPermMap = llvm::None);

		/// Holds the active masks for permutations of the canonical vector iteration
		/// space.
		DenseMap<AffineMap, Value> activeMaskCache;
		/// Holds the values of the sizes of the masked dimensions.
		SmallVector<Value> maskedDimSizeValues;

		private:
		/// Generates 'tensor.dim' operations for all the dynamic dimensions of the
		/// given operation to be vectorized and store them in `maskedDimSizeValues`.
		LogicalResult extractDynamicVectorDimValues(OpBuilder &builder,
		LinalgOp linalgOp);

		/// Create or retrieve an existing mask value to mask `opToMask` in the
		/// canonical vector iteration space. If `maybeMaskPermMap` the mask is
		/// permuted using that permutation map. If a new mask is created, it will be
		/// cached for future users.
		Value getOrCreateMaskFor(OpBuilder &builder, Operation *opToMask,
		LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskPermMap);

		/// Holds the canonical vector shape used to vectorize the iteration space.
		SmallVector<int64_t> canonicalVecShape;
		};

		/// Given a dimension of the iteration space of an operation, finds an operand
		/// in the operation that is defined on such dimension. Returns the operand and
		/// the operand dimension.
		static void mapIterationSpaceDimToOperandDim(unsigned dim, LinalgOp linalgOp,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit. nicolasvasilache: plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit. Sorry,`dim` is a misnomer as it refers to the dimension position, not the size. It has to be `unsigned` as that's what it's expected by the `AffineMap::getXXXYYYPosition()`. Added to `Pos` to a few names. Hopefully that's better. Moved it to LinalgOp interface. I can't think of a better name for the utility... It's just mapping an iteration space dimension to a dimension of an operand... Any other suggestion? I'm happy to replace with with any other existing utility but I couldn't find any. dcaballe: > plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Should this be a (better named) helper on the LinalgOp interface? This seems to reimplement common functionality (but maybe not exactly) and rely on deep internal Linalg assumptions (e.g. all possible ways of defining the extent of an iterator have to match and you can therefore take the first one). The name makes it hard to understand what it does and we should be doing any such manipulation in a very localized place in LinalgOp. nicolasvasilache: Should this be a (better named) helper on the LinalgOp interface? This seems to reimplement…
		Value &operand,
		unsigned &operandDim) {
		// Retrieve the operand and its dimension from the first operand with a
		// permutation map that is defined on such dimension.
		for (auto &en : llvm::enumerate(linalgOp.getIndexingMapsArray())) {
		AffineMap idxMap = en.value();
		if (idxMap.isProjectedPermutation()) {
		auto mayOperandDim = idxMap.getProjectedPermutationPermutedPosition(dim);
		if (mayOperandDim) {
		operand = linalgOp->getOperand(en.index());
		operandDim = *mayOperandDim;
		return;
		}
		}
		}

		llvm_unreachable("Unsupported linalg op");
		}

		/// Generates 'tensor.dim' operations for all the dynamic dimensions of the
		/// given operation to be vectorized and store them in `maskedDimSizeValues`.
		LogicalResult
		VectorizationState::extractDynamicVectorDimValues(OpBuilder &builder,
		LinalgOp linalgOp) {
		assert(canonicalVecShape.empty() && "The canonical vector shape is empty");
		for (int vecDim = 0, end = canonicalVecShape.size(); vecDim < end; ++vecDim) {
		Value operand;
		unsigned operandDim = std::numeric_limits<unsigned>::max();
		nicolasvasilacheUnsubmitted Done Reply Inline Actions unsigned purge here and everywhere plz nicolasvasilache: unsigned purge here and everywhere plz
		dcaballeAuthorUnsubmitted Done Reply Inline Actions As explained before, renamed to indicate that it's the position, not the size. dcaballe: As explained before, renamed to indicate that it's the position, not the size.
		mapIterationSpaceDimToOperandDim(vecDim, linalgOp, operand, operandDim);
		assert(operand && operandDim != std::numeric_limits<unsigned>::max() &&
		"Masked dimension mapping didn't happen");

		auto dynDim =
		builder.create<tensor::DimOp>(linalgOp.getLoc(), operand, operandDim);
		maskedDimSizeValues.push_back(dynDim);
		}

		return success();
		}

		/// Initializes the vectorization state, including the computation of the
		/// canonical vector shape for vectorization.
		LogicalResult
		VectorizationState::initState(OpBuilder &builder, LinalgOp linalgOp,
		ArrayRef<int64_t> vecSizesForMaskedDims) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can we call these `staticUpperBounds` everywhere? And the other ones `dynamicUpperBounds` ? This seems easier to me to relate to what we're looking to do instead of `vecSizesForMaskedDims` and `extractDynamicVectorDimValues`. nicolasvasilache: Can we call these `staticUpperBounds` everywhere? And the other ones `dynamicUpperBounds` ?
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I had already renamed this locally to `inputVectorSizes` and change a bit the meaning. The input sizes are now taken into account to compute the canonical vector shape and if they are also provided for static shapes they should match the size of the static shapes. We are passing them all now to simplify the client API, including the transform dialect, as it's easier to provide all the vector sizes than having to filter out the static ones. Let me know if that works. dcaballe: I had already renamed this locally to `inputVectorSizes` and change a bit the meaning. The…
		assert((vecSizesForMaskedDims.empty() \|\|
		vecSizesForMaskedDims.size() == linalgOp.getNumLoops()) &&
		"Sizes for masked dims don't match iteration space dims");

		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions How about early exit here ? if (!linalgOp.hasDynamicShape()) { canonicalVecShape = linalgOp.getStaticLoopRanges(); return success(); } I don't think you need the checks and debugs after that in the static case? nicolasvasilache: How about early exit here ? ``` if (!linalgOp.hasDynamicShape()) { canonicalVecShape =…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions This is gone now. This code has changed a bit in the last version. dcaballe: This is gone now. This code has changed a bit in the last version.
		if (linalgOp.hasDynamicShape()) {
		// TODO: Only support fully dynamic ops for now.
		if (!llvm::all_of(vecSizesForMaskedDims, ShapedType::isDynamic))
		return failure();

		canonicalVecShape.append(vecSizesForMaskedDims.begin(),
		vecSizesForMaskedDims.end());
		return extractDynamicVectorDimValues(builder, linalgOp);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Seems fishy, what happens in this case ? I'd expect this to be not fail gracefully .. Make it an assert and lift logic to the precondition to avoid this? Edit: ah scratch that, I see that this is just after the precondition, can we make it part of the precondition? nicolasvasilache: Seems fishy, what happens in this case ? I'd expect this to be not fail gracefully .. Make it…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Let me know if it makes more sense in the new version, where the `inputVectorSizes`, if provided, should match the `linalgop.getNumLoops()`. Otherwise, this would be a bug. dcaballe: Let me know if it makes more sense in the new version, where the `inputVectorSizes`, if…
		} else {
		canonicalVecShape = linalgOp.getStaticLoopRanges();
		}

		return success();
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Can we sprinkle a few precompute prefixes in some of these APIs to make it clear what happens at init time? nicolasvasilache: Can we sprinkle a few precompute prefixes in some of these APIs to make it clear what happens…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Much better! dcaballe: Much better!
		}

		/// Create or retrieve an existing mask value to mask `opToMask` in the
		/// canonical vector iteration space. If `maybeMaskPermMap` the mask is permuted
		/// using that permutation map. If a new mask is created, it will be cached for
		/// future users.
		Value VectorizationState::getOrCreateMaskFor(
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs() << ...)); nicolasvasilache: ``` LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs() << ...)); ```
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Ah! I didn't know this utility! It's been such a pain to always print SmallVector's... Thanks! dcaballe: Ah! I didn't know this utility! It's been such a pain to always print SmallVector's... Thanks!
		OpBuilder &builder, Operation *opToMask, LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskPermMap) {
		// No mask is needed if no masked dim sizes provided.
		if (maskedDimSizeValues.empty())
		return Value();

		// No mask is needed if the operation is not maskable.
		auto maskableOp = dyn_cast<vector::MaskableOpInterface>(opToMask);
		if (!maskableOp)
		return Value();

		assert(!maskableOp.isMasked() &&
		"Masking an operation that is already masked");

		// If no mask permutation map was provided, use an identity map with the loop
		// dims.
		AffineMap maskPermMap =
		maybeMaskPermMap ? *maybeMaskPermMap
		: AffineMap::getMultiDimIdentityMap(
		linalgOp.getNumLoops(), builder.getContext());

		// Return active mask for the indexing map of this operand if it was already
		// created.
		if (activeMaskCache.count(maskPermMap)) {
		Value mask = activeMaskCache[maskPermMap];
		LDBG("Reusing mask: " << mask << "\n");
		return mask;
		}

		// Premute the dimension values and vector sizes so that they align with the
		// dimension order of the mask.
		SmallVector<Value> permVecDimValues =
		applyPermutationMap(maskPermMap, ArrayRef<Value>(maskedDimSizeValues));
		SmallVector<int64_t> permVecShape =
		applyPermutationMap(maskPermMap, ArrayRef<int64_t>(canonicalVecShape));

		// Create the mask based on the runtime value of the dimensions to be
		// vectorized.
		auto maskType = VectorType::get(permVecShape, builder.getI1Type());
		Value mask = builder.create<vector::CreateMaskOp>(linalgOp.getLoc(), maskType,
		permVecDimValues);
		LDBG("Creating new mask: " << mask << "\n");
		activeMaskCache[maskPermMap] = mask;
		return mask;
		}
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Use the `find` API and the iterator you get from it to avoid multi-lookups nicolasvasilache: Use the `find` API and the iterator you get from it to avoid multi-lookups

		/// Masks an operation with the canonical vector mask if the operation needs
		/// masking. Returns the masked operation or the original operation if masking
		/// is not needed. If provided, the canonical mask for this operation is
		/// permuted using `maybeMaskPermMap`.
		Operation *
		VectorizationState::maskOperation(OpBuilder &builder, Operation *opToMask,
		LinalgOp linalgOp,
		Optional<AffineMap> maybeMaskPermMap) {
		// Create or retrieve mask for this operation.
		Value mask =
		getOrCreateMaskFor(builder, opToMask, linalgOp, maybeMaskPermMap);

		if (!mask) {
		LDBG("No mask required for: " << *opToMask << "\n");
		return opToMask;
		}

		// Wrap `opToMask` with a new `vector.mask` and update D-U chain.
		assert(opToMask && "Expected a valid operation to mask");
		auto maskOp = builder.create<vector::MaskOp>(
		opToMask->getLoc(), opToMask->getResultTypes().front(), mask,
		[opToMask](OpBuilder &builder, Location loc) {
		Block *insBlock = builder.getInsertionBlock();
		insBlock->getOperations().splice(
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions pass RewriterBase here and everywhere possible post https://reviews.llvm.org/D137922 plz nicolasvasilache: pass RewriterBase here and everywhere possible post https://reviews.llvm.org/D137922 plz
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I think we are going in the opposite direction based on the review comments? dcaballe: I think we are going in the opposite direction based on the review comments?
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions We cannot have code like this: Operation maskOpTerminator = &maskOp.getMaskRegion().front().back(); for (auto &en : llvm::enumerate(opToMask->getResults())) en.value().replaceAllUsesExcept(maskOp.getResult(en.index()), maskOpTerminator); it must use a RewriterBase with `updateRootInPlace` nicolasvasilache:* We cannot have code like this: ``` Operation *maskOpTerminator = &maskOp.getMaskRegion().
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I can use `updateRootInPlace` when we have a rewriter here but I can't do any replacement method because `opToMask` is moved inside the mask region, not replaced. dcaballe: I can use `updateRootInPlace` when we have a rewriter here but I can't do any replacement…
		insBlock->begin(), opToMask->getBlock()->getOperations(), opToMask);
		builder.create<vector::YieldOp>(loc, opToMask->getResults());
		});

		Operation *maskOpTerminator = &maskOp.getMaskRegion().front().back();
		for (auto &en : llvm::enumerate(opToMask->getResults()))
		en.value().replaceAllUsesExcept(maskOp.getResult(en.index()),
		maskOpTerminator);

		LDBG("Masked operation: " << *maskOp << "\n");
		return maskOp;
		}

/// Given an indexing `map` coming from a LinalgOp indexing, restricted to a		/// Given an indexing `map` coming from a LinalgOp indexing, restricted to a
/// projectedPermutation, compress the unused dimensions to serve as a		/// projectedPermutation, compress the unused dimensions to serve as a
/// permutation_map for a vector transfer operation.		/// permutation_map for a vector transfer operation.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: uppermension :) ? nicolasvasilache: nit: uppermension :) ?
/// For example, given a linalg op such as:		/// For example, given a linalg op such as:
///		///
/// ```		/// ```
/// %0 = linalg.generic {		/// %0 = linalg.generic {
/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d4, d0, d2)>,		/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d4, d0, d2)>,
/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d1, d3)>		/// indexing_maps = affine_map<(d0, d1, d2, d3, d4) -> (d1, d3)>
/// }		/// }
/// ins(%0 : tensor<2x3x4xf32>)		/// ins(%0 : tensor<2x3x4xf32>)
/// outs(%1 : tensor<5x6xf32>)		/// outs(%1 : tensor<5x6xf32>)
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Plz use `updateRootInPlace` once RewriterBase is piped through. nicolasvasilache: Plz use `updateRootInPlace` once RewriterBase is piped through.
/// ```		/// ```
///		///
/// the iteration domain size of the linalg op is 3x5x4x6x2. The first affine		/// the iteration domain size of the linalg op is 3x5x4x6x2. The first affine
/// map is reindexed to `affine_map<(d0, d1, d2) -> (d2, d0, d1)>`, the second		/// map is reindexed to `affine_map<(d0, d1, d2) -> (d2, d0, d1)>`, the second
/// affine map is reindexed to `affine_map<(d0, d1) -> (d0, d1)>`.		/// affine map is reindexed to `affine_map<(d0, d1) -> (d0, d1)>`.
static AffineMap reindexIndexingMap(AffineMap map) {		static AffineMap reindexIndexingMap(AffineMap map) {
assert(map.isProjectedPermutation(/allowZeroInResults=/true) &&		assert(map.isProjectedPermutation(/allowZeroInResults=/true) &&
"expected projected permutation");		"expected projected permutation");
auto res = compressUnusedDims(map);		auto res = compressUnusedDims(map);
assert(res.getNumDims() == res.getNumResults() &&		assert(res.getNumDims() == res.getNumResults() &&
"expected reindexed map with same number of dims and results");		"expected reindexed map with same number of dims and results");
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Hmm .. what's the contract between this `createRegionMask` lambda and the insertion points during `builder.create<vector::MaskOp>` ? I've seen too much ugly stuff re. insertion points leaking across function call boundaries. Let's add an OpBuilder::InsertionGuard at the top of this function. nicolasvasilache: Hmm .. what's the contract between this `createRegionMask` lambda and the insertion points…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Added guard. This is a simple lambda to create the op region. We follow the same approach for `scf.if` and other region ops. The only contract is that the region needs to have a `vector::YieldOp`, which is described in the `vector.mask` doc. dcaballe: Added guard. This is a simple lambda to create the op region. We follow the same approach for…
return res;		return res;
}		}

/// Helper enum to represent conv1d input traversal order.		/// Helper enum to represent conv1d input traversal order.
enum class Conv1DOpOrder {		enum class Conv1DOpOrder {
Ncw, // Corresponds to operation that traverses the input in (n, c, w) order.		Ncw, // Corresponds to operation that traverses the input in (n, c, w) order.
Nwc // Corresponds to operation that traverses the input in (n, w, c) order.		Nwc // Corresponds to operation that traverses the input in (n, w, c) order.
};		};

/// Helper data structure to represent the result of vectorization.		/// Helper data structure to represent the result of vectorization.
/// In certain specific cases, like terminators, we do not want to propagate/		/// In certain specific cases, like terminators, we do not want to propagate/
enum VectorizationStatus {		enum VectorizationStatus {
/// Op failed to vectorize.		/// Op failed to vectorize.
Failure = 0,		Failure = 0,
/// Op vectorized and custom function took care of replacement logic		/// Op vectorized and custom function took care of replacement logic
NoReplace,		NoReplace,
/// Op vectorized into a new Op whose results will replace original Op's		/// Op vectorized into a new Op whose results will replace original Op's
		nicolasvasilacheUnsubmitted Done Reply Inline Actions this must use a RewriterBase with updateRootInPlace nicolasvasilache: this must use a RewriterBase with updateRootInPlace
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Added TODO until we have a rewriter. dcaballe: Added TODO until we have a rewriter.
/// results.		/// results.
NewOp		NewOp
// TODO: support values if Op vectorized to Many-Ops whose results we need to		// TODO: support values if Op vectorized to Many-Ops whose results we need to
// aggregate for replacement.		// aggregate for replacement.
};		};
struct VectorizationResult {		struct VectorizationResult {
/// Return status from vectorizing the current op.		/// Return status from vectorizing the current op.
enum VectorizationStatus status = VectorizationStatus::Failure;		enum VectorizationStatus status = VectorizationStatus::Failure;
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	static SmallVector<bool> getReductionMask(LinalgOp linalgOp) {
return llvm::to_vector(		return llvm::to_vector(
llvm::map_range(linalgOp.getIteratorTypesArray(), isReductionIterator));		llvm::map_range(linalgOp.getIteratorTypesArray(), isReductionIterator));
}		}

/// Build a vector.transfer_write of `value` into `outputOperand` at indices set		/// Build a vector.transfer_write of `value` into `outputOperand` at indices set
/// to all `0`; where `outputOperand` is an output operand of the LinalgOp		/// to all `0`; where `outputOperand` is an output operand of the LinalgOp
/// currently being vectorized. If `dest` has null rank, build an memref.store.		/// currently being vectorized. If `dest` has null rank, build an memref.store.
/// Return the produced value or null if no value is produced.		/// Return the produced value or null if no value is produced.
static Value buildVectorWrite(OpBuilder &b, Value value,		static Operation *buildVectorWrite(OpBuilder &b, Value value,
OpOperand *outputOperand) {		OpOperand *outputOperand,
		VectorizationState &state) {
Operation *write;		Operation *write;
Location loc = value.getLoc();		Location loc = value.getLoc();
auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());		auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());
ArrayRef<int64_t> shape = linalgOp.getShape(outputOperand);		AffineMap opOperandMap = linalgOp.getMatchingIndexingMap(outputOperand);
auto vectorType = VectorType::get(		auto vectorType =
shape, getElementTypeOrSelf(outputOperand->get().getType()));		VectorType::get(opOperandMap.compose(state.getCanonicalVecShape()),
		getElementTypeOrSelf(outputOperand->get().getType()));

if (vectorType.getRank() > 0) {		if (vectorType.getRank() > 0) {
// 0-d case is still special: do not invert the reindexing map.		AffineMap writeMap = reindexIndexingMap(opOperandMap);
AffineMap map =
reindexIndexingMap(linalgOp.getMatchingIndexingMap(outputOperand));
SmallVector<int64_t> transposeShape =
applyPermutationMap(inversePermutation(map), vectorType.getShape());
assert(!transposeShape.empty() && "unexpected empty transpose shape");
vectorType = VectorType::get(transposeShape, vectorType.getElementType());
SmallVector<Value> indices(linalgOp.getRank(outputOperand),		SmallVector<Value> indices(linalgOp.getRank(outputOperand),
b.create<arith::ConstantIndexOp>(loc, 0));		b.create<arith::ConstantIndexOp>(loc, 0));
value = broadcastIfNeeded(b, value, vectorType.getShape());		value = broadcastIfNeeded(b, value, vectorType.getShape());
		// If masked, set in-bounds to true. Masking guarantees that the access will
		// be in-bounds.
		SmallVector<bool> inBounds;
		if (!state.maskedDimSizeValues.empty())
		inBounds.append(vectorType.getRank(), true);

write = b.create<vector::TransferWriteOp>(loc, value, outputOperand->get(),		write = b.create<vector::TransferWriteOp>(loc, value, outputOperand->get(),
indices, map);		indices, writeMap,
		ArrayRef<bool>(inBounds));
} else {		} else {
		// 0-d case is still special: do not invert the reindexing writeMap.
if (!value.getType().isa<VectorType>())		if (!value.getType().isa<VectorType>())
value = b.create<vector::BroadcastOp>(loc, vectorType, value);		value = b.create<vector::BroadcastOp>(loc, vectorType, value);
assert(value.getType() == vectorType && "incorrect type");		assert(value.getType() == vectorType && "incorrect type");
write = b.create<vector::TransferWriteOp>(loc, value, outputOperand->get(),		write = b.create<vector::TransferWriteOp>(loc, value, outputOperand->get(),
ValueRange{});		ValueRange{});
}		}
LDBG("vectorized op: " << *write);		LDBG("vectorized op: " << *write << "\n");
if (!write->getResults().empty())		return write;
return write->getResult(0);
return Value();
}		}
		nicolasvasilacheUnsubmitted Done Reply Inline Actions can you add a TODO to tighten op semantics so that we don't mix inbounds and mask since this is well defined? nicolasvasilache: can you add a TODO to tighten op semantics so that we don't mix inbounds and mask since this is…

// Custom vectorization precondition function type. This is intented to be used		// Custom vectorization precondition function type. This is intented to be used
// with CustomVectorizationHook. Returns success if the correpsonding custom		// with CustomVectorizationHook. Returns success if the correpsonding custom
// hook can vectorize the op.		// hook can vectorize the op.
using CustomVectorizationPrecondition =		using CustomVectorizationPrecondition =
std::function<LogicalResult(Operation *)>;		std::function<LogicalResult(Operation *)>;

// Custom vectorization function type. Produce a vector form of Operation*		// Custom vectorization function type. Produce a vector form of Operation*
// assuming all its vectorized operands are already in the BlockAndValueMapping.		// assuming all its vectorized operands are already in the BlockAndValueMapping.
// Return nullptr if the Operation cannot be vectorized.		// Return nullptr if the Operation cannot be vectorized.
using CustomVectorizationHook = std::function<VectorizationResult(		using CustomVectorizationHook = std::function<VectorizationResult(
Operation *, const BlockAndValueMapping &)>;		Operation *, const BlockAndValueMapping &)>;

/// Helper function to vectorize the terminator of a `linalgOp`. New result		/// Helper function to vectorize the terminator of a `linalgOp`. New result
/// vector values are appended to `newResults`. Return		/// vector values are appended to `newResults`. Return
/// VectorizationStatus::NoReplace to signal the vectorization algorithm that it		/// VectorizationStatus::NoReplace to signal the vectorization algorithm that it
/// should not try to map produced operations and instead return the results		/// should not try to map produced operations and instead return the results
/// using the `newResults` vector making them available to the		/// using the `newResults` vector making them available to the
/// vectorization algorithm for RAUW. This function is meant to be used as a		/// vectorization algorithm for RAUW. This function is meant to be used as a
/// CustomVectorizationHook.		/// CustomVectorizationHook.
static VectorizationResult		static VectorizationResult
vectorizeLinalgYield(OpBuilder &b, Operation *op,		vectorizeLinalgYield(OpBuilder &b, Operation *op,
const BlockAndValueMapping &bvm, LinalgOp linalgOp,		const BlockAndValueMapping &bvm, VectorizationState &state,
SmallVectorImpl<Value> &newResults) {		LinalgOp linalgOp, SmallVectorImpl<Value> &newResults) {
auto yieldOp = dyn_cast<linalg::YieldOp>(op);		auto yieldOp = dyn_cast<linalg::YieldOp>(op);
if (!yieldOp)		if (!yieldOp)
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};
for (const auto &outputs : llvm::enumerate(yieldOp.getValues())) {		for (const auto &output : llvm::enumerate(yieldOp.getValues())) {
// TODO: Scan for an opportunity for reuse.		// TODO: Scan for an opportunity for reuse.
// TODO: use a map.		// TODO: use a map.
Value vectorValue = bvm.lookup(outputs.value());		Value vectorValue = bvm.lookup(output.value());
Value newResult = buildVectorWrite(		OpOperand *opOperand = linalgOp.getDpsInitOperand(output.index());
b, vectorValue, linalgOp.getDpsInitOperand(outputs.index()));		Operation *write = buildVectorWrite(
if (newResult)		b, vectorValue, linalgOp.getDpsInitOperand(output.index()), state);
newResults.push_back(newResult);		// TODO: We mask the transfer.transfer_write here because this op is
		// special-cased. A linalg.yield may produced multiple vector.transfer_write
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: produce nicolasvasilache: nit: produce
		dcaballeAuthorUnsubmitted Done Reply Inline Actions I do not understand the implication of TODO: We mask the transfer.transfer_write here because this op is special-cased. A linalg.yield may produced multiple vector.transfer_write // ops and can't be mapped using BlockAndValueMapping. Good point. This is a comment for an old problem. I removed it and moved this code to `buildVectorWrite`. dcaballe: > I do not understand the implication of > > TODO: We mask the transfer.transfer_write here…
		// ops and can't be mapped using BlockAndValueMapping.
		AffineMap opOperandMap = linalgOp.getMatchingIndexingMap(opOperand);
		write = state.maskOperation(b, write, linalgOp, opOperandMap);
		newResults.append(write->result_begin(), write->result_end());
}		}

return VectorizationResult{VectorizationStatus::NoReplace, nullptr};		return VectorizationResult{VectorizationStatus::NoReplace, nullptr};
}		}

/// Helper function to vectorize the index operations of a `linalgOp`. Return		/// Helper function to vectorize the index operations of a `linalgOp`. Return
/// VectorizationStatus::NewOp to signal the vectorization algorithm that it		/// VectorizationStatus::NewOp to signal the vectorization algorithm that it
/// should map the produced operations. This function is meant to be used as a		/// should map the produced operations. This function is meant to be used as a
/// CustomVectorizationHook.		/// CustomVectorizationHook.
static VectorizationResult vectorizeLinalgIndex(OpBuilder &b, Operation *op,		static VectorizationResult vectorizeLinalgIndex(OpBuilder &b, Operation *op,
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
/// the `bvm` mapping. As a consequence, this function is meant to be called on		/// the `bvm` mapping. As a consequence, this function is meant to be called on
/// a topologically-sorted list of ops.		/// a topologically-sorted list of ops.
/// This function does not update `bvm` but returns a VectorizationStatus that		/// This function does not update `bvm` but returns a VectorizationStatus that
/// instructs the caller what `bvm` update needs to occur.		/// instructs the caller what `bvm` update needs to occur.
static VectorizationResult		static VectorizationResult
vectorizeOneOp(OpBuilder &b, LinalgOp linalgOp, Operation *op,		vectorizeOneOp(OpBuilder &b, LinalgOp linalgOp, Operation *op,
const BlockAndValueMapping &bvm,		const BlockAndValueMapping &bvm,
ArrayRef<CustomVectorizationHook> customVectorizationHooks) {		ArrayRef<CustomVectorizationHook> customVectorizationHooks) {
LDBG("vectorize op " << *op);		LDBG("vectorize op " << *op << "\n");

// 1. Try to apply any CustomVectorizationHook.		// 1. Try to apply any CustomVectorizationHook.
if (!customVectorizationHooks.empty()) {		if (!customVectorizationHooks.empty()) {
for (auto &customFunc : customVectorizationHooks) {		for (auto &customFunc : customVectorizationHooks) {
VectorizationResult result = customFunc(op, bvm);		VectorizationResult result = customFunc(op, bvm);
if (result.status == VectorizationStatus::Failure)		if (result.status == VectorizationStatus::Failure)
continue;		continue;
return result;		return result;
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
/// iteration space. This eager broadcasting is introduced in the		/// iteration space. This eager broadcasting is introduced in the
/// permutation_map of the vector.transfer_read operations. The eager		/// permutation_map of the vector.transfer_read operations. The eager
/// broadcasting makes it trivial to detrmine where broadcast, transposes and		/// broadcasting makes it trivial to detrmine where broadcast, transposes and
/// reductions should occur, without any bookkeeping. The tradeoff is that, in		/// reductions should occur, without any bookkeeping. The tradeoff is that, in
/// the absence of good canonicalizations, the amount of work increases.		/// the absence of good canonicalizations, the amount of work increases.
/// This is not deemed a problem as we expect canonicalizations and foldings to		/// This is not deemed a problem as we expect canonicalizations and foldings to
/// aggressively clean up the useless work.		/// aggressively clean up the useless work.
static LogicalResult		static LogicalResult
vectorizeAsLinalgGeneric(OpBuilder &b, LinalgOp linalgOp,		vectorizeAsLinalgGeneric(OpBuilder &b, VectorizationState &state,
		LinalgOp linalgOp,
SmallVectorImpl<Value> &newResults) {		SmallVectorImpl<Value> &newResults) {
		LDBG("Vectorizing operation as linalg generic\n");
Block *block = linalgOp.getBlock();		Block *block = linalgOp.getBlock();

// 2. Values defined above the region can only be broadcast for now. Make them		// 2. Values defined above the region can only be broadcast for now. Make them
// map to themselves.		// map to themselves.
BlockAndValueMapping bvm;		BlockAndValueMapping bvm;
SetVector<Value> valuesSet;		SetVector<Value> valuesSet;
mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet);		mlir::getUsedValuesDefinedAbove(linalgOp->getRegion(0), valuesSet);
bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());		bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());

if (linalgOp.getNumDpsInits() == 0)		if (linalgOp.getNumDpsInits() == 0)
return failure();		return failure();

// TODO: the common vector shape is equal to the static loop sizes only when
// all indexing maps are projected permutations. For convs and stencils the
// logic will need to evolve.
SmallVector<int64_t> commonVectorShape = linalgOp.computeStaticLoopSizes();

// 3. Turn all BBArgs into vector.transfer_read / load.		// 3. Turn all BBArgs into vector.transfer_read / load.
Location loc = linalgOp.getLoc();		Location loc = linalgOp.getLoc();
Value zero = b.create<arith::ConstantIndexOp>(loc, 0);		Value zero = b.create<arith::ConstantIndexOp>(loc, 0);
for (OpOperand *opOperand : linalgOp.getOpOperandsMatchingBBargs()) {		for (OpOperand *opOperand : linalgOp.getOpOperandsMatchingBBargs()) {
BlockArgument bbarg = linalgOp.getMatchingBlockArgument(opOperand);		BlockArgument bbarg = linalgOp.getMatchingBlockArgument(opOperand);
if (linalgOp.isScalar(opOperand)) {		if (linalgOp.isScalar(opOperand)) {
bvm.map(bbarg, opOperand->get());		bvm.map(bbarg, opOperand->get());
continue;		continue;
}		}
VectorType readType;
AffineMap map;		// Convert the indexing map for this input/output to a transfer read
		nicolasvasilacheUnsubmitted Done Reply Inline Actions nit: can we spell this as: // 3.a. Convert the indexing map for this input/output to a transfer read... ... /// 3.a.i For input reads we use the canonical vector shape. if (linalgOp.isDpsInput(opOperand)) ... } else { /// 3.a.ii For output reads (iteration-carried dependence, e.g., reductions) ... // 3.b. If masked, set in-bounds to true. ... // 3.c. Not all ops support 0-d vectors, nicolasvasilache: nit: can we spell this as: ``` // 3.a. Convert the indexing map for this input/output to a…
// TODO: can we keep this simplification?		// permutation map. For input reads we use the canonical vector shape. For
// if (linalgOp.getShape(&opOperand).empty()) {		// output reads (iteration-carried dependence, e.g., reductions), the vector
// readType = VectorType::get({}, bbarg.getType());		// shape is computed by mapping the canonical vector shape to the output
// } else {		// domain and back to the canonical domain.
if (opOperand->getOperandNumber() < linalgOp.getNumDpsInputs()) {		AffineMap indexingMap = linalgOp.getMatchingIndexingMap(opOperand);
map = inverseAndBroadcastProjectedPermutation(		AffineMap readMap;
linalgOp.getMatchingIndexingMap(opOperand));		ArrayRef<int64_t> readVecShape;
readType = VectorType::get(commonVectorShape,		if (linalgOp.isDpsInput(opOperand)) {
getElementTypeOrSelf(opOperand->get()));		readMap = inverseAndBroadcastProjectedPermutation(indexingMap);
		readVecShape = state.getCanonicalVecShape();
} else {		} else {
map = inversePermutation(		readMap = inversePermutation(reindexIndexingMap(indexingMap));
reindexIndexingMap(linalgOp.getMatchingIndexingMap(opOperand)));		readVecShape =
readType = VectorType::get(map.compose(linalgOp.getShape(opOperand)),		readMap.compose(indexingMap.compose(state.getCanonicalVecShape()));
getElementTypeOrSelf(opOperand->get()));		}
}
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I'll need to revisit all this in light of the broadcast separation. From a cursory glance this looks reasonable, let's land and iterate. nicolasvasilache: I'll need to revisit all this in light of the broadcast separation. From a cursory glance this…
// }		auto readType =
		VectorType::get(readVecShape, getElementTypeOrSelf(opOperand->get()));
auto shape = linalgOp.getShape(opOperand);		SmallVector<Value> indices(linalgOp.getShape(opOperand).size(), zero);
SmallVector<Value> indices(shape.size(), zero);
Value readValue = b.create<vector::TransferReadOp>(		// If masked, set in-bounds to true. Masking guarantees that the access will
loc, readType, opOperand->get(), indices, map);		// be in-bounds.
		SmallVector<bool> inBounds;
		if (!state.maskedDimSizeValues.empty())
		inBounds.append(readType.getRank(), true);

		Operation *read = b.create<vector::TransferReadOp>(
		loc, readType, opOperand->get(), indices, readMap,
		ArrayRef<bool>(inBounds));
		read = state.maskOperation(b, read, linalgOp, indexingMap);
		Value readValue = read->getResult(0);

// Not all ops support 0-d vectors, extract the scalar for now.		// Not all ops support 0-d vectors, extract the scalar for now.
// TODO: remove this.		// TODO: remove this.
if (readValue.getType().cast<VectorType>().getRank() == 0)		if (readValue.getType().cast<VectorType>().getRank() == 0)
readValue = b.create<vector::ExtractElementOp>(loc, readValue);		readValue = b.create<vector::ExtractElementOp>(loc, readValue);

LDBG("new vectorized bbarg(" << bbarg.getArgNumber() << "): " << readValue);		LDBG("New vectorized bbarg(" << bbarg.getArgNumber() << "): " << readValue
		<< "\n");
bvm.map(bbarg, readValue);		bvm.map(bbarg, readValue);
bvm.map(opOperand->get(), readValue);		bvm.map(opOperand->get(), readValue);
}		}

SmallVector<CustomVectorizationHook> hooks;		SmallVector<CustomVectorizationHook> hooks;
// 4a. Register CustomVectorizationHook for yieldOp.		// 4a. Register CustomVectorizationHook for yieldOp.
CustomVectorizationHook vectorizeYield =		CustomVectorizationHook vectorizeYield =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeLinalgYield(b, op, bvm, linalgOp, newResults);		return vectorizeLinalgYield(b, op, bvm, state, linalgOp, newResults);
};		};
hooks.push_back(vectorizeYield);		hooks.push_back(vectorizeYield);

// 4b. Register CustomVectorizationHook for indexOp.		// 4b. Register CustomVectorizationHook for indexOp.
CustomVectorizationHook vectorizeIndex =		CustomVectorizationHook vectorizeIndex =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeLinalgIndex(b, op, linalgOp);		return vectorizeLinalgIndex(b, op, linalgOp);
};		};
hooks.push_back(vectorizeIndex);		hooks.push_back(vectorizeIndex);

// 4c. Register CustomVectorizationHook for extractOp.		// 4c. Register CustomVectorizationHook for extractOp.
CustomVectorizationHook vectorizeExtract =		CustomVectorizationHook vectorizeExtract =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeTensorExtract(b, op, linalgOp, bvm);		return vectorizeTensorExtract(b, op, linalgOp, bvm);
};		};
hooks.push_back(vectorizeExtract);		hooks.push_back(vectorizeExtract);

// 5. Iteratively call `vectorizeOneOp` to each op in the slice.		// 5. Iteratively call `vectorizeOneOp` to each op in the slice.
for (Operation &op : block->getOperations()) {		for (Operation &op : block->getOperations()) {
VectorizationResult result = vectorizeOneOp(b, linalgOp, &op, bvm, hooks);		VectorizationResult result = vectorizeOneOp(b, linalgOp, &op, bvm, hooks);
if (result.status == VectorizationStatus::Failure) {		if (result.status == VectorizationStatus::Failure) {
LDBG("failed to vectorize: " << op);		LDBG("failed to vectorize: " << op << "\n");
return failure();		return failure();
}		}
if (result.status == VectorizationStatus::NewOp) {		if (result.status == VectorizationStatus::NewOp) {
LDBG("new vector op: " << *result.newOp;);		Operation *maybeMaskedOp = state.maskOperation(b, result.newOp, linalgOp);
bvm.map(op.getResults(), result.newOp->getResults());		LDBG("New vector op: " << *maybeMaskedOp << "\n");
		bvm.map(op.getResults(), maybeMaskedOp->getResults());
}		}
}		}

return success();		return success();
}		}

// TODO: probably need some extra checks for reduction followed by consumer		// TODO: probably need some extra checks for reduction followed by consumer
// ops that may not commute (e.g. linear reduction + non-linear instructions).		// ops that may not commute (e.g. linear reduction + non-linear instructions).
static LogicalResult reductionPreconditions(LinalgOp op) {		static LogicalResult reductionPreconditions(LinalgOp op) {
if (llvm::none_of(op.getIteratorTypesArray(), isReductionIterator)) {		if (llvm::none_of(op.getIteratorTypesArray(), isReductionIterator)) {
LDBG("reduction precondition failed: no reduction iterator");		LDBG("reduction precondition failed: no reduction iterator\n");
return failure();		return failure();
}		}
for (OpOperand *opOperand : op.getDpsInitOperands()) {		for (OpOperand *opOperand : op.getDpsInitOperands()) {
AffineMap indexingMap = op.getMatchingIndexingMap(opOperand);		AffineMap indexingMap = op.getMatchingIndexingMap(opOperand);
if (indexingMap.isPermutation())		if (indexingMap.isPermutation())
continue;		continue;

Operation *reduceOp = matchLinalgReduction(opOperand);		Operation *reduceOp = matchLinalgReduction(opOperand);
if (!reduceOp \|\| !getCombinerOpKind(reduceOp)) {		if (!reduceOp \|\| !getCombinerOpKind(reduceOp)) {
LDBG("reduction precondition failed: reduction detection failed");		LDBG("reduction precondition failed: reduction detection failed\n");
return failure();		return failure();
}		}
}		}
return success();		return success();
}		}

static LogicalResult vectorizeStaticLinalgOpPrecondition(		static LogicalResult vectorizeDynamicLinalgOpPrecondition(linalg::LinalgOp op) {
linalg::LinalgOp op,		// TODO: Only dynamic generic ops are supported for now.
ArrayRef<CustomVectorizationPrecondition> customPreconditions) {		if (!isa<GenericOp>(op) && !isa<MatmulOp>(op))
		return failure();

		// TODO: Only ops with fully dynamic tensors are supported for now.
		if (llvm::any_of(op.getOperation()->getOpOperands(),
		[](OpOperand &opOperand) {
		TensorType operandType =
		opOperand.get().getType().dyn_cast<TensorType>();
		return !operandType \|\| operandType.hasStaticShape();
		}))
		return failure();

		LDBG("Dynamically-shaped op meets vectorization pre-conditions\n");
		return success();
		}

		LogicalResult mlir::linalg::vectorizeLinalgOpPrecondition(LinalgOp linalgOp) {
		if (linalgOp.hasDynamicShape() &&
		failed(vectorizeDynamicLinalgOpPrecondition(linalgOp)))
		return failure();

		SmallVector<CustomVectorizationPrecondition> customPreconditions;

		// Register CustomVectorizationPrecondition for extractOp.
		customPreconditions.push_back(tensorExtractVectorizationPrecondition);

// All types in the body should be a supported element type for VectorType.		// All types in the body should be a supported element type for VectorType.
for (Operation &innerOp : op->getRegion(0).front()) {		for (Operation &innerOp : linalgOp->getRegion(0).front()) {
// Check if any custom hook can vectorize the inner op.		// Check if any custom hook can vectorize the inner op.
if (llvm::any_of(		if (llvm::any_of(
customPreconditions,		customPreconditions,
[&](const CustomVectorizationPrecondition &customPrecondition) {		[&](const CustomVectorizationPrecondition &customPrecondition) {
return succeeded(customPrecondition(&innerOp));		return succeeded(customPrecondition(&innerOp));
})) {		})) {
continue;		continue;
}		}
if (llvm::any_of(innerOp.getOperandTypes(), [](Type type) {		if (llvm::any_of(innerOp.getOperandTypes(), [](Type type) {
return !VectorType::isValidElementType(type);		return !VectorType::isValidElementType(type);
})) {		})) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions ok as a first appox. nicolasvasilache: ok as a first appox.
return failure();		return failure();
}		}
if (llvm::any_of(innerOp.getResultTypes(), [](Type type) {		if (llvm::any_of(innerOp.getResultTypes(), [](Type type) {
return !VectorType::isValidElementType(type);		return !VectorType::isValidElementType(type);
})) {		})) {
return failure();		return failure();
}		}
}		}
if (isElementwise(op))		if (isElementwise(linalgOp))
return success();		return success();
// TODO: isaConvolutionOpInterface that can also infer from generic features.		// TODO: isaConvolutionOpInterface that can also infer from generic features.
// But we will still need stride/dilation attributes that will be annoying to		// But we will still need stride/dilation attributes that will be annoying to
// reverse-engineer...		// reverse-engineer...
if (isa<ConvolutionOpInterface>(op.getOperation()))		if (isa<ConvolutionOpInterface>(linalgOp.getOperation()))
return success();		return success();
// TODO: the common vector shape is equal to the static loop sizes only when		// TODO: the common vector shape is equal to the static loop sizes only when
// all indexing maps are projected permutations. For convs and stencils the		// all indexing maps are projected permutations. For convs and stencils the
// logic will need to evolve.		// logic will need to evolve.
if (!allIndexingsAreProjectedPermutation(op)) {		if (!allIndexingsAreProjectedPermutation(linalgOp)) {
LDBG("precondition failed: not projected permutations");		LDBG("precondition failed: not projected permutations\n");
return failure();		return failure();
}		}
if (failed(reductionPreconditions(op))) {		if (failed(reductionPreconditions(linalgOp))) {
LDBG("precondition failed: reduction preconditions");		LDBG("precondition failed: reduction preconditions\n");
return failure();		return failure();
}		}
return success();		return success();
}		}

LogicalResult mlir::linalg::vectorizeLinalgOpPrecondition(LinalgOp linalgOp) {		LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, LinalgOp linalgOp,
// All types must be static shape to go to vector.		ArrayRef<int64_t> vecSizesForMaskedDims) {
if (linalgOp.hasDynamicShape()) {		LDBG("Attempting to vectorize:\n" << linalgOp << "\n");
LDBG("precondition failed: dynamic shape");		if (failed(vectorizeLinalgOpPrecondition(linalgOp)))
return failure();		return failure();
}

SmallVector<CustomVectorizationPrecondition> customPreconditions;		// Initialize vectorization state.
		VectorizationState state;
// Register CustomVectorizationPrecondition for extractOp.		if (failed(state.initState(rewriter, linalgOp, vecSizesForMaskedDims))) {
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can we make this init the state as part of the precondition? nicolasvasilache: Can we make this init the state as part of the precondition?
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Probably better to separate the concerns. I think even part of the precondition checks are reused outside of the vectorizer. A public interface was introduced recently. dcaballe: Probably better to separate the concerns. I think even part of the precondition checks are…
customPreconditions.push_back(tensorExtractVectorizationPrecondition);		LDBG("Vectorization state couldn't be initialized\n");

return vectorizeStaticLinalgOpPrecondition(linalgOp, customPreconditions);
}

LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter,
LinalgOp linalgOp) {
if (failed(vectorizeLinalgOpPrecondition(linalgOp)))
return failure();		return failure();
		}

SmallVector<Value> results;		SmallVector<Value> results;
// TODO: isaConvolutionOpInterface that can also infer from generic		// TODO: isaConvolutionOpInterface that can also infer from generic
// features. Will require stride/dilation attributes inference.		// features. Will require stride/dilation attributes inference.
FailureOr<Operation *> convOr = vectorizeConvolution(rewriter, linalgOp);		FailureOr<Operation *> convOr = vectorizeConvolution(rewriter, linalgOp);
if (succeeded(convOr)) {		if (succeeded(convOr)) {
llvm::append_range(results, (*convOr)->getResults());		llvm::append_range(results, (*convOr)->getResults());
} else {		} else {
if (failed(vectorizeLinalgOpPrecondition(linalgOp)))		if (failed(vectorizeLinalgOpPrecondition(linalgOp)))
return failure();		return failure();
LDBG("Vectorize generic by broadcasting to a common shape: " << linalgOp);		LDBG("Vectorize generic by broadcasting to a common shape: \n"
if (failed(vectorizeAsLinalgGeneric(rewriter, linalgOp, results)))		<< linalgOp << "\n");
		// TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted to
		// 'OpBuilder' when it is passed over to some methods like
		// 'vectorizeAsLinalgGeneric'. This is highly problematic: if we erase an op
		// within these methods, the actual rewriter won't be notified and we will
		// end up with read-after-free issues!
		if (failed(vectorizeAsLinalgGeneric(rewriter, state, linalgOp, results)))
return failure();		return failure();
}		}

if (!results.empty())		if (!results.empty())
rewriter.replaceOp(linalgOp, results);		rewriter.replaceOp(linalgOp, results);
else		else
rewriter.eraseOp(linalgOp);		rewriter.eraseOp(linalgOp);

▲ Show 20 Lines • Show All 479 Lines • ▼ Show 20 Lines
/// Check whether there is any interleaved use of any `values` between		/// Check whether there is any interleaved use of any `values` between
/// `firstOp` and `secondOp`. Conservatively return `true` if any op or value		/// `firstOp` and `secondOp`. Conservatively return `true` if any op or value
/// is in a different block.		/// is in a different block.
static bool mayExistInterleavedUses(Operation firstOp, Operation secondOp,		static bool mayExistInterleavedUses(Operation firstOp, Operation secondOp,
ValueRange values) {		ValueRange values) {
if (firstOp->getBlock() != secondOp->getBlock() \|\|		if (firstOp->getBlock() != secondOp->getBlock() \|\|
!firstOp->isBeforeInBlock(secondOp)) {		!firstOp->isBeforeInBlock(secondOp)) {
LDBG("interleavedUses precondition failed, firstOp: "		LDBG("interleavedUses precondition failed, firstOp: "
<< firstOp << ", second op: " << secondOp);		<< firstOp << ", second op: " << secondOp << "\n");
return true;		return true;
}		}
for (auto v : values) {		for (auto v : values) {
for (auto &u : v.getUses()) {		for (auto &u : v.getUses()) {
Operation *owner = u.getOwner();		Operation *owner = u.getOwner();
if (owner == firstOp \|\| owner == secondOp)		if (owner == firstOp \|\| owner == secondOp)
continue;		continue;
// TODO: this is too conservative, use dominance info in the future.		// TODO: this is too conservative, use dominance info in the future.
if (owner->getBlock() == firstOp->getBlock() &&		if (owner->getBlock() == firstOp->getBlock() &&
(owner->isBeforeInBlock(firstOp) \|\| secondOp->isBeforeInBlock(owner)))		(owner->isBeforeInBlock(firstOp) \|\| secondOp->isBeforeInBlock(owner)))
continue;		continue;
LDBG(" found interleaved op " << owner << ", firstOp: " << firstOp		LDBG(" found interleaved op " << owner << ", firstOp: " << firstOp
<< ", second op: " << *secondOp);		<< ", second op: " << *secondOp << "\n");
return true;		return true;
}		}
}		}
return false;		return false;
}		}

/// Return the unique subview use of `v` if it is indeed unique, null		/// Return the unique subview use of `v` if it is indeed unique, null
/// otherwise.		/// otherwise.
Show All 19 Lines	if (xferOp.getMask())
return failure();		return failure();

// Transfer into `view`.		// Transfer into `view`.
Value viewOrAlloc = xferOp.getSource();		Value viewOrAlloc = xferOp.getSource();
if (!viewOrAlloc.getDefiningOp<memref::ViewOp>() &&		if (!viewOrAlloc.getDefiningOp<memref::ViewOp>() &&
!viewOrAlloc.getDefiningOp<memref::AllocOp>())		!viewOrAlloc.getDefiningOp<memref::AllocOp>())
return failure();		return failure();

LDBG(viewOrAlloc);		LDBG(viewOrAlloc << "\n");

// Ensure there is exactly one subview of `viewOrAlloc` defining `subView`.		// Ensure there is exactly one subview of `viewOrAlloc` defining `subView`.
memref::SubViewOp subViewOp = getSubViewUseIfUnique(viewOrAlloc);		memref::SubViewOp subViewOp = getSubViewUseIfUnique(viewOrAlloc);
if (!subViewOp)		if (!subViewOp)
return failure();		return failure();
Value subView = subViewOp.getResult();		Value subView = subViewOp.getResult();
LDBG("with subView " << subView);		LDBG("with subView " << subView << "\n");

// Find the copy into `subView` without interleaved uses.		// Find the copy into `subView` without interleaved uses.
memref::CopyOp copyOp;		memref::CopyOp copyOp;
for (auto &u : subView.getUses()) {		for (auto &u : subView.getUses()) {
if (auto newCopyOp = dyn_cast<memref::CopyOp>(u.getOwner())) {		if (auto newCopyOp = dyn_cast<memref::CopyOp>(u.getOwner())) {
assert(newCopyOp.getTarget().getType().isa<MemRefType>());		assert(newCopyOp.getTarget().getType().isa<MemRefType>());
if (newCopyOp.getTarget() != subView)		if (newCopyOp.getTarget() != subView)
continue;		continue;
LDBG("copy candidate " << *newCopyOp);		LDBG("copy candidate " << *newCopyOp << "\n");
if (mayExistInterleavedUses(newCopyOp, xferOp, {viewOrAlloc, subView}))		if (mayExistInterleavedUses(newCopyOp, xferOp, {viewOrAlloc, subView}))
continue;		continue;
copyOp = newCopyOp;		copyOp = newCopyOp;
break;		break;
}		}
}		}
if (!copyOp)		if (!copyOp)
return failure();		return failure();
LDBG("with copy " << *copyOp);		LDBG("with copy " << *copyOp << "\n");

// Find the fill into `viewOrAlloc` without interleaved uses before the		// Find the fill into `viewOrAlloc` without interleaved uses before the
// copy.		// copy.
FillOp maybeFillOp;		FillOp maybeFillOp;
for (auto &u : viewOrAlloc.getUses()) {		for (auto &u : viewOrAlloc.getUses()) {
if (auto newFillOp = dyn_cast<FillOp>(u.getOwner())) {		if (auto newFillOp = dyn_cast<FillOp>(u.getOwner())) {
assert(newFillOp.output().getType().isa<MemRefType>());		assert(newFillOp.output().getType().isa<MemRefType>());
if (newFillOp.output() != viewOrAlloc)		if (newFillOp.output() != viewOrAlloc)
continue;		continue;
LDBG("fill candidate " << *newFillOp);		LDBG("fill candidate " << *newFillOp << "\n");
if (mayExistInterleavedUses(newFillOp, copyOp, {viewOrAlloc, subView}))		if (mayExistInterleavedUses(newFillOp, copyOp, {viewOrAlloc, subView}))
continue;		continue;
maybeFillOp = newFillOp;		maybeFillOp = newFillOp;
break;		break;
}		}
}		}
// Ensure padding matches.		// Ensure padding matches.
if (maybeFillOp && xferOp.getPadding() != maybeFillOp.value())		if (maybeFillOp && xferOp.getPadding() != maybeFillOp.value())
return failure();		return failure();
if (maybeFillOp)		if (maybeFillOp)
LDBG("with maybeFillOp " << *maybeFillOp);		LDBG("with maybeFillOp " << *maybeFillOp << "\n");

// `in` is the subview that memref.copy reads. Replace it.		// `in` is the subview that memref.copy reads. Replace it.
Value in = copyOp.getSource();		Value in = copyOp.getSource();

// memref.copy + linalg.fill can be used to create a padded local buffer.		// memref.copy + linalg.fill can be used to create a padded local buffer.
// The `masked` attribute is only valid on this padded buffer.		// The `masked` attribute is only valid on this padded buffer.
// When forwarding to vector.transfer_read, the attribute must be reset		// When forwarding to vector.transfer_read, the attribute must be reset
// conservatively.		// conservatively.
▲ Show 20 Lines • Show All 612 Lines • Show Last 20 Lines

mlir/lib/IR/AffineMap.cpp

	Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines
	unsigned AffineMap::getPermutedPosition(unsigned input) const {			unsigned AffineMap::getPermutedPosition(unsigned input) const {
	assert(isPermutation() && "invalid permutation request");			assert(isPermutation() && "invalid permutation request");
	for (unsigned i = 0, numResults = getNumResults(); i < numResults; i++)			for (unsigned i = 0, numResults = getNumResults(); i < numResults; i++)
	if (getDimPosition(i) == input)			if (getDimPosition(i) == input)
	return i;			return i;
	llvm_unreachable("incorrect permutation request");			llvm_unreachable("incorrect permutation request");
	}			}

				/// Extracts the permuted position where the given input index resides.
				/// Returns `llvm::None` if the input index is projected. Fails when called on
				nicolasvasilacheUnsubmitted Done Reply Inline Actions `Fails when called on a non-projected-permutation.` is misleading here. It expects a projected permutation otherwise it crashes. Failing would have returned llvm::None without crashing unless I am missing something? nicolasvasilache: `Fails when called on a non-projected-permutation.` is misleading here. It expects a…
				/// a non-projected-permutation.
				Optional<unsigned>
				AffineMap::getProjectedPermutationPermutedPosition(unsigned input) const {
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Better name and doc please, this is much too confusing. In fact I think you can just do something like `llvm::find(map.getResults(), AffineDimExpr::get(input))` at the client and avoid adding more APIs to AffineMap nicolasvasilache: Better name and doc please, this is much too confusing. In fact I think you can just do…
				dcaballeAuthorUnsubmitted Done Reply Inline Actions I had renamed this like 10 times. It's a difficult name. Hopefully it's better now :). dcaballe: I had renamed this like 10 times. It's a difficult name. Hopefully it's better now :).
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Wait .. what do I see just above .. literally the same functionality modulo an assert .. Can we just have a single Optional<int64_t> AffineMap::getResultPosition(AffineExpr e) const { for (int64_t i = 0, numResults = getNumResults(); i < numResults; i++) if (getResult(i) == e) return i; return llvm::None; } and let clients do the assertions they want ? It seems very counterproductive to have all these special case functions with slightly varying assertions and hard to grok names .. nicolasvasilache: Wait .. what do I see just above .. literally the same functionality modulo an assert .. Can…
				dcaballeAuthorUnsubmitted Done Reply Inline Actions Extracted this to https://reviews.llvm.org/D138946 dcaballe: Extracted this to https://reviews.llvm.org/D138946
				assert(isProjectedPermutation() && "invalid projected permutation request");
				for (unsigned i = 0, numResults = getNumResults(); i < numResults; i++)
				if (getDimPosition(i) == input)
				return i;
				return llvm::None;
				}

	/// Folds the results of the application of an affine map on the provided			/// Folds the results of the application of an affine map on the provided
	/// operands to a constant if possible. Returns false if the folding happens,			/// operands to a constant if possible. Returns false if the folding happens,
	/// true otherwise.			/// true otherwise.
	LogicalResult			LogicalResult
	AffineMap::constantFold(ArrayRef<Attribute> operandConstants,			AffineMap::constantFold(ArrayRef<Attribute> operandConstants,
	SmallVectorImpl<Attribute> &results) const {			SmallVectorImpl<Attribute> &results) const {
	// Attempt partial folding.			// Attempt partial folding.
	SmallVector<int64_t, 2> integers;			SmallVector<int64_t, 2> integers;
	▲ Show 20 Lines • Show All 436 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s

// -----

// CHECK-LABEL: contraction_dot		// CHECK-LABEL: contraction_dot
func.func @contraction_dot(%A: memref<1584xf32>, %B: memref<1584xf32>, %C: memref<f32>) {		func.func @contraction_dot(%A: memref<1584xf32>, %B: memref<1584xf32>, %C: memref<f32>) {

// CHECK: arith.mulf %{{.}}, %{{.}} : vector<1584xf32>		// CHECK: arith.mulf %{{.}}, %{{.}} : vector<1584xf32>
// CHECK: vector.multi_reduction <add>, %{{.}}, {{.}} [0] : vector<1584xf32> to f32		// CHECK: vector.multi_reduction <add>, %{{.}}, {{.}} [0] : vector<1584xf32> to f32
linalg.dot ins(%A, %B: memref<1584xf32>, memref<1584xf32>)		linalg.dot ins(%A, %B: memref<1584xf32>, memref<1584xf32>)
outs(%C: memref<f32>)		outs(%C: memref<f32>)
return		return
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	indexing_maps = [
affine_map<(m, n, k) -> (k, n)>,		affine_map<(m, n, k) -> (k, n)>,
affine_map<(m, n, k) -> (n, m)>		affine_map<(m, n, k) -> (n, m)>
],		],
iterator_types = ["parallel", "parallel", "reduction"]		iterator_types = ["parallel", "parallel", "reduction"]
}		}

// CHECK-LABEL: func @generic_output_transpose		// CHECK-LABEL: func @generic_output_transpose
func.func @generic_output_transpose(%A: memref<8x16xf32>, %B: memref<16x32xf32>,		func.func @generic_output_transpose(%A: memref<8x16xf32>, %B: memref<16x32xf32>,
%C: memref<32x8xf32>) {		%C: memref<32x8xf32>) {
// CHECK: vector.transfer_read %{{.*}} : memref<8x16xf32>, vector<8x32x16xf32>		// CHECK: vector.transfer_read %{{.*}} : memref<8x16xf32>, vector<8x32x16xf32>
// CHECK: vector.transfer_read %{{.*}} : memref<16x32xf32>, vector<8x32x16xf32>		// CHECK: vector.transfer_read %{{.*}} : memref<16x32xf32>, vector<8x32x16xf32>
// CHECK: %[[ACC:.]] = vector.transfer_read %{{.}} : memref<32x8xf32>, vector<8x32xf32>		// CHECK: %[[ACC:.]] = vector.transfer_read %{{.}} : memref<32x8xf32>, vector<8x32xf32>
// CHECK: %[[MUL:.]] = arith.mulf %{{.}}, %{{.*}} : vector<8x32x16xf32>		// CHECK: %[[MUL:.]] = arith.mulf %{{.}}, %{{.*}} : vector<8x32x16xf32>
// CHECK: %[[R:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[ACC]] [2] : vector<8x32x16xf32> to vector<8x32xf32>		// CHECK: %[[R:.*]] = vector.multi_reduction <add>, %[[MUL]], %[[ACC]] [2] : vector<8x32x16xf32> to vector<8x32xf32>
// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xf32>, memref<32x8xf32>		// CHECK: vector.transfer_write %{{.}}, %{{.}} : vector<8x32xf32>, memref<32x8xf32>
linalg.generic #matmul_transpose_out_trait		linalg.generic #matmul_transpose_out_trait
ins(%A, %B : memref<8x16xf32>, memref<16x32xf32>)		ins(%A, %B : memref<8x16xf32>, memref<16x32xf32>)
▲ Show 20 Lines • Show All 1,449 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Initial masking support in Linalg vectorizerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 474162

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/include/mlir/IR/AffineMap.h

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/lib/IR/AffineMap.cpp

mlir/test/Dialect/Linalg/vectorization.mlir

[mlir][Vector] Initial masking support in Linalg vectorizer
ClosedPublic