This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
7/27
Vectorization.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
1
vectorization.mlir

Differential D137660

[MLIR] Vectorize tensor.extract on n-D tensor
ClosedPublic

Authored by awarzynski on Nov 8 2022, 11:41 AM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
dcaballe
pzread
bsmith
rsuderman

Commits

rGc181f21ac71f: [MLIR] Vectorize tensor.extract on n-D tensor (n >= 2)

Summary

This patch implements the vectorization of tensor.extract for arbitrary
tensors. It basically extends https://reviews.llvm.org/D133786 by adding
support for n-D tensors. This works by basically flattening the
indices.

While this patch allows for more cases of tensor.extract to be
vectorised, performance on the workloads that we tested
either regressed or didn't improve. For this reason, new functionality
is hidden behind a global command-line flag:

-linalg-vectorize-n-d-extract.

We may want remove it once the implementation is refined and we
are happy with the performance.

Extra logic in the Linalg vectorizer is added to support the new
command line option.

Related discussion: https://github.com/iree-org/iree/issues/9198

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

awarzynski created this revision.Nov 8 2022, 11:41 AM

Herald added a reviewer: aartbik. · View Herald TranscriptNov 8 2022, 11:41 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 21 others. · View Herald Transcript

awarzynski requested review of this revision.Nov 8 2022, 11:41 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptNov 8 2022, 11:41 AM

Herald added subscribers: • pcwang-thead, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Herald added a reviewer: dcaballe. · View Herald TranscriptNov 8 2022, 11:41 AM

Herald added a project: Restricted Project. · View Herald Transcript

awarzynski added a reviewer: pzread.Nov 8 2022, 11:43 AM

awarzynski added a reviewer: bsmith.Nov 8 2022, 11:55 AM

Harbormaster completed remote builds in B196758: Diff 474063.Nov 8 2022, 11:58 AM

rsuderman requested changes to this revision.Nov 15 2022, 6:47 PM

rsuderman added a subscriber: rsuderman.

rsuderman added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
397	You can initially construct this using `numIndices` for the size and create the `arith::ConstantIndexOp` with value `0`. This avoids having branching behavior depending on the number of indices.
398	Make `const auto` or `const int64_t`.
404	This line will no longer be needed.
418	You actually don't need this saved to a vector. You only need an accumulator for mul-add. (More details below).
427	Ditto about needing a vector. You can iterate on values.
441	You can simplify the two loops below roughly with the IR equivalent below: int64_t index[N]; int64_t shape[N]; int64_t offset = index[N - 1]; for (int i = N - 1; i > 0; --i ){ offset = offset * shape[i] + index[i - 1] } It still requires the broadcast work but avoids the extraneous mul-by-1 and extra operations that are just cleaned up. It also avoids computing all the slice sizes when we just care about the global offset.
473	If you make the change to the gatherIndices declaration you can delete this entire loop.

This revision now requires changes to proceed.Nov 15 2022, 6:47 PM

Herald added a subscriber: hanchung. · View Herald TranscriptNov 15 2022, 6:47 PM

Simplify as per comments from @rsuderman

Thank you for taking a look, Rob! Much appreciated comments and suggestions.

Herald added a subscriber: jsetoain. · View Herald TranscriptNov 21 2022, 10:26 AM

Harbormaster completed remote builds in B198819: Diff 476941.Nov 21 2022, 5:58 PM

Thanks for working on this and addressing the feedback! LGTM. Just some minor comments!

Also wondering if we should land this under a flag (disable by default) as we know performance will degrade until we have some gather optimizations in place.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
398	nit: gatherIndices -> baseIndices? Clearer?
406	Sorry, missing something probably trivial but, where is the transpose being generated?
409	Instead of special-casing this for numIndices == 1... would it be possible to have the logic below working for just one index (and do mostly nothing)? It looks like having one index should be a base case of that code.
431	There might be a few improvements related to the offset computation. Would it make sense to move all of this to self-contained utility function? It's very likely that we play with different approaches to compute the offsets and having a that isolated into a utility function would help.

Thanks for taking a look!

In D137660#3944636, @dcaballe wrote:

Also wondering if we should land this under a flag (disable by default) as we know performance will degrade until we have some gather optimizations in place.

I agree, but will need some pointers :) The test runs TestTransformDialectInterpreterPass, but I doubt that's the pass I'd want to update here.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
406	`vector.transpose` is inserted in vectorizeLinalgIndex, i.e. prior to entering this method. Note that we only need to broadcast the trailing index as that's the only one that has not been broadcast yet. ATM, I'm broadcasting "everything" to keep the code simple and rely on subsequent optimisations to remove redundant broadcasts. I'll update the broadcasting part to make this more explicit. I suggest that we look at removing `vector.transpose` in a separate patch. I admit that I've not looked at all. WDTY?
409	Good point, done!

Implement suggestions from @dcaballe

moved offset calculation to a dedicated method calculateOffsetForGLoad (perhaps there's a better name?)
made sure that we only add broadcast for the trainling index (other indices have already been broadcast by vectorizeLinalgIndex)
renamed gatherIndices as baseIndices

Harbormaster completed remote builds in B199189: Diff 477477.Nov 23 2022, 6:02 AM

Thanks for addressing the feedback! I'm ok with landing under a flag after addressing the feedback, as stepping stone towards the final n-D gather solution. We can build on top of this iteratively.

I agree, but will need some pointers :) The test runs TestTransformDialectInterpreterPass, but I doubt that's the pass I'd want to update here.

We can use an global llvm::opt flag as this flag is temporary that will go away. The other options is to add a bool parameter to vectorize but we would have to extend the Transform dialect, etc. to provide the right value. I don't think it's worth it given that this will go away.

(perhaps there's a better name?)

calculateGatherOffset?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
406	SG, thanks!

dcaballe accepted this revision.Nov 23 2022, 1:23 PM

Hide the new functionality behind an llvm::cl option

Added a new global command line option for enabling the vectorisaiton of
tensor.extract when the input is an n-D tensor:
--linalg-vectorize-n-d-extract. The default value is "off", which means that
this patch will not change the default behaviour. I'll update the summary of
this change shortly.

Given that this is the Thanksgiving week, I'll wait until next week before
landing this. This way our friends celebrating get an opportunity to take
another look before I merge this. I will also add some reviewers that might
have an opinion about the new global options.##

Harbormaster completed remote builds in B199421: Diff 477781.Nov 24 2022, 7:26 AM

awarzynski edited the summary of this revision. (Show Details)Nov 24 2022, 7:29 AM

Herald added a subscriber: limo1996. · View Herald TranscriptNov 24 2022, 7:29 AM

Please hold off with this, I am making significant changes to vectorization.
I hope I can share something tomorrow.

This revision now requires changes to proceed.Nov 24 2022, 7:52 AM

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

In D137660#3951094, @nicolasvasilache wrote:

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

Thanks for the heads-up and for sharing! Those look like some really nice improvements, thank you!

I definitely would prefer to have this in sooner rather than later, but also don't mind waiting and definitely don't want to make anyone's life harder :) Shall I wait another week? Would that be sufficient for you?

In D137660#3951361, @awarzynski wrote:

In D137660#3951094, @nicolasvasilache wrote:

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

Thanks for the heads-up and for sharing! Those look like some really nice improvements, thank you!

I definitely would prefer to have this in sooner rather than later, but also don't mind waiting and definitely don't want to make anyone's life harder :) Shall I wait another week? Would that be sufficient for you?

Thanks!

This is definitely a long-standing simplification for at least 2 years, now is the right time to do it before too many things are added on top.

I estimate 1 week should be good, the main blocker atm is that I have no easy way of turning downstream IREE errors into actual test cases here.

In D137660#3951561, @nicolasvasilache wrote:

In D137660#3951361, @awarzynski wrote:

In D137660#3951094, @nicolasvasilache wrote:

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

Thanks for the heads-up and for sharing! Those look like some really nice improvements, thank you!

I definitely would prefer to have this in sooner rather than later, but also don't mind waiting and definitely don't want to make anyone's life harder :) Shall I wait another week? Would that be sufficient for you?

Thanks!

This is definitely a long-standing simplification for at least 2 years, now is the right time to do it before too many things are added on top.

I estimate 1 week should be good, the main blocker atm is that I have no easy way of turning downstream IREE errors into actual test cases here.

We iterated with @dcaballe offline. I can remove that blocker and reshuffle things on my end.

However, the CL option needs to be reworked.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52	This is problematic, you shouldn't expect that any of this is run in a pass pipeline. You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead.
mlir/test/Dialect/Linalg/vectorization.mlir
1	This is problematic, you shouldn't expect that any of this is run in a pass pipeline. You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead.

pzread added inline comments.Nov 28 2022, 10:25 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
315–316	nit: This comment can be removed.
354	nit: I think it's clearer and safer to write: extractOp.getTensor().getType() Also do we check if the dim is dynamic? If it has been checked in another place, we can have an assert here.

awarzynski added inline comments.Nov 28 2022, 11:52 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52	This is problematic, you shouldn't expect that any of this is run in a pass pipeline. Isn't a global command-line option orthogonal to whether this functionality is wrapped in a pass or not? You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead. I feel that this would only be sufficient for testing/experimenting with the Transform dialect. But I'd like more than that :) I'm not that familiar with the Transform dialect and might be missing something obvious - please let me know! I'm just about send and updated that preserves this global command-line flag and adds a Transform dialect attribute. Is that what you are suggesting?
354	nit: I think it's clearer and safer to write: Thanks for the suggestion, I'll update this in the next revision! Also do we check if the dim is dynamic? If it has been checked in another place, we can have an assert here. It is checked here. Should I add an assert regardless?

Address PR comments

Added an attribute to VectorizeOp so that the vectorization can be controlled from the Transfer dialect level. @nicolasvasilache, is that what you had in mind?

Thanks for all the comments so far!

dcaballe added inline comments.Nov 28 2022, 12:07 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52	I suggested that we could go with a global flag as this is a temporary flag that will be removed once we stabilize the performance of the gather. I actually wanted to avoid having to extend the transform dialect just for this temporary thing but happy to have it working with the transform dialect as well. However, it's not clear to me why we need a struct (`MLIRLinalgVectorizerOptions`) and all the `MLIRLinalgVectorizerOptions` and `MLIRLinalgVectorizerOptions` related code. Wouldn't just adding the `llvm::cl` option work?

Harbormaster completed remote builds in B199819: Diff 478309.Nov 28 2022, 12:46 PM

awarzynski added inline comments.Nov 28 2022, 1:52 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52	However, it's not clear to me why we need a struct (MLIRLinalgVectorizerOptions) and all the MLIRLinalgVectorizerOptions and MLIRLinalgVectorizerOptions related code. Wouldn't just adding the llvm::cl option work? Without a struct we would have a global destructor (for the `vectorizeNDExtract`), and that's not an option, see https://github.com/llvm/llvm-project/blob/d620bae999465f4a418c38b6451b11d80523c6de/mlir/lib/CMakeLists.txt#L1-L2 :) This approach is actually fairly common these days (see how things are set-up in mlir-opt - it allows you to control when particular sets of options are exposed. Otherwise, all "global" options would be available at all times.

Ping.

Thanks for adding support for the Transform dialect! LGTM. I think we should be able to land this. Does anybody else have any comments?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52	Thanks! You are definitely more up-to-date on this than I am.
340	`const SmallVector<int64, 4> &` -> `const SmallVectorImpl<int64_t> &`

dcaballe accepted this revision.Dec 5 2022, 9:35 AM

rsuderman accepted this revision.Dec 5 2022, 11:11 AM

nicolasvasilache added inline comments.Dec 5 2022, 11:48 AM

mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
269 ↗	(On Diff #478309)	oof this looks like a very unwelcome dependence at this level .. @rriddle , any comments / suggestions here ?

nicolasvasilache requested changes to this revision.Dec 5 2022, 11:49 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
53	This leaks badly all the way up to the main binary.. I'd like to see this discussed.

This revision now requires changes to proceed.Dec 5 2022, 11:49 AM

nicolasvasilache added inline comments.Dec 5 2022, 11:54 AM

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1724 ↗	(On Diff #478309)	thanks for adding this
1779 ↗	(On Diff #478309)	why not have this work exactly the same way as the code 2 lines above?
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
60	If you really want a CL option, you could add a test pass (we don't want to start maintaining a pass with heuristics and perf regressions that people would depend on). But I think the value of that is limited these days.

I suggested a global flag to avoid having to extend the Transform dialect, as the flag will go away in a few weeks. Since it's already supported in the Transform dialect, I think we could remove the global flag and add a bool option to the vectorize method. That should work for both, the Transform dialect and any other users of the vectorizer.

I see that the global option is proving more controversial than we anticipated :)

@nicolasvasilache, we don't really need the changes in mlir-opt (@dcaballe ?), but I'd like to be able to control this vectorization through other tools ( iree-compile in my case). AFAIK, a transfer dialect attribute is not sufficient for this (perhaps I'm missing something?). Global options is what folks within LLVM have been using for this sort of things, so perhaps this is not too bad?

In D137660#3972178, @dcaballe wrote:

Since it's already supported in the Transform dialect, I think we could remove the global flag and add a bool option to the vectorize method. That should work for both, the Transform dialect and any other users of the vectorizer.

This actually a bit tricky. I'd need to find a way to pass it to tensorExtractVectorizationPrecondition, but it only accepts one argument (it's type is CustomVectorizationPrecondition). Also, this would mean adding a bool argument in quite a few places, so not so tidy. I'll try again tomorrow.

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1779 ↗	(On Diff #478309)	You know this stuff much better than I do, so please point me in the right direction if I'm talking nonsense. Basically, hooks for vectorising `tensor.pad` and `tensor.extract` are registered differently ATM: `tensor.pad` goes through the OpRewritePattern API, for `tensor.extract` we just use a CustomVectorizationHook (added here). I wanted to avoid refactoring, hence this slight inconsistency how `tensor.pad` and `tensor.extract` are treated. Do you reckon that we should go via `OpRewritePattern` for `tensor.extract` instead? If yes, then presumably we'd want that in a separate patch?
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
53	This leaks badly all the way up to the main binary Only if you register it through `linalg::registerLinalgVectorizerCLOptions();` like I did in MLIROptMain.cpp. I'd like to see this discussed. Are you concerned about `mlir-opt` specifically? Because I don't really need this in `mlir-opt`. I use `iree-compile` instead, but that's orthogonal here.

Revert changes from mlir-opt

The changes in mlir-opt are not needed for this patch to work. I'm leaving
the llvm::cl option for now, as even extending vectorize to accept additional
bool (as per Diego's sugestion) would require adding a global variable.
Otherwise, we'd need to propagage that bool to either
tensorExtractVectorizationPrecondition or VectorizeTensorExtract, but
that's not possible (i.e. we can't really change the corresponding interfaces).

ALso rebased on top of main and replaced SmallVector with SmallVectorImpl.

Harbormaster completed remote builds in B201639: Diff 480813.Dec 7 2022, 11:14 AM

nicolasvasilache added inline comments.Dec 8 2022, 2:26 PM

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1779 ↗	(On Diff #478309)	I see, I mistook the fact that you are vectorizing a `tensor.xx` op for what you are actually doing, which is vectorizing something new within the Linalg body and injecting an orthogonal control to enable it. The easy way forward is to pass the boolean to the pattern constructor and to the vectorize function, feel free to wrap that in an options struct (i.e. your LinalgVectorizerOptions). If you don't want to pass that forward to `vectorizeAsLinalgGeneric`, feel free to hoist the hooks registration one level which will also make them more visible and documented one level higher. Despite passing the information in 3 places, this is infinitely better than an orthogonal control injection with a global, this definitely feels like an abuse to me. Doing a quick grep in MLIR I do not see any such use and I am definitely not looking at having Linalg introduce such a mechanism, sorry. If you feel strongly about this please start and RFC on discourse to gather more feedback.

Removed the llvm::cl logic

I removed all the logic related to llvm::cl and replaced it entirely with:

transfer dialect attribute (vectorize_nd_extract)
extra bool option passed to vectorize

@nicolasvasilache, I didn't really need the global, it just happened to be an interface that I am very familiar with. I'm new to this area and it's not always obvious which approach would be the least frictionless. This approach works for me, thanks for pointing me in the right direction!

I am not too happy about the extra argument in vectorizeLinalgOpPrecondition that's very specific that what I'm trying to do for tensor.extract. But equally, this hook is currently only used for tensor.extract.

Harbormaster completed remote builds in B202386: Diff 481852.Dec 10 2022, 6:58 AM

thanks!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
36	spurious leftover include

This revision is now accepted and ready to land.Dec 11 2022, 7:59 AM

Closed by commit rGc181f21ac71f: [MLIR] Vectorize tensor.extract on n-D tensor (n >= 2) (authored by awarzynski). · Explain WhyDec 12 2022, 1:35 AM

This revision was automatically updated to reflect the committed changes.

awarzynski added a commit: rGc181f21ac71f: [MLIR] Vectorize tensor.extract on n-D tensor (n >= 2).

awarzynski mentioned this in D140781: [mlir] Broadcast scalars when vectorising tensor.extract.Dec 30 2022, 7:46 AM

awarzynski mentioned this in D141998: [mlir][linalg] Vectorize tensor.extract using contiguous loads.Jan 18 2023, 1:23 AM

awarzynski mentioned this in rG89b144ece330: [mlir][linalg] Vectorize tensor.extract using contiguous loads.Feb 22 2023, 11:33 AM

awarzynski mentioned this in rG8ece85a682f0: [mlir][linalg] Vectorize tensor.extract using contiguous loads.Mar 2 2023, 1:20 AM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

72 lines

test/

Dialect/

Linalg/

vectorization.mlir

26 lines

Diff 477477

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show All 27 Lines
#include "mlir/IR/PatternMatch.h"		#include "mlir/IR/PatternMatch.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
#include "mlir/Support/LLVM.h"		#include "mlir/Support/LLVM.h"
#include "mlir/Transforms/RegionUtils.h"		#include "mlir/Transforms/RegionUtils.h"
#include "llvm/ADT/ScopeExit.h"		#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/Sequence.h"		#include "llvm/ADT/Sequence.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/TypeSwitch.h"		#include "llvm/ADT/TypeSwitch.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions spurious leftover include nicolasvasilache: spurious leftover include
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <type_traits>		#include <type_traits>

using namespace mlir;		using namespace mlir;
using namespace mlir::linalg;		using namespace mlir::linalg;

#define DEBUG_TYPE "linalg-vectorization"		#define DEBUG_TYPE "linalg-vectorization"

#define DBGS() (llvm::dbgs() << '[' << DEBUG_TYPE << "] ")		#define DBGS() (llvm::dbgs() << '[' << DEBUG_TYPE << "] ")
#define LDBG(X) LLVM_DEBUG(DBGS() << X)		#define LDBG(X) LLVM_DEBUG(DBGS() << X)

/// Try to vectorize `convOp` as a convolution.		/// Try to vectorize `convOp` as a convolution.
static FailureOr<Operation *> vectorizeConvolution(OpBuilder &b,		static FailureOr<Operation *> vectorizeConvolution(OpBuilder &b,
LinalgOp convOp);		LinalgOp convOp);

/// Return the unique instance of OpType in `block` if it is indeed unique.		/// Return the unique instance of OpType in `block` if it is indeed unique.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This is problematic, you shouldn't expect that any of this is run in a pass pipeline. You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead. nicolasvasilache: This is problematic, you shouldn't expect that any of this is run in a pass pipeline. You could…
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions This is problematic, you shouldn't expect that any of this is run in a pass pipeline. Isn't a global command-line option orthogonal to whether this functionality is wrapped in a pass or not? You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead. I feel that this would only be sufficient for testing/experimenting with the Transform dialect. But I'd like more than that :) I'm not that familiar with the Transform dialect and might be missing something obvious - please let me know! I'm just about send and updated that preserves this global command-line flag and adds a Transform dialect attribute. Is that what you are suggesting? awarzynski: > This is problematic, you shouldn't expect that any of this is run in a pass pipeline. Isn't…
		dcaballeUnsubmitted Not Done Reply Inline Actions I suggested that we could go with a global flag as this is a temporary flag that will be removed once we stabilize the performance of the gather. I actually wanted to avoid having to extend the transform dialect just for this temporary thing but happy to have it working with the transform dialect as well. However, it's not clear to me why we need a struct (`MLIRLinalgVectorizerOptions`) and all the `MLIRLinalgVectorizerOptions` and `MLIRLinalgVectorizerOptions` related code. Wouldn't just adding the `llvm::cl` option work? dcaballe: I suggested that we could go with a global flag as this is a temporary flag that will be…
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions However, it's not clear to me why we need a struct (MLIRLinalgVectorizerOptions) and all the MLIRLinalgVectorizerOptions and MLIRLinalgVectorizerOptions related code. Wouldn't just adding the llvm::cl option work? Without a struct we would have a global destructor (for the `vectorizeNDExtract`), and that's not an option, see https://github.com/llvm/llvm-project/blob/d620bae999465f4a418c38b6451b11d80523c6de/mlir/lib/CMakeLists.txt#L1-L2 :) This approach is actually fairly common these days (see how things are set-up in mlir-opt - it allows you to control when particular sets of options are exposed. Otherwise, all "global" options would be available at all times. awarzynski: > However, it's not clear to me why we need a struct (MLIRLinalgVectorizerOptions) and all the…
		dcaballeUnsubmitted Not Done Reply Inline Actions Thanks! You are definitely more up-to-date on this than I am. dcaballe: Thanks! You are definitely more up-to-date on this than I am.
/// Return null if none or more than 1 instances exist.		/// Return null if none or more than 1 instances exist.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This leaks badly all the way up to the main binary.. I'd like to see this discussed. nicolasvasilache: This leaks badly all the way up to the main binary.. I'd like to see this discussed.
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions This leaks badly all the way up to the main binary Only if you register it through `linalg::registerLinalgVectorizerCLOptions();` like I did in MLIROptMain.cpp. I'd like to see this discussed. Are you concerned about `mlir-opt` specifically? Because I don't really need this in `mlir-opt`. I use `iree-compile` instead, but that's orthogonal here. awarzynski: > This leaks badly all the way up to the main binary Only if you register it through ` linalg…
template <typename OpType>		template <typename OpType>
static OpType getSingleOpOfType(Block &block) {		static OpType getSingleOpOfType(Block &block) {
OpType res;		OpType res;
block.walk([&](OpType op) {		block.walk([&](OpType op) {
if (res) {		if (res) {
res = nullptr;		res = nullptr;
return WalkResult::interrupt();		return WalkResult::interrupt();
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions If you really want a CL option, you could add a test pass (we don't want to start maintaining a pass with heuristics and perf regressions that people would depend on). But I think the value of that is limited these days. nicolasvasilache: If you really want a CL option, you could add a test pass (we don't want to start maintaining…
}		}
res = op;		res = op;
return WalkResult::advance();		return WalkResult::advance();
});		});
return res;		return res;
}		}

/// Given an indexing `map` coming from a LinalgOp indexing, restricted to a		/// Given an indexing `map` coming from a LinalgOp indexing, restricted to a
▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines
}		}

/// Helper function to check if the tensor.extract can be vectorized by the		/// Helper function to check if the tensor.extract can be vectorized by the
/// custom hook vectorizeTensorExtract.		/// custom hook vectorizeTensorExtract.
static LogicalResult tensorExtractVectorizationPrecondition(Operation *op) {		static LogicalResult tensorExtractVectorizationPrecondition(Operation *op) {
tensor::ExtractOp extractOp = dyn_cast<tensor::ExtractOp>(op);		tensor::ExtractOp extractOp = dyn_cast<tensor::ExtractOp>(op);
if (!extractOp)		if (!extractOp)
return failure();		return failure();

// Currently only supports extraction with an 1-D index.
if (extractOp.getIndices().size() != 1)
return failure();

if (!VectorType::isValidElementType(extractOp.getIndices()[0].getType()))		if (!VectorType::isValidElementType(extractOp.getIndices()[0].getType()))
		pzreadUnsubmitted Not Done Reply Inline Actions nit: This comment can be removed. pzread: nit: This comment can be removed.
return failure();		return failure();

if (llvm::any_of(extractOp->getResultTypes(), [](Type type) {		if (llvm::any_of(extractOp->getResultTypes(), [](Type type) {
return !VectorType::isValidElementType(type);		return !VectorType::isValidElementType(type);
})) {		})) {
return failure();		return failure();
}		}

return success();		return success();
}		}

		/// Calculates the offsets (`$index_vec`) for `vector.gather` operations
		/// generated from `tensor.extract`. The offset is calculated as follows
		/// (example using scalar values):
		///
		/// offset = extractOp.indices[0]
		/// for (i = 1; i < numIndices; i++)
		/// offset = extractOp.dimSize[i] * offset + extractOp.indices[i];
		///
		/// For tensor<45 x 80 x 15 x f32> and index [1, 2, 3], this leads to:
		/// offset = ( ( 1 ) * 80 + 2 ) * 15 + 3
		static Value
		calculateOffsetForGLoad(OpBuilder &b, tensor::ExtractOp extractOp,
		const BlockAndValueMapping &bvm,
		dcaballeUnsubmitted Not Done Reply Inline Actions `const SmallVector<int64, 4> &` -> `const SmallVectorImpl<int64_t> &` dcaballe: `const SmallVector<int64, 4> &` -> `const SmallVectorImpl<int64_t> &`
		const SmallVector<int64_t, 4> &targetShape) {
		// The vector of indices for GatherOp should be shaped as the output vector
		auto indexVecType = VectorType::get(targetShape, b.getIndexType());
		auto loc = extractOp.getLoc();

		Value offset = b.create<vector::BroadcastOp>(
		loc, indexVecType, bvm.lookup(extractOp.getIndices()[0]));

		const size_t numIndices = extractOp.getIndices().size();
		for (size_t i = 1; i < numIndices; i++) {
		auto dimSizeBcast = b.create<vector::BroadcastOp>(
		loc, indexVecType,
		b.create<arith::ConstantIndexOp>(
		loc,
		pzreadUnsubmitted Not Done Reply Inline Actions nit: I think it's clearer and safer to write: extractOp.getTensor().getType() Also do we check if the dim is dynamic? If it has been checked in another place, we can have an assert here. pzread: nit: I think it's clearer and safer to write: ``` extractOp.getTensor().getType() ``` Also do…
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions nit: I think it's clearer and safer to write: Thanks for the suggestion, I'll update this in the next revision! Also do we check if the dim is dynamic? If it has been checked in another place, we can have an assert here. It is checked here. Should I add an assert regardless? awarzynski: >nit: I think it's clearer and safer to write: Thanks for the suggestion, I'll update this in…
		extractOp->getOperandTypes()[0].cast<ShapedType>().getDimSize(i)));
		offset = b.create<arith::MulIOp>(loc, offset, dimSizeBcast);

		auto originalIndexBcast = bvm.lookup(extractOp.getIndices()[i]);
		if (i == numIndices - 1) {
		// We only need an additional broadcast for the trailing index. All other
		// indices have already been broadcast by `vectorizeLinalgIndex` to match
		// the output size.
		originalIndexBcast = b.create<vector::BroadcastOp>(
		loc, indexVecType, bvm.lookup(extractOp.getIndices()[i]));
		}

		offset = b.create<arith::AddIOp>(loc, originalIndexBcast, offset);
		}

		return offset;
		}

/// Helper function to vectorize the tensor.extract operations. Returns		/// Helper function to vectorize the tensor.extract operations. Returns
/// VectorizationStatus::NewOp to signal the vectorization algorithm that it		/// VectorizationStatus::NewOp to signal the vectorization algorithm that it
/// should map the produced operations. This function is meant to be used as a		/// should map the produced operations. This function is meant to be used as a
/// CustomVectorizationHook.		/// CustomVectorizationHook.
static VectorizationResult		static VectorizationResult
vectorizeTensorExtract(OpBuilder &b, Operation *op, LinalgOp linalgOp,		vectorizeTensorExtract(OpBuilder &b, Operation *op, LinalgOp linalgOp,
const BlockAndValueMapping &bvm) {		const BlockAndValueMapping &bvm) {
tensor::ExtractOp extractOp = dyn_cast<tensor::ExtractOp>(op);		tensor::ExtractOp extractOp = dyn_cast<tensor::ExtractOp>(op);
if (!extractOp)		if (!extractOp)
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};
auto loc = extractOp.getLoc();		auto loc = extractOp.getLoc();

// Currently only supports extraction with an 1-D index. Checked in the
// tensorExtractVectorizationPrecondition.
assert(extractOp.getIndices().size() == 1);

auto indexVec = bvm.lookup(extractOp.getIndices()[0]);
// Compute the static loop sizes of the extract op.		// Compute the static loop sizes of the extract op.
auto targetShape = linalgOp.computeStaticLoopSizes();		auto targetShape = linalgOp.computeStaticLoopSizes();

SmallVector<Value> gatherIndices;		auto resultType =
gatherIndices.push_back(b.create<arith::ConstantIndexOp>(loc, 0));		VectorType::get(targetShape, extractOp.getResult().getType());

auto maskConstantOp = b.create<arith::ConstantOp>(		auto maskConstantOp = b.create<arith::ConstantOp>(
loc,		loc,
DenseIntElementsAttr::get(VectorType::get(targetShape, b.getI1Type()),		DenseIntElementsAttr::get(VectorType::get(targetShape, b.getI1Type()),
/value=/true));		/value=/true));

auto resultType =
VectorType::get(targetShape, extractOp.getResult().getType());
auto passThruConstantOp =		auto passThruConstantOp =
b.create<arith::ConstantOp>(loc, b.getZeroAttr(resultType));		b.create<arith::ConstantOp>(loc, b.getZeroAttr(resultType));

		// Base indices are currently set to 0. We will need to re-visit if more
		rsudermanUnsubmitted Not Done Reply Inline Actions You can initially construct this using `numIndices` for the size and create the `arith::ConstantIndexOp` with value `0`. This avoids having branching behavior depending on the number of indices. rsuderman: You can initially construct this using `numIndices` for the size and create the `arith…
		// generic scenarios are to be supported.
		rsudermanUnsubmitted Not Done Reply Inline Actions Make `const auto` or `const int64_t`. rsuderman: Make `const auto` or `const int64_t`.
		dcaballeUnsubmitted Done Reply Inline Actions nit: gatherIndices -> baseIndices? Clearer? dcaballe: nit: gatherIndices -> baseIndices? Clearer?
		SmallVector<Value> baseIndices(extractOp.getIndices().size(),
		b.create<arith::ConstantIndexOp>(loc, 0));

		Value offset = calculateOffsetForGLoad(b, extractOp, bvm, targetShape);

		// Generate the gather load
		rsudermanUnsubmitted Not Done Reply Inline Actions This line will no longer be needed. rsuderman: This line will no longer be needed.
auto gatherOp = b.create<vector::GatherOp>(		auto gatherOp = b.create<vector::GatherOp>(
loc, resultType, extractOp.getTensor(), gatherIndices, indexVec,		loc, resultType, extractOp.getTensor(), baseIndices, offset,
		dcaballeUnsubmitted Not Done Reply Inline Actions Sorry, missing something probably trivial but, where is the transpose being generated? dcaballe: Sorry, missing something probably trivial but, where is the transpose being generated?
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions `vector.transpose` is inserted in vectorizeLinalgIndex, i.e. prior to entering this method. Note that we only need to broadcast the trailing index as that's the only one that has not been broadcast yet. ATM, I'm broadcasting "everything" to keep the code simple and rely on subsequent optimisations to remove redundant broadcasts. I'll update the broadcasting part to make this more explicit. I suggest that we look at removing `vector.transpose` in a separate patch. I admit that I've not looked at all. WDTY? awarzynski: `vector.transpose` is inserted in [[ https://github.com/llvm/llvm…
		dcaballeUnsubmitted Not Done Reply Inline Actions SG, thanks! dcaballe: SG, thanks!
maskConstantOp, passThruConstantOp);		maskConstantOp, passThruConstantOp);

return VectorizationResult{VectorizationStatus::NewOp, gatherOp};		return VectorizationResult{VectorizationStatus::NewOp, gatherOp};
		dcaballeUnsubmitted Not Done Reply Inline Actions Instead of special-casing this for numIndices == 1... would it be possible to have the logic below working for just one index (and do mostly nothing)? It looks like having one index should be a base case of that code. dcaballe: Instead of special-casing this for numIndices == 1... would it be possible to have the logic…
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions Good point, done! awarzynski: Good point, done!
}		}

/// Emit reduction operations if the shapes of the value to reduce is different		/// Emit reduction operations if the shapes of the value to reduce is different
/// that the result shape.		/// that the result shape.
static Operation reduceIfNeeded(OpBuilder &b, LinalgOp linalgOp, Operation op,		static Operation reduceIfNeeded(OpBuilder &b, LinalgOp linalgOp, Operation op,
Value reduceValue, Value initialValue,		Value reduceValue, Value initialValue,
const BlockAndValueMapping &bvm) {		const BlockAndValueMapping &bvm) {
Value reduceVec = bvm.lookup(reduceValue);		Value reduceVec = bvm.lookup(reduceValue);
Value outputVec = bvm.lookup(initialValue);		Value outputVec = bvm.lookup(initialValue);
		rsudermanUnsubmitted Not Done Reply Inline Actions You actually don't need this saved to a vector. You only need an accumulator for mul-add. (More details below). rsuderman: You actually don't need this saved to a vector. You only need an accumulator for mul-add. (More…
auto reduceType = reduceVec.getType().dyn_cast<VectorType>();		auto reduceType = reduceVec.getType().dyn_cast<VectorType>();
auto outputType = outputVec.getType().dyn_cast<VectorType>();		auto outputType = outputVec.getType().dyn_cast<VectorType>();
// Reduce only if needed as the value may already have been reduce for		// Reduce only if needed as the value may already have been reduce for
// contraction vectorization.		// contraction vectorization.
if (!reduceType \|\|		if (!reduceType \|\|
(outputType && reduceType.getShape() == outputType.getShape()))		(outputType && reduceType.getShape() == outputType.getShape()))
return nullptr;		return nullptr;
SmallVector<bool> reductionMask = getReductionMask(linalgOp);		SmallVector<bool> reductionMask = getReductionMask(linalgOp);
return buildMultiDimReduce(b, op, reduceVec, outputVec, reductionMask);		return buildMultiDimReduce(b, op, reduceVec, outputVec, reductionMask);
		rsudermanUnsubmitted Not Done Reply Inline Actions Ditto about needing a vector. You can iterate on values. rsuderman: Ditto about needing a vector. You can iterate on values.
}		}

/// Generic vectorization for a single operation `op`, given already vectorized		/// Generic vectorization for a single operation `op`, given already vectorized
/// operands carried by `bvm`. Vectorization occurs as follows:		/// operands carried by `bvm`. Vectorization occurs as follows:
		dcaballeUnsubmitted Not Done Reply Inline Actions There might be a few improvements related to the offset computation. Would it make sense to move all of this to self-contained utility function? It's very likely that we play with different approaches to compute the offsets and having a that isolated into a utility function would help. dcaballe: There might be a few improvements related to the offset computation. Would it make sense to…
/// 1. Try to apply any of the `customVectorizationHooks` and return its		/// 1. Try to apply any of the `customVectorizationHooks` and return its
/// result on success.		/// result on success.
/// 2. Clone any constant in the current scope without vectorization: each		/// 2. Clone any constant in the current scope without vectorization: each
/// consumer of the constant will later determine the shape to which the		/// consumer of the constant will later determine the shape to which the
/// constant needs to be broadcast to.		/// constant needs to be broadcast to.
/// 3. Fail on any remaining non `ElementwiseMappable` op. It is the purpose		/// 3. Fail on any remaining non `ElementwiseMappable` op. It is the purpose
/// of the `customVectorizationHooks` to cover such cases.		/// of the `customVectorizationHooks` to cover such cases.
/// 4. Clone `op` in vector form to a vector of shape prescribed by the first		/// 4. Clone `op` in vector form to a vector of shape prescribed by the first
/// operand of maximal rank. Other operands have smaller rank and are		/// operand of maximal rank. Other operands have smaller rank and are
/// broadcast accordingly. It is assumed this broadcast is always legal,		/// broadcast accordingly. It is assumed this broadcast is always legal,
		rsudermanUnsubmitted Not Done Reply Inline Actions You can simplify the two loops below roughly with the IR equivalent below: int64_t index[N]; int64_t shape[N]; int64_t offset = index[N - 1]; for (int i = N - 1; i > 0; --i ){ offset = offset * shape[i] + index[i - 1] } It still requires the broadcast work but avoids the extraneous mul-by-1 and extra operations that are just cleaned up. It also avoids computing all the slice sizes when we just care about the global offset. rsuderman: You can simplify the two loops below roughly with the IR equivalent below: ``` int64_t index[N]…
/// otherwise, it means one of the `customVectorizationHooks` is incorrect.		/// otherwise, it means one of the `customVectorizationHooks` is incorrect.
///		///
/// This function assumes all operands of `op` have been vectorized and are in		/// This function assumes all operands of `op` have been vectorized and are in
/// the `bvm` mapping. As a consequence, this function is meant to be called on		/// the `bvm` mapping. As a consequence, this function is meant to be called on
/// a topologically-sorted list of ops.		/// a topologically-sorted list of ops.
/// This function does not update `bvm` but returns a VectorizationStatus that		/// This function does not update `bvm` but returns a VectorizationStatus that
/// instructs the caller what `bvm` update needs to occur.		/// instructs the caller what `bvm` update needs to occur.
static VectorizationResult		static VectorizationResult
Show All 15 Lines	vectorizeOneOp(OpBuilder &b, LinalgOp linalgOp, Operation *op,
// 2. Constant ops don't get vectorized but rather broadcasted at their users.		// 2. Constant ops don't get vectorized but rather broadcasted at their users.
// Clone so that the constant is not confined to the linalgOp block .		// Clone so that the constant is not confined to the linalgOp block .
if (isa<arith::ConstantOp, func::ConstantOp>(op))		if (isa<arith::ConstantOp, func::ConstantOp>(op))
return VectorizationResult{VectorizationStatus::NewOp, b.clone(*op)};		return VectorizationResult{VectorizationStatus::NewOp, b.clone(*op)};

// 3. Only ElementwiseMappable are allowed in the generic vectorization.		// 3. Only ElementwiseMappable are allowed in the generic vectorization.
if (!OpTrait::hasElementwiseMappableTraits(op))		if (!OpTrait::hasElementwiseMappableTraits(op))
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};

		rsudermanUnsubmitted Not Done Reply Inline Actions If you make the change to the gatherIndices declaration you can delete this entire loop. rsuderman: If you make the change to the gatherIndices declaration you can delete this entire loop.
// 4 . Check if the operation is a reduction.		// 4 . Check if the operation is a reduction.
SmallVector<std::pair<Value, Value>> reductionOperands;		SmallVector<std::pair<Value, Value>> reductionOperands;
for (Value operand : op->getOperands()) {		for (Value operand : op->getOperands()) {
auto arg = operand.dyn_cast<BlockArgument>();		auto arg = operand.dyn_cast<BlockArgument>();
if (!arg \|\| arg.getArgNumber() < linalgOp.getNumDpsInputs())		if (!arg \|\| arg.getArgNumber() < linalgOp.getNumDpsInputs())
continue;		continue;
SmallVector<Operation *> reductionOps;		SmallVector<Operation *> reductionOps;
Value reduceValue = matchReduction(		Value reduceValue = matchReduction(
▲ Show 20 Lines • Show All 1,479 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This is problematic, you shouldn't expect that any of this is run in a pass pipeline. You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead. nicolasvasilache: This is problematic, you shouldn't expect that any of this is run in a pass pipeline. You could…

// -----		// -----

// CHECK-LABEL: contraction_dot		// CHECK-LABEL: contraction_dot
func.func @contraction_dot(%A: memref<1584xf32>, %B: memref<1584xf32>, %C: memref<f32>) {		func.func @contraction_dot(%A: memref<1584xf32>, %B: memref<1584xf32>, %C: memref<f32>) {

// CHECK: arith.mulf %{{.}}, %{{.}} : vector<1584xf32>		// CHECK: arith.mulf %{{.}}, %{{.}} : vector<1584xf32>
// CHECK: vector.multi_reduction <add>, %{{.}}, {{.}} [0] : vector<1584xf32> to f32		// CHECK: vector.multi_reduction <add>, %{{.}}, {{.}} [0] : vector<1584xf32> to f32
▲ Show 20 Lines • Show All 1,485 Lines • ▼ Show 20 Lines	^bb1(%arg1: !pdl.operation):
%2 = transform.structured.vectorize %1		%2 = transform.structured.vectorize %1
}		}

// -----		// -----

#map0 = affine_map<(d0, d1, d2, d3) -> (d0, d2)>		#map0 = affine_map<(d0, d1, d2, d3) -> (d0, d2)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>		#map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>
#map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>		#map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
func.func @not_vectorize_nd_tensor_extract(%arg0: tensor<3x3xf32>, %arg1: tensor<4x3xi32>, %arg2: tensor<4x3xi32>, %arg3: tensor<4x7x2xf32>, %arg4: tensor<4x7x3x2xf32>) -> tensor<4x7x3x2xf32> {		func.func @vectorize_nd_tensor_extract(%arg0: tensor<3x3xf32>, %arg1: tensor<4x3xi32>, %arg2: tensor<4x3xi32>, %arg3: tensor<4x7x2xf32>, %arg4: tensor<4x7x3x2xf32>) -> tensor<4x7x3x2xf32> {
%2 = linalg.generic {		%2 = linalg.generic {
indexing_maps = [#map0, #map0, #map1, #map2],		indexing_maps = [#map0, #map0, #map1, #map2],
iterator_types = ["parallel", "parallel", "parallel", "parallel"]		iterator_types = ["parallel", "parallel", "parallel", "parallel"]
} ins(%arg1, %arg2, %arg3 : tensor<4x3xi32>, tensor<4x3xi32>, tensor<4x7x2xf32>) outs(%arg4 : tensor<4x7x3x2xf32>) {		} ins(%arg1, %arg2, %arg3 : tensor<4x3xi32>, tensor<4x3xi32>, tensor<4x7x2xf32>) outs(%arg4 : tensor<4x7x3x2xf32>) {
^bb0(%arg5: i32, %arg6: i32, %arg7: f32, %arg8: f32):		^bb0(%arg5: i32, %arg6: i32, %arg7: f32, %arg8: f32):
%3 = arith.index_cast %arg5 : i32 to index		%3 = arith.index_cast %arg5 : i32 to index
%4 = arith.index_cast %arg6 : i32 to index		%4 = arith.index_cast %arg6 : i32 to index
%7 = tensor.extract %arg0[%3, %4] : tensor<3x3xf32>		%7 = tensor.extract %arg0[%3, %4] : tensor<3x3xf32>
linalg.yield %7 : f32		linalg.yield %7 : f32
} -> tensor<4x7x3x2xf32>		} -> tensor<4x7x3x2xf32>
return %2 : tensor<4x7x3x2xf32>		return %2 : tensor<4x7x3x2xf32>
}		}
// CHECK-LABEL: func.func @not_vectorize_nd_tensor_extract		// CHECK-LABEL: func.func @vectorize_nd_tensor_extract
// CHECK: tensor.extract		// CHECK-SAME: %[[ARG0:.*]]: tensor<3x3xf32>
		// CHECK-SAME: %[[ARG1:arg1]]: tensor<4x3xi32>
		// CHECK-SAME: %[[ARG2:arg2]]: tensor<4x3xi32>
		// CHECK-SAME: %[[ARG3:.*]]: tensor<4x7x2xf32>
		// CHECK-SAME: %[[ARG4:.*]]: tensor<4x7x3x2xf32>
		// CHECK: %[[C0:.*]] = arith.constant 0 : index
		// CHECK: %[[C0_i32:.*]] = arith.constant 0 : i32
		// CHECK: %[[CST:.*]] = arith.constant dense<3> : vector<7x2x4x3xindex>
		// CHECK: %[[CST_1:.*]] = arith.constant dense<true> : vector<4x7x3x2xi1>
		// CHECK: %[[PASSTHRU:.*]] = arith.constant dense<0.000000e+00> : vector<4x7x3x2xf32>
		// CHECK: %[[V0:.*]] = vector.transfer_read %[[ARG1]][%[[C0]], %[[C0]]], %[[C0_i32]] {in_bounds = [true, true]} : tensor<4x3xi32>, vector<4x3xi32>
		// CHECK: %[[V1:.*]] = vector.transfer_read %[[ARG2]][%[[C0]], %[[C0]]], %[[C0_i32]] {in_bounds = [true, true]} : tensor<4x3xi32>, vector<4x3xi32>
		// CHECK: %[[CAST:.*]] = arith.index_cast %[[V0]] : vector<4x3xi32> to vector<4x3xindex>
		// CHECK: %[[B1:.*]] = vector.broadcast %[[CAST]] : vector<4x3xindex> to vector<7x2x4x3xindex>
		// CHECK: %[[CAST_1:.*]] = arith.index_cast %[[V1]] : vector<4x3xi32> to vector<4x3xindex>
		// CHECK: %[[B2:.*]] = vector.broadcast %[[CAST_1]] : vector<4x3xindex> to vector<7x2x4x3xindex>
		// CHECK: %[[MULI:.*]] = arith.muli %[[B1]], %[[CST]] : vector<7x2x4x3xindex>
		// CHECK: %[[ADDI:.*]] = arith.addi %[[B2]], %[[MULI]] : vector<7x2x4x3xindex>
		// CHECK: %[[T:.*]] = vector.transpose %[[ADDI]], [2, 0, 3, 1] : vector<7x2x4x3xindex> to vector<4x7x3x2xindex>
		// CHECK: %[[GATHER:.*]] = vector.gather %[[ARG0]][%[[C0]], %[[C0]]] [%[[T]]], %[[CST_1]], %[[PASSTHRU]] : tensor<3x3xf32>, vector<4x7x3x2xindex>, vector<4x7x3x2xi1>, vector<4x7x3x2xf32> into vector<4x7x3x2xf32>
		// CHECK: vector.transfer_write %[[GATHER]], %[[ARG4]][%[[C0]], %[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, true, true, true]} : vector<4x7x3x2xf32>, tensor<4x7x3x2xf32>

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1		%2 = transform.structured.vectorize %1
}		}

▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines