This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Vectorize tensor.extract on n-D tensor
ClosedPublic

Authored by awarzynski on Nov 8 2022, 11:41 AM.

Details

Summary

This patch implements the vectorization of tensor.extract for arbitrary
tensors. It basically extends https://reviews.llvm.org/D133786 by adding
support for n-D tensors. This works by basically flattening the
indices.

While this patch allows for more cases of tensor.extract to be
vectorised, performance on the workloads that we tested
either regressed or didn't improve. For this reason, new functionality
is hidden behind a global command-line flag:

  • -linalg-vectorize-n-d-extract.

We may want remove it once the implementation is refined and we
are happy with the performance.

Extra logic in the Linalg vectorizer is added to support the new
command line option.

Related discussion: https://github.com/iree-org/iree/issues/9198

Diff Detail

Event Timeline

awarzynski created this revision.Nov 8 2022, 11:41 AM
Herald added a project: Restricted Project. · View Herald Transcript
awarzynski requested review of this revision.Nov 8 2022, 11:41 AM
Herald added a project: Restricted Project. · View Herald Transcript
rsuderman requested changes to this revision.Nov 15 2022, 6:47 PM
rsuderman added a subscriber: rsuderman.
rsuderman added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
397

You can initially construct this using numIndices for the size and create the arith::ConstantIndexOp with value 0. This avoids having branching behavior depending on the number of indices.

398

Make const auto or const int64_t.

404

This line will no longer be needed.

418

You actually don't need this saved to a vector. You only need an accumulator for mul-add. (More details below).

427

Ditto about needing a vector. You can iterate on values.

441

You can simplify the two loops below roughly with the IR equivalent below:

int64_t index[N];
int64_t shape[N];
int64_t offset = index[N - 1];

for (int i = N - 1; i > 0; --i ){
  offset = offset * shape[i] + index[i - 1]
}

It still requires the broadcast work but avoids the extraneous mul-by-1 and extra operations that are just cleaned up. It also avoids computing all the slice sizes when we just care about the global offset.

473

If you make the change to the gatherIndices declaration you can delete this entire loop.

This revision now requires changes to proceed.Nov 15 2022, 6:47 PM

Simplify as per comments from @rsuderman

Thank you for taking a look, Rob! Much appreciated comments and suggestions.

Thanks for working on this and addressing the feedback! LGTM. Just some minor comments!

Also wondering if we should land this under a flag (disable by default) as we know performance will degrade until we have some gather optimizations in place.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
398

nit: gatherIndices -> baseIndices? Clearer?

406

Sorry, missing something probably trivial but, where is the transpose being generated?

409

Instead of special-casing this for numIndices == 1... would it be possible to have the logic below working for just one index (and do mostly nothing)? It looks like having one index should be a base case of that code.

431

There might be a few improvements related to the offset computation. Would it make sense to move all of this to self-contained utility function? It's very likely that we play with different approaches to compute the offsets and having a that isolated into a utility function would help.

awarzynski marked an inline comment as done.Nov 23 2022, 5:18 AM

Thanks for taking a look!

Also wondering if we should land this under a flag (disable by default) as we know performance will degrade until we have some gather optimizations in place.

I agree, but will need some pointers :) The test runs TestTransformDialectInterpreterPass, but I doubt that's the pass I'd want to update here.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
406

vector.transpose is inserted in vectorizeLinalgIndex, i.e. prior to entering this method.

Note that we only need to broadcast the trailing index as that's the only one that has not been broadcast yet. ATM, I'm broadcasting "everything" to keep the code simple and rely on subsequent optimisations to remove redundant broadcasts. I'll update the broadcasting part to make this more explicit.

I suggest that we look at removing vector.transpose in a separate patch. I admit that I've not looked at all. WDTY?

409

Good point, done!

Implement suggestions from @dcaballe

  • moved offset calculation to a dedicated method calculateOffsetForGLoad (perhaps there's a better name?)
  • made sure that we only add broadcast for the trainling index (other indices have already been broadcast by vectorizeLinalgIndex)
  • renamed gatherIndices as baseIndices

Thanks for addressing the feedback! I'm ok with landing under a flag after addressing the feedback, as stepping stone towards the final n-D gather solution. We can build on top of this iteratively.

I agree, but will need some pointers :) The test runs TestTransformDialectInterpreterPass, but I doubt that's the pass I'd want to update here.

We can use an global llvm::opt flag as this flag is temporary that will go away. The other options is to add a bool parameter to vectorize but we would have to extend the Transform dialect, etc. to provide the right value. I don't think it's worth it given that this will go away.

(perhaps there's a better name?)

calculateGatherOffset?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
406

SG, thanks!

dcaballe accepted this revision.Nov 23 2022, 1:23 PM

Hide the new functionality behind an llvm::cl option

Added a new global command line option for enabling the vectorisaiton of
tensor.extract when the input is an n-D tensor:
--linalg-vectorize-n-d-extract. The default value is "off", which means that
this patch will not change the default behaviour. I'll update the summary of
this change shortly.

Given that this is the Thanksgiving week, I'll wait until next week before
landing this. This way our friends celebrating get an opportunity to take
another look before I merge this. I will also add some reviewers that might
have an opinion about the new global options.##

awarzynski edited the summary of this revision. (Show Details)Nov 24 2022, 7:29 AM
nicolasvasilache requested changes to this revision.Nov 24 2022, 7:52 AM

Please hold off with this, I am making significant changes to vectorization.
I hope I can share something tomorrow.

This revision now requires changes to proceed.Nov 24 2022, 7:52 AM

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

Thanks for the heads-up and for sharing! Those look like some really nice improvements, thank you!

I definitely would prefer to have this in sooner rather than later, but also don't mind waiting and definitely don't want to make anyone's life harder :) Shall I wait another week? Would that be sufficient for you?

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

Thanks for the heads-up and for sharing! Those look like some really nice improvements, thank you!

I definitely would prefer to have this in sooner rather than later, but also don't mind waiting and definitely don't want to make anyone's life harder :) Shall I wait another week? Would that be sufficient for you?

Thanks!

This is definitely a long-standing simplification for at least 2 years, now is the right time to do it before too many things are added on top.

I estimate 1 week should be good, the main blocker atm is that I have no easy way of turning downstream IREE errors into actual test cases here.

Here is the WIP https://reviews.llvm.org/D138688

Unfortunately I am having issues on the IREE side that I cannot yet diagnose and do not know whether this is causing potentially major regressions that need to be addressed before landing.

Thanks for the heads-up and for sharing! Those look like some really nice improvements, thank you!

I definitely would prefer to have this in sooner rather than later, but also don't mind waiting and definitely don't want to make anyone's life harder :) Shall I wait another week? Would that be sufficient for you?

Thanks!

This is definitely a long-standing simplification for at least 2 years, now is the right time to do it before too many things are added on top.

I estimate 1 week should be good, the main blocker atm is that I have no easy way of turning downstream IREE errors into actual test cases here.

We iterated with @dcaballe offline. I can remove that blocker and reshuffle things on my end.

However, the CL option needs to be reworked.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52

This is problematic, you shouldn't expect that any of this is run in a pass pipeline.
You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead.

mlir/test/Dialect/Linalg/vectorization.mlir
1

This is problematic, you shouldn't expect that any of this is run in a pass pipeline.
You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead.

pzread added inline comments.Nov 28 2022, 10:25 AM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
315–316

nit: This comment can be removed.

354

nit: I think it's clearer and safer to write:

extractOp.getTensor().getType()

Also do we check if the dim is dynamic? If it has been checked in another place, we can have an assert here.

awarzynski added inline comments.Nov 28 2022, 11:52 AM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52

This is problematic, you shouldn't expect that any of this is run in a pass pipeline.

Isn't a global command-line option orthogonal to whether this functionality is wrapped in a pass or not?

You could add a new attribute the to VectorizeOp (like we do for padding) and propagate it from there instead.

I feel that this would only be sufficient for testing/experimenting with the Transform dialect. But I'd like more than that :) I'm not that familiar with the Transform dialect and might be missing something obvious - please let me know! I'm just about send and updated that preserves this global command-line flag and adds a Transform dialect attribute. Is that what you are suggesting?

354

nit: I think it's clearer and safer to write:

Thanks for the suggestion, I'll update this in the next revision!

Also do we check if the dim is dynamic? If it has been checked in another place, we can have an assert here.

It is checked here. Should I add an assert regardless?

Address PR comments

Added an attribute to VectorizeOp so that the vectorization can be controlled from the Transfer dialect level. @nicolasvasilache, is that what you had in mind?

Thanks for all the comments so far!

dcaballe added inline comments.Nov 28 2022, 12:07 PM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52

I suggested that we could go with a global flag as this is a temporary flag that will be removed once we stabilize the performance of the gather. I actually wanted to avoid having to extend the transform dialect just for this temporary thing but happy to have it working with the transform dialect as well.

However, it's not clear to me why we need a struct (MLIRLinalgVectorizerOptions) and all the MLIRLinalgVectorizerOptions and MLIRLinalgVectorizerOptions related code. Wouldn't just adding the llvm::cl option work?

awarzynski added inline comments.Nov 28 2022, 1:52 PM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52

However, it's not clear to me why we need a struct (MLIRLinalgVectorizerOptions) and all the MLIRLinalgVectorizerOptions and MLIRLinalgVectorizerOptions related code. Wouldn't just adding the llvm::cl option work?

Without a struct we would have a global destructor (for the vectorizeNDExtract), and that's not an option, see https://github.com/llvm/llvm-project/blob/d620bae999465f4a418c38b6451b11d80523c6de/mlir/lib/CMakeLists.txt#L1-L2 :)

This approach is actually fairly common these days (see how things are set-up in mlir-opt - it allows you to control when particular sets of options are exposed. Otherwise, all "global" options would be available at all times.

Thanks for adding support for the Transform dialect! LGTM. I think we should be able to land this. Does anybody else have any comments?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
52

Thanks! You are definitely more up-to-date on this than I am.

340

const SmallVector<int64, 4> & -> const SmallVectorImpl<int64_t> &

dcaballe accepted this revision.Dec 5 2022, 9:35 AM
rsuderman accepted this revision.Dec 5 2022, 11:11 AM
mlir/lib/Tools/mlir-opt/MlirOptMain.cpp
269 ↗(On Diff #478309)

oof this looks like a very unwelcome dependence at this level ..

@rriddle , any comments / suggestions here ?

nicolasvasilache requested changes to this revision.Dec 5 2022, 11:49 AM
nicolasvasilache added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
53

This leaks badly all the way up to the main binary.. I'd like to see this discussed.

This revision now requires changes to proceed.Dec 5 2022, 11:49 AM
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1724 ↗(On Diff #478309)

thanks for adding this

1779 ↗(On Diff #478309)

why not have this work exactly the same way as the code 2 lines above?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
60

If you really want a CL option, you could add a test pass (we don't want to start maintaining a pass with heuristics and perf regressions that people would depend on).
But I think the value of that is limited these days.

I suggested a global flag to avoid having to extend the Transform dialect, as the flag will go away in a few weeks. Since it's already supported in the Transform dialect, I think we could remove the global flag and add a bool option to the vectorize method. That should work for both, the Transform dialect and any other users of the vectorizer.

I see that the global option is proving more controversial than we anticipated :)

@nicolasvasilache, we don't really need the changes in mlir-opt (@dcaballe ?), but I'd like to be able to control this vectorization through other tools ( iree-compile in my case). AFAIK, a transfer dialect attribute is not sufficient for this (perhaps I'm missing something?). Global options is what folks within LLVM have been using for this sort of things, so perhaps this is not too bad?

Since it's already supported in the Transform dialect, I think we could remove the global flag and add a bool option to the vectorize method. That should work for both, the Transform dialect and any other users of the vectorizer.

This actually a bit tricky. I'd need to find a way to pass it to tensorExtractVectorizationPrecondition, but it only accepts one argument (it's type is CustomVectorizationPrecondition). Also, this would mean adding a bool argument in quite a few places, so not so tidy. I'll try again tomorrow.

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1779 ↗(On Diff #478309)

You know this stuff much better than I do, so please point me in the right direction if I'm talking nonsense.

Basically, hooks for vectorising tensor.pad and tensor.extract are registered differently ATM:

I wanted to avoid refactoring, hence this slight inconsistency how tensor.pad and tensor.extract are treated. Do you reckon that we should go via OpRewritePattern for tensor.extract instead? If yes, then presumably we'd want that in a separate patch?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
53

This leaks badly all the way up to the main binary

Only if you register it through linalg::registerLinalgVectorizerCLOptions(); like I did in MLIROptMain.cpp.

I'd like to see this discussed.

Are you concerned about mlir-opt specifically? Because I don't really need this in mlir-opt. I use iree-compile instead, but that's orthogonal here.

awarzynski updated this revision to Diff 480813.Dec 7 2022, 1:29 AM

Revert changes from mlir-opt

The changes in mlir-opt are not needed for this patch to work. I'm leaving
the llvm::cl option for now, as even extending vectorize to accept additional
bool (as per Diego's sugestion) would require adding a global variable.
Otherwise, we'd need to propagage that bool to either
tensorExtractVectorizationPrecondition or VectorizeTensorExtract, but
that's not possible (i.e. we can't really change the corresponding interfaces).

ALso rebased on top of main and replaced SmallVector with SmallVectorImpl.

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1779 ↗(On Diff #478309)

I see, I mistook the fact that you are vectorizing a tensor.xx op for what you are actually doing, which is vectorizing something new within the Linalg body and injecting an orthogonal control to enable it.

The easy way forward is to pass the boolean to the pattern constructor and to the vectorize function, feel free to wrap that in an options struct (i.e. your LinalgVectorizerOptions).
If you don't want to pass that forward to vectorizeAsLinalgGeneric, feel free to hoist the hooks registration one level which will also make them more visible and documented one level higher.

Despite passing the information in 3 places, this is infinitely better than an orthogonal control injection with a global, this definitely feels like an abuse to me.
Doing a quick grep in MLIR I do not see any such use and I am definitely not looking at having Linalg introduce such a mechanism, sorry.
If you feel strongly about this please start and RFC on discourse to gather more feedback.

Removed the llvm::cl logic

I removed all the logic related to llvm::cl and replaced it entirely with:

  • transfer dialect attribute (vectorize_nd_extract)
  • extra bool option passed to vectorize

@nicolasvasilache, I didn't really need the global, it just happened to be an interface that I am very familiar with. I'm new to this area and it's not always obvious which approach would be the least frictionless. This approach works for me, thanks for pointing me in the right direction!

I am not too happy about the extra argument in vectorizeLinalgOpPrecondition that's very specific that what I'm trying to do for tensor.extract. But equally, this hook is currently only used for tensor.extract.

nicolasvasilache accepted this revision.Dec 11 2022, 7:59 AM

thanks!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
36

spurious leftover include

This revision is now accepted and ready to land.Dec 11 2022, 7:59 AM
This revision was automatically updated to reflect the committed changes.