This is an archive of the discontinued LLVM Phabricator instance.

[mlir] [VectorOps] Rewriting of vector.extract/insert_slices to other vector ops
ClosedPublic

Authored by aartbik on Jan 23 2020, 2:17 PM.

Details

Summary

Rewrites the extract/insert_slices operation in terms of
strided_slice/insert_strided_slice ops with intermediate
tuple uses (that should get optimimized away with typical
usage). This is done in a separate "pass" to enable testing
this particular rewriting in isolation.

Diff Detail

Event Timeline

aartbik created this revision.Jan 23 2020, 2:17 PM

I really like how this has evolved from the original point, it is almost good to go on my end.

mlir/lib/Dialect/VectorOps/VectorTransforms.cpp
774

drop trivial braces plz, this is the MLIR style.

798

So I really like what you're doing re exposing and classifying patterns by intention, other places in the codebase should also do that and document it: "this set of patterns is useful for X"

Now, the selection of patterns you chose to add is a bit trickier IMO and I think we should:

  1. name this populateVectorSlicesLoweringPatterns because it lowers out of these ops. Since it does not lower the type it is not a conversion so LoweringPatterns seems an appropriate name.
  2. insert all the extra patterns (all the necessary tuple stuff) that ensure the Insert/ExtractSlices indeed go away, otherwise it will be surprising that the VectorSlicesLoweringPatterns are not enough to lower the vector slices.
  3. explicitly list at the API doc level the patterns that are included (including the ) so people can easily look them up.

After we have enough of those, we will end up with pattern collections that implement behaviors.
This will have a granularity somewhere in between (1) individual patterns and (2) full transformations.
I expect this to be very powerful and independently testable like you do.

I am particularly sensitive to this in light of https://reviews.llvm.org/D73145 in which I could not break the phase ordering/dependence for now.

@rriddle what's your take on this?
Do you see a need / opportunity to have core infra support for collections of patterns and tests?

Side note:
So far I have shelved the debugging of why https://reviews.llvm.org/D73145 does not work with fused patterns but we need to resolve it.

mlir/test/lib/Transforms/TestVectorTransforms.cpp
41

Re pattern selection, it would be greate that populateVectorSlicesLoweringPatterns has everything it needs to make the test pass.
So concretely, this line should go away (and we should hunt other opportunities to improving other populateXXX methods by following your model).

oh yes and trivial braces everywhere plz, I just annotated one.

aartbik updated this revision to Diff 240039.Jan 23 2020, 4:05 PM
aartbik marked 6 inline comments as done.

addressed ntv's comments

aartbik added inline comments.Jan 23 2020, 4:09 PM
mlir/lib/Dialect/VectorOps/VectorTransforms.cpp
774

you would think I knew that by now, but old habits die hard....

798

Done all (renamed and added doc), except that we don't need any specific tuple stuff anymore! Dead tuples are removed by DCE (in the greedy rewriter at least) while get-tuples-on-tuples are folded away automatically as well!

mlir/test/lib/Transforms/TestVectorTransforms.cpp
41

It has. I simply overlooked this line. It is not needed!

Unit tests: fail. 62144 tests passed, 6 failed and 811 were skipped.

failed: MLIR.Dialect/VectorOps/vector-slices-transforms.mlir
failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: fail. 62151 tests passed, 5 failed and 811 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

mlir/lib/Dialect/VectorOps/VectorTransforms.cpp
798

Sorry I don't understand how the tuples get magically removed.
Both rewrites insert tuple/tuple_get but I do not see what patterns removes them concretely.
Maybe in these particular tests you wrote things degenerate into DCE (and that's great!), but I imagine it is possible to write tests where tuple ops will remain right?

800

Also, let's plz rename to XXXLowering given the naming argument above.

aartbik updated this revision to Diff 240271.Jan 24 2020, 12:34 PM
aartbik marked 2 inline comments as done.

renaming, added more comments

mlir/lib/Dialect/VectorOps/VectorTransforms.cpp
798

Well if of course does not guarantee that all tuple ops are eliminated (it only deals with slices ops rewriting), since tuple values may "leak" going in already. Take for example

func @extract_slices(%arg0: vector<3x3xf32>) -> tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>> {

%0 = vector.extract_slices %arg0, [2, 2], [1, 1]
  : vector<3x3xf32> into tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>>
return %0 : tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>>

}

this will be lowered to

func @extract_slices(%arg0: vector<3x3xf32>) -> tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>> {

  %0 = vector.strided_slice %arg0 {offsets = [0, 0], sizes = [2, 2], strides = [1, 1]} : vector<3x3xf32> to vector<2x2xf32>
  %1 = vector.strided_slice %arg0 {offsets = [0, 2], sizes = [2, 1], strides = [1, 1]} : vector<3x3xf32> to vector<2x1xf32>
  %2 = vector.strided_slice %arg0 {offsets = [2, 0], sizes = [1, 2], strides = [1, 1]} : vector<3x3xf32> to vector<1x2xf32>
  %3 = vector.strided_slice %arg0 {offsets = [2, 2], sizes = [1, 1], strides = [1, 1]} : vector<3x3xf32> to vector<1x1xf32>
  %4 = vector.tuple %0, %1, %2, %3 : vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>
  return %4 : tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>>
}

Here the tuple leaks, because there is no way to remove it. However, the lowering guarantees that any uses of slices where the tuple values are consumed, are lowered into something without tuples, even for the newly introduced operations.

I have added the comment on the lowering API a bit to reflect this.

aartbik updated this revision to Diff 240273.Jan 24 2020, 12:45 PM

added "leaking" tuple to test

Unit tests: fail. 62151 tests passed, 5 failed and 811 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: fail. 62151 tests passed, 5 failed and 811 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

aartbik updated this revision to Diff 240282.Jan 24 2020, 1:33 PM

rebase to master

Unit tests: pass. 62191 tests passed, 0 failed and 815 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Well my point here is that if you added the patterns to fold the tuples, you would do as good a job as you could in the absence of region/function boundaries.
Here you created an example that leaks in an unfixable way (because we have conciously decided to no lower the tuple<vector> type to LLVMIR and take the strong position that it must canonicalize/fold/DCE away).

Really the only thing I am trying to get is to either have:

  1. the pattern collection be this:
void mlir::vector::populateVectorSlicesLoweringPatterns(
    OwningRewritePatternList &patterns, MLIRContext *context) {
  patterns.insert<ExtractSlicesOpLowering, InsertSlicesOpLowering, TupleGetFolderOp, OtherRelevantTuplePatterns>(context);
}
  1. a good justification why that would be a bad idea.

The outcome I am looking for is an API that makes is easy and unsurprising to add patterns that guarantee Slices are lowered away (and it should be a "compiler bug" if they are remaining).
All this because we refuse to have a representation for tuple<vector<4x8x16x32xf32>.... > in LLVMIR (unless strong data comes to suggest this inuition is wrong).

Does this make sense?

Hmm. I would argue that the tuple discussion makes less sense in the context of this particular rewriting (since it really deals with getting rid of extract_slices/insert_slices using other slice ops). Cleaning up tuple uses is a nice bonus, but not a requirement, since tuples are really a part of the vector dialect to start with, as the leaking example shows. If you want something that blocks any tuple uses, I would argue that we need another pattern population for that, and run that at places where we need to guarantee they are gone (such as the lowering to LLVM, although the "legality" part takes care of it there already).

Other than this comment, I am not sure what you would like to see added in this CL.

nicolasvasilache accepted this revision.Jan 24 2020, 2:32 PM

ok, fair enough, thank you Aart!

This revision is now accepted and ready to land.Jan 24 2020, 2:32 PM

Note: I was probably overfitting on the issues I have with https://reviews.llvm.org/D73145 :)

This revision was automatically updated to reflect the committed changes.
rriddle added inline comments.Jan 24 2020, 4:35 PM
mlir/include/mlir/Dialect/VectorOps/VectorOps.h
51

I don't understand this comment. I don't see how this lowering removes tuples. The pattern driver performs some DCE, but I don't see how that is related to these patterns.

aartbik marked an inline comment as done.Jan 24 2020, 4:51 PM
aartbik added inline comments.
mlir/include/mlir/Dialect/VectorOps/VectorOps.h
51

Yes, this assumes the patterns are run through the greedy driver (or a similar driver that calls DCE and folds as part of the rewriting). I was trying to convey that rewriting of a typical extract_slices or insert_slices where the individual parts are always consumed somehow leaves no tuple behind. I can send out a follow-up CL with some rephrased comments to continue the discussion there.