Rewrites the extract/insert_slices operation in terms of
strided_slice/insert_strided_slice ops with intermediate
tuple uses (that should get optimimized away with typical
usage). This is done in a separate "pass" to enable testing
this particular rewriting in isolation.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I really like how this has evolved from the original point, it is almost good to go on my end.
mlir/lib/Dialect/VectorOps/VectorTransforms.cpp | ||
---|---|---|
774 | drop trivial braces plz, this is the MLIR style. | |
798 | So I really like what you're doing re exposing and classifying patterns by intention, other places in the codebase should also do that and document it: "this set of patterns is useful for X" Now, the selection of patterns you chose to add is a bit trickier IMO and I think we should:
After we have enough of those, we will end up with pattern collections that implement behaviors. I am particularly sensitive to this in light of https://reviews.llvm.org/D73145 in which I could not break the phase ordering/dependence for now. @rriddle what's your take on this? Side note: | |
mlir/test/lib/Transforms/TestVectorTransforms.cpp | ||
41 | Re pattern selection, it would be greate that populateVectorSlicesLoweringPatterns has everything it needs to make the test pass. |
mlir/lib/Dialect/VectorOps/VectorTransforms.cpp | ||
---|---|---|
774 | you would think I knew that by now, but old habits die hard.... | |
798 | Done all (renamed and added doc), except that we don't need any specific tuple stuff anymore! Dead tuples are removed by DCE (in the greedy rewriter at least) while get-tuples-on-tuples are folded away automatically as well! | |
mlir/test/lib/Transforms/TestVectorTransforms.cpp | ||
41 | It has. I simply overlooked this line. It is not needed! |
Unit tests: fail. 62144 tests passed, 6 failed and 811 were skipped.
failed: MLIR.Dialect/VectorOps/vector-slices-transforms.mlir failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp
clang-tidy: pass.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
Unit tests: fail. 62151 tests passed, 5 failed and 811 were skipped.
failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp
clang-tidy: pass.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
mlir/lib/Dialect/VectorOps/VectorTransforms.cpp | ||
---|---|---|
798 | Sorry I don't understand how the tuples get magically removed. | |
800 | Also, let's plz rename to XXXLowering given the naming argument above. |
renaming, added more comments
mlir/lib/Dialect/VectorOps/VectorTransforms.cpp | ||
---|---|---|
798 | Well if of course does not guarantee that all tuple ops are eliminated (it only deals with slices ops rewriting), since tuple values may "leak" going in already. Take for example func @extract_slices(%arg0: vector<3x3xf32>) -> tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>> { %0 = vector.extract_slices %arg0, [2, 2], [1, 1] : vector<3x3xf32> into tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>> return %0 : tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>> } this will be lowered to func @extract_slices(%arg0: vector<3x3xf32>) -> tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>> { %0 = vector.strided_slice %arg0 {offsets = [0, 0], sizes = [2, 2], strides = [1, 1]} : vector<3x3xf32> to vector<2x2xf32> %1 = vector.strided_slice %arg0 {offsets = [0, 2], sizes = [2, 1], strides = [1, 1]} : vector<3x3xf32> to vector<2x1xf32> %2 = vector.strided_slice %arg0 {offsets = [2, 0], sizes = [1, 2], strides = [1, 1]} : vector<3x3xf32> to vector<1x2xf32> %3 = vector.strided_slice %arg0 {offsets = [2, 2], sizes = [1, 1], strides = [1, 1]} : vector<3x3xf32> to vector<1x1xf32> %4 = vector.tuple %0, %1, %2, %3 : vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32> return %4 : tuple<vector<2x2xf32>, vector<2x1xf32>, vector<1x2xf32>, vector<1x1xf32>> } Here the tuple leaks, because there is no way to remove it. However, the lowering guarantees that any uses of slices where the tuple values are consumed, are lowered into something without tuples, even for the newly introduced operations. I have added the comment on the lowering API a bit to reflect this. |
Unit tests: fail. 62151 tests passed, 5 failed and 811 were skipped.
failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp
clang-tidy: pass.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
Unit tests: fail. 62151 tests passed, 5 failed and 811 were skipped.
failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp
clang-tidy: pass.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
Unit tests: pass. 62191 tests passed, 0 failed and 815 were skipped.
clang-tidy: pass.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
Well my point here is that if you added the patterns to fold the tuples, you would do as good a job as you could in the absence of region/function boundaries.
Here you created an example that leaks in an unfixable way (because we have conciously decided to no lower the tuple<vector> type to LLVMIR and take the strong position that it must canonicalize/fold/DCE away).
Really the only thing I am trying to get is to either have:
- the pattern collection be this:
void mlir::vector::populateVectorSlicesLoweringPatterns( OwningRewritePatternList &patterns, MLIRContext *context) { patterns.insert<ExtractSlicesOpLowering, InsertSlicesOpLowering, TupleGetFolderOp, OtherRelevantTuplePatterns>(context); }
- a good justification why that would be a bad idea.
The outcome I am looking for is an API that makes is easy and unsurprising to add patterns that guarantee Slices are lowered away (and it should be a "compiler bug" if they are remaining).
All this because we refuse to have a representation for tuple<vector<4x8x16x32xf32>.... > in LLVMIR (unless strong data comes to suggest this inuition is wrong).
Does this make sense?
Hmm. I would argue that the tuple discussion makes less sense in the context of this particular rewriting (since it really deals with getting rid of extract_slices/insert_slices using other slice ops). Cleaning up tuple uses is a nice bonus, but not a requirement, since tuples are really a part of the vector dialect to start with, as the leaking example shows. If you want something that blocks any tuple uses, I would argue that we need another pattern population for that, and run that at places where we need to guarantee they are gone (such as the lowering to LLVM, although the "legality" part takes care of it there already).
Other than this comment, I am not sure what you would like to see added in this CL.
Note: I was probably overfitting on the issues I have with https://reviews.llvm.org/D73145 :)
mlir/include/mlir/Dialect/VectorOps/VectorOps.h | ||
---|---|---|
51 | I don't understand this comment. I don't see how this lowering removes tuples. The pattern driver performs some DCE, but I don't see how that is related to these patterns. |
mlir/include/mlir/Dialect/VectorOps/VectorOps.h | ||
---|---|---|
51 | Yes, this assumes the patterns are run through the greedy driver (or a similar driver that calls DCE and folds as part of the rewriting). I was trying to convey that rewriting of a typical extract_slices or insert_slices where the individual parts are always consumed somehow leaves no tuple behind. I can send out a follow-up CL with some rephrased comments to continue the discussion there. |
I don't understand this comment. I don't see how this lowering removes tuples. The pattern driver performs some DCE, but I don't see how that is related to these patterns.