Uses progressive lowering to convert vector.extract_slices and vector_insert_slices to equivalent vector operations that can be subsequently lowered into LLVM.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Unit tests: pass. 61804 tests passed, 0 failed and 781 were skipped.
clang-tidy: unknown.
clang-format: pass.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Please provide much more explanation in the commit message because the progressive lowering behavior may be unexpected for most people.
Nothing profound in my comments, just some things I think we should reorganize to keep our house in order.
This looks great, thanks Aart for pushing this forward!
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
899 | This looks more like a rewrite / folding pattern than a conversion to me (i.e. the types don't change). | |
900 | Well on second inspection you already have that in the VectorToVectorOps conversion. Structurally this is also an indication that VectorToLLVM should start including all the VectorToVector rewrites so that things compose properly. | |
910 | We should assert(operands[0].getDefiningOp is non-null. Please also make sure the message is informative enough to surface the context I just explained. At that point, the whole things should look like: assert(...); rewriter.replaceOp(cast<...>); return matchSuccess(); i.e. no need to dyn_cast + check, this is guaranteed by the pattern rewrite infra. | |
919 | Doc please, with before / after IR. | |
929 | Yes that is hacky indeed :) You could just return failure if the uses are not empty: if (!op->getResult()->use_empty()) return matchFailure(); The pattern will fail to apply until all canonicalizations occured at which point it will apply. | |
952 | Very cool, yay intuition :) ! Same comments as above apply still: bit more doc with before/after IR. Also, this should go through VectorToVector patterns and be included in the VectorToLLVM patterns (e.g. what if the target is not LLVM, some of our friends at Xilinx seem to be in this case for example). | |
976 | Strides is probably become overloaded, maybe we should rename strided_slice and insert_strided_slice as Insert/ExtractSlice and these ops as Insert/ExtractSliceTuple or something along those lines? In any case, we should have help functions for computing this. | |
987 | We should have help functions for computing this. | |
1007 | Great, hooray for progressive lowering and pattern/canonicalization compositions ! | |
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
1701 ↗ | (On Diff #238363) | This can be called directly from other getXXXPatterns that want to use it, no need to duplicate the pattern. |
mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir | ||
548 | This should be made into a VectorToVector test, that more directly tests "extract_slices" -> "strided_slices" + the removal of the tuple op. |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
899 | Agreed in isolation, but since we introduce vector.tuple in other rewritings, I really need a ConversionPattern here, so that I see the newly introduced operands. | |
900 | I avoided the duplication for now by just having the lowering change. Once we have a proper vector2vector rewriting, we can move it into folding again and populate. | |
929 | Two complications I added to the comments. For newly introduced tuples, neither the op nor the op.getResult use_empty call is up-to-date (it is always empty). However, since this is a lowering pass, such new nodes need to be legalized anyway, so removing them is the right way. We still could assert if our rules somehow introduce a tuple we don't consume. | |
952 | It sounds like we really need to move some of this into a vector to vector pass. That will also avoid some of the complications above. | |
976 | I see similar TODOs elsewhere, so let's postpone that into one CL later. | |
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
1701 ↗ | (On Diff #238363) | For now I do everything in lowering, leaving proper interaction with folding/canonicalization for later. |
mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir | ||
548 | I agree. Once we have a proper vector to vector pass, we should test this (and others) in intermediate stages too. |
Unit tests: pass. 61909 tests passed, 0 failed and 782 were skipped.
clang-tidy: unknown.
clang-format: pass.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
940 | I was under the impression we may introduce that in the future. Rephrased this a bit. |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
976 | I think with the changes and proto we discussed offline, this will change significantly before landing? Marking as "request changes" until we see the new E2E impl. |
PTAL
This new approach directly uses the canonicalization for vector.tuple_get on a tuple (so no need to dup the rule!). Furthermore, by using the vector to vector rewriting prior to the lowering, the introduction of a vector tuple is completely transparent, since subsequent rewriting removes them again. An additional advantage is that we get DCE for free, so need to introduce a DCE for just vector.tuple. The introduction of DCE forced me to fix a few test cases with dead code (good to do that anyway).
Note, since this code is much more structured, I also introduced vector.insert_slices in addition to vector_extract_slices, so that the final mechanism becomes more apparent.
Unit tests: pass. 62017 tests passed, 0 failed and 783 were skipped.
clang-tidy: unknown.
clang-format: pass.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Unit tests: pass. 62017 tests passed, 0 failed and 783 were skipped.
clang-tidy: unknown.
clang-format: pass.
Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
---|---|---|
1699 ↗ | (On Diff #239447) | This should just be in the 'fold' method instead of a canonicalization pattern. |
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
---|---|---|
1699 ↗ | (On Diff #239447) | 'fold' should be used whenever possible because it is applicable in many more places, e.g. during dialect conversion and OpBuilder::createOrFold. |
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
---|---|---|
1699 ↗ | (On Diff #239447) | Also, please split this out into a different revision as this is unrelated. |
sounds good, one question though
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
---|---|---|
1699 ↗ | (On Diff #239447) | I followed the other "canonicalization" patterns in VectorOps. Just for my own understanding, should these strictly speaking have been written as folders too (the name "Folder" seems to imply that)? |
sounds good, one question though
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
---|---|---|
1699 ↗ | (On Diff #239447) | Other question, isn't fold for constants only? Here the tuple itself does not need to be constant, the only thing that matters is that get-on-tuple becomes a pass through of the value at the corresponding index? |
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
1072 | We need a TODO to move this code into a VectorOpsUtils class, we have various versions of it floating around in VectorOps.cp and VectorTransforms.cpp |
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
---|---|---|
1699 ↗ | (On Diff #239447) | fold has a slightly different contract than canonicalization patterns, and can be used (generally) for the following:
So with that being said, if the canonicalization relies on creating new operations then it must use a pattern. |
Hold off on reviewing this CL, since it will be broken up in three.
The final remaining part for the lowering will be super small :=)
mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp | ||
---|---|---|
1072 | The TODO is already there, see line 1038 |
mlir/lib/Dialect/VectorOps/VectorOps.cpp | ||
---|---|---|
1699 ↗ | (On Diff #239447) | In particular, it was not immediately apparent to me that we *always* call fold(), passing in constants as arguments when available, but also allowing access to the non-constant operands through the regular getters (giving it potentially "two sources of truth"). Now that I have working, it makes sense, but a bit more API doc would have been helpful. |
PTAL (note, this will not build yet, but it shows how small this CL has become by defining the progressive lowering in a separate patterns)
Unit tests: unknown.
clang-tidy: fail. clang-tidy found 2 errors and 0 warnings. 0 of them are added as review comments below (why?).
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
\o/ yeah, it has been a long road, but PTAL
this CL is much, much smaller now; extra test added, and rewritten some existing tests to ensure code is not dead on entry
Unit tests: pass. 62194 tests passed, 0 failed and 815 were skipped.
clang-tidy: pass.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
This looks more like a rewrite / folding pattern than a conversion to me (i.e. the types don't change).
I would prob go for a folding pattern in VectorOps.cpp but we also need to make sure it is accessible and
In any case please document with a before / after IR example.