This help expose more fusion opportunities.
Details
Diff Detail
Event Timeline
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
---|---|---|
935 | We had discussed a bunch of times in the past to move reshapes towards either function entry or return depending on whether it is an expanding or contracting reshape to make the "reshape"-free block of ops higher-dimensional. I thought @mravishankar had implemented some of those already. Could you please comment on how this overlaps with, complements or extends what (I think) exists? |
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
---|---|---|
935 | This is something we discussed with @mravishankar on discord. My understanding is that what has been done so far works to fuse reshape into elementwise when the conversion increase the rank. I believe there is no solution for the case where we need to reduce the rank of the generic op. Here I always try to sink the reshape because named op are fused with generic users so moving the reshape down allows removing the reshape between named ops and and its users. Let me know if you think we should discuss this more. |
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
---|---|---|
935 | ok, it seems I mistook discussions that turned into code with wishful thinking :) |
Very cool, nice to see this line of work taking shape (pun intended :) )!
Feel free to ignore the refactoring comments for now if that sounds too painful, we'll have a bunch of refactoring and helper functions to add anyway once the various reshapes ops are rebalanced.
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h | ||
---|---|---|
109 | "Sinking an op" involves a notion of loop nest in my mind. | |
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
1002 | Note: we discussed with @pifon2a and we are going to do some refactoring on the various reshape ops. The rationale is that maps were provisioned to allow more advanced reindexings than just permutations but in practice this is overkill and will be hard to use: composition with other ops will keep it simpler. | |
1020 | typo inserting. | |
1022 | this looks like it would be a good helper function too on reshape op. | |
1036 | Should be refactored somewhere as helpers on linalg.reshape/linalg.generic ? |
Initial comments. Will do a detailed review soon
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
---|---|---|
935 | Sorry, I missed this. Will take a look at this. But at the outset this is the converse of the existing pattern that expands generic/indexed_generic to fold away reshapes. It always makes sense to expand dimensions and execute in higher dimensionality when possible. Once that is done, we need to the converse. You reduce the dimensionality of generic/indexed_generic operations to fold away more reshapes. So both are I needed. Either way, I think there is a lot of opportunity to reuse/extend the logic used for the expansion mode here, and this looks like a rewrite/replication of that functionality. Maybe there is something missing in the above, would be good to unify these. |
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
---|---|---|
967 | This should work for IndexedGeneric too right? I would suggest moving the actual logic into a function which takes LinalgOp as an argument (as is done for other functions here) | |
971 | Better to check if (!op.hasTensorSemantics() || op.getNumParallelLoops() == op.getNumLoops()) return failure(); | |
995 | Kind of a very specific check. Actually would be better off separting the core logic of fusing a linalg.generic/indexed_generic operation with the linalg.tensor_reshape from heuristics that decide when to do the fusion. This check is better off in the pattern itself, but outside of the core transformation. | |
1007 | I am confused why there is a mergedDim and a removedDim. All dims are merged into another dim during collapse. | |
1036 | I think similar comment here. You can just use the constructor with SmallVector<ReassociationIndices> | |
1046 | Dont need to use the maps. You can just use the build method with SmallVector<ReassociationIndices> directly. | |
1064 | Mentioned this offline too, but noting it here. AFAIK, the build method used here is for the case where the source is collapsed to get the dest. Here the src is expanded to get the dest. So need to provide the output type specifically. | |
1424 | Nit: Would rename this to populateFoldReshapeOpsByCollapsingPatterns (its the dual of the populateFoldReshapeOpsByExpansionPatterns | |
mlir/test/Dialect/Linalg/fusion-sink-reshape.mlir | ||
100 ↗ | (On Diff #338221) | Add a dynamic shape test too? |
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h | ||
---|---|---|
109 | Changed the naming to push. | |
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
967 | Good point, I made it a template. | |
995 | This is just because I'm not inserting reshape for other operands so just checking that all the reshape operands match. As discuss I can extend it in a separate patch. | |
1002 | It does seem much simpler. I changed the code to be based SmallVector<ReassociationIndices> (as suggested by Mahesh below) which is basically a vector<vector<int>>. It should be easy to switch in the future. | |
1007 | This was a bit confusing indeed. This was to be able to figure out which dimensions was left after merging. I simplified this logic now everything is based on the reassociation indices as the reverse map so this should be much easier to understand. | |
1022 | I removed this code and use the reshape builder instead. | |
1424 | It doesn't really fold the reshape so I'm not sure about calling it fold. I renamed to push but I can change it again if you think there is a better name. | |
mlir/test/Dialect/Linalg/fusion-sink-reshape.mlir | ||
100 ↗ | (On Diff #338221) | Added a dynamic dim in the first test. |
Logic looks fine. Just some points about code structure. Thanks and sorry for nitpicking a bit. Trying to keep a consistent flow across patterns in this file.
mlir/lib/Dialect/Linalg/Transforms/FusionOnTensors.cpp | ||
---|---|---|
967 | I'd still like it refactored though. THe core logic of this has nothing specific to Generic or IndexedGeneric. So you can move the core logic to a method fuseWithReshapeByCollapsing(LinalgOp linalgOp, TensorReshapeOp reshapeOp, unsigned fusedTensorIndex, PatternRewriter &rewriter) similar to fuseWithReshapeByExpansion above. THere is a separate method isFusable* above that checks all the conditions for the fusion to be valid. So the approach has been bool isFusableWithReshapeByCollapsing(LinalgOp linalgOp, TensorReshapeOp reshapeOp, ...) { ... } static Optional<SmallVector<Value, 1>> fuseWithReshapeByCollapsing(...) { ...} template <typename GenericOpTy> struct FuseWithReshapeByCollapsing : OpRewritePattern<GenericOpTy> { LogicalResult matchAndRewrite(GenericOpTy genericOp, PatternRewriter &rewriter) const override { if (!isFusableWithReshapeByCollapsing(...)) return failure(); auto replacement = fuseWithReshapeByCollapsing(cast<LinalgOp>(genericOp.getOperation(), reshapeOp); if( !replacement) return failure(); rewriter.replaceOp(genericOp, (*replacement)[0]); return success(); } }; | |
999 | I dont think this is necessray. You can stop on the first TensorReshapeOp you find that is fusable by collapsing (see the use of isFusable... method above). You implement the fusion for that reshape -> generic op. If there are other reshape ops, they will be checked independent of the first match. It should still compose this way and this extra logic is not needed. |
After offlien conversation with Thomas, I realize I was missing one aspect of checking that all reshapes need to be folded at the same time here as opposed to other patterns where each reshape could be folded at a time (this is because the pattern does not introduce new reshapes). So LGTM this for now. Can revisit this later.
"Sinking an op" involves a notion of loop nest in my mind.
I'd prefer terminology such as "push" towards block end / "pull" towards block begin; other suggestions welcome.