This commit adds a pattern to perform constant folding on linalg
generic ops which are essentially transposes. We see real cases
where model importers may generate such patterns.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | ||
---|---|---|
934 ↗ | (On Diff #375432) | Will review this, but I am not sure that this goes in as a general canonicalization pattern. This can increase your memory footprint by quite a lot. You could have same constant one with and without transpose. |
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | ||
---|---|---|
868 ↗ | (On Diff #375432) | THis is a heuristic. I would rather use a control function and allow the caller to decide when this should be done. Also I would move this out of canonicalization and move it into the ElementwiseOpsFusion patterns where all the patterns already use such a control function. |
Address comments
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp | ||
---|---|---|
868 ↗ | (On Diff #375432) | Done. |
934 ↗ | (On Diff #375432) | The logic checks only one user and the # of elements stays the same. So it should not increase/decrease memory footprint at runtime. It can for the compiler but that's due to the fact that all attributes are never released in MLIR. But not relevant anymore given now the decision is deferred to the caller's policy. |
Thanks for modifications. Some more changes here will help land something really nice!
This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is
- Use the affine map + linearization to get the index for each input at every point in the iteration space.
- "interpret" the body of the linalg op to get the final yield value.
The space of ops to interpret can be large, but I think the code here could be restructured to setup those things so that the generalization of this is just adding interpretation for different operations. So we can start with just the transpose, and generalizing that would be straight-forward.
Also note that this doesnt have to anchor on Generic ops. This could work for any LinalgOp. All LinalgOps have a region. So you could just use that.
mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp | ||
---|---|---|
1332 | If I am reading this right, this is going to create a new Attribute for every element of the constant. Thats probably going to blow up the compilation time. Can we create the container that the DenseElementsAttr can hold directly instead of first creating an attribute for each value and then using that to create the DenseElementsAttr by individually extracting the value of each attribute? |
Do you have a concrete real-world pattern that would fit into this category? My approach is generally to avoid abstracting too early because YAGNI. :) Without concrete examples, it's also hard to see how much we can reuse here. The checking over input/output type, indexing maps, and region contents are likely all specific to the particular pattern. It looks the indexing linearization and de-linearization might be reused? Not exactly sure how we should best modularize it though. Maybe some utility functions would be fine, or a function taking functors, or templates..
Anyway, it looks we should defer this until we see another concrete real-world example.
This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is
- Use the affine map + linearization to get the index for each input at every point in the iteration space.
- "interpret" the body of the linalg op to get the final yield value.
Talking generally, it would be nice to have a constexpr like mode in MLIR. Effectively what we want is const folding the region content at compile time. Simply pattern matching the region content can become unwieldy quickly.
mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp | ||
---|---|---|
1332 | Right. I've changed to avoid creating Attributes all the time. Holding different element types in the same attribute is always hard to handle, but at least APInt/APFloat provides some common abstraction to avoid a full blown switch among all possible int/float bitwdiths. |
I am not asking to generalize right now, but rather restructure the code so that it can handle such cases (not sure how to answer "real world pattern". I can give you an example that is real world that is also something I made up). The current implementation is very transpose specific, and then this will eitehr have to removed/reworked significantly even for slight variances. Here maybe are some more concrete changes that might help
- For now check that the op has single input and a single output with permutation maps (this is how it is right now).
- Instead of just checking for the body being a linalg.yield, just iterate over the body with a map from Value -> {int64_t/float}. Then you have a dispatch to "evaluate" each instruction to update this map. Finally you just return the value the yield is mapped to. Today this dispatch could have no statements, and it just returns the yield.
This is a minor restructuring that could be left as a placeholder for people if this needs to be extended AFAICS.
Anyway, it looks we should defer this until we see another concrete real-world example.
This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is
- Use the affine map + linearization to get the index for each input at every point in the iteration space.
- "interpret" the body of the linalg op to get the final yield value.
Talking generally, it would be nice to have a constexpr like mode in MLIR. Effectively what we want is const folding the region content at compile time. Simply pattern matching the region content can become unwieldy quickly.
mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp | ||
---|---|---|
1335 | Nit: Some comments over the reasoning here would be good as a reference , i.e. state that this is done to save compile time, etc. |
Okay, I basically rewrite the patterns to extract out a common base that can support N inputs, 1 output, all permutation maps. Hopefully this abstraction is good for future cases .
Thanks! We can iterate on this if we need to.
mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp | ||
---|---|---|
1329 | nit: s/require all permutation maps for now./require all permutation maps for now to be permutations. | |
1343 | Dont see what this check is really doing. The actual implementation below is always true. |
mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp | ||
---|---|---|
1343 | It's meant for subclasses to further check the indexing maps. For the only subclass we have, it checks return genericOp.getIndexingMaps().size() == 2;. That makes sure we have 1 input and 1 output. It can be false for cases where we have more inputs (which is allowed in the base class). |
nit: s/require all permutation maps for now./require all permutation maps for now to be permutations.