This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Constant fold linalg.generic that are transposes
ClosedPublic

Authored by antiagainst on Sep 27 2021, 4:21 PM.

Details

Summary

This commit adds a pattern to perform constant folding on linalg
generic ops which are essentially transposes. We see real cases
where model importers may generate such patterns.

Event Timeline

antiagainst created this revision.Sep 27 2021, 4:21 PM
antiagainst requested review of this revision.Sep 27 2021, 4:21 PM
mravishankar requested changes to this revision.Sep 28 2021, 11:30 AM
mravishankar added inline comments.
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
934 ↗(On Diff #375432)

Will review this, but I am not sure that this goes in as a general canonicalization pattern. This can increase your memory footprint by quite a lot. You could have same constant one with and without transpose.

This revision now requires changes to proceed.Sep 28 2021, 11:30 AM
mravishankar added inline comments.Sep 28 2021, 11:33 AM
mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
868 ↗(On Diff #375432)

THis is a heuristic. I would rather use a control function and allow the caller to decide when this should be done.

Also I would move this out of canonicalization and move it into the ElementwiseOpsFusion patterns where all the patterns already use such a control function.

antiagainst marked 2 inline comments as done.

Address comments

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
868 ↗(On Diff #375432)

Done.

934 ↗(On Diff #375432)

The logic checks only one user and the # of elements stays the same. So it should not increase/decrease memory footprint at runtime. It can for the compiler but that's due to the fact that all attributes are never released in MLIR.

But not relevant anymore given now the decision is deferred to the caller's policy.

mravishankar requested changes to this revision.Sep 30 2021, 11:11 AM

Thanks for modifications. Some more changes here will help land something really nice!

This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is

  1. Use the affine map + linearization to get the index for each input at every point in the iteration space.
  2. "interpret" the body of the linalg op to get the final yield value.

The space of ops to interpret can be large, but I think the code here could be restructured to setup those things so that the generalization of this is just adding interpretation for different operations. So we can start with just the transpose, and generalizing that would be straight-forward.
Also note that this doesnt have to anchor on Generic ops. This could work for any LinalgOp. All LinalgOps have a region. So you could just use that.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1332

If I am reading this right, this is going to create a new Attribute for every element of the constant. Thats probably going to blow up the compilation time. Can we create the container that the DenseElementsAttr can hold directly instead of first creating an attribute for each value and then using that to create the DenseElementsAttr by individually extracting the value of each attribute?

This revision now requires changes to proceed.Sep 30 2021, 11:11 AM
antiagainst marked an inline comment as done.

Use APInt/APFloat instead of Attribute

Thanks for modifications. Some more changes here will help land something really nice!

Do you have a concrete real-world pattern that would fit into this category? My approach is generally to avoid abstracting too early because YAGNI. :) Without concrete examples, it's also hard to see how much we can reuse here. The checking over input/output type, indexing maps, and region contents are likely all specific to the particular pattern. It looks the indexing linearization and de-linearization might be reused? Not exactly sure how we should best modularize it though. Maybe some utility functions would be fine, or a function taking functors, or templates..

Anyway, it looks we should defer this until we see another concrete real-world example.

This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is

  1. Use the affine map + linearization to get the index for each input at every point in the iteration space.
  2. "interpret" the body of the linalg op to get the final yield value.

Talking generally, it would be nice to have a constexpr like mode in MLIR. Effectively what we want is const folding the region content at compile time. Simply pattern matching the region content can become unwieldy quickly.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1332

Right. I've changed to avoid creating Attributes all the time. Holding different element types in the same attribute is always hard to handle, but at least APInt/APFloat provides some common abstraction to avoid a full blown switch among all possible int/float bitwdiths.

Thanks for modifications. Some more changes here will help land something really nice!

Do you have a concrete real-world pattern that would fit into this category? My approach is generally to avoid abstracting too early because YAGNI. :) Without concrete examples, it's also hard to see how much we can reuse here. The checking over input/output type, indexing maps, and region contents are likely all specific to the particular pattern. It looks the indexing linearization and de-linearization might be reused? Not exactly sure how we should best modularize it though. Maybe some utility functions would be fine, or a function taking functors, or templates..

I am not asking to generalize right now, but rather restructure the code so that it can handle such cases (not sure how to answer "real world pattern". I can give you an example that is real world that is also something I made up). The current implementation is very transpose specific, and then this will eitehr have to removed/reworked significantly even for slight variances. Here maybe are some more concrete changes that might help

  1. For now check that the op has single input and a single output with permutation maps (this is how it is right now).
  2. Instead of just checking for the body being a linalg.yield, just iterate over the body with a map from Value -> {int64_t/float}. Then you have a dispatch to "evaluate" each instruction to update this map. Finally you just return the value the yield is mapped to. Today this dispatch could have no statements, and it just returns the yield.

This is a minor restructuring that could be left as a placeholder for people if this needs to be extended AFAICS.

Anyway, it looks we should defer this until we see another concrete real-world example.

This has some of the pieces needed for the general implementation of not just transpose but all elementwise operation. All you need for the next step here is

  1. Use the affine map + linearization to get the index for each input at every point in the iteration space.
  2. "interpret" the body of the linalg op to get the final yield value.

Talking generally, it would be nice to have a constexpr like mode in MLIR. Effectively what we want is const folding the region content at compile time. Simply pattern matching the region content can become unwieldy quickly.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1335

Nit: Some comments over the reasoning here would be good as a reference , i.e. state that this is done to save compile time, etc.

antiagainst marked an inline comment as done.

Address comments - rewrite the pattern to have a common base

I am not asking to generalize right now, but rather restructure the code so that it can handle such cases (not sure how to answer "real world pattern". I can give you an example that is real world that is also something I made up). The current implementation is very transpose specific, and then this will eitehr have to removed/reworked significantly even for slight variances. Here maybe are some more concrete changes that might help

  1. For now check that the op has single input and a single output with permutation maps (this is how it is right now).
  2. Instead of just checking for the body being a linalg.yield, just iterate over the body with a map from Value -> {int64_t/float}. Then you have a dispatch to "evaluate" each instruction to update this map. Finally you just return the value the yield is mapped to. Today this dispatch could have no statements, and it just returns the yield.

This is a minor restructuring that could be left as a placeholder for people if this needs to be extended AFAICS.

Okay, I basically rewrite the patterns to extract out a common base that can support N inputs, 1 output, all permutation maps. Hopefully this abstraction is good for future cases .

mravishankar accepted this revision.Oct 7 2021, 3:51 PM

Thanks! We can iterate on this if we need to.

mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1329

nit: s/require all permutation maps for now./require all permutation maps for now to be permutations.

1343

Dont see what this check is really doing. The actual implementation below is always true.

This revision is now accepted and ready to land.Oct 7 2021, 3:51 PM
antiagainst marked 2 inline comments as done.Oct 8 2021, 5:08 AM
antiagainst added inline comments.
mlir/lib/Dialect/Linalg/Transforms/ElementwiseOpFusion.cpp
1343

It's meant for subclasses to further check the indexing maps. For the only subclass we have, it checks return genericOp.getIndexingMaps().size() == 2;. That makes sure we have 1 input and 1 output. It can be false for cases where we have more inputs (which is allowed in the base class).

This revision was automatically updated to reflect the committed changes.
antiagainst marked an inline comment as done.