This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][LinAlg] Start detensoring implementation.
ClosedPublic

Authored by ergawy on Feb 8 2021, 8:51 AM.

Details

Summary

This commit is the first baby step towards detensoring in
linalg-on-tensors.

Detensoring is the process through which a tensor value is convereted to one
or potentially more primitive value(s). During this process, operations with
such detensored operands are also converted to an equivalen form that works
on primitives.

The detensoring process is driven by linalg-on-tensor ops. In particular, a
linalg-on-tensor op is checked to see whether *all* its operands can be
detensored. If so, those operands are converted to thier primitive
counterparts and the linalg op is replaced by an equivalent op that takes
those new primitive values as operands. Therefore, the detensoring process
can be divided into 2 main logical phases:

  1. Detect/match an op that can be detensored.
  2. Detensor the operands of the op and replace it with a primitive equivalent.

These 2 logical phases are implemented by LinalgDetensoringPattern
which is documented in-place below.

This works towards handling github/google/iree#1159.

Diff Detail

Event Timeline

ergawy created this revision.Feb 8 2021, 8:51 AM
ergawy requested review of this revision.Feb 8 2021, 8:51 AM
silvas added a subscriber: silvas.Feb 8 2021, 11:33 AM

I would like to see a bit more of the pass that actually uses these patterns (e.g. the cost model).

Also, I left a few comments about the mechanics here.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
981 ↗(On Diff #322127)

Why this template?

994 ↗(On Diff #322127)

there should already be a hasTensorSemantics helper.

1048 ↗(On Diff #322127)

This should already be covered by existing canonicalizations.

Can you try using {ExtractOp,FromElementsOp}::getCanonicalizationPatterns

Thanks for pushing on this!
Some first comments.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
942 ↗(On Diff #322127)

equivalent

947 ↗(On Diff #322127)

their

960 ↗(On Diff #322127)

considered

964 ↗(On Diff #322127)

Please mention here that for now only 1-D tensors are supported.

988 ↗(On Diff #322127)

For this type of pattern, you might as well just use MatchAnyOpTypeTag for the improved flexibility.
Then the pattern impl can also live in the .cpp.
What you are doing here should generally work for any LinalgOp in general.

998 ↗(On Diff #322127)

Please merge this check with the following using getShapedOperandTypes() (that you can add to mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOpsInterface.td)

1010 ↗(On Diff #322127)

always use OpBuilder::InsertionGuard g(rewriter) before any setInsertionPoint.

1016 ↗(On Diff #322127)

If you go for a generalization, some LinalgOp may not have a body right now (but have a bodyBuilder).
This will change soon once a few things land.
No need to do anything for now, just mentioning in case you wanted to generalize the applicability of your pattern.

1019 ↗(On Diff #322127)

This would probably read better as something that resembles:

auto extracts = llvm::to_vector<4>(llvm::map_range(
  linalgOp.getInputOperands(), [&](Value v){ return rewriter.create(...); }
)); // use a loop that pushes_back if you prefer.
tensorToDetensoredOperandMapping.map(linalgOp.getInputOperands(), extracts);
1048 ↗(On Diff #322127)

+1, all these are canonicalizations, if some are missing they need to be added in the proper places.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
654 ↗(On Diff #322127)

If we have a 1-D tensor of 1e9 elements, is this going to unconditionally blow up everything?
We prob want some safeguard on the total number of elements that can be unrolled ?

ergawy updated this revision to Diff 322319.Feb 9 2021, 2:18 AM
ergawy marked 14 inline comments as done.

Handle review comments.

ergawy added a comment.Feb 9 2021, 2:19 AM

Thanks for your review @silvas and @nicolasvasilache. Addressed your comments and looking further into how this can be generalized and used.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
981 ↗(On Diff #322127)

It can go away for sure. Removed.

1016 ↗(On Diff #322127)

Thanks for the heads-up. I certainly hope to generalize this as much as possible :).

1048 ↗(On Diff #322127)

Thanks for pointing that out, canonicalization does the job indeed.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
654 ↗(On Diff #322127)

I think it makes sense to explicitly test for the number of elements in supported scenarios. For example, for now we explicitly check that tensorType.getNumElements() == 1 and add more sizes in the future (i.e. relax the condition a little bit more). I guess this makes it more difficult to unintentionally try to detensor some value not meant to be detensored.

I added such check, let me know if you prefer a more general cutoff condition from the get-go.

ergawy edited the summary of this revision. (Show Details)Feb 9 2021, 11:02 PM
nicolasvasilache accepted this revision.Feb 9 2021, 11:31 PM
nicolasvasilache added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
654 ↗(On Diff #322127)

This will probably end up as an argument to the pattern when we start using it more generally.
My only concern was to blow up the compilation process on mistakes, this is fine now.

This revision is now accepted and ready to land.Feb 9 2021, 11:31 PM
ergawy added a comment.EditedFeb 10 2021, 3:28 AM

Thanks for the approval.

Just to make sure we are properly aligned on the next steps. I started now on pushing the implementation further. This is mainly driven by these 2 examples:

Of course, after lowering them to linalg through: mlir-hlo-opt <example> -mhlo-legalize-control-flow -hlo-legalize-to-linalg.

For this we need to support the 0D case which comes with its own set of challenges I am trying to solve at the moment. We can also adapt the examples to work with <1x*> tensors but I think support for 0D is needed anyway.

I can either (1) commit this patch now since Nicolas already approved or (2) wait till I open a new patch (or patches) (on top of this one) with support for more advanced examples like the above ones.

If you don't have a preference then I will take the approval as: merge this patch please :D.

Adding an assertion that fails now for the 0-D case and iterating on it in the next revision SGTM.

@silvas Sorry forgot to reply to this point:

I would like to see a bit more of the pass that actually uses these patterns (e.g. the cost model).

Do you mean something like an analysis that computes the benefit of detensoring for a certain operation (for some definition of benefit, I didn't think this through yet)?

If so, then that can be step number 3. Step 2, as I mentioned in my previous comment is supporting detensoring on more complex CFGs like the while loops I linked in the comment.

Is that reasonable?

@silvas Sorry forgot to reply to this point:

I would like to see a bit more of the pass that actually uses these patterns (e.g. the cost model).

Do you mean something like an analysis that computes the benefit of detensoring for a certain operation (for some definition of benefit, I didn't think this through yet)?

If so, then that can be step number 3. Step 2, as I mentioned in my previous comment is supporting detensoring on more complex CFGs like the while loops I linked in the comment.

Is that reasonable?

Concretely, I would suggest:

  1. Write a pass that, given a function, sets up a TypeConverter and uses a ConversionPattern to detensorize all candidate ops (see "Type Conversions the Not-So-Hard Way" at https://mlir.llvm.org/talks/ for how to do that. especially slides "Setting up TypeConverter properly" and the surrounding slides). Basically you want a type converter that converts a tensor of static size to N individual SSA values (N = number of elements).
  2. Add a cost model to it.

Personally this patch feels too "sandboxy" / "work in progress"-ey for the bar that I usually use for reviewing things for inclusion upstream (for example, handling only special cases and not being correct in general, etc.).

That's not to degrade the patch. I think it's great progress. I just don't see any benefit to committing it upstream vs you iterating more locally. Nobody else is going to depend on this patch, and I strongly suspect that subsequent patches will likely significantly change the code anyway (and given that, spending reviewer time on code that is going to be rewritten is not efficient), and the code doesn't nontrivially intersect with any other code that might create merge conflicts.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
13 ↗(On Diff #322319)

don't include the .inc

I think we need to start with the pass (perhaps it only handles certain simple cases) and then gradually make the pass better.

One thing I don't like about this patch is that it adds this detensoring pattern in a header. I think it should be an implementation detail of the pass until somebody needs it.

I agree it's not mature enough yet, that's why I wanted to look into more advanced scenarios to derive it. Let me check the talk you suggested and try to reiterate on this. Please bear with me :).

ergawy updated this revision to Diff 323276.Feb 12 2021, 2:34 AM
  • Rewrite detensoring logic using the dialect conversion framework.
  • Add support for 0D tesnros.
  • Add a finalizing pass to detensor functions.
  • Add more tests.

Thanks, for the talk @silvas, it enabled me to shed some light on some new corners I didn't get to interact with before.

I rewrote everything using the dialect conversion framework and added a finalizing pass. Actually, I mostly reused your FuncBufferizePass to do detesoring as well (extracted everything in a shared util). Hopefully, this now goes in a better direction. Any further comments are more than welcome.

mlir/include/mlir/Dialect/Linalg/Passes.td
139

TODO: if this is a good direction, modify docs and more detailed descriptions here and below.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
663 ↗(On Diff #323276)

TODO: if this is a good direction, modify docs and more detailed description.

mlir/test/Dialect/Linalg/detensorized_while.mlir
1 ↗(On Diff #323276)

TODO: if this is a good direction, add more tests that involve function calls.

ergawy added inline comments.Feb 12 2021, 2:42 AM
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
772 ↗(On Diff #323276)

I think it might be useful to extract this to be available as a general canonicalization pass. Let me know if you have any objections.

silvas added inline comments.Feb 12 2021, 5:52 PM
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
691 ↗(On Diff #323276)

Put this pass in its own file: Detensorize.cpp

692 ↗(On Diff #323276)

For now, please limit this to the rank0 case. That is the most important one in practice.

Also, it feels unnatural to handle the tensor<1xdtype> case but not the tensor<2xdtype>(and 3, 4, ..., but not ? obviously). That is, I think when you consider implementing the general case of statically shaped with rank>0 it will fall out naturally without any of the special handling you are doing here. (if we even need rank>0). Specifically, any statically shaped tensor converts to a set of values equal to its nuber of elements, and the detensorizing conversion does the equivalent of effectively converting the linalg.generic to loops and fully unrolling it and substituting the individual SSA values. (that's just the mechanics; of course a cost model will be needed to control that).

Anyway, I doubt we will need that level of genericity, or even just the 1D case. At least for a while. The cost model is higher priority, because, for example, a "reduce" operation can easily create a rank0 tensor, but that tensor is likely to live on device, and so detensorizing further operations on it is undesirable.

Priorities:

  1. initial boilerplate + the mechanics of handling rank0 without control flow
  2. adding the mechanics to handle control flow
  3. adding the cost model.
772 ↗(On Diff #323276)

I think if you start with just the rank0 case, you won't need this.

808 ↗(On Diff #323276)

Note: you can handle control flow here without changing the function type itself. To do it:

  • need to split populateBranchOpInterfaceAndReturnOpTypeConversionPattern into two functions: one that handles "return" and one the handles all other terminators. You will use the function for all other terminators and but not the one for "return".
  • need to write a pattern similar to the one in populateFuncOpTypeConversionPattern, but with one key change: it won't rewrite the function arguments itself. The key difference is that the only change this new pattern does is the rewriter.convertRegionTypes call, and it will pass a special TypeConverter::SignatureConversion for the third argument: it won't change the entry block types at all! You can call the function `populateRegionInternalBlockArgTypeConversionPatterns. This new pattern will be in the same file as populateBranchOpInterfaceAndReturnOpTypeConversionPattern.

(you probably want to leave that to a subsequent patch)

825 ↗(On Diff #323276)

Please don't include this pass in the initial commit. It's unclear if we ever want to to do this transformation at function boundaries (certainly we can't do it for publicly visible functions).

This pass is totally different from FuncBufferize because FuncBufferize changes even publicly visible functions (it changes the calling convention of the module!). That's not the case for this pass, which is just an optimization. We can also extend LinalgDetensorizePass to also be interprocedural subject to not transforming public functions.

If the need ever arises, we can have a separate DetensorizeModuleCallingConvention pass that just converts public function signatures. That's a totally disjoint purpose for a pass though than what we are doing here.

ergawy updated this revision to Diff 323679.Feb 15 2021, 12:48 AM
ergawy marked 3 inline comments as done.
  • Don't detensorize control flow.
  • Don't detensorize across function boundaries.
  • Support only rank0 for now.

Thanks @nicolasvasilache and @silvas again for your help in calibrating this to better fit the intended needs. I keep adding and deleting code but that's what happens when you are a newbie :).

I handled the comments for Sean. Any other comments are welcome.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
692 ↗(On Diff #323276)

In my (admittedly still naive) understanding, 0D and <1xdtype> tensors are more or less equivalent (i.e. they actually wrap a single-element of the underlying type). That's why I wanted to support both in one go.

However, I see your point in that if a single-element tensor needs to be emitted by codegen, then 0D tensors will be used and not <1xdtype> ones.

Removed support for <1x...> tensors.

772 ↗(On Diff #323276)

This is actually needed specifically for the rank0 case. This is because source materializations are emitted as tensor.from_elements + linalg.tensor_reshape since tensor.from_elements results in a tensor<1xdtype>.

808 ↗(On Diff #323276)

Thanks for the pointers on how to do that. Will do in a follow up patch.

silvas added inline comments.Feb 16 2021, 3:07 PM
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
691 ↗(On Diff #323276)

It seems to still be in Transforms.cpp

772 ↗(On Diff #323276)

Ah, makes sense. I forgot that from_elements only handles 1D.

825 ↗(On Diff #323276)

Please remove FuncDetensorize and any changes related to FuncBufferize.

mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp
32 ↗(On Diff #323276)

please don't touch this pass in this patch.

@silvas Your last comments all seem to be on an old diff :). FuncBufferize changes were already reverted and the detensoring code was moved to its own file.

silvas requested changes to this revision.Feb 17 2021, 11:21 AM

@silvas Your last comments all seem to be on an old diff :). FuncBufferize changes were already reverted and the detensoring code was moved to its own file.

Sorry about that!

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
1822 ↗(On Diff #323679)

nit: revert this change.

mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
52

use rewriter.inlineRegionBefore to handle the generic case of the body having multiple ops. If you clone, then you will run into the issue described here: https://github.com/llvm/llvm-project/blob/892d2822b62ebcaa7aa0b006b5ea4f26593c1618/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L44

136

StandardOpsDialect not needed after rewriter.inlineRegionBefore change.

mlir/test/Dialect/Linalg/detensorized_0d.mlir
2

do not run canonicalize as part of the test. that is fragile.

12

add a test case that uses an op from some unknown dialect in the body (to verify that the rewriting logic is agnostic to the body ops)

mlir/test/Dialect/Linalg/detensorized_while.mlir
1 ↗(On Diff #323679)

This test can be added when you add the control flow support. For now, it isn't testing much.

This revision now requires changes to proceed.Feb 17 2021, 11:21 AM
ergawy updated this revision to Diff 324899.Feb 19 2021, 12:21 AM
ergawy marked 6 inline comments as done.

Handle review comments:

  • Use inlining instead of cloning.
  • Clean-up unneeded code.
  • Add more tests.
ergawy retitled this revision from [WIP] -- [MLIR][LinAlg] Start detensoring implementation. to [MLIR][LinAlg] Start detensoring implementation..Feb 19 2021, 12:23 AM
ergawy updated this revision to Diff 324922.Feb 19 2021, 1:51 AM

Simplify converted code by merging some blocks.

mikeurbach added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
77

For entry block arguments, do we end up here? I think you'd want them to convert to themselves, i.e. end up on L78.

silvas accepted this revision.Feb 22 2021, 10:23 AM

Perfect!

This revision is now accepted and ready to land.Feb 22 2021, 10:23 AM
silvas added inline comments.Feb 22 2021, 10:23 AM
mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
77

the patch doesn't rewrite block args yet.

mikeurbach added inline comments.Feb 22 2021, 12:22 PM
mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
77

Yep, sorry for the noise, I just wanted to point this out after it was mentioned on Discourse. I couldn't find a simple way to link directly to this line in this revision to paste into Discourse.

This revision was landed with ongoing or failed builds.Feb 22 2021, 11:28 PM
This revision was automatically updated to reflect the committed changes.