This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/
-
mlir/
-
Dialect/
-
Linalg/
-
Passes.h
1/1
Passes.td
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
-
CMakeLists.txt
2/5
Detensorize.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
2/2
detensorized_0d.mlir

Differential D96271

[MLIR][LinAlg] Start detensoring implementation.
ClosedPublic

Authored by ergawy on Feb 8 2021, 8:51 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
silvas

Commits

rG67e0d58de4d3: [MLIR][LinAlg] Start detensoring implementation.

Summary

This commit is the first baby step towards detensoring in
linalg-on-tensors.

Detensoring is the process through which a tensor value is convereted to one
or potentially more primitive value(s). During this process, operations with
such detensored operands are also converted to an equivalen form that works
on primitives.

The detensoring process is driven by linalg-on-tensor ops. In particular, a
linalg-on-tensor op is checked to see whether *all* its operands can be
detensored. If so, those operands are converted to thier primitive
counterparts and the linalg op is replaced by an equivalent op that takes
those new primitive values as operands. Therefore, the detensoring process
can be divided into 2 main logical phases:

Detect/match an op that can be detensored.
Detensor the operands of the op and replace it with a primitive equivalent.

These 2 logical phases are implemented by LinalgDetensoringPattern
which is documented in-place below.

This works towards handling github/google/iree#1159.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ergawy created this revision.Feb 8 2021, 8:51 AM

Herald added subscribers: mravishankar, teijeong, rdzhabarov and 15 others. · View Herald TranscriptFeb 8 2021, 8:51 AM

ergawy requested review of this revision.Feb 8 2021, 8:51 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 8 2021, 8:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2021, 8:51 AM

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B88301: Diff 322127.Feb 8 2021, 9:23 AM

I would like to see a bit more of the pass that actually uses these patterns (e.g. the cost model).

Also, I left a few comments about the mechanics here.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
981 ↗	(On Diff #322127)	Why this template?
994 ↗	(On Diff #322127)	there should already be a hasTensorSemantics helper.
1048 ↗	(On Diff #322127)	This should already be covered by existing canonicalizations. Can you try using {ExtractOp,FromElementsOp}::getCanonicalizationPatterns

silvas added a reviewer: silvas.Feb 8 2021, 11:33 AM

Thanks for pushing on this!
Some first comments.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
942 ↗	(On Diff #322127)	equivalent
947 ↗	(On Diff #322127)	their
960 ↗	(On Diff #322127)	considered
964 ↗	(On Diff #322127)	Please mention here that for now only 1-D tensors are supported.
988 ↗	(On Diff #322127)	For this type of pattern, you might as well just use MatchAnyOpTypeTag for the improved flexibility. Then the pattern impl can also live in the .cpp. What you are doing here should generally work for any LinalgOp in general.
998 ↗	(On Diff #322127)	Please merge this check with the following using `getShapedOperandTypes()` (that you can add to mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOpsInterface.td)
1010 ↗	(On Diff #322127)	always use `OpBuilder::InsertionGuard g(rewriter)` before any `setInsertionPoint`.
1016 ↗	(On Diff #322127)	If you go for a generalization, some LinalgOp may not have a body right now (but have a bodyBuilder). This will change soon once a few things land. No need to do anything for now, just mentioning in case you wanted to generalize the applicability of your pattern.
1019 ↗	(On Diff #322127)	This would probably read better as something that resembles: auto extracts = llvm::to_vector<4>(llvm::map_range( linalgOp.getInputOperands(), [&](Value v){ return rewriter.create(...); } )); // use a loop that pushes_back if you prefer. tensorToDetensoredOperandMapping.map(linalgOp.getInputOperands(), extracts);
1048 ↗	(On Diff #322127)	+1, all these are canonicalizations, if some are missing they need to be added in the proper places.
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
654 ↗	(On Diff #322127)	If we have a 1-D tensor of 1e9 elements, is this going to unconditionally blow up everything? We prob want some safeguard on the total number of elements that can be unrolled ?

Handle review comments.

Thanks for your review @silvas and @nicolasvasilache. Addressed your comments and looking further into how this can be generalized and used.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
981 ↗	(On Diff #322127)	It can go away for sure. Removed.
1016 ↗	(On Diff #322127)	Thanks for the heads-up. I certainly hope to generalize this as much as possible :).
1048 ↗	(On Diff #322127)	Thanks for pointing that out, canonicalization does the job indeed.
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
654 ↗	(On Diff #322127)	I think it makes sense to explicitly test for the number of elements in supported scenarios. For example, for now we explicitly check that `tensorType.getNumElements() == 1` and add more sizes in the future (i.e. relax the condition a little bit more). I guess this makes it more difficult to unintentionally try to detensor some value not meant to be detensored. I added such check, let me know if you prefer a more general cutoff condition from the get-go.

Harbormaster completed remote builds in B88428: Diff 322319.Feb 9 2021, 2:54 AM

ergawy edited the summary of this revision. (Show Details)Feb 9 2021, 11:02 PM

nicolasvasilache accepted this revision.Feb 9 2021, 11:31 PM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
654 ↗	(On Diff #322127)	This will probably end up as an argument to the pattern when we start using it more generally. My only concern was to blow up the compilation process on mistakes, this is fine now.

This revision is now accepted and ready to land.Feb 9 2021, 11:31 PM

Thanks for the approval.

Just to make sure we are properly aligned on the next steps. I started now on pushing the implementation further. This is mainly driven by these 2 examples:

Of course, after lowering them to linalg through: mlir-hlo-opt <example> -mhlo-legalize-control-flow -hlo-legalize-to-linalg.

For this we need to support the 0D case which comes with its own set of challenges I am trying to solve at the moment. We can also adapt the examples to work with <1x*> tensors but I think support for 0D is needed anyway.

I can either (1) commit this patch now since Nicolas already approved or (2) wait till I open a new patch (or patches) (on top of this one) with support for more advanced examples like the above ones.

If you don't have a preference then I will take the approval as: merge this patch please :D.

Adding an assertion that fails now for the 0-D case and iterating on it in the next revision SGTM.

@silvas Sorry forgot to reply to this point:

I would like to see a bit more of the pass that actually uses these patterns (e.g. the cost model).

Do you mean something like an analysis that computes the benefit of detensoring for a certain operation (for some definition of benefit, I didn't think this through yet)?

If so, then that can be step number 3. Step 2, as I mentioned in my previous comment is supporting detensoring on more complex CFGs like the while loops I linked in the comment.

Is that reasonable?

In D96271#2554127, @ergawy wrote:

@silvas Sorry forgot to reply to this point:

I would like to see a bit more of the pass that actually uses these patterns (e.g. the cost model).

Do you mean something like an analysis that computes the benefit of detensoring for a certain operation (for some definition of benefit, I didn't think this through yet)?

If so, then that can be step number 3. Step 2, as I mentioned in my previous comment is supporting detensoring on more complex CFGs like the while loops I linked in the comment.

Is that reasonable?

Concretely, I would suggest:

Write a pass that, given a function, sets up a TypeConverter and uses a ConversionPattern to detensorize all candidate ops (see "Type Conversions the Not-So-Hard Way" at https://mlir.llvm.org/talks/ for how to do that. especially slides "Setting up TypeConverter properly" and the surrounding slides). Basically you want a type converter that converts a tensor of static size to N individual SSA values (N = number of elements).
Add a cost model to it.

Personally this patch feels too "sandboxy" / "work in progress"-ey for the bar that I usually use for reviewing things for inclusion upstream (for example, handling only special cases and not being correct in general, etc.).

That's not to degrade the patch. I think it's great progress. I just don't see any benefit to committing it upstream vs you iterating more locally. Nobody else is going to depend on this patch, and I strongly suspect that subsequent patches will likely significantly change the code anyway (and given that, spending reviewer time on code that is going to be rewritten is not efficient), and the code doesn't nontrivially intersect with any other code that might create merge conflicts.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
13 ↗	(On Diff #322319)	don't include the .inc

I think we need to start with the pass (perhaps it only handles certain simple cases) and then gradually make the pass better.

One thing I don't like about this patch is that it adds this detensoring pattern in a header. I think it should be an implementation detail of the pass until somebody needs it.

I agree it's not mature enough yet, that's why I wanted to look into more advanced scenarios to derive it. Let me check the talk you suggested and try to reiterate on this. Please bear with me :).

Rewrite detensoring logic using the dialect conversion framework.
Add support for 0D tesnros.
Add a finalizing pass to detensor functions.
Add more tests.

Thanks, for the talk @silvas, it enabled me to shed some light on some new corners I didn't get to interact with before.

I rewrote everything using the dialect conversion framework and added a finalizing pass. Actually, I mostly reused your FuncBufferizePass to do detesoring as well (extracted everything in a shared util). Hopefully, this now goes in a better direction. Any further comments are more than welcome.

mlir/include/mlir/Dialect/Linalg/Passes.td
139	TODO: if this is a good direction, modify docs and more detailed descriptions here and below.
mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
663 ↗	(On Diff #323276)	TODO: if this is a good direction, modify docs and more detailed description.
mlir/test/Dialect/Linalg/detensorized_while.mlir
1 ↗	(On Diff #323276)	TODO: if this is a good direction, add more tests that involve function calls.

ergawy added inline comments.Feb 12 2021, 2:42 AM

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
772 ↗	(On Diff #323276)	I think it might be useful to extract this to be available as a general canonicalization pass. Let me know if you have any objections.

Harbormaster completed remote builds in B88970: Diff 323276.Feb 12 2021, 4:40 AM

silvas added inline comments.Feb 12 2021, 5:52 PM

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
691 ↗	(On Diff #323276)	Put this pass in its own file: Detensorize.cpp
692 ↗	(On Diff #323276)	For now, please limit this to the rank0 case. That is the most important one in practice. Also, it feels unnatural to handle the tensor<1xdtype> case but not the tensor<2xdtype>(and 3, 4, ..., but not `?` obviously). That is, I think when you consider implementing the general case of statically shaped with rank>0 it will fall out naturally without any of the special handling you are doing here. (if we even need rank>0). Specifically, any statically shaped tensor converts to a set of values equal to its nuber of elements, and the detensorizing conversion does the equivalent of effectively converting the linalg.generic to loops and fully unrolling it and substituting the individual SSA values. (that's just the mechanics; of course a cost model will be needed to control that). Anyway, I doubt we will need that level of genericity, or even just the 1D case. At least for a while. The cost model is higher priority, because, for example, a "reduce" operation can easily create a rank0 tensor, but that tensor is likely to live on device, and so detensorizing further operations on it is undesirable. Priorities: initial boilerplate + the mechanics of handling rank0 without control flow adding the mechanics to handle control flow adding the cost model.
772 ↗	(On Diff #323276)	I think if you start with just the rank0 case, you won't need this.
808 ↗	(On Diff #323276)	Note: you can handle control flow here without changing the function type itself. To do it: need to split populateBranchOpInterfaceAndReturnOpTypeConversionPattern into two functions: one that handles "return" and one the handles all other terminators. You will use the function for all other terminators and but not the one for "return". need to write a pattern similar to the one in populateFuncOpTypeConversionPattern, but with one key change: it won't rewrite the function arguments itself. The key difference is that the only change this new pattern does is the rewriter.convertRegionTypes call, and it will pass a special TypeConverter::SignatureConversion for the third argument: it won't change the entry block types at all! You can call the function `populateRegionInternalBlockArgTypeConversionPatterns. This new pattern will be in the same file as populateBranchOpInterfaceAndReturnOpTypeConversionPattern. (you probably want to leave that to a subsequent patch)
825 ↗	(On Diff #323276)	Please don't include this pass in the initial commit. It's unclear if we ever want to to do this transformation at function boundaries (certainly we can't do it for publicly visible functions). This pass is totally different from FuncBufferize because FuncBufferize changes even publicly visible functions (it changes the calling convention of the module!). That's not the case for this pass, which is just an optimization. We can also extend LinalgDetensorizePass to also be interprocedural subject to not transforming public functions. If the need ever arises, we can have a separate DetensorizeModuleCallingConvention pass that just converts public function signatures. That's a totally disjoint purpose for a pass though than what we are doing here.

Don't detensorize control flow.
Don't detensorize across function boundaries.
Support only rank0 for now.

Herald added a subscriber: mgorny. · View Herald TranscriptFeb 15 2021, 12:48 AM

Thanks @nicolasvasilache and @silvas again for your help in calibrating this to better fit the intended needs. I keep adding and deleting code but that's what happens when you are a newbie :).

I handled the comments for Sean. Any other comments are welcome.

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
692 ↗	(On Diff #323276)	In my (admittedly still naive) understanding, 0D and <1xdtype> tensors are more or less equivalent (i.e. they actually wrap a single-element of the underlying type). That's why I wanted to support both in one go. However, I see your point in that if a single-element tensor needs to be emitted by codegen, then 0D tensors will be used and not <1xdtype> ones. Removed support for <1x...> tensors.
772 ↗	(On Diff #323276)	This is actually needed specifically for the rank0 case. This is because source materializations are emitted as `tensor.from_elements` + `linalg.tensor_reshape` since `tensor.from_elements` results in a `tensor<1xdtype>`.
808 ↗	(On Diff #323276)	Thanks for the pointers on how to do that. Will do in a follow up patch.

Harbormaster completed remote builds in B89193: Diff 323679.Feb 15 2021, 1:18 AM

silvas added inline comments.Feb 16 2021, 3:07 PM

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp
691 ↗	(On Diff #323276)	It seems to still be in Transforms.cpp
772 ↗	(On Diff #323276)	Ah, makes sense. I forgot that from_elements only handles 1D.
825 ↗	(On Diff #323276)	Please remove FuncDetensorize and any changes related to FuncBufferize.
mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp
32 ↗	(On Diff #323276)	please don't touch this pass in this patch.

@silvas Your last comments all seem to be on an old diff :). FuncBufferize changes were already reverted and the detensoring code was moved to its own file.

In D96271#2567515, @ergawy wrote:

@silvas Your last comments all seem to be on an old diff :). FuncBufferize changes were already reverted and the detensoring code was moved to its own file.

Sorry about that!

mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp
1822 ↗	(On Diff #323679)	nit: revert this change.
mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
52	use rewriter.inlineRegionBefore to handle the generic case of the body having multiple ops. If you clone, then you will run into the issue described here: https://github.com/llvm/llvm-project/blob/892d2822b62ebcaa7aa0b006b5ea4f26593c1618/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L44
136	StandardOpsDialect not needed after rewriter.inlineRegionBefore change.
mlir/test/Dialect/Linalg/detensorized_0d.mlir
2	do not run canonicalize as part of the test. that is fragile.
12	add a test case that uses an op from some unknown dialect in the body (to verify that the rewriting logic is agnostic to the body ops)
mlir/test/Dialect/Linalg/detensorized_while.mlir
1 ↗	(On Diff #323679)	This test can be added when you add the control flow support. For now, it isn't testing much.

This revision now requires changes to proceed.Feb 17 2021, 11:21 AM

Handle review comments:

Use inlining instead of cloning.
Clean-up unneeded code.
Add more tests.

ergawy retitled this revision from [WIP] -- [MLIR][LinAlg] Start detensoring implementation. to [MLIR][LinAlg] Start detensoring implementation..Feb 19 2021, 12:23 AM

Harbormaster completed remote builds in B89879: Diff 324899.Feb 19 2021, 1:27 AM

Simplify converted code by merging some blocks.

Harbormaster completed remote builds in B89891: Diff 324922.Feb 19 2021, 2:34 AM

mikeurbach added a subscriber: mikeurbach.Feb 21 2021, 9:17 PM

mikeurbach added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
77	For entry block arguments, do we end up here? I think you'd want them to convert to themselves, i.e. end up on L78.

Perfect!

This revision is now accepted and ready to land.Feb 22 2021, 10:23 AM

silvas added inline comments.Feb 22 2021, 10:23 AM

mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
77	the patch doesn't rewrite block args yet.

mikeurbach added inline comments.Feb 22 2021, 12:22 PM

mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp
77	Yep, sorry for the noise, I just wanted to point this out after it was mentioned on Discourse. I couldn't find a simple way to link directly to this line in this revision to paste into Discourse.

This revision was landed with ongoing or failed builds.Feb 22 2021, 11:28 PM

Closed by commit rG67e0d58de4d3: [MLIR][LinAlg] Start detensoring implementation. (authored by ergawy). · Explain Why

This revision was automatically updated to reflect the committed changes.

ergawy added a commit: rG67e0d58de4d3: [MLIR][LinAlg] Start detensoring implementation..

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Passes.h

4 lines

Passes.td

24 lines

lib/

Dialect/

Linalg/

Transforms/

CMakeLists.txt

1 line

Detensorize.cpp

173 lines

test/

Dialect/

Linalg/

detensorized_0d.mlir

107 lines

Diff 325691

mlir/include/mlir/Dialect/Linalg/Passes.h

	Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	/// parallel loops.			/// parallel loops.
	void populateElementwiseToLinalgConversionPatterns(			void populateElementwiseToLinalgConversionPatterns(
	OwningRewritePatternList &patterns, MLIRContext *ctx);			OwningRewritePatternList &patterns, MLIRContext *ctx);

	/// Create a pass to conver named Linalg operations to Linalg generic			/// Create a pass to conver named Linalg operations to Linalg generic
	/// operations.			/// operations.
	std::unique_ptr<OperationPass<FuncOp>> createLinalgGeneralizationPass();			std::unique_ptr<OperationPass<FuncOp>> createLinalgGeneralizationPass();

				/// Create a pass to convert Linalg operations to equivalent operations that
				/// work on primitive types, if possible.
				std::unique_ptr<Pass> createLinalgDetensorizePass();

	/// Patterns to fold an expanding (collapsing) tensor_reshape operation with its			/// Patterns to fold an expanding (collapsing) tensor_reshape operation with its
	/// producer (consumer) generic operation by expanding the dimensionality of the			/// producer (consumer) generic operation by expanding the dimensionality of the
	/// loop in the generic op.			/// loop in the generic op.
	void populateFoldReshapeOpsByExpansionPatterns(			void populateFoldReshapeOpsByExpansionPatterns(
	MLIRContext *context, OwningRewritePatternList &patterns);			MLIRContext *context, OwningRewritePatternList &patterns);

	/// Patterns to fold a collapsing (expanding) tensor_reshape operation with its			/// Patterns to fold a collapsing (expanding) tensor_reshape operation with its
	/// producer (consumer) generic/indexed_generic operation by linearizing the			/// producer (consumer) generic/indexed_generic operation by linearizing the
	Show All 34 Lines

mlir/include/mlir/Dialect/Linalg/Passes.td

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	}			}

	def LinalgGeneralization : FunctionPass<"linalg-generalize-named-ops"> {			def LinalgGeneralization : FunctionPass<"linalg-generalize-named-ops"> {
	let summary = "Convert named ops into generic ops";			let summary = "Convert named ops into generic ops";
	let constructor = "mlir::createLinalgGeneralizationPass()";			let constructor = "mlir::createLinalgGeneralizationPass()";
	let dependentDialects = ["linalg::LinalgDialect"];			let dependentDialects = ["linalg::LinalgDialect"];
	}			}

				def LinalgDetensorize : FunctionPass<"linalg-detensorize"> {
				ergawyAuthorUnsubmitted Done Reply Inline Actions TODO: if this is a good direction, modify docs and more detailed descriptions here and below. ergawy: TODO: if this is a good direction, modify docs and more detailed descriptions here and below.
				let summary = "Detensorize linalg ops";
				let constructor = "mlir::createLinalgDetensorizePass()";
				let dependentDialects = [];

				let description = [{
				Detensoring is the process through which a tensor value is convereted to one
				or potentially more primitive value(s). During this process, operations with
				such detensored operands are also converted to an equivalent form that works
				on primitives.

				The detensoring process is driven by linalg-on-tensor ops. In particular, a
				linalg-on-tensor op is checked to see whether all its operands can be
				detensored. If so, those operands are converted to their primitive
				counterparts and the linalg op is replaced by an equivalent op that takes
				those new primitive values as operands. Therefore, the detensoring process
				can be divided into 2 main logical phases:

				1. Detect/match an op that can be detensored.
				2. Detensor the operands of the op and replace it with a primitive
				equivalent.
				}];
				}

	#endif // MLIR_DIALECT_LINALG_PASSES			#endif // MLIR_DIALECT_LINALG_PASSES

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRLinalgTransforms			add_mlir_dialect_library(MLIRLinalgTransforms
	Bufferize.cpp			Bufferize.cpp
	CodegenStrategy.cpp			CodegenStrategy.cpp
				Detensorize.cpp
	DropUnitDims.cpp			DropUnitDims.cpp
	ElementwiseToLinalg.cpp			ElementwiseToLinalg.cpp
	Fusion.cpp			Fusion.cpp
	FusionOnTensors.cpp			FusionOnTensors.cpp
	Generalization.cpp			Generalization.cpp
	Hoisting.cpp			Hoisting.cpp
	Interchange.cpp			Interchange.cpp
	Loops.cpp			Loops.cpp
	Show All 35 Lines

mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp

This file was added.

				//===- Detensorize.cpp - Linalg transformations as patterns ----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Dialect/Linalg/IR/LinalgOps.h"
				#include "mlir/Dialect/Linalg/IR/LinalgTypes.h"
				#include "mlir/Dialect/Linalg/Passes.h"
				#include "mlir/Dialect/StandardOps/Transforms/FuncConversions.h"
				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/IR/OpDefinition.h"
				#include "mlir/Transforms/DialectConversion.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
				#include <iterator>
				#include <memory>

				using namespace mlir;
				using namespace mlir::linalg;

				namespace {
				/// Defines the criteria a TensorType must follow in order to be considered
				/// "detensorable".
				///
				/// NOTE: For now, only 0-D are supported.
				///
				/// Returns true if tensorType can be detensored.
				bool canBeDetensored(TensorType tensorType) {
				return tensorType.hasRank() && tensorType.getRank() == 0;
				}

				/// A conversion patttern for detensoring `linalg.generic` ops.
				class DetensorizeGenericOp : public OpConversionPattern<GenericOp> {
				public:
				using OpConversionPattern::OpConversionPattern;
				LogicalResult
				matchAndRewrite(GenericOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				Block *originalBlock = op->getBlock();

				// Gather some information about the op before inling its region.
				Block opEntryBlock = &op.region().begin();
				YieldOp yieldOp = dyn_cast<YieldOp>(op.region().back().getTerminator());

				// Split the op's region before the op. This way, we have a clear insertion
				// point in which the op can be inlined.
				Block *newBlock = originalBlock->splitBlock(op);
				rewriter.inlineRegionBefore(op.region(), newBlock);
				// Now that op's region is inlined, the operands of its YieldOp are mapped
				silvasUnsubmitted Done Reply Inline Actions use rewriter.inlineRegionBefore to handle the generic case of the body having multiple ops. If you clone, then you will run into the issue described here: https://github.com/llvm/llvm-project/blob/892d2822b62ebcaa7aa0b006b5ea4f26593c1618/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L44 silvas: use rewriter.inlineRegionBefore to handle the generic case of the body having multiple ops. If…
				// to the materialized target values. Therefore, we can replace the op's
				// uses with those of its YielOp's operands.
				rewriter.replaceOp(op, yieldOp->getOperands());

				// No need for these intermediate blocks, merge them into 1.
				rewriter.mergeBlocks(opEntryBlock, originalBlock, operands);
				rewriter.mergeBlocks(newBlock, originalBlock, {});

				rewriter.eraseOp(&*Block::iterator(yieldOp));

				return success();
				}
				};

				class DetensorizeTypeConverter : public TypeConverter {
				public:
				DetensorizeTypeConverter() {
				addConversion([](Type type) { return type; });

				// A TensorType that can be detensored, is converted to the underlying
				// element type.
				addConversion([](TensorType tensorType) -> Type {
				if (canBeDetensored(tensorType))
				return tensorType.getElementType();

				mikeurbachUnsubmitted Not Done Reply Inline Actions For entry block arguments, do we end up here? I think you'd want them to convert to themselves, i.e. end up on L78. mikeurbach: For entry block arguments, do we end up here? I think you'd want them to convert to themselves…
				silvasUnsubmitted Not Done Reply Inline Actions the patch doesn't rewrite block args yet. silvas: the patch doesn't rewrite block args yet.
				mikeurbachUnsubmitted Not Done Reply Inline Actions Yep, sorry for the noise, I just wanted to point this out after it was mentioned on Discourse. I couldn't find a simple way to link directly to this line in this revision to paste into Discourse. mikeurbach: Yep, sorry for the noise, I just wanted to point this out after it was mentioned on Discourse.
				return tensorType;
				});

				// A tensor value is detensoried by extracting its element(s).
				addTargetMaterialization([](OpBuilder &builder, Type type,
				ValueRange inputs, Location loc) -> Value {
				return builder.create<tensor::ExtractOp>(loc, inputs[0], ValueRange{});
				});

				// A detensored value is converted back by creating a new tensor from its
				// element(s).
				addSourceMaterialization([](OpBuilder &builder, Type type,
				ValueRange inputs, Location loc) -> Value {
				auto createNewTensorOp = builder.create<tensor::FromElementsOp>(
				loc, inputs[0].getType(), inputs[0]);

				// FromElementsOp results in a tensor<1xdtype>, we need to reshape that to
				// a tensor<dtype> instead.
				return builder.create<linalg::TensorReshapeOp>(
				loc, type, createNewTensorOp, ArrayRef<ReassociationExprs>{});
				});
				}
				};

				/// Canonicalizes the pattern of the form
				///
				/// %tensor = tensor.from_elements(%element) : (i32) -> tensor<1xi32>
				/// %reshaped_tensor = linalg.tensor_reshape %tensor [] : tensor<1xi32> into
				/// tensor<i32>
				/// %extracted_element = tensor.extract %reshaped_tensor[] : tensor<i32>
				///
				/// to just %element.
				struct ExtractFromReshapeFromElements
				: public OpRewritePattern<tensor::ExtractOp> {
				using OpRewritePattern<tensor::ExtractOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(tensor::ExtractOp extract,
				PatternRewriter &rewriter) const final {
				if (extract.indices().size() != 0)
				return failure();

				auto tensorReshape = extract.tensor().getDefiningOp<TensorReshapeOp>();
				if (tensorReshape == nullptr)
				return failure();

				auto tensorFromElements =
				tensorReshape.getOperand()
				.getDefiningOp<mlir::tensor::FromElementsOp>();
				if (tensorFromElements == nullptr)
				return failure();

				rewriter.replaceOp(extract, tensorFromElements.getOperand(0));
				return success();
				}
				};

				/// @see LinalgDetensorize in Linalg/Passes.td for more details.
				struct LinalgDetensorize : public LinalgDetensorizeBase<LinalgDetensorize> {
				void runOnFunction() override {
				silvasUnsubmitted Done Reply Inline Actions StandardOpsDialect not needed after rewriter.inlineRegionBefore change. silvas: StandardOpsDialect not needed after rewriter.inlineRegionBefore change.
				auto *context = &getContext();
				DetensorizeTypeConverter typeConverter;
				OwningRewritePatternList patterns;
				ConversionTarget target(*context);

				target.markUnknownOpDynamicallyLegal([](Operation *op) { return true; });
				target.addLegalDialect<linalg::LinalgDialect>();
				target.addDynamicallyLegalOp<GenericOp>([&](GenericOp op) {
				// If any of the operands or results cannot be detensored, the op is
				// considered legal and won't be detensored.
				return llvm::any_of(
				op.getShapedOperandTypes(), [](ShapedType shapedType) {
				assert(shapedType.isa<TensorType>());
				return !canBeDetensored(shapedType.cast<TensorType>());
				});
				});

				patterns.insert<DetensorizeGenericOp>(typeConverter, context);

				if (failed(
				applyPartialConversion(getFunction(), target, std::move(patterns))))
				signalPassFailure();

				OwningRewritePatternList canonPatterns;
				canonPatterns.insert<ExtractFromReshapeFromElements>(context);
				if (failed(applyPatternsAndFoldGreedily(getFunction(),
				std::move(canonPatterns))))
				signalPassFailure();

				// TODO Properly handle control flow within function boundaries.
				}
				};
				} // namespace

				std::unique_ptr<Pass> mlir::createLinalgDetensorizePass() {
				return std::make_unique<LinalgDetensorize>();
				}

mlir/test/Dialect/Linalg/detensorized_0d.mlir

This file was added.

				// RUN: mlir-opt %s -allow-unregistered-dialect -linalg-detensorize \| FileCheck %s

				silvasUnsubmitted Done Reply Inline Actions do not run canonicalize as part of the test. that is fragile. silvas: do not run canonicalize as part of the test. that is fragile.
				#map = affine_map<() -> ()>

				func @detensor_simple(%arg1: tensor<f32>, %arg2: tensor<f32>) -> tensor<f32> attributes {iree.module.export} {
				%0 = linalg.init_tensor [] : tensor<f32>
				%1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []}
				ins(%arg1, %arg2 : tensor<f32>, tensor<f32>)
				outs(%0 : tensor<f32>) {
				^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
				%2 = addf %arg3, %arg4 : f32
				linalg.yield %2 : f32
				silvasUnsubmitted Done Reply Inline Actions add a test case that uses an op from some unknown dialect in the body (to verify that the rewriting logic is agnostic to the body ops) silvas: add a test case that uses an op from some unknown dialect in the body (to verify that the…
				} -> tensor<f32>
				return %1: tensor<f32>
				}
				// CHECK-LABEL: func @detensor_simple
				// CHECK-SAME: (%[[arg1:.]]: tensor<f32>, %[[arg2:.]]: tensor<f32>)
				// CHECK-DAG: %[[arg1_val:.*]] = tensor.extract %[[arg1]]
				// CHECK-DAG: %[[arg2_val:.*]] = tensor.extract %[[arg2]]
				// CHECK: %[[detensored_res:.*]] = addf %[[arg1_val]], %[[arg2_val]]
				// CHECK: %[[new_tensor_res:.*]] = tensor.from_elements %[[detensored_res]]
				// CHECK: %[[reshaped_tensor_res:.*]] = linalg.tensor_reshape %[[new_tensor_res]]
				// CHECK: return %[[reshaped_tensor_res]]

				func @detensor_op_sequence(%arg1: tensor<f32>, %arg2: tensor<f32>) -> tensor<f32> attributes {iree.module.export} {
				%0 = linalg.init_tensor [] : tensor<f32>
				%1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []}
				ins(%arg1, %arg2 : tensor<f32>, tensor<f32>)
				outs(%0 : tensor<f32>) {
				^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
				%2 = addf %arg3, %arg4 : f32
				linalg.yield %2 : f32
				} -> tensor<f32>

				%3 = linalg.init_tensor [] : tensor<f32>
				%4 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []}
				ins(%arg1, %1 : tensor<f32>, tensor<f32>)
				outs(%3 : tensor<f32>) {
				^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
				%5 = mulf %arg3, %arg4 : f32
				linalg.yield %5 : f32
				} -> tensor<f32>

				%6 = linalg.init_tensor [] : tensor<f32>
				%7 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []}
				ins(%1, %4 : tensor<f32>, tensor<f32>)
				outs(%6 : tensor<f32>) {
				^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
				%5 = divf %arg3, %arg4 : f32
				linalg.yield %5 : f32
				} -> tensor<f32>

				return %7: tensor<f32>
				}
				// CHECK-LABEL: func @detensor_op_sequence
				// CHECK-SAME: (%[[arg1:.]]: tensor<f32>, %[[arg2:.]]: tensor<f32>)
				// CHECK-DAG: %[[arg1_val:.*]] = tensor.extract %[[arg1]]
				// CHECK-DAG: %[[arg2_val:.*]] = tensor.extract %[[arg2]]
				// CHECK: %[[detensored_res:.*]] = addf %[[arg1_val]], %[[arg2_val]]
				// CHECK-DAG: %[[arg1_val2:.*]] = tensor.extract %[[arg1]]
				// CHECK: %[[detensored_res2:.*]] = mulf %[[arg1_val2]], %[[detensored_res]]
				// CHECK: %[[detensored_res3:.*]] = divf %[[detensored_res]], %[[detensored_res2]]
				// CHECK: %[[new_tensor_res:.*]] = tensor.from_elements %[[detensored_res3]]
				// CHECK: %[[reshaped_tensor_res:.*]] = linalg.tensor_reshape %[[new_tensor_res]]
				// CHECK: return %[[reshaped_tensor_res]]

				func @detensor_multiple_ops(%arg1: tensor<f32>, %arg2: tensor<f32>) -> tensor<f32> attributes {iree.module.export} {
				%0 = linalg.init_tensor [] : tensor<f32>
				%1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []}
				ins(%arg1, %arg2 : tensor<f32>, tensor<f32>)
				outs(%0 : tensor<f32>) {
				^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
				%2 = addf %arg3, %arg4 : f32
				%3 = mulf %2, %arg4 : f32
				linalg.yield %3 : f32
				} -> tensor<f32>
				return %1: tensor<f32>
				}
				// CHECK-LABEL: func @detensor_multiple_ops
				// CHECK-SAME: (%[[arg1:.]]: tensor<f32>, %[[arg2:.]]: tensor<f32>)
				// CHECK-DAG: %[[arg1_val:.*]] = tensor.extract %[[arg1]]
				// CHECK-DAG: %[[arg2_val:.*]] = tensor.extract %[[arg2]]
				// CHECK: %[[detensored_res:.*]] = addf %[[arg1_val]], %[[arg2_val]]
				// CHECK: %[[detensored_res2:.*]] = mulf %[[detensored_res]], %[[arg2_val]]
				// CHECK: %[[new_tensor_res:.*]] = tensor.from_elements %[[detensored_res2]]
				// CHECK: %[[reshaped_tensor_res:.*]] = linalg.tensor_reshape %[[new_tensor_res]]
				// CHECK: return %[[reshaped_tensor_res]]

				func @detensor_foreign_op(%arg1: tensor<f32>, %arg2: tensor<f32>) -> tensor<f32> attributes {iree.module.export} {
				%0 = linalg.init_tensor [] : tensor<f32>
				%1 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = []}
				ins(%arg1, %arg2 : tensor<f32>, tensor<f32>)
				outs(%0 : tensor<f32>) {
				^bb0(%arg3: f32, %arg4: f32, %arg5: f32): // no predecessors
				%2 = "foreign.do_something"(%arg3, %arg4) {} : (f32, f32) -> f32
				linalg.yield %2 : f32
				} -> tensor<f32>
				return %1: tensor<f32>
				}
				// CHECK-LABEL: func @detensor_foreign_op
				// CHECK-SAME: (%[[arg1:.]]: tensor<f32>, %[[arg2:.]]: tensor<f32>)
				// CHECK-DAG: %[[arg1_val:.*]] = tensor.extract %[[arg1]]
				// CHECK-DAG: %[[arg2_val:.*]] = tensor.extract %[[arg2]]
				// CHECK: %[[detensored_res:.*]] = "foreign.do_something"(%[[arg1_val]], %[[arg2_val]])
				// CHECK: %[[new_tensor_res:.*]] = tensor.from_elements %[[detensored_res]]
				// CHECK: %[[reshaped_tensor_res:.*]] = linalg.tensor_reshape %[[new_tensor_res]]
				// CHECK: return %[[reshaped_tensor_res]]

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][LinAlg] Start detensoring implementation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 325691

mlir/include/mlir/Dialect/Linalg/Passes.h

mlir/include/mlir/Dialect/Linalg/Passes.td

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

mlir/lib/Dialect/Linalg/Transforms/Detensorize.cpp

mlir/test/Dialect/Linalg/detensorized_0d.mlir

[MLIR][LinAlg] Start detensoring implementation.
ClosedPublic