This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/
-
mlir/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
1
Transforms.h
-
Tensor/IR/
-
IR/
-
TensorOps.td
-
lib/Dialect/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
-
CMakeLists.txt
11/24
DataLayoutPropagation.cpp
-
Tensor/IR/
-
IR/
-
TensorOps.cpp
-
test/
-
Dialect/Linalg/
-
Linalg/
3/5
data-layout-propagation.mlir
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
CMakeLists.txt
-
TestDataLayoutPropagation.cpp
-
tools/mlir-opt/
-
mlir-opt/
-
mlir-opt.cpp

Differential D138882

[mlir][tensor][linalg] Introduce DataLayoutPropagation pass.
ClosedPublic

Authored by hanchung on Nov 28 2022, 5:11 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
rengolin
mravishankar
chelini

Commits

rG0f297cad4d5b: [mlir][tensor][linalg] Introduce DataLayoutPropagation pass.

Summary

It introduces a pattern that swaps linalg.generic + tensor.pack to
tensor.pack + linalg.generic. It requires all the iteration types
being parallel; the indexing map of output operand is identiy. They can
all be relaxed in the future.

The user can decide whether the propagation should be applied or not by
passing a control function.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hanchung created this revision.Nov 28 2022, 5:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 5:11 PM

Herald added subscribers: Moerafaat, bzcheeseman, sdasgup3 and 21 others. · View Herald Transcript

hanchung requested review of this revision.Nov 28 2022, 5:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 28 2022, 5:11 PM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

remove unused variable

update indexing_map

Harbormaster completed remote builds in B199909: Diff 478427.Nov 28 2022, 6:23 PM

chelini added inline comments.Nov 29 2022, 5:19 AM

mlir/include/mlir/Dialect/Utils/IndexingUtils.h
83 ↗	(On Diff #478427)	I think we should update the comment here. You get [c, a].
mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
151	is `llvm::None` padding? Can we also propagate if we have padding?
171	We can use: rewriter.inlineRegionBefore(genericOp.getRegion(), newGenericOp.getRegion(), newGenericOp.getRegion().begin());

nicolasvasilache added inline comments.Nov 29 2022, 7:10 AM

mlir/include/mlir/Dialect/Utils/IndexingUtils.h
85 ↗	(On Diff #478427)	Why do you need the `, N` part ?
mlir/test/Dialect/Linalg/data-layout-propagation.mlir
12	please use `linalg.map` for all the generics in this test, they are simple maps

nicolasvasilache requested changes to this revision.Nov 29 2022, 7:22 AM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/Linalg/Passes.td
49 ↗	(On Diff #478427)	Should this be a test pass ? We don't want to start adding (and especially maintaining) heuristics at this point, right?
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
84	this is lacking docs
mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
34	Please add this to PackOp static PackOp::createDestinationTensor(...) This is where one would expect to find this helper.
59	more docs plz, input IR, output IR etc.
68	The functionality should be implemented in a function that takes relevant pieces of IR (i.e. the packop) and returns the relevant pieces of IR (the new linalg op and the new unpack op). The pattern should just call those. This way we can connect a transform op and even replace that test pass.
87	this is lacking docs, and more especially why you need some non-serializable C++ lambda injected here
104	Seems like a very complex piece of code to leave undocumented and inlined here. What can be hoisted, what becomes a util on some op etc?

This revision now requires changes to proceed.Nov 29 2022, 7:22 AM

hanchung marked 2 inline comments as done.Nov 29 2022, 11:42 AM

hanchung added inline comments.

mlir/include/mlir/Dialect/Utils/IndexingUtils.h
83 ↗	(On Diff #478427)	I misunderstood the semantics of `outer_dims_perm`. It should be either a permutation or empty. The method does not make sense to me now.. We won't need this method anymore. I enhance the verifier in https://reviews.llvm.org/D138936

address comments

hanchung marked 5 inline comments as done.Nov 30 2022, 5:10 PM

hanchung added inline comments.

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
34	The trade-off is that we can't use `tensor::createDimValues` if we add it to PackOp. Otherwise, there will be cyclic deps. TensorOps -> TensorUtils -> TensorOps
68	Do we want it being able to connect to a transform op? My understanding is that transform ops aims to apply the pattern once, while propagation patterns would be applied until IR gets converged. The pass is at the position similar to element-wise fusion, IMO.
87	It was intended to provide a callback function for users. So the users can decide what to propagate. Since the pass is in early development phase, we can drop the support; and add it back when needed.
151	It could introduce undefined behavior if we unconditionally propagate pack op through all the ops. E.g., if the padding value is zero and there are division ops in a generic op. Some values of padding area could be NaN (0/0). I disable the propagation in the pattern by default, and thinking to add an option for enabling aggressive propagation in the future.
mlir/test/Dialect/Linalg/data-layout-propagation.mlir
12	IIUC, `linalg.map` op is named op. I can't simply update the test file because the pattern works on linalg::GenericOp. Are you suggesting me to update the pattern taking linalg::LinalgOp instead? I think that could work for `linalg.map`, `linalg.transpose`, etc; I can take a stab at it, just wanna clarify before updating it.

move the pass to test-only

Harbormaster completed remote builds in B200387: Diff 479111.Nov 30 2022, 5:26 PM

Harbormaster completed remote builds in B200393: Diff 479121.Nov 30 2022, 5:43 PM

mravishankar added inline comments.Nov 30 2022, 6:48 PM

mlir/include/mlir/Dialect/Linalg/Passes.td
49 ↗	(On Diff #478427)	+1 to this point. Might be better to expose the core utility method or patterns as populate methods, and move the pass itself to a test pass.
mlir/include/mlir/Dialect/Utils/IndexingUtils.h
85 ↗	(On Diff #478427)	If you want to update the `SmallVector` in place, you could use `MutableArrayRef<..>`.
mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
68	It is still worth separating the core implementation into a stand-alone function (even if it is `static` and not exposed right now). Using fixed point is one way of applying the functionality. Even the elementwise op fusion patterns are just a wrapper around the core functions that implement the transformation.
87	This is the control that is based on call site constraints. Instead of using heuristics in the core method that decides when to apply this, we defer the control to the caller, and make a go/no-go decision....

nicolasvasilache added inline comments.Dec 1 2022, 2:26 AM

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
87	Injecting lambda form above allows inversion of control that is convenient very short term and has almost always proven to very quickly turn to technical debt. Given recent offline discussions I had and other parts of the codebase I have seen, I am going to more seriously push back against this anti-pattern globally. Injecting C++ callback control from above is a sign of missing abstractions and should almost always be disallowed. The alternative is often to refactor multiple times until the right abstractions emerge. In other words, the transformations we add must be functional-style, statically return the pieces of IR (existing created or updated) that make sense for that transform. This is not something customizable, if you need more information then statically return more information: no backchannels through lambdas. If you need different behaviors, instead of injecting a dynamic mechanism through a lambda, write another transformation that takes more/different inputs and return more/different outputs. Refactor the reusable utility helpers to avoid copy-pasta. These transformations can then be plugged into patterns and transform dialect ops who can be responsible for the switch between different static behaviors.

nicolasvasilache mentioned this in D134307: [mlir][TilingInterface] Add callback to yield a produced value..Dec 1 2022, 2:29 AM

remove unneeded changes

move implementations to functions

hanchung added inline comments.Dec 1 2022, 1:55 PM

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
87	thanks for the details! So we provide methods upstream, and let upstream patterns and downstream projects use the methods. They can always wrap the methods into a custom patterns w/o a callback function. I moved them to functions and the pattern becomes a wrapper of the method.

nicolasvasilache added inline comments.Dec 1 2022, 2:31 PM

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
87	We discussed offline with Mahesh and this may be one of the cases that I could see as an exception to the rule. Some loose characteristics of potential green/red flags that seem to emerge are: the information conveyed by the callback cannot be serialized in IR in a reasonable form the callback use remains superficial and does not permeate the transform implementation but is called once at the high level (e.g. in the match part of a pattern or in a transform dialect op) the callback does not capture or bypass the return mechanism maybe others ? I'd love a more principled discussion on the topic and that we come up with good guidelines we can follow. Removing the blocker, thanks for your patience. Edit: I don't know how to do this without accepting or resigning but consider my concern addressed :)

Harbormaster completed remote builds in B200619: Diff 479418.Dec 1 2022, 5:23 PM

hanchung added inline comments.Dec 1 2022, 5:27 PM

mlir/test/Dialect/Linalg/data-layout-propagation.mlir
12	It's quite tricky because the builder and clone method can't be reused. The most tricky part is about updating indexing_maps. The builder and clone method does not take indexing_maps into account. Also, there is not a clean way to use setting methods for them.

chelini added inline comments.Dec 5 2022, 6:03 AM

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
180	We also need to make sure that the result of the current linalg generic operation has no users. Otherwise, the linalg generic remains alive without a body.
230	struct instead of class?

Overall looks good to me. Just left two minor comments.

inlineRegionBefore -> cloneRegionBefore

class -> struct

rebase

Thanks! Overall looks good to me. Have a few nits.

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
82	Nit: Move the comments before the statement to make it read better.
85	Nice! TIL.

nicolasvasilache added a subscriber: pifon2a.Dec 5 2022, 2:30 PM

nicolasvasilache added inline comments.

mlir/test/Dialect/Linalg/data-layout-propagation.mlir
12	@pifon2a ?

move comments above the variable declaration

hanchung added inline comments.Dec 5 2022, 4:52 PM

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
180	I see the point. I think we should use cloneRegionBefore instead of inline*.
mlir/test/Dialect/Linalg/data-layout-propagation.mlir
12	I'm going to land the PR to unblock some work on @chelini side tomorrow. If it's an easy fix, I can give it a shot. If not, I'd prefer fixing it afterwards. Thanks!

Harbormaster completed remote builds in B201250: Diff 480284.Dec 5 2022, 11:15 PM

All the comments have been addressed except testing with linalg.map. I did some study and think that's not feasible to address. The pass works on the generic ops similar to ElementWiseFusion. If it can be done in elementwise fusion, it can also be done here. Since the other pass is testing with this form, I think it's not a big concern. I'm going to land the revision, and prefer addressing the issue afterwards if we know a way to address it.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 6 2022, 3:00 PM

Closed by commit rG0f297cad4d5b: [mlir][tensor][linalg] Introduce DataLayoutPropagation pass. (authored by hanchung). · Explain Why

This revision was automatically updated to reflect the committed changes.

hanchung added a commit: rG0f297cad4d5b: [mlir][tensor][linalg] Introduce DataLayoutPropagation pass..

nicolasvasilache added inline comments.Jan 12 2023, 9:59 AM

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp
65	std::pair
82	nit: typo
85	This seems like it will crash if not a projected permutation. Plz add a clear assert at the top-level of the func if this can never happen or consider failing.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Transforms.h

3 lines

Tensor/

IR/

TensorOps.td

4 lines

lib/

Dialect/

Linalg/

Transforms/

CMakeLists.txt

1 line

DataLayoutPropagation.cpp

250 lines

Tensor/

IR/

TensorOps.cpp

31 lines

test/

Dialect/

Linalg/

data-layout-propagation.mlir

230 lines

lib/

Dialect/

Linalg/

CMakeLists.txt

1 line

TestDataLayoutPropagation.cpp

49 lines

tools/

mlir-opt/

mlir-opt.cpp

2 lines

Diff 479418

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	/// Patterns for fusing linalg operation on tensors.			/// Patterns for fusing linalg operation on tensors.

	/// Pattern to fuse `linalg.generic` -> `linalg.generic` operations			/// Pattern to fuse `linalg.generic` -> `linalg.generic` operations
	/// when both operations are fusable elementwise operations.			/// when both operations are fusable elementwise operations.
	void populateElementwiseOpsFusionPatterns(			void populateElementwiseOpsFusionPatterns(
	RewritePatternSet &patterns,			RewritePatternSet &patterns,
	const ControlFusionFn &controlElementwiseOpFusion);			const ControlFusionFn &controlElementwiseOpFusion);

				/// Patterns to bubble up or down data layout ops across other operations.
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions this is lacking docs nicolasvasilache: this is lacking docs
				void populateDataLayoutPropagationPatterns(RewritePatternSet &patterns);

	/// Pattern to remove dead operands and results of `linalg.generic` operations.			/// Pattern to remove dead operands and results of `linalg.generic` operations.
	/// This is effectively DCE for a linalg op.			/// This is effectively DCE for a linalg op.
	void populateEraseUnusedOperandsAndResultsPatterns(RewritePatternSet &patterns);			void populateEraseUnusedOperandsAndResultsPatterns(RewritePatternSet &patterns);

	/// Function type to control generic op dimension collapsing. It is expected			/// Function type to control generic op dimension collapsing. It is expected
	/// to return an array of `ReassociationIndices` representing dimensions that			/// to return an array of `ReassociationIndices` representing dimensions that
	/// should be merged.			/// should be merged.
	using GetCollapsableDimensionsFn =			using GetCollapsableDimensionsFn =
	▲ Show 20 Lines • Show All 990 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td

Show First 20 Lines • Show All 1,770 Lines • ▼ Show 20 Lines	def Tensor_PackOp : Tensor_RelayoutOp<"pack", [

let extraClassDeclaration = commonExtraClassDeclaration # [{		let extraClassDeclaration = commonExtraClassDeclaration # [{
// Method to get the `ShapedType` of the result based on the inner tiles,		// Method to get the `ShapedType` of the result based on the inner tiles,
// position of the inner tiles (innerDimsPos) and interchange vector of		// position of the inner tiles (innerDimsPos) and interchange vector of
// outer loops (outerDimsPerm).		// outer loops (outerDimsPerm).
static ShapedType inferPackedType(ShapedType sourceType,		static ShapedType inferPackedType(ShapedType sourceType,
ArrayRef<int64_t> innerTileSizes, ArrayRef<int64_t> innerDimsPos,		ArrayRef<int64_t> innerTileSizes, ArrayRef<int64_t> innerDimsPos,
ArrayRef<int64_t> outerDimsPerm = {});		ArrayRef<int64_t> outerDimsPerm = {});

		static Value createDestinationTensor(OpBuilder &b, Location loc,
		Value source, ArrayRef<OpFoldResult> innerTileSizes,
		ArrayRef<int64_t> innerDimsPos, ArrayRef<int64_t> outerDimsPerm);
}];		}];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// UnPackOp		// UnPackOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def Tensor_UnPackOp : Tensor_RelayoutOp<"unpack"> {		def Tensor_UnPackOp : Tensor_RelayoutOp<"unpack"> {
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRLinalgTransforms			add_mlir_dialect_library(MLIRLinalgTransforms
	BubbleUpExtractSlice.cpp			BubbleUpExtractSlice.cpp
	BufferizableOpInterfaceImpl.cpp			BufferizableOpInterfaceImpl.cpp
	Bufferize.cpp			Bufferize.cpp
	ConstantFold.cpp			ConstantFold.cpp
				DataLayoutPropagation.cpp
	DecomposeLinalgOps.cpp			DecomposeLinalgOps.cpp
	Detensorize.cpp			Detensorize.cpp
	DropUnitDims.cpp			DropUnitDims.cpp
	ElementwiseOpFusion.cpp			ElementwiseOpFusion.cpp
	ElementwiseToLinalg.cpp			ElementwiseToLinalg.cpp
	EraseUnusedOperandsAndResults.cpp			EraseUnusedOperandsAndResults.cpp
	FusePadOpWithLinalgProducer.cpp			FusePadOpWithLinalgProducer.cpp
	Fusion.cpp			Fusion.cpp
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp

This file was added.

				//===- DataLayoutPropagation.cpp -----------------------------------------===///
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/Linalg/Passes.h"

				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/Linalg/IR/Linalg.h"
				#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
				#include "mlir/Dialect/Linalg/Utils/Utils.h"
				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/Dialect/Tensor/Utils/Utils.h"
				#include "mlir/Dialect/Utils/IndexingUtils.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

				namespace mlir {
				#define GEN_PASS_DEF_LINALGDATALAYOUTPROPAGATION
				#include "mlir/Dialect/Linalg/Passes.h.inc"
				} // namespace mlir

				using namespace mlir;
				using namespace mlir::linalg;

				#define DEBUG_TYPE "linalg-data-layout-propagation"

				namespace {

				/// Returns a tuple for packed operand and indexing_map with the assumptions:
				/// 1) The generic op is the producer of the pack op.
				/// 2) The generic op has only one result.
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Please add this to PackOp static PackOp::createDestinationTensor(...) This is where one would expect to find this helper. nicolasvasilache: Please add this to PackOp ``` static PackOp::createDestinationTensor(...) ``` This is where…
				hanchungAuthorUnsubmitted Done Reply Inline Actions The trade-off is that we can't use `tensor::createDimValues` if we add it to PackOp. Otherwise, there will be cyclic deps. TensorOps -> TensorUtils -> TensorOps hanchung: The trade-off is that we can't use `tensor::createDimValues` if we add it to PackOp. Otherwise…
				/// 3) The indexing map of the output operand is identity.
				/// If the operand is a scalar or packing dimensions are all irrelevant to the
				/// operand, the opreand and the updated indexing map will be returned.
				/// Otherwise, it returns the packed operand and the updated indexing map. E.g.,
				///
				/// #map0 = affine_map<(d0, d1) -> (d0, d1)>
				/// #map1 = affine_map<(d0, d1) -> (d0)>
				/// #map2 = affine_map<(d0, d1) -> (d1)>
				/// %0 = linalg.generic {indexing_maps = [#map1, #map2, #map0],
				/// iterator_types = ["parallel", "parallel"]}
				/// ins(%arg0, %arg1 : tensor<?xf32>, tensor<?xf32>)
				/// outs(%init : tensor<?x?xf32>) {
				/// ^bb0(%arg3: f32, %arg4: f32, %arg5: f32):
				/// %4 = arith.addf %arg3, %arg4 : f32
				/// linalg.yield %4 : f32
				/// } -> tensor<?x?xf32>
				/// %1 = tensor.pack %0
				/// inner_dims_pos = [0, 1]
				/// inner_tiles = [8, 2]
				/// into %dest : tensor<?x?xf32> -> tensor<?x?x8x2xf32>
				///
				/// Taking the first input operand as an example, the inner tile size of d1 is
				/// 8. Thus, the below operation and `affine_map<(d0, d1, d2, d3)> ->
				/// affine_map<(d1, d3)>` will be returned.
				///
				nicolasvasilacheUnsubmitted Done Reply Inline Actions more docs plz, input IR, output IR etc. nicolasvasilache: more docs plz, input IR, output IR etc.
				/// %pack = tensor.pack %arg0
				/// inner_dims_pos = [0]
				/// inner_tiles = [8]
				/// into %init : tensor<?xf32> -> tensor<?x8xf32>
				static std::tuple<Value, AffineMap>
				getOrCreatePackedViewOfOperand(OpBuilder &b, Location loc,
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions std::pair nicolasvasilache: std::pair
				tensor::PackOp packOp, GenericOp genericOp,
				OpOperand *opOperand) {
				int numOrigLoops = genericOp.getNumLoops();
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions The functionality should be implemented in a function that takes relevant pieces of IR (i.e. the packop) and returns the relevant pieces of IR (the new linalg op and the new unpack op). The pattern should just call those. This way we can connect a transform op and even replace that test pass. nicolasvasilache: The functionality should be implemented in a function that takes relevant pieces of IR (i.e.
				hanchungAuthorUnsubmitted Done Reply Inline Actions Do we want it being able to connect to a transform op? My understanding is that transform ops aims to apply the pattern once, while propagation patterns would be applied until IR gets converged. The pass is at the position similar to element-wise fusion, IMO. hanchung: Do we want it being able to connect to a transform op? My understanding is that transform ops…
				mravishankarUnsubmitted Not Done Reply Inline Actions It is still worth separating the core implementation into a stand-alone function (even if it is `static` and not exposed right now). Using fixed point is one way of applying the functionality. Even the elementwise op fusion patterns are just a wrapper around the core functions that implement the transformation. mravishankar: It is still worth separating the core implementation into a stand-alone function (even if it is…
				int64_t numInnerLoops = packOp.getInnerDimsPos().size();
				int64_t numLoops = numOrigLoops + numInnerLoops;
				AffineMap origIndexingMap = genericOp.getMatchingIndexingMap(opOperand);
				SmallVector<AffineExpr> exprs(origIndexingMap.getResults());

				if (genericOp.isScalar(opOperand))
				return std::make_tuple(
				opOperand->get(),
				AffineMap::get(numLoops, 0, exprs, packOp.getContext()));

				llvm::SetVector<int64_t> innerDimsPosSet(packOp.getInnerDimsPos().begin(),
				packOp.getInnerDimsPos().end());
				DenseMap<int64_t, int64_t>
				iterMapToDim; // Mapping from AffinDimExpr of indexing maps to the operand
				mravishankarUnsubmitted Not Done Reply Inline Actions Nit: Move the comments before the statement to make it read better. mravishankar: Nit: Move the comments before the statement to make it read better.
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions nit: typo nicolasvasilache: nit: typo
				// shape dimension.
				for (auto [index, expr] : llvm::enumerate(origIndexingMap.getResults())) {
				int64_t dimPos = expr.cast<AffineDimExpr>().getPosition();
				mravishankarUnsubmitted Not Done Reply Inline Actions Nice! TIL. mravishankar: Nice! TIL.
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This seems like it will crash if not a projected permutation. Plz add a clear assert at the top-level of the func if this can never happen or consider failing. nicolasvasilache: This seems like it will crash if not a projected permutation. Plz add a clear assert at the top…
				if (!innerDimsPosSet.contains(dimPos))
				continue;
				nicolasvasilacheUnsubmitted Done Reply Inline Actions this is lacking docs, and more especially why you need some non-serializable C++ lambda injected here nicolasvasilache: this is lacking docs, and more especially why you need some non-serializable C++ lambda…
				mravishankarUnsubmitted Not Done Reply Inline Actions This is the control that is based on call site constraints. Instead of using heuristics in the core method that decides when to apply this, we defer the control to the caller, and make a go/no-go decision.... mravishankar: This is the control that is based on call site constraints. Instead of using heuristics in the…
				hanchungAuthorUnsubmitted Done Reply Inline Actions It was intended to provide a callback function for users. So the users can decide what to propagate. Since the pass is in early development phase, we can drop the support; and add it back when needed. hanchung: It was intended to provide a callback function for users. So the users can decide what to…
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Injecting lambda form above allows inversion of control that is convenient very short term and has almost always proven to very quickly turn to technical debt. Given recent offline discussions I had and other parts of the codebase I have seen, I am going to more seriously push back against this anti-pattern globally. Injecting C++ callback control from above is a sign of missing abstractions and should almost always be disallowed. The alternative is often to refactor multiple times until the right abstractions emerge. In other words, the transformations we add must be functional-style, statically return the pieces of IR (existing created or updated) that make sense for that transform. This is not something customizable, if you need more information then statically return more information: no backchannels through lambdas. If you need different behaviors, instead of injecting a dynamic mechanism through a lambda, write another transformation that takes more/different inputs and return more/different outputs. Refactor the reusable utility helpers to avoid copy-pasta. These transformations can then be plugged into patterns and transform dialect ops who can be responsible for the switch between different static behaviors. nicolasvasilache: Injecting lambda form above allows inversion of control that is convenient very short term and…
				hanchungAuthorUnsubmitted Done Reply Inline Actions thanks for the details! So we provide methods upstream, and let upstream patterns and downstream projects use the methods. They can always wrap the methods into a custom patterns w/o a callback function. I moved them to functions and the pattern becomes a wrapper of the method. hanchung: thanks for the details! So we provide methods upstream, and let upstream patterns and…
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions We discussed offline with Mahesh and this may be one of the cases that I could see as an exception to the rule. Some loose characteristics of potential green/red flags that seem to emerge are: the information conveyed by the callback cannot be serialized in IR in a reasonable form the callback use remains superficial and does not permeate the transform implementation but is called once at the high level (e.g. in the match part of a pattern or in a transform dialect op) the callback does not capture or bypass the return mechanism maybe others ? I'd love a more principled discussion on the topic and that we come up with good guidelines we can follow. Removing the blocker, thanks for your patience. Edit: I don't know how to do this without accepting or resigning but consider my concern addressed :) nicolasvasilache: We discussed offline with Mahesh and this may be one of the cases that I could see as an…
				iterMapToDim[dimPos] = index;
				}

				// Construct the information of packing data dimensions and new indexing maps
				// for the operand.
				SmallVector<int64_t> innerDimsPos;
				SmallVector<OpFoldResult> innerTileSizes;
				for (auto [index, value] : llvm::enumerate(
				llvm::zip(packOp.getInnerDimsPos(), packOp.getMixedTiles()))) {
				int64_t dimPos = std::get<0>(value);
				if (!iterMapToDim.count(dimPos))
				continue;
				innerDimsPos.push_back(iterMapToDim[dimPos]);
				innerTileSizes.push_back(std::get<1>(value));
				exprs.push_back(b.getAffineDimExpr(numOrigLoops + index));
				}
				auto indexingMap = AffineMap::get(numLoops, 0, exprs, packOp.getContext());
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Seems like a very complex piece of code to leave undocumented and inlined here. What can be hoisted, what becomes a util on some op etc? nicolasvasilache: Seems like a very complex piece of code to leave undocumented and inlined here. What can be…

				SmallVector<int64_t> outerDimsPerm;
				for (auto outDim : packOp.getOuterDimsPerm()) {
				if (!iterMapToDim.count(outDim))
				continue;
				outerDimsPerm.push_back(iterMapToDim[outDim]);
				}

				// The operand does not have dimensions that relates to pack op.
				if (innerDimsPos.empty() && outerDimsPerm.empty())
				return std::make_tuple(opOperand->get(), indexingMap);

				auto empty = tensor::PackOp::createDestinationTensor(
				b, loc, opOperand->get(), innerTileSizes, innerDimsPos, outerDimsPerm);
				auto packedOperand = b.create<tensor::PackOp>(
				loc, opOperand->get(), empty, innerDimsPos, innerTileSizes,
				packOp.getPaddingValue(), outerDimsPerm);
				return std::make_tuple(packedOperand, indexingMap);
				}

				/// Bubbles up tensor.pack op through elementwise generic op. This
				/// swap pack(generic) to generic(pack). The new generic op works on packed
				/// domain; pack ops are created for input and output operands. E.g.,
				///
				/// #map0 = affine_map<(d0, d1) -> (d0, d1)>
				/// %0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
				/// %1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
				/// %2 = tensor.empty(%0, %1) : tensor<?x?xf32>
				/// %3 = linalg.generic {indexing_maps = [#map0, #map0],
				/// iterator_types = ["parallel", "parallel"]}
				/// ins(%arg0 : tensor<?x?xf32>)
				/// outs(%2 : tensor<?x?xf32>) {
				/// ^bb0(%arg3: f32, %arg4: f32):
				/// %4 = arith.addf %arg3, %arg3 : f32
				/// linalg.yield %4 : f32
				/// } -> tensor<?x?xf32>
				/// %4 = tensor.pack %3
				/// inner_dims_pos = [0, 1]
				/// inner_tiles = [8, 2]
				/// into %dest : tensor<?x?xf32> -> tensor<?x?x8x2xf32>
				///
				/// will be converted to
				///
				/// #map = affine_map<()[s0] -> (s0 ceildiv 8)>
				/// #map1 = affine_map<()[s0] -> (s0 ceildiv 2)>
				/// #map2 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				/// %dim = tensor.dim %arg0, %c0 : tensor<?x?xf32>
				cheliniUnsubmitted Done Reply Inline Actions is `llvm::None` padding? Can we also propagate if we have padding? chelini: is `llvm::None` padding? Can we also propagate if we have padding?
				hanchungAuthorUnsubmitted Done Reply Inline Actions It could introduce undefined behavior if we unconditionally propagate pack op through all the ops. E.g., if the padding value is zero and there are division ops in a generic op. Some values of padding area could be NaN (0/0). I disable the propagation in the pattern by default, and thinking to add an option for enabling aggressive propagation in the future. hanchung: It could introduce undefined behavior if we unconditionally propagate pack op through all the…
				/// %dim_0 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
				/// %0 = affine.apply #map()[%dim]
				/// %1 = affine.apply #map1()[%dim_0]
				/// %2 = tensor.empty(%0, %1) : tensor<?x?x8x2xf32>
				/// %pack = tensor.pack %arg0
				/// inner_dims_pos = [0, 1]
				/// inner_tiles = [8, 2]
				/// into %2 : tensor<?x?xf32> -> tensor<?x?x8x2xf32>
				/// %3 = linalg.generic {indexing_maps = [#map2, #map2],
				/// iterator_types = ["parallel", "parallel", "parallel", "parallel"]}
				/// ins(%pack : tensor<?x?x8x2xf32>)
				/// outs(%arg1 : tensor<?x?x8x2xf32>) {
				/// ^bb0(%in: f32, %out: f32):
				/// %4 = arith.addf %in, %in : f32
				/// linalg.yield %4 : f32
				/// } -> tensor<?x?x8x2xf32>
				static FailureOr<GenericOp>
				bubbleUpPackOpThroughElemGenericOp(RewriterBase &rewriter,
				tensor::PackOp packOp) {
				auto genericOp = packOp.getSource().getDefiningOp<GenericOp>();
				cheliniUnsubmitted Done Reply Inline Actions We can use: rewriter.inlineRegionBefore(genericOp.getRegion(), newGenericOp.getRegion(), newGenericOp.getRegion().begin()); chelini: We can use: ``` rewriter.inlineRegionBefore(genericOp.getRegion(), newGenericOp.getRegion()…
				if (!genericOp)
				return failure();

				if (!isElementwise(genericOp))
				return failure();

				// TODO: Relax the restriction. We are able to bubble up the pack op through
				// multi-result generic op. It just needs more work.
				if (genericOp.getNumResults() != 1)
				cheliniUnsubmitted Not Done Reply Inline Actions We also need to make sure that the result of the current linalg generic operation has no users. Otherwise, the linalg generic remains alive without a body. chelini: We also need to make sure that the result of the current linalg generic operation has no users.
				hanchungAuthorUnsubmitted Done Reply Inline Actions I see the point. I think we should use cloneRegionBefore instead of inline. hanchung:* I see the point. I think we should use cloneRegionBefore instead of inline*.
				return failure();

				// TODO: Add an option for allowing padding values. It could introduce
				// undefined behavior if we unconditionally propagate pack op through all
				// the ops. E.g., if the padding value is zero and there are division ops in
				// a generic op. Some values of padding area could be NaN (0/0).
				if (packOp.getPaddingValue())
				return failure();

				OpOperand *opOperand = genericOp.getDpsInitOperand(0);
				// TODO: Add support for all permutation indexing maps.
				if (!genericOp.getMatchingIndexingMap(opOperand).isIdentity())
				return rewriter.notifyMatchFailure(
				packOp, "the result of generic op does not have identity indexing_map");

				Location loc = packOp.getLoc();
				SmallVector<Value> inputOperands;
				SmallVector<AffineMap> indexingMaps;
				for (OpOperand *inputOperand : genericOp.getDpsInputOperands()) {
				auto [packedOperand, packedIndexingMap] = getOrCreatePackedViewOfOperand(
				rewriter, loc, packOp, genericOp, inputOperand);
				inputOperands.push_back(packedOperand);
				indexingMaps.push_back(packedIndexingMap);
				}

				int64_t numLoops = genericOp.getNumLoops();
				int64_t numInnerLoops = packOp.getInnerDimsPos().size();
				int64_t newNumLoops = numLoops + numInnerLoops;
				SmallVector<utils::IteratorType> iterTypes =
				genericOp.getIteratorTypesArray();
				iterTypes.append(numInnerLoops, utils::IteratorType::parallel);

				SmallVector<AffineExpr> outExprs(
				genericOp.getMatchingIndexingMap(opOperand).getResults());
				for (int i = 0; i < numInnerLoops; ++i)
				outExprs.push_back(rewriter.getAffineDimExpr(numLoops + i));
				indexingMaps.push_back(
				AffineMap::get(newNumLoops, 0, outExprs, rewriter.getContext()));

				auto newGenericOp = rewriter.create<linalg::GenericOp>(
				loc, packOp.getDestType(), inputOperands, packOp.getDest(), indexingMaps,
				iterTypes, /bodyBuild=/nullptr,
				linalg::getPrunedAttributeList(genericOp));
				rewriter.inlineRegionBefore(genericOp.getRegion(), newGenericOp.getRegion(),
				newGenericOp.getRegion().begin());
				return newGenericOp;
				}

				// Wrapper pattern that applies bubbleUpPackOpThroughElemGenericOp method.
				class BubbleUpPackOpThroughElemGenericOpPattern
				cheliniUnsubmitted Not Done Reply Inline Actions struct instead of class? chelini: struct instead of class?
				: public OpRewritePattern<tensor::PackOp> {
				public:
				using OpRewritePattern<tensor::PackOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(tensor::PackOp packOp,
				PatternRewriter &rewriter) const override {
				auto genericOp = bubbleUpPackOpThroughElemGenericOp(rewriter, packOp);
				if (failed(genericOp))
				return failure();
				rewriter.replaceOp(packOp, genericOp.value().getResults());
				return success();
				}
				};
				} // namespace

				void mlir::linalg::populateDataLayoutPropagationPatterns(
				RewritePatternSet &patterns) {
				patterns.insert<BubbleUpPackOpThroughElemGenericOpPattern>(
				patterns.getContext());
				}

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

Show First 20 Lines • Show All 3,341 Lines • ▼ Show 20 Lines	ShapedType PackOp::inferPackedType(ShapedType sourceType,
if (!outerDimsPerm.empty())		if (!outerDimsPerm.empty())
applyPermutationToVector(resultShape, outerDimsPerm);		applyPermutationToVector(resultShape, outerDimsPerm);

// Append the inner tile dimensions.		// Append the inner tile dimensions.
resultShape.append(innerTileSizes.begin(), innerTileSizes.end());		resultShape.append(innerTileSizes.begin(), innerTileSizes.end());
return RankedTensorType::get(resultShape, sourceType.getElementType());		return RankedTensorType::get(resultShape, sourceType.getElementType());
}		}

		Value PackOp::createDestinationTensor(OpBuilder &b, Location loc, Value source,
		ArrayRef<OpFoldResult> innerTileSizes,
		ArrayRef<int64_t> innerDimsPos,
		ArrayRef<int64_t> outerDimsPerm) {
		AffineExpr dim0, dim1;
		bindDims(b.getContext(), dim0, dim1);
		auto ceilDiv = [&](OpFoldResult v1, OpFoldResult v2) -> OpFoldResult {
		return makeComposedFoldedAffineApply(b, loc, dim0.ceilDiv(dim1), {v1, v2});
		};

		SmallVector<OpFoldResult> mixedSizes;
		for (auto [index, value] :
		llvm::enumerate(source.getType().cast<RankedTensorType>().getShape())) {
		if (ShapedType::isDynamic(value))
		mixedSizes.push_back(b.create<DimOp>(loc, source, index).getResult());
		else
		mixedSizes.push_back(b.getIndexAttr(value));
		}
		for (auto it : llvm::zip(innerDimsPos, innerTileSizes)) {
		int64_t dimPos = std::get<0>(it);
		OpFoldResult tileSize = std::get<1>(it);
		mixedSizes[dimPos] = ceilDiv(mixedSizes[dimPos], tileSize);
		}
		if (!outerDimsPerm.empty())
		applyPermutationToVector<OpFoldResult>(mixedSizes, outerDimsPerm);

		mixedSizes.append(innerTileSizes.begin(), innerTileSizes.end());
		auto elemType = source.getType().cast<ShapedType>().getElementType();
		return b.create<tensor::EmptyOp>(loc, mixedSizes, elemType);
		}

/// Returns true if the tiles and the tiled dims are constant.		/// Returns true if the tiles and the tiled dims are constant.
template <typename OpTy>		template <typename OpTy>
bool areTilesAndTiledDimsAllConstant(OpTy op) {		bool areTilesAndTiledDimsAllConstant(OpTy op) {
static_assert(llvm::is_one_of<OpTy, PackOp, UnPackOp>::value,		static_assert(llvm::is_one_of<OpTy, PackOp, UnPackOp>::value,
"applies to only pack or unpack operations");		"applies to only pack or unpack operations");
ShapedType packedType = (std::is_same<OpTy, PackOp>::value)		ShapedType packedType = (std::is_same<OpTy, PackOp>::value)
? op.getDestType()		? op.getDestType()
: op.getSourceType();		: op.getSourceType();
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/data-layout-propagation.mlir

This file was added.

				// RUN: mlir-opt %s -test-linalg-data-layout-propagation -split-input-file \| FileCheck %s

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				func.func @dynamic_elem_pack(%arg0: tensor<?x?xf32>, %dest: tensor<?x?x8x2xf32>) -> tensor<?x?x8x2xf32>
				{
				%c0 = arith.constant 0 : index
				%c1 = arith.constant 1 : index
				%0 = tensor.dim %arg0, %c0 : tensor<?x?xf32>
				%1 = tensor.dim %arg0, %c1 : tensor<?x?xf32>
				%2 = tensor.empty(%0, %1) : tensor<?x?xf32>
				%3 = linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel", "parallel"]}
				ins(%arg0 : tensor<?x?xf32>)
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions please use `linalg.map` for all the generics in this test, they are simple maps nicolasvasilache: please use `linalg.map` for all the generics in this test, they are simple maps
				hanchungAuthorUnsubmitted Done Reply Inline Actions IIUC, `linalg.map` op is named op. I can't simply update the test file because the pattern works on linalg::GenericOp. Are you suggesting me to update the pattern taking linalg::LinalgOp instead? I think that could work for `linalg.map`, `linalg.transpose`, etc; I can take a stab at it, just wanna clarify before updating it. hanchung: IIUC, `linalg.map` op is named op. I can't simply update the test file because the pattern…
				hanchungAuthorUnsubmitted Done Reply Inline Actions It's quite tricky because the builder and clone method can't be reused. The most tricky part is about updating indexing_maps. The builder and clone method does not take indexing_maps into account. Also, there is not a clean way to use setting methods for them. hanchung: It's quite tricky because the builder and clone method can't be reused. The most tricky part is…
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions @pifon2a ? nicolasvasilache: @pifon2a ?
				hanchungAuthorUnsubmitted Done Reply Inline Actions I'm going to land the PR to unblock some work on @chelini side tomorrow. If it's an easy fix, I can give it a shot. If not, I'd prefer fixing it afterwards. Thanks! hanchung: I'm going to land the PR to unblock some work on @chelini side tomorrow. If it's an easy fix, I…
				outs(%2 : tensor<?x?xf32>) {
				^bb0(%arg3: f32, %arg4: f32):
				%4 = arith.addf %arg3, %arg3 : f32
				linalg.yield %4 : f32
				} -> tensor<?x?xf32>
				%4 = tensor.pack %3
				inner_dims_pos = [0, 1]
				inner_tiles = [8, 2]
				into %dest : tensor<?x?xf32> -> tensor<?x?x8x2xf32>
				return %4 : tensor<?x?x8x2xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 ceildiv 8)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 ceildiv 2)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				// CHECK: func.func @dynamic_elem_pack
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]
				// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
				// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[ARG0]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[ARG0]], %[[C1]]
				// CHECK-DAG: %[[OUTER_D0:.+]] = affine.apply #[[MAP0]]()[%[[D0]]]
				// CHECK-DAG: %[[OUTER_D1:.+]] = affine.apply #[[MAP1]]()[%[[D1]]]
				// CHECK: %[[ARG0_EMPTY:.+]] = tensor.empty(%[[OUTER_D0]], %[[OUTER_D1]]) : tensor<?x?x8x2xf32>
				// CHECK: %[[PACK_ARG0:.+]] = tensor.pack %[[ARG0]]
				// CHECK-SAME: inner_dims_pos = [0, 1] inner_tiles = [8, 2]
				// CHECK-SAME: into %[[ARG0_EMPTY]]
				// CHECK: %[[ELEM:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP2]], #[[MAP2]]]
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[PACK_ARG0]]
				// CHECK-SAME: outs(%[[DEST]]
				// CHECK: return %[[ELEM]] : tensor<?x?x8x2xf32>

				// -----

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				func.func @elem_pack_transpose_inner_dims(%arg0: tensor<128x256xi32>, %dest: tensor<4x16x16x32xi32>) -> tensor<4x16x16x32xi32>{
				%init = tensor.empty() : tensor<128x256xi32>
				%elem = linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel", "parallel"]}
				ins(%arg0 : tensor<128x256xi32>)
				outs(%init : tensor<128x256xi32>) {
				^bb0(%arg3: i32, %arg4: i32):
				%4 = arith.addi %arg3, %arg3 : i32
				linalg.yield %4 : i32
				} -> tensor<128x256xi32>
				%pack = tensor.pack %elem
				inner_dims_pos = [1, 0]
				inner_tiles = [16, 32]
				into %dest : tensor<128x256xi32> -> tensor<4x16x16x32xi32>
				return %pack : tensor<4x16x16x32xi32>
				}
				// CHECK-DAG: #[[MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				// CHECK: func.func @elem_pack_transpose_inner_dims
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]
				// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
				// CHECK: %[[ARG0_EMPTY:.+]] = tensor.empty() : tensor<4x16x16x32xi32>
				// CHECK: %[[PACK_ARG0:.+]] = tensor.pack %[[ARG0]]
				// CHECK-SAME: inner_dims_pos = [1, 0] inner_tiles = [16, 32]
				// CHECK-SAME: into %[[ARG0_EMPTY]]
				// CHECK: %[[ELEM:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP]], #[[MAP]]]
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[PACK_ARG0]]
				// CHECK-SAME: outs(%[[DEST]]
				// CHECK: return %[[ELEM]] : tensor<4x16x16x32xi32>

				// -----

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				func.func @elem_pack_transpose_outer_dims(%arg0: tensor<128x256xi32>, %dest: tensor<16x4x32x16xi32>) -> tensor<16x4x32x16xi32>{
				%init = tensor.empty() : tensor<128x256xi32>
				%elem = linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel", "parallel"]}
				ins(%arg0 : tensor<128x256xi32>)
				outs(%init : tensor<128x256xi32>) {
				^bb0(%arg3: i32, %arg4: i32):
				%4 = arith.addi %arg3, %arg3 : i32
				linalg.yield %4 : i32
				} -> tensor<128x256xi32>
				%pack = tensor.pack %elem
				outer_dims_perm = [1, 0]
				inner_dims_pos = [0, 1]
				inner_tiles = [32, 16]
				into %dest : tensor<128x256xi32> -> tensor<16x4x32x16xi32>
				return %pack : tensor<16x4x32x16xi32>
				}
				// CHECK-DAG: #[[MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				// CHECK: func.func @elem_pack_transpose_outer_dims
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]
				// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
				// CHECK: %[[ARG0_EMPTY:.+]] = tensor.empty() : tensor<16x4x32x16xi32>
				// CHECK: %[[PACK_ARG0:.+]] = tensor.pack %[[ARG0]]
				// CHECK-SAME: outer_dims_perm = [1, 0] inner_dims_pos = [0, 1] inner_tiles = [32, 16]
				// CHECK-SAME: into %[[ARG0_EMPTY]]
				// CHECK: %[[ELEM:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP]], #[[MAP]]]
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[PACK_ARG0]]
				// CHECK-SAME: outs(%[[DEST]]
				// CHECK: return %[[ELEM]] : tensor<16x4x32x16xi32>

				// -----

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				func.func @elem_pack_transpose_inner_and_outer_dims(%arg0: tensor<128x256xi32>, %dest: tensor<16x4x16x32xi32>) -> tensor<16x4x16x32xi32>{
				%init = tensor.empty() : tensor<128x256xi32>
				%elem = linalg.generic {indexing_maps = [#map0, #map0], iterator_types = ["parallel", "parallel"]}
				ins(%arg0 : tensor<128x256xi32>)
				outs(%init : tensor<128x256xi32>) {
				^bb0(%arg3: i32, %arg4: i32):
				%4 = arith.addi %arg3, %arg3 : i32
				linalg.yield %4 : i32
				} -> tensor<128x256xi32>
				%pack = tensor.pack %elem
				outer_dims_perm = [1, 0]
				inner_dims_pos = [1, 0]
				inner_tiles = [16, 32]
				into %dest : tensor<128x256xi32> -> tensor<16x4x16x32xi32>
				return %pack : tensor<16x4x16x32xi32>
				}
				// CHECK-DAG: #[[MAP:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				// CHECK: func.func @elem_pack_transpose_inner_and_outer_dims
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]
				// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
				// CHECK: %[[ARG0_EMPTY:.+]] = tensor.empty() : tensor<16x4x16x32xi32>
				// CHECK: %[[PACK_ARG0:.+]] = tensor.pack %[[ARG0]]
				// CHECK-SAME: outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [16, 32]
				// CHECK-SAME: into %[[ARG0_EMPTY]]
				// CHECK: %[[ELEM:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP]], #[[MAP]]]
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[PACK_ARG0]]
				// CHECK-SAME: outs(%[[DEST]]
				// CHECK: return %[[ELEM]] : tensor<16x4x16x32xi32>

				// -----

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				#map1 = affine_map<(d0, d1) -> (d0)>
				#map2 = affine_map<(d0, d1) -> (d1)>
				func.func @dynamic_broadcast_pack(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>, %dest: tensor<?x?x8x2xf32>) -> tensor<?x?x8x2xf32>
				{
				%c0 = arith.constant 0 : index
				%0 = tensor.dim %arg0, %c0 : tensor<?xf32>
				%1 = tensor.dim %arg1, %c0 : tensor<?xf32>
				%2 = tensor.empty(%0, %1) : tensor<?x?xf32>
				%3 = linalg.generic {indexing_maps = [#map1, #map2, #map0], iterator_types = ["parallel", "parallel"]}
				ins(%arg0, %arg1 : tensor<?xf32>, tensor<?xf32>)
				outs(%2 : tensor<?x?xf32>) {
				^bb0(%arg3: f32, %arg4: f32, %arg5: f32):
				%4 = arith.addf %arg3, %arg4 : f32
				linalg.yield %4 : f32
				} -> tensor<?x?xf32>
				%4 = tensor.pack %3
				inner_dims_pos = [0, 1]
				inner_tiles = [8, 2]
				into %dest : tensor<?x?xf32> -> tensor<?x?x8x2xf32>
				return %4 : tensor<?x?x8x2xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 ceildiv 8)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 ceildiv 2)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d2)>
				// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0, d1, d2, d3) -> (d1, d3)>
				// CHECK-DAG: #[[MAP4:.+]] = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
				// CHECK: func.func @dynamic_broadcast_pack
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]
				// CHECK-SAME: %[[DEST:[a-zA-Z0-9]+]]
				// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[ARG0]], %[[C0]]
				// CHECK-DAG: %[[OUTER_D0:.+]] = affine.apply #[[MAP0]]()[%[[D0]]]
				// CHECK: %[[ARG0_EMPTY:.+]] = tensor.empty(%[[OUTER_D0]]) : tensor<?x8xf32>
				// CHECK: %[[PACK_ARG0:.+]] = tensor.pack %[[ARG0]]
				// CHECK-SAME: inner_dims_pos = [0] inner_tiles = [8]
				// CHECK-SAME: into %[[ARG0_EMPTY]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[ARG1]], %[[C0]]
				// CHECK-DAG: %[[OUTER_D1:.+]] = affine.apply #[[MAP1]]()[%[[D1]]]
				// CHECK: %[[ARG1_EMPTY:.+]] = tensor.empty(%[[OUTER_D1]]) : tensor<?x2xf32>
				// CHECK: %[[PACK_ARG1:.+]] = tensor.pack %[[ARG1]]
				// CHECK-SAME: inner_dims_pos = [0] inner_tiles = [2]
				// CHECK-SAME: into %[[ARG1_EMPTY]]
				// CHECK: %[[ELEM:.+]] = linalg.generic
				// CHECK-SAME: indexing_maps = [#[[MAP2]], #[[MAP3]], #[[MAP4]]]
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "parallel", "parallel"]
				// CHECK-SAME: ins(%[[PACK_ARG0]], %[[PACK_ARG0]]
				// CHECK-SAME: outs(%[[DEST]]
				// CHECK: return %[[ELEM]] : tensor<?x?x8x2xf32>

				// -----

				#map0 = affine_map<(d0, d1) -> (d0, d1)>
				#map1 = affine_map<(d0, d1) -> (d0)>
				#map2 = affine_map<(d0, d1) -> (d1)>
				func.func @transpose_pack(%arg0: tensor<100x128x200x256xi32>, %arg1: tensor<100xi32>, %arg2: tensor<128xi32>, %dest: tensor<100x200x4x16x16x32xi32>) -> tensor<100x200x4x16x16x32xi32>
				{
				%init_transpose = tensor.empty() : tensor<100x200x128x256xi32>
				%transpose = linalg.generic {
				indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>,
				affine_map<(d0, d1, d2, d3) -> (d0)>,
				affine_map<(d0, d1, d2, d3) -> (d1)>,
				affine_map<(d0, d1, d2, d3) -> (d0, d2, d1, d3)>],
				iterator_types = ["parallel", "parallel", "parallel", "parallel"]}
				ins(%arg0, %arg1, %arg2 : tensor<100x128x200x256xi32>, tensor<100xi32>, tensor<128xi32>)
				outs(%init_transpose : tensor<100x200x128x256xi32>) {
				^bb0(%b0 : i32, %b1 : i32, %b2 : i32, %b3 : i32):
				%0 = arith.addi %b0, %b1 : i32
				%1 = arith.addi %0, %b2 : i32
				linalg.yield %1 : i32
				} -> tensor<100x200x128x256xi32>
				%4 = tensor.pack %transpose
				inner_dims_pos = [3, 2]
				inner_tiles = [16, 32]
				into %dest : tensor<100x200x128x256xi32> -> tensor<100x200x4x16x16x32xi32>
				return %4 : tensor<100x200x4x16x16x32xi32>
				}
				// CHECK: func.func @transpose_pack
				// CHECK: linalg.generic
				// CHECK: tensor.pack

mlir/test/lib/Dialect/Linalg/CMakeLists.txt

	# Exclude tests from libMLIR.so			# Exclude tests from libMLIR.so
	add_mlir_library(MLIRLinalgTestPasses			add_mlir_library(MLIRLinalgTestPasses
				TestDataLayoutPropagation.cpp
	TestLinalgDecomposeOps.cpp			TestLinalgDecomposeOps.cpp
	TestLinalgElementwiseFusion.cpp			TestLinalgElementwiseFusion.cpp
	TestLinalgFusionTransforms.cpp			TestLinalgFusionTransforms.cpp
	TestLinalgHoisting.cpp			TestLinalgHoisting.cpp
	TestLinalgTransforms.cpp			TestLinalgTransforms.cpp
	TestPadFusion.cpp			TestPadFusion.cpp

	EXCLUDE_FROM_LIBMLIR			EXCLUDE_FROM_LIBMLIR
	Show All 23 Lines

mlir/test/lib/Dialect/Linalg/TestDataLayoutPropagation.cpp

This file was added.

				//===- TestDataLayoutPropagation.cpp --------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Pass/PassManager.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

				using namespace mlir;

				namespace {
				struct TestDataLayoutPropagationPass
				: public PassWrapper<TestDataLayoutPropagationPass, OperationPass<>> {
				MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(TestDataLayoutPropagationPass)

				void getDependentDialects(DialectRegistry &registry) const override {
				registry
				.insert<AffineDialect, linalg::LinalgDialect, tensor::TensorDialect>();
				}

				StringRef getArgument() const final {
				return "test-linalg-data-layout-propagation";
				}
				StringRef getDescription() const final {
				return "Test data layout propagation";
				}

				void runOnOperation() override {
				MLIRContext *context = &getContext();
				RewritePatternSet patterns(context);
				linalg::populateDataLayoutPropagationPatterns(patterns);
				if (failed(
				applyPatternsAndFoldGreedily(getOperation(), std::move(patterns))))
				return signalPassFailure();
				}
				};
				} // namespace

				namespace mlir {
				namespace test {
				void registerTestDataLayoutPropagation() {
				PassRegistration<TestDataLayoutPropagationPass>();
				}
				} // namespace test
				} // namespace mlir

mlir/tools/mlir-opt/mlir-opt.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
void registerTestArithEmulateWideIntPass();		void registerTestArithEmulateWideIntPass();
void registerTestAliasAnalysisPass();		void registerTestAliasAnalysisPass();
void registerTestBuiltinAttributeInterfaces();		void registerTestBuiltinAttributeInterfaces();
void registerTestCallGraphPass();		void registerTestCallGraphPass();
void registerTestConstantFold();		void registerTestConstantFold();
void registerTestControlFlowSink();		void registerTestControlFlowSink();
void registerTestGpuSerializeToCubinPass();		void registerTestGpuSerializeToCubinPass();
void registerTestGpuSerializeToHsacoPass();		void registerTestGpuSerializeToHsacoPass();
		void registerTestDataLayoutPropagation();
void registerTestDataLayoutQuery();		void registerTestDataLayoutQuery();
void registerTestDeadCodeAnalysisPass();		void registerTestDeadCodeAnalysisPass();
void registerTestDecomposeCallGraphTypes();		void registerTestDecomposeCallGraphTypes();
void registerTestDiagnosticsPass();		void registerTestDiagnosticsPass();
void registerTestDialectConversionPasses();		void registerTestDialectConversionPasses();
void registerTestDominancePass();		void registerTestDominancePass();
void registerTestDynamicPipelinePass();		void registerTestDynamicPipelinePass();
void registerTestExpandMathPass();		void registerTestExpandMathPass();
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	void registerTestPasses() {
mlir::test::registerTestDialectConversionPasses();		mlir::test::registerTestDialectConversionPasses();
#if MLIR_CUDA_CONVERSIONS_ENABLED		#if MLIR_CUDA_CONVERSIONS_ENABLED
mlir::test::registerTestGpuSerializeToCubinPass();		mlir::test::registerTestGpuSerializeToCubinPass();
#endif		#endif
#if MLIR_ROCM_CONVERSIONS_ENABLED		#if MLIR_ROCM_CONVERSIONS_ENABLED
mlir::test::registerTestGpuSerializeToHsacoPass();		mlir::test::registerTestGpuSerializeToHsacoPass();
#endif		#endif
mlir::test::registerTestDecomposeCallGraphTypes();		mlir::test::registerTestDecomposeCallGraphTypes();
		mlir::test::registerTestDataLayoutPropagation();
mlir::test::registerTestDataLayoutQuery();		mlir::test::registerTestDataLayoutQuery();
mlir::test::registerTestDeadCodeAnalysisPass();		mlir::test::registerTestDeadCodeAnalysisPass();
mlir::test::registerTestDominancePass();		mlir::test::registerTestDominancePass();
mlir::test::registerTestDynamicPipelinePass();		mlir::test::registerTestDynamicPipelinePass();
mlir::test::registerTestExpandMathPass();		mlir::test::registerTestExpandMathPass();
mlir::test::registerTestFooAnalysisPass();		mlir::test::registerTestFooAnalysisPass();
mlir::test::registerTestComposeSubView();		mlir::test::registerTestComposeSubView();
mlir::test::registerTestMultiBuffering();		mlir::test::registerTestMultiBuffering();
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][tensor][linalg] Introduce DataLayoutPropagation pass.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 479418

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

mlir/lib/Dialect/Linalg/Transforms/DataLayoutPropagation.cpp

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp

mlir/test/Dialect/Linalg/data-layout-propagation.mlir

mlir/test/lib/Dialect/Linalg/CMakeLists.txt

mlir/test/lib/Dialect/Linalg/TestDataLayoutPropagation.cpp

mlir/tools/mlir-opt/mlir-opt.cpp

[mlir][tensor][linalg] Introduce DataLayoutPropagation pass.
ClosedPublic