This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/Linalg/TilingInterface/
-
Linalg/
-
TilingInterface/
1/2
Tiling.h
-
Interfaces/
-
CMakeLists.txt
-
TilingInterface.h
5/6
TilingInterface.td
-
lib/
-
Dialect/Linalg/
-
Linalg/
-
CMakeLists.txt
-
TilingInterface/
-
CMakeLists.txt
12/17
Tiling.cpp
-
Interfaces/
-
CMakeLists.txt
-
TilingInterface.cpp
-
test/
-
Interfaces/TilingInterface/
-
TilingInterface/
-
tiling.mlir
-
lib/
-
CMakeLists.txt
-
Dialect/Test/
-
Test/
-
TestDialect.h
-
TestOps.td
-
Interfaces/
-
CMakeLists.txt
-
TilingInterface/
-
CMakeLists.txt
2/4
TestTilingInterface.cpp
-
tools/mlir-opt/
-
mlir-opt/
-
CMakeLists.txt
-
mlir-opt.cpp

Differential D106406

[mlir] Add an interface to allow operations to specify how they can be tiled.
AbandonedPublic

Authored by mravishankar on Jul 20 2021, 3:29 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache

Summary

An interface to allow for tiling of operations is introduced
here. Using this interface a tiling algorithm is implemented that
tiles an operation that implements that interface.

The current tiling algorithm uses the same options as Linalg ops and
uses a similar control mechanism. To not leak these options outside of
Linalg, the developed tiling algorithm is currently places within
Linalg Dialect.

The test ops defined to check the tiling algorithm implement the
interface as an ExternalModel. Also implemented is the tiling of
tensor.extract_slice using a similar approach.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mravishankar created this revision.Jul 20 2021, 3:29 PM

Herald added subscribers: dcaballe, cota, teijeong and 17 others. · View Herald TranscriptJul 20 2021, 3:29 PM

mravishankar requested review of this revision.Jul 20 2021, 3:29 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 20 2021, 3:29 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Cmake file fix.

Fix Cmake builds.

troggo added a subscriber: troggo.Aug 11 2021, 6:16 PM

Herald added subscribers: wrengr, Chia-hungDuan. · View Herald TranscriptAug 11 2021, 6:16 PM

springerm added a subscriber: springerm.Aug 15 2021, 7:47 PM

springerm added inline comments.

mlir/include/mlir/Interfaces/TilingInterface.td
23	them
58–61	I think there could be a bit more documentation around these and the method description. In particular, what could've helped me understand this faster (without looking at the code): `getTiledImplementation` generates the ExtractSlice ops (extracting tiles of size `sizes` at position `offsets`). The caller of `getTiledImplementation` is responsible for writing back the tile result to the output tensor via InsertSlice. The result offset and size are calculated by `getTiledImplementation`. The caller of `getTiledImplementation` is responsible for generating the loop nest. Maybe rename `sizes` to `tileSizes`. Should `resultOffsets` etc. be called `outputOffsets`?
mlir/lib/Dialect/Linalg/TilingInterface/Tiling.cpp
56–67	Can be deleted. These helper functions are in StaticValueUtils.
94–95	Just an implementation detail, but I would mention somewhere that this function generates just one loop and then calls itself recursively.
120	if body could be a separate static helper function. Sth like `insertTiledOpIntoOutput`.
147–149	Why not insert into `ret.results` directly?
184	What is a "buffer"? Is this needed to support memref and tensor outputs? (Probably not because the above code always generates an InsertSlice, so only tensor works...) A comment would be helpful.
229–230	Not sure what this means.
232	There's `getAsOpFoldResult` in StaticValueUtils.
239	why are non-parallel loop iterators different?
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp
46–51	Not exactly the same, but you could probably use `getConstantIntValue` from StaticValueUtils.
74	nit: `getDimValue` uses `Value()`

nicolasvasilache requested changes to this revision.Aug 19 2021, 12:05 PM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/Linalg/TilingInterface/Tiling.h
39	This looks quite similar to https://sourcegraph.com/github.com/llvm/llvm-project@dcc6b7b1d5e5a0f9537ce1bf919ac2338bd7ad7b/-/blob/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h?L532. I am wary of adding a parallel tiling implementation that has not yet proven to compose with the other transformations we want (e.g. fusion, padding, distribution). I would be much more supportive of seeing the tiling interface integrated into the existing flow. In other words, there isn't room for multiple implementations of the same thing: either we replace the existing one completely in a single shot or we incrementally refactor it to use the interface.
mlir/include/mlir/Interfaces/TilingInterface.td
37	Part of the exercise here is to split the LinalgStructureOpInterface into the parts that are useful for tiling. This should come with a matching deletion on the LinalgStructuredOpInterface side, otherwise this is duplication. If there are layering issues that make this difficult, we don't want them to hit us halfway into some future refactoring.
58–61	+1 this definitely needs to be heavily documented.
mlir/lib/Dialect/Linalg/TilingInterface/Tiling.cpp
111	This is not just tiling but tile and distributed fused into a single implementation. Besides the other points I made in other places re. duplication of functionality and overlap, I also have deeper composability and evolution concerns with the current impl and API.
175	Why are we talking about distribution here? This should be done in a composable way tiling is one thing, distribution is another and they should compose.
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp
2	I am having trouble understanding the purpose of this file. You propose a new tiling interface op that can generalize in the future, this is great. I do not understand why this interface is not hooked up to the existing tiling passes. Even if this is meant as a transient state, this introduces 3 extra "one-off" tiling implementations + extra patterns: TestFullSizeOutputTileTilingInterface::getTiledImplementation TestMixedReduceParallelTilingInterface::getTiledImplementation InsertSliceTilingInterface::getTiledImplementation History shows that this will turn into more dead code in the end (either this file or the existing tiling). I have not yet seen evidence that we want to buy into these implementations of tiling. The interface is one thing that I support as we previously discussed, the usage of the interface as proposed in this revision not so much.

This revision now requires changes to proceed.Aug 19 2021, 12:05 PM

Rebase and address a few comments.

Based on comments and offline discussion, going to abandon this change. Will try to find a way to land the interface without the tiling algorithm.

mlir/include/mlir/Dialect/Linalg/TilingInterface/Tiling.h
39	Yes it does. I agree that having two tiling algorithms is not ideal. I think integrating the interface into the existing Linalg tiling algorithm is going to be a more gradual process. My plan was to land a prototype and then evolve the existing tiling algorithm making one of them eventually redundant. That being said, I am happy to not land the tiling algorithm implemented in this patch. So will abandon this patch and keep this as a WIP. Will try to see if I can land the itnerface by using it for tiling the `linalg.pad_tensor` that exists (I dont know how it is implemented, but would be good to use this interface for that and see what I can learn from this).
mlir/include/mlir/Interfaces/TilingInterface.td
37	I suspect there is going to be some hairy issues to resolve. So might be worth landing the interface first and then making structured ops implement more and more of this interface (and removing them from structured ops interface).
58–61	Added some more documentation. More of an FYI. This patch is not meant to be landed at this point (based on feedback).
mlir/lib/Dialect/Linalg/TilingInterface/Tiling.cpp
94–95	Added comments about this. Thanks!
111	The existing tiling algorithm in Linalg is also doing tile + distribute at the same time (unless it has been changed from the last time I saw this). So this isnt doing anything different. From the last time we talked about this (ages ago), tile + distribute could happen at the same time since distribution is only a matter of changing lower bound and step of the loop. Maybe there is a change in the thought here. I am all for composable steps, but this was just keeping the same functionality as the existing tiling algorithm
120	Changed it now, so this much simpler. After using this a bit, it is very awkward for the tiling algorithm to do these inserts. Its much cleaner for the interface method to do this. So the change has been refactored to reflect that. In any case, not going to land this patch, so the details of the tiling algorithm might be immaterial.
147–149	Unnecessary now.
175	(see above)
184	No, this works for both tensors and buffers. The above insert slice is generated only when the tiled operation has a result value (that happens only when there operands are `tensor` types). The structure of the loop generated is also different for tensors and buffers. So again here, the operation not returning a value is used as a proxy to say that this is tiling on tensors or tiling on buffers. ANyway, we can revisit this since this algorithm is not going to be landed as is.
239	THis has to do with distribution. Right now the algorithm here does not handle tiling the reduction dimension. THe reduction is a bit more involved since I am not assuming that this is a commutative/associative operation. Maybe reduction is the wrong terminology and we need a new iterator type called "sequential" to truly capture the op semantics. TBD. (Again moot since this algorithm is not landing)
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp
2	Agreed. To repeat from above. I wasnt sure I wanted to change the existing tiling algorithm to use the interface in one-shot. So tried to stage it with a separate tiling algorithm that uses the interface. Lets talk about how we can stage this process.

Harbormaster completed remote builds in B120455: Diff 367654.Aug 19 2021, 4:50 PM

mravishankar mentioned this in D108611: [mlir] Add an interface to allow operations to specify how they can be tiled..Aug 23 2021, 11:05 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

TilingInterface/

Tiling.h

61 lines

Interfaces/

CMakeLists.txt

1 line

TilingInterface.h

26 lines

TilingInterface.td

100 lines

lib/

Dialect/

Linalg/

CMakeLists.txt

1 line

TilingInterface/

CMakeLists.txt

16 lines

Tiling.cpp

256 lines

Interfaces/

CMakeLists.txt

3 lines

TilingInterface.cpp

17 lines

test/

Interfaces/

TilingInterface/

tiling.mlir

427 lines

lib/

CMakeLists.txt

1 line

Dialect/

Test/

TestDialect.h

1 line

TestOps.td

77 lines

Interfaces/

CMakeLists.txt

1 line

TilingInterface/

CMakeLists.txt

24 lines

TestTilingInterface.cpp

495 lines

tools/

mlir-opt/

CMakeLists.txt

1 line

mlir-opt.cpp

2 lines

Diff 367654

mlir/include/mlir/Dialect/Linalg/TilingInterface/Tiling.h

This file was added.

				//===- Tiling.h - Tiling transformsion using TilingInterface ----- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef DIALECT_LINALG_TILINGINTERFACE_TILING_H_
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] not useful Lint: Pre-merge checks: clang-tidy: warning: header guard does not follow preferred style [llvm-header-guard] [[https…
				#define DIALECT_LINALG_TILINGINTERFACE_TILING_H_

				#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
				#include "mlir/Interfaces/TilingInterface.h"

				namespace mlir {

				/// Structure to represent the result of tiling operation using TilingInterface.
				struct TiledOp {
				/// Tiled op.
				Operation *op;
				/// Loops generated during tiling.
				SmallVector<Operation *> loops;
				/// Values that are replacements for the untiled operations.
				SmallVector<Value> results;
				};

				/// Main entry point for tiling LinalgExtOps using TilingInterface. If the `op`
				/// does not implement the `TilingInterface` returns a `TiledOp{}` value.
				FailureOr<TiledOp>
				tileOpUsingInterface(OpBuilder &b, TilingInterface tilableOp,
				const linalg::LinalgTilingOptions &options);

				/// Base pattern for tiling TilingInterface. Patterns can inherit from this
				/// class and implement the `matchAndRewrite` method to call into the
				/// `matchAndRewriteBase` method of this class. Note that this method, does not
				/// delete the operation. Instead the `matchAndRewriteBase` method returns
				/// failure if an error was encountered. If the tiled implementation of the
				/// operation wasnt found, this method returns `TiledOp{}`.
				struct TilingInterfaceBasePattern : public RewritePattern {
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This looks quite similar to https://sourcegraph.com/github.com/llvm/llvm-project@dcc6b7b1d5e5a0f9537ce1bf919ac2338bd7ad7b/-/blob/mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h?L532. I am wary of adding a parallel tiling implementation that has not yet proven to compose with the other transformations we want (e.g. fusion, padding, distribution). I would be much more supportive of seeing the tiling interface integrated into the existing flow. In other words, there isn't room for multiple implementations of the same thing: either we replace the existing one completely in a single shot or we incrementally refactor it to use the interface. nicolasvasilache: This looks quite similar to https://sourcegraph.com/github.com/llvm/llvm…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions Yes it does. I agree that having two tiling algorithms is not ideal. I think integrating the interface into the existing Linalg tiling algorithm is going to be a more gradual process. My plan was to land a prototype and then evolve the existing tiling algorithm making one of them eventually redundant. That being said, I am happy to not land the tiling algorithm implemented in this patch. So will abandon this patch and keep this as a WIP. Will try to see if I can land the itnerface by using it for tiling the `linalg.pad_tensor` that exists (I dont know how it is implemented, but would be good to use this interface for that and see what I can learn from this). mravishankar: Yes it does. I agree that having two tiling algorithms is not ideal. I think integrating the…
				TilingInterfaceBasePattern(StringRef opName, MLIRContext *context,
				linalg::LinalgTilingOptions options,
				linalg::LinalgTransformationFilter filter =
				linalg::LinalgTransformationFilter(),
				PatternBenefit benefit = 1)
				: RewritePattern(opName, benefit, context), filter(filter),
				options(options) {}

				LogicalResult matchAndRewriteBase(TilingInterface tilableOp,
				PatternRewriter &rewriter,
				TiledOp &result) const;

				private:
				/// LinalgTransformMarker handles special attribute manipulations.
				linalg::LinalgTransformationFilter filter;
				/// Options to control tiling;
				linalg::LinalgTilingOptions options;
				};

				} // namespace mlir

				#endif // DIALECT_LINALT_TILINGINTERFACE_TILING_H_

mlir/include/mlir/Interfaces/CMakeLists.txt

	add_mlir_interface(CallInterfaces)			add_mlir_interface(CallInterfaces)
	add_mlir_interface(CastInterfaces)			add_mlir_interface(CastInterfaces)
	add_mlir_interface(ControlFlowInterfaces)			add_mlir_interface(ControlFlowInterfaces)
	add_mlir_interface(CopyOpInterface)			add_mlir_interface(CopyOpInterface)
	add_mlir_interface(DerivedAttributeOpInterface)			add_mlir_interface(DerivedAttributeOpInterface)
	add_mlir_interface(InferTypeOpInterface)			add_mlir_interface(InferTypeOpInterface)
	add_mlir_interface(LoopLikeInterface)			add_mlir_interface(LoopLikeInterface)
	add_mlir_interface(SideEffectInterfaces)			add_mlir_interface(SideEffectInterfaces)
				add_mlir_interface(TilingInterface)
	add_mlir_interface(VectorInterfaces)			add_mlir_interface(VectorInterfaces)
	add_mlir_interface(ViewLikeInterface)			add_mlir_interface(ViewLikeInterface)

	set(LLVM_TARGET_DEFINITIONS DataLayoutInterfaces.td)			set(LLVM_TARGET_DEFINITIONS DataLayoutInterfaces.td)
	mlir_tablegen(DataLayoutAttrInterface.h.inc -gen-attr-interface-decls)			mlir_tablegen(DataLayoutAttrInterface.h.inc -gen-attr-interface-decls)
	mlir_tablegen(DataLayoutAttrInterface.cpp.inc -gen-attr-interface-defs)			mlir_tablegen(DataLayoutAttrInterface.cpp.inc -gen-attr-interface-defs)
	mlir_tablegen(DataLayoutOpInterface.h.inc -gen-op-interface-decls)			mlir_tablegen(DataLayoutOpInterface.h.inc -gen-op-interface-decls)
	mlir_tablegen(DataLayoutOpInterface.cpp.inc -gen-op-interface-defs)			mlir_tablegen(DataLayoutOpInterface.cpp.inc -gen-op-interface-defs)
	Show All 19 Lines

mlir/include/mlir/Interfaces/TilingInterface.h

This file was added.

				//===- TilingInterface.h - Interface for tiling operations ------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the definitions of the TilingInterface defined in
				// `TilingInterface.td`.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_INTERFACES_TILINGINTERFACE_H_
				#define MLIR_INTERFACES_TILINGINTERFACE_H_

				#include "mlir/IR/Builders.h"
				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/Interfaces/ViewLikeInterface.h"
				#include "mlir/Support/LLVM.h"

				/// Include the ODS generated interface header files.
				#include "mlir/Interfaces/TilingInterface.h.inc"

				#endif // MLIR_INTERFACES_TILINGINTERFACE_H_

mlir/include/mlir/Interfaces/TilingInterface.td

This file was added.

				//===- TilingInterface.td - Interface for tiling operations - tablegen --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains an interface to allow operations to generate a tiled
				// implementation of themselves.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_TILINGINTERFACE
				#define MLIR_TILINGINTERFACE

				include "mlir/IR/OpBase.td"

				def TilingInterface : OpInterface<"TilingInterface"> {
				let description = [{
				Interface for allowing operations to expose information needed to
				tile them (similar to LinalgOp, but without having access to
				indexing maps)
				springermUnsubmitted Done Reply Inline Actions them springerm: them
				}];
				let cppNamespace = "::mlir";
				let methods = [
				InterfaceMethod<
				/desc=/[{
				Returns a list of operands into which the result of the
				tiled implementation is written into. With `tensor`
				operands, this will be used as the initial tensor into which
				the tiled results are inserted into. With `memref` operands,
				this will be the operand into which the result of the tiled
				operation is written into.
				}],
				/retType=/"SmallVector<Value>",
				/methodName=/"getDestinationOperands",
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Part of the exercise here is to split the LinalgStructureOpInterface into the parts that are useful for tiling. This should come with a matching deletion on the LinalgStructuredOpInterface side, otherwise this is duplication. If there are layering issues that make this difficult, we don't want them to hit us halfway into some future refactoring. nicolasvasilache: Part of the exercise here is to split the LinalgStructureOpInterface into the parts that are…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions I suspect there is going to be some hairy issues to resolve. So might be worth landing the interface first and then making structured ops implement more and more of this interface (and removing them from structured ops interface). mravishankar: I suspect there is going to be some hairy issues to resolve. So might be worth landing the…
				/args=/(ins "OpBuilder &":$b),
				/methodBody=/"",
				/defaultImplementation=/"return ValueRange{};"
				>,
				InterfaceMethod<
				/desc=/[{
				Returns a list of `StringRef`s that describe the number of
				loops and the iterator types of the operation. The list is
				expected to use
				`getParallelIteratorTypeName()`/`getReductionIteratorTypeName()`
				from MLIR Structured Op Utils.
				}],
				/retType=/"SmallVector<StringRef>",
				/methodName=/"getLoopIteratorTypes"
				>,
				InterfaceMethod<
				/desc=/[{
				Returns a list of ranges that describe the loop bounds and
				step for the loops of the operation.
				}],
				/retTy=/"SmallVector<Range>",
				/methodName=/"getLoopBounds",
				/args=/(ins "OpBuilder &":$b)
				>,
				springermUnsubmitted Done Reply Inline Actions I think there could be a bit more documentation around these and the method description. In particular, what could've helped me understand this faster (without looking at the code): `getTiledImplementation` generates the ExtractSlice ops (extracting tiles of size `sizes` at position `offsets`). The caller of `getTiledImplementation` is responsible for writing back the tile result to the output tensor via InsertSlice. The result offset and size are calculated by `getTiledImplementation`. The caller of `getTiledImplementation` is responsible for generating the loop nest. Maybe rename `sizes` to `tileSizes`. Should `resultOffsets` etc. be called `outputOffsets`? springerm: I think there could be a bit more documentation around these and the method description. In…
				nicolasvasilacheUnsubmitted Done Reply Inline Actions +1 this definitely needs to be heavily documented. nicolasvasilache: +1 this definitely needs to be heavily documented.
				mravishankarAuthorUnsubmitted Done Reply Inline Actions Added some more documentation. More of an FYI. This patch is not meant to be landed at this point (based on feedback). mravishankar: Added some more documentation. More of an FYI. This patch is not meant to be landed at this…
				InterfaceMethod<
				/desc=/[{
				Method to generate the tiled implementation of an operation.

				The iteration space of the operation is returned by
				`getLoopBounds`. The caller provides the information of the
				tile within this iteration space whose implementation the
				caller needs.
				- `offsets` provides the offset of the tile within the
				iteration space
				- `sizes` provides the size of the tile.
				- `dest` are the Value into which the result of the tiled
				operation is to be inserted into. The type of the `dest`
				Values is same as the types returned by
				`getDestinationOperands` method.
				- When the operands of the operation are `tensor` types, the
				result of the tiled operation are inserted into the
				corresponding `dest` values, and a new tensor is created
				(using destructive-update paradigm). These new values are
				to be returned to the caller in the `results` vector. When
				the operands of the operation are `memref` types, this
				vector can be left empty.
				}],
				/retType=/"Operation *",
				/methodName=/"getTiledImplementation",
				/args=/(ins
				"OpBuilder &":$b,
				"ValueRange ":$dest,
				"ArrayRef<OpFoldResult> ":$offsets,
				"ArrayRef<OpFoldResult> ":$sizes,
				"SmallVector<Value> &":$results),
				/methodBody=/"",
				/defaultImplementation=/[{
				return nullptr;
				}]
				>
				];
				}
				#endif // MLIR_TILINGINTERFACE

mlir/lib/Dialect/Linalg/CMakeLists.txt

	add_subdirectory(Analysis)			add_subdirectory(Analysis)
	add_subdirectory(IR)			add_subdirectory(IR)
				add_subdirectory(TilingInterface)
	add_subdirectory(Transforms)			add_subdirectory(Transforms)
	add_subdirectory(Utils)			add_subdirectory(Utils)

mlir/lib/Dialect/Linalg/TilingInterface/CMakeLists.txt

This file was added.

				add_mlir_dialect_library(MLIRTilingTransform
				Tiling.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Linalg/

				LINK_LIBS_PUBLIC
				MLIRAffine
				MLIRIR
				MLIRLinalgTransforms
				MLIRMemRef
				MLIRSCF
				MLIRStandard
				MLIRTensor
				MLIRTilingInterface
				)

mlir/lib/Dialect/Linalg/TilingInterface/Tiling.cpp

This file was added.

				//===- Tiling.cpp - Implementation of Tiling using TilingInterface --------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the Tiling using Tiling Interface.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/Linalg/TilingInterface/Tiling.h"
				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/SCF/SCF.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/IR/Matchers.h"
				#include "mlir/Interfaces/TilingInterface.h"

				using namespace mlir;

				//===----------------------------------------------------------------------===//
				// Utility methods for tiling a linalg_ext operation that implements a
				// TilingInterface
				//===----------------------------------------------------------------------===//

				/// Returns failure if the options are unsupported.
				static LogicalResult
				verifySupportedTilingOptions(PatternRewriter &rewriter, Operation *op,
				const linalg::LinalgTilingOptions &options) {
				if (!options.interchangeVector.empty()) {
				return rewriter.notifyMatchFailure(op,
				"unsupported interchange during tiling");
				}
				if (options.paddingValueComputationFunction) {
				return rewriter.notifyMatchFailure(op, "unsupported tile + pad option");
				}
				if (options.loopType != linalg::LinalgTilingLoopType::Loops) {
				return rewriter.notifyMatchFailure(op,
				"only tiling with scf.for is supported");
				}
				if (options.distribution) {
				if (llvm::any_of(options.distribution->distributionMethod,
				[](linalg::DistributionMethod method) {
				return method != linalg::DistributionMethod::Cyclic;
				})) {
				return rewriter.notifyMatchFailure(op,
				"only cyclic distibution is allowed");
				}
				}
				return success();
				}

				/// Converts an `OpFoldResult` to a `Value` by building a constant op if
				/// if the `OpFoldResult` is an `IntegerAttr`.
				static Value getValue(OpBuilder &builder, Location loc,
				OpFoldResult valueOrAttr) {
				if (auto attr = valueOrAttr.dyn_cast<Attribute>()) {
				return builder.create<ConstantIndexOp>(loc,
				attr.cast<IntegerAttr>().getInt());
				}
				return valueOrAttr.get<Value>();
				}

				/// Checks if `valueOrAttr` represents a constant value `val`.
				static bool isValue(OpFoldResult valueOrAttr, int64_t val) {
				springermUnsubmitted Done Reply Inline Actions Can be deleted. These helper functions are in StaticValueUtils. springerm: Can be deleted. These helper functions are in StaticValueUtils.
				auto attr = valueOrAttr.dyn_cast<Attribute>();
				return attr && attr.cast<IntegerAttr>().getValue() == val;
				}

				/// Returns true if loop is untiled. Only checks if the value is statically
				/// zero. It is assumed that a `Value` defined by a constant op is already
				/// converted to an `IntegerAttr` of that value. So here just return true if
				/// this is an attribute with a zero value.
				static bool isUntiledLoop(OpFoldResult valueOrAttr) {
				return isValue(valueOrAttr, 0);
				}

				/// Generates the tiled loops and the body by invoking the interface methods of
				/// TilingInterface. This method generates a single tiled loop and calls itself
				/// recursively to generate all the tiled loops (the recursive call increases
				/// the `loopDepth` value by 1). For every invocation the function appends to
				/// the `offsets` list with the offset to be used within the body of the loopfor
				/// the dimension of the iteration space being tiled.
				/// - `outputs` are the operands to use for outputs of the tiled operation.
				/// - `tileSizes` are tile sizes specified for all loops of the operation. If a
				/// loop is to be untiled it is set to 0.
				/// - `iteratorType` is the type of the loop iterator returned by the
				/// TilingInterface.
				/// - `loopBounds` are the bounds of all the loops of the op returned by the
				/// TilingInterface.
				/// - `loopDepth` is the current loop depth being processed.
				/// - `offsets` are the `Value`s that represent the position of the tile being
				/// operated on. The offsets are computed as the tiled loops are being
				springermUnsubmitted Done Reply Inline Actions Just an implementation detail, but I would mention somewhere that this function generates just one loop and then calls itself recursively. springerm: Just an implementation detail, but I would mention somewhere that this function generates just…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions Added comments about this. Thanks! mravishankar: Added comments about this. Thanks!
				/// generated.
				/// - `distributionInfo` is the proc_id and nprocs `Value`s to be used for
				/// distributed loops. It is a stack, and once an entry at the top of the
				/// stack is used for distribution it is popped before processing the inner
				/// loops.
				static FailureOr<TiledOp> tileOpUsingInterfaceImpl(
				OpBuilder &builder, TilingInterface tilableOp, ValueRange outputs,
				MutableArrayRef<OpFoldResult> tileSizes, ArrayRef<StringRef> iteratorTypes,
				ArrayRef<Range> loopBounds, unsigned loopDepth,
				SmallVectorImpl<OpFoldResult> &offsets,
				ArrayRef<linalg::ProcInfo> distributionInfo) {
				Location loc = tilableOp.getLoc();
				// If this is the innermost loop, then generated the tiled implementation of
				// the op by invoking the TilingInterface methods.
				if (loopDepth == tileSizes.size()) {
				TiledOp ret;
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This is not just tiling but tile and distributed fused into a single implementation. Besides the other points I made in other places re. duplication of functionality and overlap, I also have deeper composability and evolution concerns with the current impl and API. nicolasvasilache: This is not just tiling but tile and distributed fused into a single implementation. Besides…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions The existing tiling algorithm in Linalg is also doing tile + distribute at the same time (unless it has been changed from the last time I saw this). So this isnt doing anything different. From the last time we talked about this (ages ago), tile + distribute could happen at the same time since distribution is only a matter of changing lower bound and step of the loop. Maybe there is a change in the thought here. I am all for composable steps, but this was just keeping the same functionality as the existing tiling algorithm mravishankar: The existing tiling algorithm in Linalg is also doing tile + distribute at the same time…
				ret.op = tilableOp.getTiledImplementation(builder, outputs, offsets,
				tileSizes, ret.results);
				if (!ret.op) {
				return static_cast<LogicalResult>(
				tilableOp.emitOpError("failed to get tiled implementation"));
				}
				return ret;
				}

				springermUnsubmitted Not Done Reply Inline Actions if body could be a separate static helper function. Sth like `insertTiledOpIntoOutput`. springerm: if body could be a separate static helper function. Sth like `insertTiledOpIntoOutput`.
				mravishankarAuthorUnsubmitted Done Reply Inline Actions Changed it now, so this much simpler. After using this a bit, it is very awkward for the tiling algorithm to do these inserts. Its much cleaner for the interface method to do this. So the change has been refactored to reflect that. In any case, not going to land this patch, so the details of the tiling algorithm might be immaterial. mravishankar: Changed it now, so this much simpler. After using this a bit, it is very awkward for the tiling…
				// If tile size at this depth is empty, do nothing.
				if (isUntiledLoop(tileSizes[loopDepth])) {
				auto zeroAttr = builder.getI64IntegerAttr(0);
				offsets.push_back(zeroAttr);
				assert(matchPattern(loopBounds[loopDepth].offset, m_Zero()) &&
				"expected loop bounds to have lower bound of zero");
				tileSizes[loopDepth] = getAsOpFoldResult(loopBounds[loopDepth].size);
				return tileOpUsingInterfaceImpl(builder, tilableOp, outputs, tileSizes,
				iteratorTypes, loopBounds, loopDepth + 1,
				offsets, distributionInfo);
				}

				// Generate an scf.for for the current loop depth.
				Value lb = loopBounds[loopDepth].offset;
				Value ub = loopBounds[loopDepth].size;
				if (!matchPattern(loopBounds[loopDepth].stride, m_One())) {
				return static_cast<LogicalResult>(
				tilableOp.emitOpError("expected stride to be 1"));
				}
				Value step = getValue(builder, loc, tileSizes[loopDepth]);

				// Update lb, ub and step for cyclic distribution.
				if (!distributionInfo.empty() &&
				iteratorTypes[loopDepth] == getParallelIteratorTypeName()) {
				linalg::updateBoundsForCyclicDistribution(
				builder, loc, distributionInfo.front().procId,
				distributionInfo.front().nprocs, lb, ub, step);
				distributionInfo = distributionInfo.drop_front();
				}
				springermUnsubmitted Done Reply Inline Actions Why not insert into `ret.results` directly? springerm: Why not insert into `ret.results` directly?
				mravishankarAuthorUnsubmitted Done Reply Inline Actions Unnecessary now. mravishankar: Unnecessary now.
				FailureOr<TiledOp> innerReturnValue;
				bool isBufferTiling = tilableOp->getNumResults() == 0;
				ValueRange initValues(isBufferTiling ? ValueRange{} : outputs);
				auto forOp = builder.create<scf::ForOp>(
				loc, lb, ub, step, initValues,
				[&](OpBuilder &b, Location loc, Value iv, ValueRange args) {
				offsets.push_back(iv);
				auto affineMaps = AffineMap::inferFromExprList({ArrayRef<AffineExpr>{
				b.getAffineSymbolExpr(0),
				b.getAffineSymbolExpr(1) - b.getAffineDimExpr(0)}})[0];
				// Similar to linalg tiling, the tile size is the min(tileSizes, ub -
				// iv) to account for cases where tile size does not divide (ub - lb)
				// exactly.
				Value inBoundsTileSize = b.create<AffineMinOp>(
				loc, affineMaps,
				ValueRange{iv, getValue(builder, loc, tileSizes[loopDepth]), ub});
				tileSizes[loopDepth] = getAsOpFoldResult(inBoundsTileSize);
				// Recursively proceed to generate the tiled loop for the next level.
				innerReturnValue = tileOpUsingInterfaceImpl(
				b, tilableOp, (isBufferTiling ? outputs : args), tileSizes,
				iteratorTypes, loopBounds, loopDepth + 1, offsets,
				distributionInfo);
				if (failed(innerReturnValue))
				return;
				b.create<scf::YieldOp>(loc, innerReturnValue->results);
				});
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Why are we talking about distribution here? This should be done in a composable way tiling is one thing, distribution is another and they should compose. nicolasvasilache: Why are we talking about distribution here? This should be done in a composable way tiling is…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions (see above) mravishankar: (see above)
				if (failed(innerReturnValue))
				return innerReturnValue;
				innerReturnValue->loops.insert(innerReturnValue->loops.begin(),
				forOp.getOperation());
				innerReturnValue->results = forOp.getResults();
				return innerReturnValue;
				}

				FailureOr<TiledOp>
				springermUnsubmitted Not Done Reply Inline Actions What is a "buffer"? Is this needed to support memref and tensor outputs? (Probably not because the above code always generates an InsertSlice, so only tensor works...) A comment would be helpful. springerm: What is a "buffer"? Is this needed to support memref and tensor outputs? (Probably not because…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions No, this works for both tensors and buffers. The above insert slice is generated only when the tiled operation has a result value (that happens only when there operands are `tensor` types). The structure of the loop generated is also different for tensors and buffers. So again here, the operation not returning a value is used as a proxy to say that this is tiling on tensors or tiling on buffers. ANyway, we can revisit this since this algorithm is not going to be landed as is. mravishankar: No, this works for both tensors and buffers. The above insert slice is generated only when the…
				mlir::tileOpUsingInterface(OpBuilder &b, TilingInterface tilableOp,
				const linalg::LinalgTilingOptions &options) {
				SmallVector<StringRef> iteratorTypes = tilableOp.getLoopIteratorTypes();
				SmallVector<Value, 4> tileSizesVals =
				options.tileSizeComputationFunction(b, tilableOp);
				auto zeroAttr = b.getI64IntegerAttr(0);

				// The actual tile sizes used converts `Value` defined as constant 0, to a
				// zero integer attributes. Currently if the iterator type is not "parallel",
				// the tile size is forced to zero as well.
				auto tileSizes = getAsOpFoldResult(tileSizesVals);
				tileSizes.resize(iteratorTypes.size(), zeroAttr);
				for (auto en : llvm::enumerate(iteratorTypes)) {
				if (en.value() == getParallelIteratorTypeName())
				continue;
				if (!isUntiledLoop(tileSizes[en.index()])) {
				return static_cast<LogicalResult>(tilableOp.emitOpError(
				"unimplemented tiling of non-parallel loop iterator type"));
				}
				}

				// Trivial early exit case of tile sizes being zero for all parallel loops.
				if (llvm::all_of(tileSizes, isUntiledLoop))
				return TiledOp{tilableOp, {}, {}};

				SmallVector<Range> loopBounds = tilableOp.getLoopBounds(b);
				SmallVector<linalg::ProcInfo> distributionInfo;
				// If the tiled loops are distributed, get the proc_id and nprocs for the
				// distributed loops. First collect the parallel loops by iterating over the
				// tileSizes and getting the loops that are distribute, i.e.,
				// - parallel, i.e. iteratorTypes is "parallel"
				// - tiled, i.e. tileSize != 0
				if (options.distribution) {
				SmallVector<Range> distributedLoopRange;
				for (auto i : llvm::seq<unsigned>(0, tileSizes.size())) {
				if (isUntiledLoop(tileSizes[i]))
				continue;
				if (iteratorTypes[i] != getParallelIteratorTypeName())
				continue;
				distributedLoopRange.push_back(loopBounds[i]);
				}
				distributionInfo = options.distribution->procInfo(b, tilableOp.getLoc(),
				distributedLoopRange);
				}

				SmallVector<OpFoldResult> offsets;
				springermUnsubmitted Not Done Reply Inline Actions Not sure what this means. springerm: Not sure what this means.
				SmallVector<Value> dest = tilableOp.getDestinationOperands(b);
				return tileOpUsingInterfaceImpl(b, tilableOp, dest, tileSizes, iteratorTypes,
				springermUnsubmitted Done Reply Inline Actions There's `getAsOpFoldResult` in StaticValueUtils. springerm: There's `getAsOpFoldResult` in StaticValueUtils.
				loopBounds, 0, offsets, distributionInfo);
				}

				//===----------------------------------------------------------------------===//
				// Definintion of methods for `TilingInterfaceBasePattern`.
				//===----------------------------------------------------------------------===//

				springermUnsubmitted Not Done Reply Inline Actions why are non-parallel loop iterators different? springerm: why are non-parallel loop iterators different?
				mravishankarAuthorUnsubmitted Done Reply Inline Actions THis has to do with distribution. Right now the algorithm here does not handle tiling the reduction dimension. THe reduction is a bit more involved since I am not assuming that this is a commutative/associative operation. Maybe reduction is the wrong terminology and we need a new iterator type called "sequential" to truly capture the op semantics. TBD. (Again moot since this algorithm is not landing) mravishankar: THis has to do with distribution. Right now the algorithm here does not handle tiling the…
				LogicalResult
				mlir::TilingInterfaceBasePattern::matchAndRewriteBase(TilingInterface tilableOp,
				PatternRewriter &rewriter,
				TiledOp &result) const {
				if (failed(filter.checkAndNotify(rewriter, tilableOp)))
				return failure();
				if (failed(verifySupportedTilingOptions(rewriter, tilableOp, options)))
				return failure();

				FailureOr<TiledOp> res = tileOpUsingInterface(rewriter, tilableOp, options);
				if (failed(res))
				return res;
				result = *res;
				if (result.op)
				filter.replaceLinalgTransformationFilter(rewriter, result.op);
				return success();
				}

mlir/lib/Interfaces/CMakeLists.txt

	set(LLVM_OPTIONAL_SOURCES			set(LLVM_OPTIONAL_SOURCES
	CallInterfaces.cpp			CallInterfaces.cpp
	CastInterfaces.cpp			CastInterfaces.cpp
	ControlFlowInterfaces.cpp			ControlFlowInterfaces.cpp
	CopyOpInterface.cpp			CopyOpInterface.cpp
	DataLayoutInterfaces.cpp			DataLayoutInterfaces.cpp
	DerivedAttributeOpInterface.cpp			DerivedAttributeOpInterface.cpp
	InferTypeOpInterface.cpp			InferTypeOpInterface.cpp
	LoopLikeInterface.cpp			LoopLikeInterface.cpp
	SideEffectInterfaces.cpp			SideEffectInterfaces.cpp
				TilingInterface.cpp
	VectorInterfaces.cpp			VectorInterfaces.cpp
	ViewLikeInterface.cpp			ViewLikeInterface.cpp
	)			)

	function(add_mlir_interface_library name)			function(add_mlir_interface_library name)
	add_mlir_library(MLIR${name}			add_mlir_library(MLIR${name}
	${name}.cpp			${name}.cpp

	Show All 13 Lines
	add_mlir_interface_library(CastInterfaces)			add_mlir_interface_library(CastInterfaces)
	add_mlir_interface_library(ControlFlowInterfaces)			add_mlir_interface_library(ControlFlowInterfaces)
	add_mlir_interface_library(CopyOpInterface)			add_mlir_interface_library(CopyOpInterface)
	add_mlir_interface_library(DataLayoutInterfaces)			add_mlir_interface_library(DataLayoutInterfaces)
	add_mlir_interface_library(DerivedAttributeOpInterface)			add_mlir_interface_library(DerivedAttributeOpInterface)
	add_mlir_interface_library(InferTypeOpInterface)			add_mlir_interface_library(InferTypeOpInterface)
	add_mlir_interface_library(LoopLikeInterface)			add_mlir_interface_library(LoopLikeInterface)
	add_mlir_interface_library(SideEffectInterfaces)			add_mlir_interface_library(SideEffectInterfaces)
				add_mlir_interface_library(TilingInterface)
	add_mlir_interface_library(VectorInterfaces)			add_mlir_interface_library(VectorInterfaces)
	add_mlir_interface_library(ViewLikeInterface)			add_mlir_interface_library(ViewLikeInterface)

mlir/lib/Interfaces/TilingInterface.cpp

This file was added.

				//===- TilingInterface.cpp - Tiling interface -------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains the definitions of the interface in `TilingInterface.td`.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Interfaces/TilingInterface.h"

				namespace mlir {
				#include "mlir/Interfaces/TilingInterface.cpp.inc"
				}
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: namespace 'mlir' not terminated with a closing comment [llvm-namespace-comment] not useful Lint: Pre-merge checks: clang-tidy: warning: namespace 'mlir' not terminated with a closing comment [llvm-namespace…

mlir/test/Interfaces/TilingInterface/tiling.mlir

This file was added.

				// RUN: mlir-opt -test-tiling-interface -split-input-file %s \| FileCheck %s

				func @scatter_tiling(
				%original: tensor<?x?xf32>, %indices: tensor<?x1xi32>,
				%update : tensor<?x?xf32>) -> tensor<?x?xf32> {
				%0 = test.scatter {__internal_linalg_transform__ = "tiling_input"}
				%update, %indices, %original : tensor<?x?xf32>, tensor<?x1xi32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0 : tensor<?x?xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (20, -d0 + s1)>
				// CHECK: func @scatter_tiling(
				// CHECK-SAME: %[[ORIGINAL:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-SAME: %[[INDICES:[a-zA-Z0-9_]+]]: tensor<?x1xi32>
				// CHECK-SAME: %[[UPDATES:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[TILESIZEY:.+]] = constant 10 : index
				// CHECK-DAG: %[[TILESIZEX:.+]] = constant 20 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[UPDATES]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[UPDATES]], %[[C1]]
				// CHECK: %[[RESULT0:.+]] = scf.for %[[IV0:.+]] = %[[C0]] to %[[D0]] step %[[TILESIZEY]]
				// CHECK-SAME: iter_args(%[[INIT0:.+]] = %[[ORIGINAL]])
				// CHECK-DAG: %[[USED_TILESIZEY:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[TILESIZEY]], %[[D0]]]
				// CHECK: %[[RESULT1:.+]] = scf.for %[[IV1:.+]] = %[[C0]] to %[[D1]] step %[[TILESIZEX]]
				// CHECK-SAME: iter_args(%[[INIT1:.+]] = %[[INIT0]])
				// CHECK-DAG: %[[USED_TILESIZEX:.+]] = affine.min #[[MAP1]](%[[IV1]])[%[[TILESIZEX]], %[[D1]]]
				// CHECK: %[[UPDATE_SLICE:.+]] = tensor.extract_slice %[[UPDATES]][%[[IV0]], %[[IV1]]]
				// CHECK-SAME: [%[[USED_TILESIZEY]], %[[USED_TILESIZEX]]]
				// CHECK: %[[INDEX_SLICE:.+]] = tensor.extract_slice %[[INDICES]][%[[IV0]], 0]
				// CHECK-SAME: [%[[USED_TILESIZEY]], 1]
				// CHECK: %[[SLICE_D0:.+]] = tensor.dim %[[ORIGINAL]], %[[C0]]
				// CHECK: %[[SOURCE_SLICE:.+]] = tensor.extract_slice %[[INIT1]][0, %[[IV1]]]
				// CHECK-SAME: [%[[SLICE_D0]], %[[USED_TILESIZEX]]]
				// CHECK: %[[SCATTER_TILE:.+]] = test.scatter
				// CHECK-SAME: __internal_linalg_transform__ = "tiling_output"
				// CHECK-SAME: %[[UPDATE_SLICE]], %[[INDEX_SLICE]], %[[SOURCE_SLICE]]
				// CHECK: %[[YIELD:.+]] = tensor.insert_slice %[[SCATTER_TILE]] into %[[INIT1]][0, %[[IV1]]]
				// CHECK-SAME: [%[[SLICE_D0]], %[[USED_TILESIZEX]]]
				// CHECK: scf.yield %[[YIELD]]
				// CHECK: scf.yield %[[RESULT1]]
				// CHECK: return %[[RESULT0]]

				// -----

				func @scatter_tiling_memref(
				%original: memref<?x?xf32>, %indices: memref<?x1xi32>,
				%update : memref<?x?xf32>) {
				test.scatter {__internal_linalg_transform__ = "tiling_input"}
				%update, %indices, %original : memref<?x?xf32>, memref<?x1xi32>, memref<?x?xf32>
				return
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (20, -d0 + s1)>
				// CHECK: func @scatter_tiling_memref(
				// CHECK-SAME: %[[ORIGINAL:[a-zA-Z0-9_]+]]: memref<?x?xf32>
				// CHECK-SAME: %[[INDICES:[a-zA-Z0-9_]+]]: memref<?x1xi32>
				// CHECK-SAME: %[[UPDATES:[a-zA-Z0-9_]+]]: memref<?x?xf32>
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[TILESIZEY:.+]] = constant 10 : index
				// CHECK-DAG: %[[TILESIZEX:.+]] = constant 20 : index
				// CHECK-DAG: %[[D0:.+]] = memref.dim %[[UPDATES]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = memref.dim %[[UPDATES]], %[[C1]]
				// CHECK: scf.for %[[IV0:.+]] = %[[C0]] to %[[D0]] step %[[TILESIZEY]]
				// CHECK-DAG: %[[USED_TILESIZEY:.+]] = affine.min #[[MAP0]](%[[IV0]])[%[[TILESIZEY]], %[[D0]]]
				// CHECK: scf.for %[[IV1:.+]] = %[[C0]] to %[[D1]] step %[[TILESIZEX]]
				// CHECK-DAG: %[[USED_TILESIZEX:.+]] = affine.min #[[MAP1]](%[[IV1]])[%[[TILESIZEX]], %[[D1]]]
				// CHECK: %[[UPDATE_SLICE:.+]] = memref.subview %[[UPDATES]][%[[IV0]], %[[IV1]]]
				// CHECK-SAME: [%[[USED_TILESIZEY]], %[[USED_TILESIZEX]]]
				// CHECK: %[[INDEX_SLICE:.+]] = memref.subview %[[INDICES]][%[[IV0]], 0]
				// CHECK-SAME: [%[[USED_TILESIZEY]], 1]
				// CHECK: %[[SLICE_D0:.+]] = memref.dim %[[ORIGINAL]], %[[C0]]
				// CHECK: %[[SOURCE_SLICE:.+]] = memref.subview %[[ORIGINAL]][0, %[[IV1]]]
				// CHECK-SAME: [%[[SLICE_D0]], %[[USED_TILESIZEX]]]
				// CHECK: test.scatter
				// CHECK-SAME: __internal_linalg_transform__ = "tiling_output"
				// CHECK-SAME: %[[UPDATE_SLICE]], %[[INDEX_SLICE]], %[[SOURCE_SLICE]]

				// -----

				func @scatter_tiling_distribution(
				%original: tensor<?x?xf32>, %indices: tensor<?x1xi32>,
				%update : tensor<?x?xf32>) -> tensor<?x?xf32> {
				%0 = test.scatter {__internal_linalg_transform__ = "distribute_input"}
				%update, %indices, %original : tensor<?x?xf32>, tensor<?x1xi32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0 : tensor<?x?xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 * 10)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK: func @scatter_tiling_distribution(
				// CHECK-SAME: %[[ORIGINAL:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-SAME: %[[INDICES:[a-zA-Z0-9_]+]]: tensor<?x1xi32>
				// CHECK-SAME: %[[UPDATES:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-DAG: %[[TILESIZE:.+]] = constant 10 : index
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[UPDATES]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[UPDATES]], %[[C1]]
				// CHECK-DAG: %[[ID:.+]] = "gpu.block_id"() {dimension = "x"}
				// CHECK-DAG: %[[COUNT:.+]] = "gpu.grid_dim"() {dimension = "x"}
				// CHECK-DAG: %[[OFFSET:.+]] = affine.apply #[[MAP0]]()[%[[ID]]]
				// CHECK-DAG: %[[STEP:.+]] = affine.apply #[[MAP0]]()[%[[COUNT]]]
				// CHECK: %[[RESULT:.+]] = scf.for %[[IV:.+]] = %[[OFFSET]] to %[[D0]] step %[[STEP]]
				// CHECK-SAME: iter_args(%[[INIT:.+]] = %[[ORIGINAL]])
				// CHECK-DAG: %[[USED_TILESIZE:.+]] = affine.min #[[MAP1]](%[[IV]])[%[[TILESIZE]], %[[D0]]]
				// CHECK: %[[UPDATE_SLICE:.+]] = tensor.extract_slice %[[UPDATES]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], %[[D1]]]
				// CHECK: %[[INDEX_SLICE:.+]] = tensor.extract_slice %[[INDICES]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], 1]
				// CHECK: %[[SLICE_D0:.+]] = tensor.dim %[[ORIGINAL]], %[[C0]]
				// CHECK: %[[SOURCE_SLICE:.+]] = tensor.extract_slice %[[INIT]][0, 0] [%[[SLICE_D0]], %[[D1]]]
				// CHECK: %[[SCATTER_TILE:.+]] = test.scatter
				// CHECK-SAME: __internal_linalg_transform__ = "distribute_output"
				// CHECK-SAME: %[[UPDATE_SLICE]], %[[INDEX_SLICE]], %[[SOURCE_SLICE]]
				// CHECK: %[[YIELD:.+]] = tensor.insert_slice %[[SCATTER_TILE]] into %[[INIT]][0, 0]
				// CHECK-SAME: [%[[SLICE_D0]], %[[D1]]]
				// CHECK: scf.yield %[[YIELD]]
				// CHECK: return %[[RESULT]]

				// -----

				func @scatter_no_tiling(
				%original: tensor<?x?xf32>, %indices: tensor<?x1xi32>,
				%update : tensor<?x?xf32>) -> tensor<?x?xf32> {
				%0 = test.scatter {__internal_linalg_transform__ = "no_tiling_input"}
				%update, %indices, %original : tensor<?x?xf32>, tensor<?x1xi32>, tensor<?x?xf32> -> tensor<?x?xf32>
				return %0 : tensor<?x?xf32>
				}
				// CHECK: func @scatter_no_tiling
				// CHECK-SAME: %[[ORIGINAL:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-SAME: %[[INDICES:[a-zA-Z0-9_]+]]: tensor<?x1xi32>
				// CHECK-SAME: %[[UPDATES:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK: %[[RESULT:.+]] = test.scatter
				// CHECK-SAME: __internal_linalg_transform__ = "no_tiling_output"
				// CHECK-SAME: %[[UPDATES]], %[[INDICES]], %[[ORIGINAL]]
				// CHECK: return %[[RESULT]]

				// -----

				func @sort_1d(%arg0: tensor<?xi32>) -> tensor<?xi32> {
				%0 = test.sort reduce(0) {__internal_linalg_transform__ = "outer_reduce_input"}
				%arg0 : tensor<?xi32> -> tensor<?xi32>
				return %0 : tensor<?xi32>
				}
				// CHECK: func @sort_1d(
				// CHECK-SAME: %[[OPERAND:.+]]: tensor<?xi32>
				// CHECK: %[[RESULT:.+]] = test.sort reduce(0)
				// CHECK-SAME: __internal_linalg_transform__ = "outer_reduce_output"
				// CHECK-SAME: %[[OPERAND]]
				// CHECK: return %[[RESULT]]

				// -----

				func @sort_2d(%arg0: tensor<?x?xi32>) -> tensor<?x?xi32> {
				%0 = test.sort reduce(1) {__internal_linalg_transform__ = "inner_reduce_input"}
				%arg0 : tensor<?x?xi32> -> tensor<?x?xi32>
				return %0 : tensor<?x?xi32>
				}
				// CHECK: #[[MAP:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK: func @sort_2d(
				// CHECK-SAME: %[[OPERAND:.+]]: tensor<?x?xi32>
				// CHECK-DAG: %[[TILESIZE:.+]] = constant 10 : index
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[OPERAND]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[OPERAND]], %[[C1]]
				// CHECK: %[[RESULT:.+]] = scf.for %[[IV:.+]] = %[[C0]] to %[[D0]] step %[[TILESIZE]]
				// CHECK-SAME: iter_args(%[[INIT:.+]] = %[[OPERAND]])
				// CHECK-DAG: %[[USED_TILESIZE:.+]] = affine.min #[[MAP]](%[[IV]])[%[[TILESIZE]], %[[D0]]]
				// CHECK: %[[OPERAND_SLICE:.+]] = tensor.extract_slice %[[INIT]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], %[[D1]]]
				// CHECK: %[[SORT_TILE:.+]] = test.sort
				// CHECK-SAME: __internal_linalg_transform__ = "inner_reduce_output"
				// CHECK-SAME: %[[OPERAND_SLICE]]
				// CHECK: %[[YIELD:.+]] = tensor.insert_slice %[[SORT_TILE]] into %[[INIT]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], %[[D1]]]
				// CHECK: scf.yield %[[YIELD]]
				// CHECK: return %[[RESULT]]

				// -----

				func @sort_2d_inner_parallel(%arg0: tensor<?x?xi32>) -> tensor<?x?xi32> {
				%0 = test.sort reduce(0) {__internal_linalg_transform__ = "outer_reduce_input"}
				%arg0 : tensor<?x?xi32> -> tensor<?x?xi32>
				return %0 : tensor<?x?xi32>
				}
				// CHECK: #[[MAP:.+]] = affine_map<(d0)[s0, s1] -> (20, -d0 + s1)>
				// CHECK: func @sort_2d_inner_parallel(
				// CHECK-SAME: %[[OPERAND:.+]]: tensor<?x?xi32>
				// CHECK-DAG: %[[TILESIZE:.+]] = constant 20 : index
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[OPERAND]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[OPERAND]], %[[C1]]
				// CHECK: %[[RESULT:.+]] = scf.for %[[IV:.+]] = %[[C0]] to %[[D1]] step %[[TILESIZE]]
				// CHECK-SAME: iter_args(%[[INIT:.+]] = %[[OPERAND]])
				// CHECK-DAG: %[[USED_TILESIZE:.+]] = affine.min #[[MAP]](%[[IV]])[%[[TILESIZE]], %[[D1]]]
				// CHECK: %[[OPERAND_SLICE:.+]] = tensor.extract_slice %[[INIT]][0, %[[IV]]]
				// CHECK-SAME: [%[[D0]], %[[USED_TILESIZE]]]
				// CHECK: %[[SORT_TILE:.+]] = test.sort
				// CHECK-SAME: __internal_linalg_transform__ = "outer_reduce_output"
				// CHECK-SAME: %[[OPERAND_SLICE]]
				// CHECK: %[[YIELD:.+]] = tensor.insert_slice %[[SORT_TILE]] into %[[INIT]][0, %[[IV]]]
				// CHECK-SAME: [%[[D0]], %[[USED_TILESIZE]]]
				// CHECK: scf.yield %[[YIELD]]
				// CHECK: return %[[RESULT]]

				// -----

				func @sort_2d_multi_result(
				%arg0: tensor<?x?xi32>, %arg1: tensor<?x?xf32>)
				-> (tensor<?x?xi32>, tensor<?x?xf32>) {
				%0:2 = test.sort reduce(1) {__internal_linalg_transform__ = "inner_reduce_input"}
				%arg0, %arg1 : tensor<?x?xi32>, tensor<?x?xf32> -> tensor<?x?xi32>, tensor<?x?xf32>
				return %0#0, %0#1 : tensor<?x?xi32>, tensor<?x?xf32>
				}
				// CHECK: #[[MAP:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK: func @sort_2d_multi_result(
				// CHECK-SAME: %[[OPERAND1:.+]]: tensor<?x?xi32>
				// CHECK-SAME: %[[OPERAND2:.+]]: tensor<?x?xf32>
				// CHECK-DAG: %[[TILESIZE:.+]] = constant 10 : index
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[OPERAND1]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[OPERAND1]], %[[C1]]
				// CHECK: %[[RESULT:.+]]:2 = scf.for %[[IV:.+]] = %[[C0]] to %[[D0]] step %[[TILESIZE]]
				// CHECK-SAME: iter_args(%[[INIT1:.+]] = %[[OPERAND1]], %[[INIT2:.+]] = %[[OPERAND2]])
				// CHECK-DAG: %[[USED_TILESIZE:.+]] = affine.min #[[MAP]](%[[IV]])[%[[TILESIZE]], %[[D0]]]
				// CHECK: %[[OPERAND1_SLICE:.+]] = tensor.extract_slice %[[INIT1]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], %[[D1]]]
				// CHECK: %[[OPERAND2_SLICE:.+]] = tensor.extract_slice %[[INIT2]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], %[[D1]]]
				// CHECK: %[[SORT_TILE:.+]]:2 = test.sort
				// CHECK-SAME: __internal_linalg_transform__ = "inner_reduce_output"
				// CHECK-SAME: %[[OPERAND1_SLICE]], %[[OPERAND2_SLICE]]
				// CHECK: %[[YIELD1:.+]] = tensor.insert_slice %[[SORT_TILE]]#0 into %[[INIT1]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], %[[D1]]]
				// CHECK: %[[YIELD2:.+]] = tensor.insert_slice %[[SORT_TILE]]#1 into %[[INIT2]][%[[IV]], 0]
				// CHECK-SAME: [%[[USED_TILESIZE]], %[[D1]]]
				// CHECK: scf.yield %[[YIELD1]], %[[YIELD2]]
				// CHECK: return %[[RESULT]]#0, %[[RESULT]]#1

				// -----

				func @sort_2d_multi_result_memref(
				%arg0: memref<?x?xi32>, %arg1: memref<?x?xf32>) {
				test.sort reduce(0) {__internal_linalg_transform__ = "outer_reduce_input"}
				%arg0, %arg1 : memref<?x?xi32>, memref<?x?xf32>
				return
				}
				// CHECK: #[[MAP:.+]] = affine_map<(d0)[s0, s1] -> (20, -d0 + s1)>
				// CHECK: func @sort_2d_multi_result_memref(
				// CHECK-SAME: %[[OPERAND1:.+]]: memref<?x?xi32>
				// CHECK-SAME: %[[OPERAND2:.+]]: memref<?x?xf32>
				// CHECK-DAG: %[[TILESIZE:.+]] = constant 20 : index
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[D0:.+]] = memref.dim %[[OPERAND1]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = memref.dim %[[OPERAND1]], %[[C1]]
				// CHECK: scf.for %[[IV:.+]] = %[[C0]] to %[[D1]] step %[[TILESIZE]]
				// CHECK-DAG: %[[USED_TILESIZE:.+]] = affine.min #[[MAP]](%[[IV]])[%[[TILESIZE]], %[[D1]]]
				// CHECK: %[[OPERAND1_SLICE:.+]] = memref.subview %[[OPERAND1]][0, %[[IV]]]
				// CHECK-SAME: [%[[D0]], %[[USED_TILESIZE]]]
				// CHECK: %[[OPERAND2_SLICE:.+]] = memref.subview %[[OPERAND2]][0, %[[IV]]]
				// CHECK-SAME: [%[[D0]], %[[USED_TILESIZE]]]
				// CHECK: test.sort
				// CHECK-SAME: __internal_linalg_transform__ = "outer_reduce_output"
				// CHECK-SAME: %[[OPERAND1_SLICE]], %[[OPERAND2_SLICE]]

				// -----

				func @sort_3d_multi_result_distribute(
				%arg0: tensor<?x?x?xi32>, %arg1 : tensor<?x?x?xf32>)
				-> (tensor<?x?x?xi32>, tensor<?x?x?xf32>) {
				%0, %1 = test.sort reduce(1) {__internal_linalg_transform__ = "distribute_input"}
				%arg0, %arg1 : tensor<?x?x?xi32>, tensor<?x?x?xf32> -> tensor<?x?x?xi32>, tensor<?x?x?xf32>
				return %0, %1 : tensor<?x?x?xi32>, tensor<?x?x?xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 * 10)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0] -> (s0 * 30)>
				// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0, s1] -> (30, -d0 + s1)>
				// CHECK: func @sort_3d_multi_result_distribute(
				// CHECK-SAME: %[[OPERAND1:[a-zA-Z0-9_]+]]: tensor<?x?x?xi32>
				// CHECK-SAME: %[[OPERAND2:[a-zA-Z0-9_]+]]: tensor<?x?x?xf32>
				// CHECK-DAG: %[[TILESIZE1:.+]] = constant 10 : index
				// CHECK-DAG: %[[TILESIZE2:.+]] = constant 30 : index
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[C2:.+]] = constant 2 : index
				// CHECK-DAG: %[[D0:.+]] = tensor.dim %[[OPERAND1]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = tensor.dim %[[OPERAND1]], %[[C1]]
				// CHECK-DAG: %[[D2:.+]] = tensor.dim %[[OPERAND1]], %[[C2]]
				// CHECK-DAG: %[[IDX:.+]] = "gpu.block_id"() {dimension = "x"}
				// CHECK-DAG: %[[COUNTX:.+]] = "gpu.grid_dim"() {dimension = "x"}
				// CHECK-DAG: %[[IDY:.+]] = "gpu.block_id"() {dimension = "y"}
				// CHECK-DAG: %[[COUNTY:.+]] = "gpu.grid_dim"() {dimension = "y"}
				// CHECK-DAG: %[[OFFSETY:.+]] = affine.apply #[[MAP0]]()[%[[IDY]]]
				// CHECK-DAG: %[[STEPY:.+]] = affine.apply #[[MAP0]]()[%[[COUNTY]]]
				// CHECK: %[[RESULT:.+]]:2 = scf.for %[[IV0:.+]] = %[[OFFSETY]] to %[[D0]] step %[[STEPY]]
				// CHECK-SAME: iter_args(%[[INIT1:.+]] = %[[OPERAND1]], %[[INIT2:.+]] = %[[OPERAND2]])
				// CHECK-DAG: %[[USED_TILESIZE1:.+]] = affine.min #[[MAP1]](%[[IV0]])[%[[TILESIZE1]], %[[D0]]]
				// CHECK-DAG: %[[OFFSETX:.+]] = affine.apply #[[MAP2]]()[%[[IDX]]]
				// CHECK-DAG: %[[STEPX:.+]] = affine.apply #[[MAP2]]()[%[[COUNTX]]]
				// CHECK: %[[RESULT_INNER:.+]]:2 = scf.for %[[IV1:.+]] = %[[OFFSETX]] to %[[D2]] step %[[STEPX]]
				// CHECK-SAME: iter_args(%[[INIT3:.+]] = %[[INIT1]], %[[INIT4:.+]] = %[[INIT2]])
				// CHECK-DAG: %[[USED_TILESIZE2:.+]] = affine.min #[[MAP3]](%[[IV1]])[%[[TILESIZE2]], %[[D2]]]
				// CHECK: %[[OPERAND1_SLICE:.+]] = tensor.extract_slice %[[INIT3]][%[[IV0]], 0, %[[IV1]]]
				// CHECK-SAME: [%[[USED_TILESIZE1]], %[[D1]], %[[USED_TILESIZE2]]]
				// CHECK: %[[OPERAND2_SLICE:.+]] = tensor.extract_slice %[[INIT4]][%[[IV0]], 0, %[[IV1]]]
				// CHECK-SAME: [%[[USED_TILESIZE1]], %[[D1]], %[[USED_TILESIZE2]]]
				// CHECK: %[[SORT_SLICE:.+]]:2 = test.sort
				// CHECK-SAME: __internal_linalg_transform__ = "distribute_output"
				// CHECK-SAME: %[[OPERAND1_SLICE]], %[[OPERAND2_SLICE]]
				// CHECK: %[[YIELD1:.+]] = tensor.insert_slice %[[SORT_SLICE]]#0
				// CHECK-SAME: into %[[INIT3]][%[[IV0]], 0, %[[IV1]]]
				// CHECK: %[[YIELD2:.+]] = tensor.insert_slice %[[SORT_SLICE]]#1
				// CHECK-SAME: into %[[INIT4]][%[[IV0]], 0, %[[IV1]]]
				// CHECK: scf.yield %[[YIELD1]], %[[YIELD2]]
				// CHECK: scf.yield %[[RESULT_INNER]]#0, %[[RESULT_INNER]]#1
				// CHECK: return %[[RESULT]]#0, %[[RESULT]]#1

				// -----

				func @sort_3d_multi_result_distribute_memref(
				%arg0: memref<?x?x?xi32>, %arg1 : memref<?x?x?xf32>) {
				test.sort reduce(1) {__internal_linalg_transform__ = "distribute_input"}
				%arg0, %arg1 : memref<?x?x?xi32>, memref<?x?x?xf32>
				return
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 * 10)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0] -> (s0 * 30)>
				// CHECK-DAG: #[[MAP3:.+]] = affine_map<(d0)[s0, s1] -> (30, -d0 + s1)>
				// CHECK: func @sort_3d_multi_result_distribute_memref(
				// CHECK-SAME: %[[OPERAND1:[a-zA-Z0-9_]+]]: memref<?x?x?xi32>
				// CHECK-SAME: %[[OPERAND2:[a-zA-Z0-9_]+]]: memref<?x?x?xf32>
				// CHECK-DAG: %[[TILESIZE1:.+]] = constant 10 : index
				// CHECK-DAG: %[[TILESIZE2:.+]] = constant 30 : index
				// CHECK-DAG: %[[C0:.+]] = constant 0 : index
				// CHECK-DAG: %[[C1:.+]] = constant 1 : index
				// CHECK-DAG: %[[C2:.+]] = constant 2 : index
				// CHECK-DAG: %[[D0:.+]] = memref.dim %[[OPERAND1]], %[[C0]]
				// CHECK-DAG: %[[D1:.+]] = memref.dim %[[OPERAND1]], %[[C1]]
				// CHECK-DAG: %[[D2:.+]] = memref.dim %[[OPERAND1]], %[[C2]]
				// CHECK-DAG: %[[IDX:.+]] = "gpu.block_id"() {dimension = "x"}
				// CHECK-DAG: %[[COUNTX:.+]] = "gpu.grid_dim"() {dimension = "x"}
				// CHECK-DAG: %[[IDY:.+]] = "gpu.block_id"() {dimension = "y"}
				// CHECK-DAG: %[[COUNTY:.+]] = "gpu.grid_dim"() {dimension = "y"}
				// CHECK-DAG: %[[OFFSETY:.+]] = affine.apply #[[MAP0]]()[%[[IDY]]]
				// CHECK-DAG: %[[STEPY:.+]] = affine.apply #[[MAP0]]()[%[[COUNTY]]]
				// CHECK: scf.for %[[IV0:.+]] = %[[OFFSETY]] to %[[D0]] step %[[STEPY]]
				// CHECK-DAG: %[[USED_TILESIZE1:.+]] = affine.min #[[MAP1]](%[[IV0]])[%[[TILESIZE1]], %[[D0]]]
				// CHECK-DAG: %[[OFFSETX:.+]] = affine.apply #[[MAP2]]()[%[[IDX]]]
				// CHECK-DAG: %[[STEPX:.+]] = affine.apply #[[MAP2]]()[%[[COUNTX]]]
				// CHECK: scf.for %[[IV1:.+]] = %[[OFFSETX]] to %[[D2]] step %[[STEPX]]
				// CHECK-DAG: %[[USED_TILESIZE2:.+]] = affine.min #[[MAP3]](%[[IV1]])[%[[TILESIZE2]], %[[D2]]]
				// CHECK: %[[OPERAND1_SLICE:.+]] = memref.subview %[[OPERAND1]][%[[IV0]], 0, %[[IV1]]]
				// CHECK-SAME: [%[[USED_TILESIZE1]], %[[D1]], %[[USED_TILESIZE2]]]
				// CHECK: %[[OPERAND2_SLICE:.+]] = memref.subview %[[OPERAND2]][%[[IV0]], 0, %[[IV1]]]
				// CHECK-SAME: [%[[USED_TILESIZE1]], %[[D1]], %[[USED_TILESIZE2]]]
				// CHECK: test.sort
				// CHECK-SAME: __internal_linalg_transform__ = "distribute_output"
				// CHECK-SAME: %[[OPERAND1_SLICE]], %[[OPERAND2_SLICE]]

				// -----

				func @slice_insert(%source :tensor<?x?xf32>, %dest: tensor<?x?xf32>,
				%idx0 : index, %idx1 : index) -> tensor<?x?xf32> {
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				%0 = tensor.dim %source, %c0 : tensor<?x?xf32>
				%1 = tensor.dim %source, %c1 : tensor<?x?xf32>
				%2 = tensor.insert_slice %source into %dest[%idx0, %idx1] [%0, %1] [1, 1]
				{__internal_linalg_transform__ = "tiling_input"} : tensor<?x?xf32> into tensor<?x?xf32>
				return %2 : tensor<?x?xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (20, -d0 + s1)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0] -> (d0 + s0)>
				// CHECK: func @slice_insert(
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: index
				// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: index
				// CHECK: %[[RESULT:.+]] = scf.for %[[IV0:[a-zA-Z0-9]+]] =
				// CHECK-DAG: %[[YIELD1:.+]] = scf.for %[[IV1:[a-zA-Z0-9]+]] =
				// CHECK-DAG: %[[SLICE:.+]] = tensor.extract_slice %[[ARG0]][%[[IV0]], %[[IV1]]]
				// CHECK-DAG: %[[OFFSET0:.+]] = affine.apply #[[MAP2]](%[[IV0]])[%[[ARG2]]]
				// CHECK-DAG: %[[OFFSET1:.+]] = affine.apply #[[MAP2]](%[[IV1]])[%[[ARG3]]]
				// CHECK: %[[UPDATE:.+]] = tensor.insert_slice %[[SLICE]]
				// CHECK-SAME: into %{{.+}}[%[[OFFSET0]], %[[OFFSET1]]]
				// CHECK: scf.yield %[[UPDATE]]
				// CHECK: scf.yield %[[YIELD1]]
				// CHECK: return %[[RESULT]]

				// -----

				func @slice_insert_rank_reduce(%source :tensor<?x?xf32>, %dest: tensor<?x?x?xf32>,
				%idx0 : index, %idx1 : index) -> tensor<?x?x?xf32> {
				%c0 = constant 0 : index
				%c1 = constant 1 : index
				%0 = tensor.dim %source, %c0 : tensor<?x?xf32>
				%1 = tensor.dim %source, %c1 : tensor<?x?xf32>
				%2 = tensor.insert_slice %source into %dest[%idx0, 0, %idx1] [%0, 1, %1] [1, 1, 1]
				{__internal_linalg_transform__ = "tiling_input"} : tensor<?x?xf32> into tensor<?x?x?xf32>
				return %2 : tensor<?x?x?xf32>
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<(d0)[s0, s1] -> (10, -d0 + s1)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<(d0)[s0, s1] -> (20, -d0 + s1)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0)[s0] -> (d0 + s0)>
				// CHECK: func @slice_insert_rank_reduce(
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: tensor<?x?xf32>
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: tensor<?x?x?xf32>
				// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: index
				// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: index
				// CHECK: %[[RESULT:.+]] = scf.for %[[IV0:[a-zA-Z0-9]+]] =
				// CHECK: %[[YIELD1:.+]] = scf.for %[[IV1:[a-zA-Z0-9]+]] =
				// CHECK-DAG: %[[SLICE:.+]] = tensor.extract_slice %[[ARG0]][%[[IV0]], %[[IV1]]]
				// CHECK-DAG: %[[OFFSET0:.+]] = affine.apply #[[MAP2]](%[[IV0]])[%[[ARG2]]]
				// CHECK-DAG: %[[OFFSET1:.+]] = affine.apply #[[MAP2]](%[[IV1]])[%[[ARG3]]]
				// CHECK: %[[UPDATE:.+]] = tensor.insert_slice %[[SLICE]]
				// CHECK-SAME: into %{{.+}}[%[[OFFSET0]], 0, %[[OFFSET1]]]
				// CHECK: scf.yield %[[UPDATE]]
				// CHECK: scf.yield %[[YIELD1]]
				// CHECK: return %[[RESULT]]

mlir/test/lib/CMakeLists.txt

	add_subdirectory(Analysis)			add_subdirectory(Analysis)
	add_subdirectory(Conversion)			add_subdirectory(Conversion)
	add_subdirectory(Dialect)			add_subdirectory(Dialect)
	add_subdirectory(IR)			add_subdirectory(IR)
				add_subdirectory(Interfaces)
	add_subdirectory(Pass)			add_subdirectory(Pass)
	add_subdirectory(Reducer)			add_subdirectory(Reducer)
	add_subdirectory(Rewrite)			add_subdirectory(Rewrite)
	add_subdirectory(Transforms)			add_subdirectory(Transforms)

mlir/test/lib/Dialect/Test/TestDialect.h

	Show All 25 Lines
	#include "mlir/IR/RegionKindInterface.h"			#include "mlir/IR/RegionKindInterface.h"
	#include "mlir/IR/SymbolTable.h"			#include "mlir/IR/SymbolTable.h"
	#include "mlir/Interfaces/CallInterfaces.h"			#include "mlir/Interfaces/CallInterfaces.h"
	#include "mlir/Interfaces/ControlFlowInterfaces.h"			#include "mlir/Interfaces/ControlFlowInterfaces.h"
	#include "mlir/Interfaces/CopyOpInterface.h"			#include "mlir/Interfaces/CopyOpInterface.h"
	#include "mlir/Interfaces/DerivedAttributeOpInterface.h"			#include "mlir/Interfaces/DerivedAttributeOpInterface.h"
	#include "mlir/Interfaces/InferTypeOpInterface.h"			#include "mlir/Interfaces/InferTypeOpInterface.h"
	#include "mlir/Interfaces/SideEffectInterfaces.h"			#include "mlir/Interfaces/SideEffectInterfaces.h"
				#include "mlir/Interfaces/TilingInterface.h"

	namespace mlir {			namespace mlir {
	class DLTIDialect;			class DLTIDialect;
	class RewritePatternSet;			class RewritePatternSet;
	} // namespace mlir			} // namespace mlir

	#include "TestOpEnums.h.inc"			#include "TestOpEnums.h.inc"
	#include "TestOpInterfaces.h.inc"			#include "TestOpInterfaces.h.inc"
	Show All 12 Lines

mlir/test/lib/Dialect/Test/TestOps.td

Show All 13 Lines
include "mlir/IR/OpAsmInterface.td"		include "mlir/IR/OpAsmInterface.td"
include "mlir/IR/RegionKindInterface.td"		include "mlir/IR/RegionKindInterface.td"
include "mlir/IR/SymbolInterfaces.td"		include "mlir/IR/SymbolInterfaces.td"
include "mlir/Interfaces/CallInterfaces.td"		include "mlir/Interfaces/CallInterfaces.td"
include "mlir/Interfaces/ControlFlowInterfaces.td"		include "mlir/Interfaces/ControlFlowInterfaces.td"
include "mlir/Interfaces/CopyOpInterface.td"		include "mlir/Interfaces/CopyOpInterface.td"
include "mlir/Interfaces/DataLayoutInterfaces.td"		include "mlir/Interfaces/DataLayoutInterfaces.td"
include "mlir/Interfaces/InferTypeOpInterface.td"		include "mlir/Interfaces/InferTypeOpInterface.td"
		include "mlir/Interfaces/TilingInterface.td"
include "mlir/Interfaces/SideEffectInterfaces.td"		include "mlir/Interfaces/SideEffectInterfaces.td"
include "TestInterfaces.td"		include "TestInterfaces.td"

def Test_Dialect : Dialect {		def Test_Dialect : Dialect {
let name = "test";		let name = "test";
let cppNamespace = "::test";		let cppNamespace = "::test";
let hasCanonicalizer = 1;		let hasCanonicalizer = 1;
let hasConstantMaterializer = 1;		let hasConstantMaterializer = 1;
▲ Show 20 Lines • Show All 2,006 Lines • ▼ Show 20 Lines	let extraClassDeclaration = [{
::mlir::Block::BlockArgListType getJoinArgs() {		::mlir::Block::BlockArgListType getJoinArgs() {
return getBody(2)->getArguments();		return getBody(2)->getArguments();
}		}
::mlir::OperandRange getSuccessorEntryOperands(unsigned index);		::mlir::OperandRange getSuccessorEntryOperands(unsigned index);
}];		}];
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// Test TilingInterface
		//===----------------------------------------------------------------------===//

		def TestScatter : TEST_Op<"scatter", []> {
		let description = [{
		Operation to test tiling interface.

		This operation represents a "scatter" like pattern. The
		computation represented is shown below when all operands are of
		`memref` types. The operands could be all `tensor` types as well.

		```mlir
		%c0 = constant 0 : index
		%c1 = constant 1 : index
		%c2 = constant 2 : index
		%d0 = memref.dim %update, %c0 : memref<?x?x?xf32>
		%d1 = memref.dim %update, %c1 : memref<?x?x?xf32>
		%d2 = memref.dim %update, %c2 : memref<?x?x?xf32>
		scf.for %iv0 = %c0 to %d0 step %c1
		scf.for %iv1 = %c0 to %d1 step %c1
		scf.for %iv2 = %c0 to %d2 step %c1
		%i0 = memref.load %index[%iv0][0] : memref<?x2xi32>
		%indx0 = index_cast %i0 : i32 to index
		%i1 = memref.load %index[%iv0][1] : memref<?x2xi32>
		%indx1 = index_cast %i1 : i32 to index
		%val = memref.load %update[%iv0][%iv1][iv2] : memref<?x?xf32>
		memref.store %val, %source[%indx0][index1][%iv1][%iv2] : memref<?x?x?x?xf32>
		```
		}];
		let arguments = (ins AnyType:$update, AnyType:$indices, AnyType:$source);
		let results = (outs Optional<AnyType>:$result);
		let assemblyFormat = [{
		attr-dict $update `,` $indices `,` $source `:`
		type($update) `,` type($indices) `,` type($source) (`->` type($result)^)?
		}];
		}

		def TestSort : TEST_Op<"sort", []> {
		let description = [{
		Operation to test tiling interface.

		This operation represents a "sort-like" operation, where values
		along a single dimensions are sorted. The computation represented
		is shown below when all operands are of `memref` types. The
		operands could be all `tensor` types as well.

		```mlir
		%c0 = constant 0 : index
		%c1 = constant 1 : index
		%c2 = constant 2 : index
		%d0 = memref.dim %operand0, %c0 : memref<?x?x?xf32>
		%d1 = memref.dim %operand0, %c1 : memref<?x?x?xf32>
		%d2 = memref.dim %operand0, %c2 : memref<?x?x?xf32>
		scf.for %iv0 = %c0 to %d0 step %c1
		scf.for %iv1 = %c0 to %d1 step %c1
		%sorted_slice0 = memref.subview %operand0[%iv0, 0, %iv1][1, %d1, 1][1, 1, 1]
		: memref<?x?x?xf32> into memref<?x?x?xf32, #map>
		%sorted_slice1 = memref.subview %operand1[%iv0, 0, %iv1][1, %d1, 1][1, 1, 1]
		: memref<?x?x?xf32> into memref<?x?x?xi32, #map>
		call sort_in_place(%sorted_slice0, %sorted_slice1)
		: (memref<?x?x?xf32, #map>, memref<?x?x?xi32, #map>) -> ()
		```
		}];

		let arguments = (ins Variadic<AnyType>:$sources, I64Attr:$reduce_dim);
		let results = (outs Variadic<AnyType>:$results);
		let assemblyFormat = [{
		`reduce` `(` $reduce_dim `)` attr-dict $sources
		`:` type($sources) (`->` type($results)^)?
		}];
		}

		//===----------------------------------------------------------------------===//


		//===----------------------------------------------------------------------===//
// Test TableGen generated build() methods		// Test TableGen generated build() methods
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def TableGenConstant : TEST_Op<"tblgen_constant"> {		def TableGenConstant : TEST_Op<"tblgen_constant"> {
let results = (outs AnyType);		let results = (outs AnyType);
}		}

// No variadic args or results.		// No variadic args or results.
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

mlir/test/lib/Interfaces/CMakeLists.txt

This file was added.

add_subdirectory(TilingInterface)

mlir/test/lib/Interfaces/TilingInterface/CMakeLists.txt

This file was added.

				# Exclude tests from libMLIR.so
				add_mlir_library(MLIRTestTilingInterface
				TestTilingInterface.cpp

				EXCLUDE_FROM_LIBMLIR

				LINK_LIBS PUBLIC
				MLIRAffine
				MLIRGPUOps
				MLIRIR
				MLIRLinalgTransforms
				MLIRMemRef
				MLIRPass
				MLIRSCF
				MLIRStandard
				MLIRTransformUtils
				MLIRTensor
				MLIRTilingInterface
				MLIRTilingTransform
				MLIRTransformUtils
				)

				include_directories(${CMAKE_CURRENT_SOURCE_DIR}/../../Dialect/Test)
				include_directories(${CMAKE_CURRENT_BINARY_DIR}/../../Dialect/Test)

mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp

This file was added.

				//===- TestTilingInterface.cpp - Test pass for Tiling Interface patterns --===//
				//
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I am having trouble understanding the purpose of this file. You propose a new tiling interface op that can generalize in the future, this is great. I do not understand why this interface is not hooked up to the existing tiling passes. Even if this is meant as a transient state, this introduces 3 extra "one-off" tiling implementations + extra patterns: TestFullSizeOutputTileTilingInterface::getTiledImplementation TestMixedReduceParallelTilingInterface::getTiledImplementation InsertSliceTilingInterface::getTiledImplementation History shows that this will turn into more dead code in the end (either this file or the existing tiling). I have not yet seen evidence that we want to buy into these implementations of tiling. The interface is one thing that I support as we previously discussed, the usage of the interface as proposed in this revision not so much. nicolasvasilache: I am having trouble understanding the purpose of this file. You propose a new tiling interface…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions Agreed. To repeat from above. I wasnt sure I wanted to change the existing tiling algorithm to use the interface in one-shot. So tried to stage it with a separate tiling algorithm that uses the interface. Lets talk about how we can stage this process. mravishankar: Agreed. To repeat from above. I wasnt sure I wanted to change the existing tiling algorithm to…
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file tests the tiling transformations implemented using TilingInterface.
				//
				//===----------------------------------------------------------------------===//

				#include "TestDialect.h"
				#include "mlir/Dialect/Affine/IR/AffineOps.h"
				#include "mlir/Dialect/GPU/GPUDialect.h"
				#include "mlir/Dialect/Linalg/TilingInterface/Tiling.h"
				#include "mlir/Dialect/MemRef/IR/MemRef.h"
				#include "mlir/Dialect/SCF/SCF.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/IR/Matchers.h"
				#include "mlir/IR/PatternMatch.h"
				#include "mlir/Interfaces/TilingInterface.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
				#include "llvm/ADT/TypeSwitch.h"

				using namespace mlir;

				//===----------------------------------------------------------------------===//
				// Utility methods to convert from `OpFoldResult` to `Value`
				//===----------------------------------------------------------------------===//

				/// Converts an `OpFoldResult` to a `Value` by building a constant op if
				/// if the `OpFoldResult` is an `IntegerAttr`.
				static Value getValue(OpBuilder &builder, Location loc,
				OpFoldResult valueOrAttr) {
				if (auto attr = valueOrAttr.dyn_cast<Attribute>()) {
				return builder.create<ConstantIndexOp>(loc,
				attr.cast<IntegerAttr>().getInt());
				}
				return valueOrAttr.get<Value>();
				}

				/// Returns the constant value in `valueOrAttr` if it is not a dynamic `Value`.
				static Optional<int64_t> getConstantValue(OpFoldResult valueOrAttr) {
				if (auto attr = valueOrAttr.dyn_cast<Attribute>())
				return attr.cast<IntegerAttr>().getInt();
				return {};
				}

				springermUnsubmitted Done Reply Inline Actions Not exactly the same, but you could probably use `getConstantIntValue` from StaticValueUtils. springerm: Not exactly the same, but you could probably use `getConstantIntValue` from StaticValueUtils.
				/// Checks if `valueOrAttr` represents a constant value `val`.
				static bool isValue(OpFoldResult valueOrAttr, int64_t val) {
				auto attr = valueOrAttr.dyn_cast<Attribute>();
				return attr && attr.cast<IntegerAttr>().getValue() == val;
				}

				/// Returns a memref.subview or a tensor.extract_slice based on the type of the
				/// `source`.
				static Value getSlice(OpBuilder &b, Location loc, Value source,
				ArrayRef<OpFoldResult> offsets,
				ArrayRef<OpFoldResult> sizes,
				ArrayRef<OpFoldResult> strides) {
				return TypeSwitch<Type, Value>(source.getType())
				.Case<RankedTensorType>([&](RankedTensorType t) -> Value {
				return b.create<tensor::ExtractSliceOp>(loc, source, offsets, sizes,
				strides);
				})
				.Case<MemRefType>([&](MemRefType type) -> Value {
				return b.create<memref::SubViewOp>(loc, source, offsets, sizes,
				strides);
				})
				.Default([&](Type t) { return nullptr; });
				}
				springermUnsubmitted Not Done Reply Inline Actions nit: `getDimValue` uses `Value()` springerm: nit: `getDimValue` uses `Value()`

				static Value getDimValue(OpBuilder &builder, Location loc, Value v,
				int64_t dim) {
				return TypeSwitch<Type, Value>(v.getType())
				.Case<RankedTensorType>([&](RankedTensorType t) -> Value {
				return builder.create<tensor::DimOp>(loc, v, dim);
				})
				.Case<MemRefType>([&](MemRefType t) -> Value {
				return builder.create<memref::DimOp>(loc, v, dim);
				})
				.Default([&](Type t) { return Value(); });
				}

				static OpFoldResult getDim(OpBuilder &builder, Location loc, Value v,
				int64_t dim) {
				auto t = v.getType().cast<ShapedType>();
				if (t.isDynamicDim(dim))
				return getDimValue(builder, loc, v, dim);
				return builder.getI64IntegerAttr(t.getDimSize(dim));
				}

				//===----------------------------------------------------------------------===//
				// Tiling Interface for Test Dialect operations.
				//===----------------------------------------------------------------------===//

				namespace {
				struct TestScatterTilingInterface
				: public TilingInterface::ExternalModel<TestScatterTilingInterface,
				test::TestScatter> {
				SmallVector<Value> getDestinationOperands(Operation *op, OpBuilder &b) const {
				return {cast<test::TestScatter>(op).source()};
				}

				SmallVector<StringRef> getLoopIteratorTypes(Operation *op) const {
				auto testOp = cast<test::TestScatter>(op);
				SmallVector<StringRef> iteratorTypes(
				testOp.update().getType().cast<ShapedType>().getRank(),
				getParallelIteratorTypeName());
				return iteratorTypes;
				}

				SmallVector<Range> getLoopBounds(Operation *op, OpBuilder &builder) const {
				auto testOp = cast<test::TestScatter>(op);
				Location loc = op->getLoc();
				Value zero = builder.create<ConstantIndexOp>(loc, 0);
				Value one = builder.create<ConstantIndexOp>(loc, 1);
				SmallVector<Range> ranges;
				Value updates = testOp.update();
				for (auto dim : llvm::seq<int64_t>(
				0, updates.getType().cast<ShapedType>().getRank())) {
				Value ub = getDimValue(builder, loc, updates, dim);
				ranges.emplace_back(Range{zero, ub, one});
				}
				return ranges;
				}

				Operation getTiledImplementation(Operation op, OpBuilder &builder,
				ValueRange dest,
				ArrayRef<OpFoldResult> offsets,
				ArrayRef<OpFoldResult> sizes,
				SmallVectorImpl<Value> &results) const {
				auto testOp = cast<test::TestScatter>(op);
				Location loc = op->getLoc();
				auto zeroAttr = builder.getI64IntegerAttr(0);
				auto oneAttr = builder.getI64IntegerAttr(1);

				// Slice of the updates.
				Value updates = testOp.update();
				auto updateRank = updates.getType().cast<ShapedType>().getRank();
				SmallVector<OpFoldResult> updateStrides(updateRank, oneAttr);
				Value tiledUpdate =
				getSlice(builder, loc, updates, offsets, sizes, updateStrides);
				assert(tiledUpdate && "failed to get slice of update");

				// Slice of indices.
				Value indices = testOp.indices();
				auto indicesRank = indices.getType().cast<ShapedType>().getRank();
				SmallVector<OpFoldResult> indicesOffsets(indicesRank, zeroAttr);
				SmallVector<OpFoldResult> indicesSizes(indicesRank, zeroAttr);
				indicesOffsets[0] = offsets[0];
				indicesSizes[0] = sizes[0];
				for (auto dim : llvm::seq<int64_t>(1, indicesRank))
				indicesSizes[dim] = getDim(builder, loc, indices, dim);
				SmallVector<OpFoldResult> indicesStrides(indicesRank, oneAttr);
				Value tiledIndices = getSlice(builder, loc, indices, indicesOffsets,
				indicesSizes, indicesStrides);
				assert(tiledIndices && "failed to get slice of indices");

				// Slice of the original
				Value source = testOp.source();
				auto sourceRank = source.getType().cast<ShapedType>().getRank();
				SmallVector<OpFoldResult> sourceOffsets(sourceRank, zeroAttr);
				SmallVector<OpFoldResult> sourceSizes(sourceRank);
				for (auto dim : llvm::seq<int64_t>(0, sourceRank - updateRank + 1))
				sourceSizes[dim] = getDim(builder, loc, source, dim);
				for (auto dim :
				llvm::seq<int64_t>(sourceRank - updateRank + 1, sourceRank)) {
				sourceOffsets[dim] = offsets[dim - (sourceRank - updateRank)];
				sourceSizes[dim] = sizes[dim - (sourceRank - updateRank)];
				}
				SmallVector<OpFoldResult> sourceStrides(sourceRank, oneAttr);
				Value tiledSource = getSlice(builder, loc, dest[0], sourceOffsets,
				sourceSizes, sourceStrides);
				assert(tiledSource && "failed to get slice of source tensor");

				SmallVector<Type> resultTypes;
				if (op->getNumResults()) {
				resultTypes.push_back(tiledSource.getType());
				}
				Operation *tiledOp = builder.create<test::TestScatter>(
				loc, resultTypes, tiledUpdate, tiledIndices, tiledSource);
				for (auto result : llvm::enumerate(tiledOp->getResults())) {
				auto insertSliceOp = builder.create<tensor::InsertSliceOp>(
				loc, result.value(), dest[0], sourceOffsets, sourceSizes,
				sourceStrides);
				results.push_back(insertSliceOp.getResult());
				}
				return tiledOp;
				}
				};

				struct TestSortTilingInterface
				: public TilingInterface::ExternalModel<TestSortTilingInterface,
				test::TestSort> {
				SmallVector<Value> getDestinationOperands(Operation *op, OpBuilder &b) const {
				return cast<test::TestSort>(op).sources();
				}

				SmallVector<StringRef> getLoopIteratorTypes(Operation *op) const {
				auto testOp = cast<test::TestSort>(op);
				// All loops except the dimension to sort along are parallel.
				int64_t operandRank =
				testOp.sources()[0].getType().cast<ShapedType>().getRank();
				SmallVector<StringRef> iteratorTypes(operandRank,
				getParallelIteratorTypeName());
				iteratorTypes[testOp.reduce_dim()] = getReductionIteratorTypeName();
				return iteratorTypes;
				}

				SmallVector<Range> getLoopBounds(Operation *op, OpBuilder &builder) const {
				auto testOp = cast<test::TestSort>(op);
				int64_t operandRank =
				testOp.sources()[0].getType().cast<ShapedType>().getRank();
				SmallVector<Range> loopBounds(operandRank);
				Location loc = op->getLoc();
				Value zero = builder.create<ConstantIndexOp>(loc, 0);
				Value one = builder.create<ConstantIndexOp>(loc, 1);
				Value source = testOp.sources()[0];
				for (auto dim : llvm::seq<int64_t>(0, operandRank)) {
				loopBounds[dim].offset = zero;
				loopBounds[dim].size = getDimValue(builder, loc, source, dim);
				loopBounds[dim].stride = one;
				}
				return loopBounds;
				}

				Operation getTiledImplementation(Operation op, OpBuilder &builder,
				ValueRange dest,
				ArrayRef<OpFoldResult> offsets,
				ArrayRef<OpFoldResult> sizes,
				SmallVectorImpl<Value> &results) const {
				auto testOp = cast<test::TestSort>(op);
				assert(dest.size() == testOp.sources().size());
				int64_t rank = testOp.sources()[0].getType().cast<ShapedType>().getRank();
				assert(offsets.size() == static_cast<size_t>(rank) &&
				sizes.size() == static_cast<size_t>(rank));
				auto oneAttr = builder.getI64IntegerAttr(1);
				SmallVector<OpFoldResult> strides(rank, oneAttr);
				Location loc = op->getLoc();
				SmallVector<Value> tiledOperands(dest.size());
				for (auto en : llvm::enumerate(dest)) {
				tiledOperands[en.index()] =
				getSlice(builder, loc, en.value(), offsets, sizes, strides);
				assert(tiledOperands[en.index()] && "failed to get slice of operand");
				}
				SmallVector<Type, 4> resultTypes;
				if (op->getNumResults()) {
				resultTypes = llvm::to_vector<4>(
				llvm::map_range(tiledOperands, [&](Value v) { return v.getType(); }));
				}
				Operation *tiledOp = builder.create<test::TestSort>(
				loc, resultTypes, tiledOperands, testOp.reduce_dim());
				for (auto result : llvm::enumerate(tiledOp->getResults())) {
				auto insertSliceOp = builder.create<tensor::InsertSliceOp>(
				loc, result.value(), dest[result.index()], offsets, sizes, strides);
				results.push_back(insertSliceOp.getResult());
				}
				return tiledOp;
				}
				};

				//===----------------------------------------------------------------------===//
				// Interface implementations for external operations.
				//===----------------------------------------------------------------------===//

				struct InsertSliceTilingInterface
				: public TilingInterface::ExternalModel<InsertSliceTilingInterface,
				tensor::InsertSliceOp> {
				SmallVector<Value> getDestinationOperands(Operation *op, OpBuilder &b) const {
				return {cast<tensor::InsertSliceOp>(op).dest()};
				}

				SmallVector<StringRef> getLoopIteratorTypes(Operation *op) const {
				auto insertSliceOp = cast<tensor::InsertSliceOp>(op);
				return SmallVector<StringRef>(insertSliceOp.getSourceType().getRank(),
				getParallelIteratorTypeName());
				}

				SmallVector<Range> getLoopBounds(Operation *op, OpBuilder &b) const {
				auto insertSliceOp = cast<tensor::InsertSliceOp>(op);
				Value source = insertSliceOp.source();
				RankedTensorType sourceType = insertSliceOp.getSourceType();
				Location loc = op->getLoc();
				Value zero = b.create<ConstantIndexOp>(loc, 0);
				Value one = b.create<ConstantIndexOp>(loc, 1);
				SmallVector<Range> loopBounds(sourceType.getRank(),
				Range{zero, nullptr, one});
				for (auto dim :
				llvm::seq<int64_t>(0, insertSliceOp.getSourceType().getRank()))
				loopBounds[dim].size = b.create<tensor::DimOp>(loc, source, dim);
				return loopBounds;
				}

				Operation getTiledImplementation(Operation op, OpBuilder &b,
				ValueRange dest,
				ArrayRef<OpFoldResult> offsets,
				ArrayRef<OpFoldResult> sizes,
				SmallVector<Value> &results) const {
				auto insertOp = cast<tensor::InsertSliceOp>(op);
				// Compute a subtensor of the source based on the offsets.
				auto opStrides = insertOp.getMixedStrides();
				if (!llvm::all_of(opStrides, [&](OpFoldResult valueOrAttr) {
				return isValue(valueOrAttr, 1);
				})) {
				op->emitOpError("unable to tile operation with non-unit stride");
				return nullptr;
				}
				Location loc = insertOp.getLoc();
				auto oneAttr = b.getI64IntegerAttr(1);
				SmallVector<OpFoldResult> strides(offsets.size(), oneAttr);
				auto extractSliceOp = b.create<tensor::ExtractSliceOp>(
				loc, insertOp.source(), offsets, sizes, strides);

				// The offsets for the insert is based on the op offsets plus the offsets of
				// the loops passed in.
				auto opOffsets = insertOp.getMixedOffsets();
				auto opSizes = insertOp.getMixedSizes();
				unsigned offsetIndex = 0;
				ArrayRef<int64_t> sourceShape = insertOp.getSourceType().getShape();
				int64_t destRank = insertOp.getType().getRank();
				SmallVector<OpFoldResult> resultOffsets(destRank);
				SmallVector<OpFoldResult> resultSizes(destRank);
				auto zeroAttr = b.getI64IntegerAttr(0);
				for (auto opOffset : llvm::enumerate(opOffsets)) {
				// Check for rank-reducing by checking that
				// 1) The corresponding opSize value is 1
				// 2) The current rank of the source is not 1.
				// Then the opOffset is for the rank-reduced dimension. Skip.
				unsigned opOffsetIndex = opOffset.index();
				if (isValue(opSizes[opOffsetIndex], 1) && sourceShape[offsetIndex] != 1) {
				resultOffsets[opOffsetIndex] = zeroAttr;
				resultSizes[opOffsetIndex] = oneAttr;
				continue;
				}
				OpFoldResult opOffsetVal = opOffset.value();
				OpFoldResult offset = offsets[offsetIndex];
				if (opOffsetVal.is<Attribute>() && offset.is<Attribute>()) {
				resultOffsets[opOffsetIndex] = b.getI64IntegerAttr(
				getConstantValue(opOffsetVal) + getConstantValue(offset));
				} else {
				AffineMap map = AffineMap::get(
				1, 1, {b.getAffineDimExpr(0) + b.getAffineSymbolExpr(0)});
				resultOffsets[opOffsetIndex] =
				b.create<AffineApplyOp>(loc, map,
				ValueRange{getValue(b, loc, offset),
				getValue(b, loc, opOffsetVal)})
				.getResult();
				}
				resultSizes[opOffsetIndex] = sizes[offsetIndex];
				offsetIndex++;
				}
				SmallVector<OpFoldResult> resultStrides(destRank, oneAttr);
				auto tiledInsertOp = b.create<tensor::InsertSliceOp>(
				loc, extractSliceOp.result(), dest[0], resultOffsets, resultSizes,
				resultStrides);
				results.push_back(tiledInsertOp.result());
				return extractSliceOp;
				}
				};

				template <typename OpTy>
				struct TestOpTilingPattern : public TilingInterfaceBasePattern {
				TestOpTilingPattern(MLIRContext *context, linalg::LinalgTilingOptions options,
				linalg::LinalgTransformationFilter filter =
				linalg::LinalgTransformationFilter(),
				PatternBenefit benefit = 1)
				: TilingInterfaceBasePattern(OpTy::getOperationName(), context, options,
				filter, benefit) {}

				LogicalResult matchAndRewrite(Operation *op,
				PatternRewriter &rewriter) const override {
				auto tilableOp = dyn_cast<TilingInterface>(op);
				if (!tilableOp)
				return failure();
				TiledOp tiledOp;
				// Check for failure.
				if (failed(TilingInterfaceBasePattern::matchAndRewriteBase(
				tilableOp, rewriter, tiledOp)))
				return failure();

				// Check for do-nothing case.
				if (!tiledOp.op)
				return failure();
				if (tiledOp.op != op) {
				if (tiledOp.results.empty())
				rewriter.eraseOp(op);
				else
				rewriter.replaceOp(op, tiledOp.results);
				}
				return success();
				}
				};

				struct TestTilingInterfacePass
				: public PassWrapper<TestTilingInterfacePass, FunctionPass> {
				StringRef getArgument() const final { return "test-tiling-interface"; }
				StringRef getDescription() const final { return "Test Tiling Interface."; }
				TestTilingInterfacePass() = default;
				TestTilingInterfacePass(const TestTilingInterfacePass &pass) {}
				void getDependentDialects(DialectRegistry &registry) const override {
				registry.insert<AffineDialect, gpu::GPUDialect, memref::MemRefDialect,
				StandardOpsDialect, tensor::TensorDialect,
				test::TestDialect, scf::SCFDialect>();
				}

				LogicalResult initialize(MLIRContext *context) override;
				void runOnFunction() override;
				};
				} // namespace

				void TestTilingInterfacePass::runOnFunction() {
				FuncOp funcOp = getOperation();
				MLIRContext *context = funcOp.getContext();

				RewritePatternSet patterns(context);
				patterns.add<TestOpTilingPattern<test::TestScatter>>(
				context, linalg::LinalgTilingOptions().setTileSizes({10, 20}),
				linalg::LinalgTransformationFilter(
				Identifier::get("tiling_input", context),
				Identifier::get("tiling_output", context)));
				patterns.add<TestOpTilingPattern<test::TestScatter>>(
				context, linalg::LinalgTilingOptions().setTileSizes(ArrayRef<int64_t>{0}),
				linalg::LinalgTransformationFilter(
				Identifier::get("no_tiling_input", context),
				Identifier::get("no_tiling_output", context)));

				patterns.add<TestOpTilingPattern<test::TestSort>>(
				context, linalg::LinalgTilingOptions().setTileSizes({0, 20}),
				linalg::LinalgTransformationFilter(
				Identifier::get("outer_reduce_input", context),
				Identifier::get("outer_reduce_output", context)));
				patterns.add<TestOpTilingPattern<test::TestSort>>(
				context, linalg::LinalgTilingOptions().setTileSizes({10, 0, 0}),
				linalg::LinalgTransformationFilter(
				Identifier::get("inner_reduce_input", context),
				Identifier::get("inner_reduce_output", context)));

				static linalg::LinalgLoopDistributionOptions workgroupDistributionOptions = {
				[](OpBuilder &builder, Location loc, ArrayRef<Range> parallelLoopRanges) {
				auto numParallelDims = parallelLoopRanges.size();

				SmallVector<linalg::ProcInfo, 3> procInfo(numParallelDims);
				Type indexType = builder.getIndexType();
				std::string dimStr[3] = {"x", "y", "z"};
				for (size_t dim = 0; dim < numParallelDims; ++dim) {
				StringAttr attr = builder.getStringAttr(dimStr[dim]);
				procInfo[numParallelDims - dim - 1] = {
				builder.create<gpu::BlockIdOp>(loc, indexType, attr),
				builder.create<gpu::GridDimOp>(loc, indexType, attr)};
				}
				return procInfo;
				},
				{linalg::DistributionMethod::Cyclic, linalg::DistributionMethod::Cyclic,
				linalg::DistributionMethod::Cyclic},
				DenseMap<StringRef,
				std::function<linalg::ProcInfo(OpBuilder &, Location)>>()};

				patterns.add<TestOpTilingPattern<test::TestScatter>,
				TestOpTilingPattern<test::TestSort>>(
				context,
				linalg::LinalgTilingOptions()
				.setTileSizes(ArrayRef<int64_t>{10, 0, 30})
				.setDistributionOptions(workgroupDistributionOptions),
				linalg::LinalgTransformationFilter(
				Identifier::get("distribute_input", context),
				Identifier::get("distribute_output", context)));

				patterns.add<TestOpTilingPattern<tensor::InsertSliceOp>>(
				context, linalg::LinalgTilingOptions().setTileSizes({10, 20}),
				linalg::LinalgTransformationFilter(
				Identifier::get("tiling_input", context),
				Identifier::get("tiling_output", context)));

				if (failed(applyPatternsAndFoldGreedily(funcOp, std::move(patterns))))
				return signalPassFailure();
				}

				LogicalResult TestTilingInterfacePass::initialize(MLIRContext *context) {
				tensor::InsertSliceOp::attachInterface<InsertSliceTilingInterface>(*context);
				test::TestScatter::attachInterface<TestScatterTilingInterface>(*context);
				test::TestSort::attachInterface<TestSortTilingInterface>(*context);
				return success();
				}

				namespace mlir {
				namespace test {
				void registerTestTilingInterfacePass() {
				PassRegistration<TestTilingInterfacePass>();
				}
				} // namespace test
				} // namespace mlir

mlir/tools/mlir-opt/CMakeLists.txt

Show All 23 Lines	set(test_libs
MLIRStandardOpsTestPasses		MLIRStandardOpsTestPasses
MLIRVectorTestPasses		MLIRVectorTestPasses
MLIRTestAnalysis		MLIRTestAnalysis
MLIRTestDialect		MLIRTestDialect
MLIRTestIR		MLIRTestIR
MLIRTestPass		MLIRTestPass
MLIRTestReducer		MLIRTestReducer
MLIRTestRewrite		MLIRTestRewrite
		MLIRTestTilingInterface
MLIRTestTransforms		MLIRTestTransforms
)		)
endif()		endif()

set(LIBS		set(LIBS
${dialect_libs}		${dialect_libs}
${conversion_libs}		${conversion_libs}
${test_libs}		${test_libs}
Show All 33 Lines

mlir/tools/mlir-opt/mlir-opt.cpp

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
void registerTestMemRefStrideCalculation();		void registerTestMemRefStrideCalculation();
void registerTestNumberOfBlockExecutionsPass();		void registerTestNumberOfBlockExecutionsPass();
void registerTestNumberOfOperationExecutionsPass();		void registerTestNumberOfOperationExecutionsPass();
void registerTestOpaqueLoc();		void registerTestOpaqueLoc();
void registerTestPDLByteCodePass();		void registerTestPDLByteCodePass();
void registerTestPreparationPassWithAllowedMemrefResults();		void registerTestPreparationPassWithAllowedMemrefResults();
void registerTestRecursiveTypesPass();		void registerTestRecursiveTypesPass();
void registerTestSCFUtilsPass();		void registerTestSCFUtilsPass();
		void registerTestTilingInterfacePass();
void registerTestVectorConversions();		void registerTestVectorConversions();
} // namespace test		} // namespace test
} // namespace mlir		} // namespace mlir

namespace test {		namespace test {
void registerTestDialect(DialectRegistry &);		void registerTestDialect(DialectRegistry &);
} // namespace test		} // namespace test

▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	#endif
mlir::test::registerTestMemRefDependenceCheck();		mlir::test::registerTestMemRefDependenceCheck();
mlir::test::registerTestMemRefStrideCalculation();		mlir::test::registerTestMemRefStrideCalculation();
mlir::test::registerTestNumberOfBlockExecutionsPass();		mlir::test::registerTestNumberOfBlockExecutionsPass();
mlir::test::registerTestNumberOfOperationExecutionsPass();		mlir::test::registerTestNumberOfOperationExecutionsPass();
mlir::test::registerTestOpaqueLoc();		mlir::test::registerTestOpaqueLoc();
mlir::test::registerTestPDLByteCodePass();		mlir::test::registerTestPDLByteCodePass();
mlir::test::registerTestRecursiveTypesPass();		mlir::test::registerTestRecursiveTypesPass();
mlir::test::registerTestSCFUtilsPass();		mlir::test::registerTestSCFUtilsPass();
		mlir::test::registerTestTilingInterfacePass();
mlir::test::registerTestVectorConversions();		mlir::test::registerTestVectorConversions();
}		}
#endif		#endif

int main(int argc, char **argv) {		int main(int argc, char **argv) {
registerAllPasses();		registerAllPasses();
#ifdef MLIR_INCLUDE_TESTS		#ifdef MLIR_INCLUDE_TESTS
registerTestPasses();		registerTestPasses();
Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add an interface to allow operations to specify how they can be tiled.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 367654

mlir/include/mlir/Dialect/Linalg/TilingInterface/Tiling.h

mlir/include/mlir/Interfaces/CMakeLists.txt

mlir/include/mlir/Interfaces/TilingInterface.h

mlir/include/mlir/Interfaces/TilingInterface.td

mlir/lib/Dialect/Linalg/CMakeLists.txt

mlir/lib/Dialect/Linalg/TilingInterface/CMakeLists.txt

mlir/lib/Dialect/Linalg/TilingInterface/Tiling.cpp

mlir/lib/Interfaces/CMakeLists.txt

mlir/lib/Interfaces/TilingInterface.cpp

mlir/test/Interfaces/TilingInterface/tiling.mlir

mlir/test/lib/CMakeLists.txt

mlir/test/lib/Dialect/Test/TestDialect.h

mlir/test/lib/Dialect/Test/TestOps.td

mlir/test/lib/Interfaces/CMakeLists.txt

mlir/test/lib/Interfaces/TilingInterface/CMakeLists.txt

mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp

mlir/tools/mlir-opt/CMakeLists.txt

mlir/tools/mlir-opt/mlir-opt.cpp

[mlir] Add an interface to allow operations to specify how they can be tiled.
AbandonedPublic