This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/
-
mlir/
-
Dialect/
-
Linalg/
-
TransformOps/
3
LinalgTransformOps.td
-
Transforms/
1
Transforms.h
-
Utils/
-
Utils.h
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
TransformOps/
-
CMakeLists.txt
1
LinalgTransformOps.cpp
-
Transforms/
7
Tiling.cpp
-
Utils/
-
Utils.cpp
-
python/mlir/dialects/
-
mlir/
-
dialects/
-
_structured_transform_ops_ext.py
-
test/
-
Dialect/Linalg/
-
Linalg/
-
transform-op-tile-multisize.mlir
-
python/dialects/
-
dialects/
-
transform_structured_ext.py
-
utils/bazel/llvm-project-overlay/mlir/
-
bazel/
-
llvm-project-overlay/
-
mlir/
-
BUILD.bazel

Differential D128443

[mlir] introduce multi-sized tiling transformation
AbandonedPublic

Authored by ftynse on Jun 23 2022, 7:23 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
shabalin

Summary

Introduce a new transformation on structured ops that tiles the iteration space
using two different tile sizes after splitting it into two parts. The tile
sizes are computed dynamically to be less than some target value and so that
both parts only contain full tiles, and may be required to be divisble by a
certain (constant) value. This provides an alternative to padding and peeling
when the iteration space is not perfectly divisble by the target tile size,
with better guarantees on memory footprint.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ftynse created this revision.Jun 23 2022, 7:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 7:23 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 22 others. · View Herald Transcript

ftynse requested review of this revision.Jun 23 2022, 7:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2022, 7:23 AM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B171607: Diff 439390.Jun 23 2022, 7:40 AM

High level comment, could we use TilingInterface for this. The advantage is that you don't need to couple this to the looping construct. I hope to purge the implicit coupling of scf within the implementation of tiling in Linalg. I found that separating the looping constructs from the tiling implementation makes things much simpler. It might help here as well

In D128443#3606656, @mravishankar wrote:

High level comment, could we use TilingInterface for this. The advantage is that you don't need to couple this to the looping construct. I hope to purge the implicit coupling of scf within the implementation of tiling in Linalg. I found that separating the looping constructs from the tiling implementation makes things much simpler. It might help here as well

I looked at the interface and it seems that it would be more involved to implement this over the interface rather than here because of the exponentially expanding loop structure. So it would be preferable to me to have the actual tiling approach reviewed and exercised first, before eventually porting it to the interface in a separate commit.

In D128443#3607624, @ftynse wrote:

In D128443#3606656, @mravishankar wrote:

High level comment, could we use TilingInterface for this. The advantage is that you don't need to couple this to the looping construct. I hope to purge the implicit coupling of scf within the implementation of tiling in Linalg. I found that separating the looping constructs from the tiling implementation makes things much simpler. It might help here as well

I looked at the interface and it seems that it would be more involved to implement this over the interface rather than here because of the exponentially expanding loop structure. So it would be preferable to me to have the actual tiling approach reviewed and exercised first, before eventually porting it to the interface in a separate commit.

+1 we discussed offline and a separate commit is fine.

I wonder if there is an opportunity to split this PR in a few smaller and more composable pieces.

In particular, I would have a use of a general mechanism that splits Linalg ops into smalller Linalg ops that are known to be multiples of X and Y along each dimension while still remaining in Linalg land.
This would compose naturally with existing tiling transforms, reducing the need to port to TilingInterface by reusing as much as possible and gaining tiling to the different types of loops.

Staying in Linalg land also allows say ? -> ?x8 + ?x7 which makes it possible to implement tiling as an N-D Linalg -> 2*N-D Linalg transformation (instead of the current N-D Linalg -> N-D loops + N-D Linalg).
This will also open opportunities for distribution and other fun higher-order stuff.

So please, consider generalizing :)

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
186	should
192	imperfectly
199	The note deserves its own paragraph for highlighting I believe.
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
142	appearance
mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
325	Can we use `affine_apply` expressions that would make these quantities nicer to parse (and later compose) ?
411	Same Q here, we could literally write something like: create<AffineApply>( {iv * highTileSize + lowTrip * lowTileSize}, ...) and they would render as such. Despite the artificial limitations on dim/sym classification, I find the construct goo to carry higher-level semantic information on such quantities.
456	emit ? oh my .. :) :p

nicolasvasilache added inline comments.Jun 24 2022, 9:59 AM

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
448	Reviewed this part, LGTM.

nicolasvasilache added inline comments.Jun 24 2022, 10:21 AM

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
307	computing.
310	I would add an example that mentions a rewrite such as `? -> ?8 + ?7`.
325	Just to be sure that I don't misinterpret the computation: this is the classical "load balancing" trick right ? I.e. instead of decomposing 801 into 8100 + 1, one would decompose in 8 94 + 7 * 7 ? I.e. To avoid a shriveled last tile, you take the remainder (8 - K) out of the last tiles (you take exactly 1 from each tile, so you take 1 from 7 tiles in this case). The result is always N or N-1 sized tiles ? The way this is described lends to suggest that it may not always be N or N-1 which would be different. I know there is a paper but I find the description to not be super intuitive.

nicolasvasilache added inline comments.Jun 27 2022, 12:54 AM

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
325	Even simpler: 801 into 8100 + 1, one would decompose in 8 99 + 9 * 1. Just taking the mod and making the last mod tile +1. Not sure why I went for -1 instead of +1; +1 is the old form we used way back in the torch / lua days ...

This was implemented as a combination of structured.split and structured.multitile_sizes.

Herald added subscribers: hanchung, Moerafaat, zero9178, jsetoain. · View Herald TranscriptDec 12 2022, 4:08 AM

ftynse abandoned this revision.Dec 12 2022, 4:08 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.td

34 lines

Transforms/

Transforms.h

22 lines

Utils/

Utils.h

12 lines

lib/

Dialect/

Linalg/

TransformOps/

CMakeLists.txt

1 line

LinalgTransformOps.cpp

79 lines

Transforms/

Tiling.cpp

342 lines

Utils/

Utils.cpp

12 lines

python/

mlir/

dialects/

_structured_transform_ops_ext.py

20 lines

test/

Dialect/

Linalg/

transform-op-tile-multisize.mlir

261 lines

python/

dialects/

transform_structured_ext.py

12 lines

utils/

bazel/

llvm-project-overlay/

mlir/

BUILD.bazel

1 line

Diff 439390

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	let arguments = (ins PDL_Operation:$target,
DefaultValuedAttr<I64ArrayAttr, "{}">:$sizes,		DefaultValuedAttr<I64ArrayAttr, "{}">:$sizes,
DefaultValuedAttr<I64ArrayAttr, "{}">:$interchange);		DefaultValuedAttr<I64ArrayAttr, "{}">:$interchange);
let results = (outs PDL_Operation:$tiled_linalg_op,		let results = (outs PDL_Operation:$tiled_linalg_op,
Variadic<PDL_Operation>:$loops);		Variadic<PDL_Operation>:$loops);

let hasCustomAssemblyFormat = 1;		let hasCustomAssemblyFormat = 1;
}		}

		def TileMultiSizeOp : Op<Transform_Dialect, "structured.tile.multisize",
		[FunctionalStyleTransformOpTrait, MemoryEffectsOpInterface,
		DeclareOpInterfaceMethods<TransformOpInterface>]> {
		let description = [{
		Indicates that the multi-sized tiling transformation shuold be applied to
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions should nicolasvasilache: should
		the given target. This transformation partitions each dimension of the
		structured op's iteration space into two parts such that the shape of each
		part is perfectly divisible by some value not exceeding the corresponding
		"target size". The value itself is divisble by the corresponding "target
		size divisor" (defaults to 1 if not provided). This produces a tree of
		imprefectly nested loops, at the leaves of which 2^n smaller-sized
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions imperfectly nicolasvasilache: imperfectly
		structured operations are created, where n is the rank of original iteration
		space.

		The op requires at least as many target sizes as the target op has iteration
		space dimensions. Extra sizes are ignored. Target size divisors may be
		provided for either all or none of the tile sizes, and do not need to evenly
		divide the provided sizes. Note that zero tile sizes indicating the absence
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions The note deserves its own paragraph for highlighting I believe. nicolasvasilache: The note deserves its own paragraph for highlighting I believe.
		of tiling along the given dimension are not currently supported,
		therefore the tile sizes must be strictly positive.

		This op returns a handle to the flattened list of tiled ops, grouped by
		target op. For each op, the group of tiled ops covers the parts of the
		original iteration space in the lexicographical order of dimensions.
		}];

		let arguments = (ins PDL_Operation:$target,
		I64ArrayAttr:$target_sizes,
		DefaultValuedAttr<I64ArrayAttr, "{}">:$target_size_divisors);
		let results = (outs PDL_Operation:$tiled_linalg_ops);
		let assemblyFormat = "$target attr-dict";
		let hasVerifier = 1;
		}

def VectorizeOp : Op<Transform_Dialect, "structured.vectorize",		def VectorizeOp : Op<Transform_Dialect, "structured.vectorize",
[FunctionalStyleTransformOpTrait, MemoryEffectsOpInterface,		[FunctionalStyleTransformOpTrait, MemoryEffectsOpInterface,
TransformEachOpTrait, TransformOpInterface]> {		TransformEachOpTrait, TransformOpInterface]> {
let description = [{		let description = [{
Indicates that the given `target` op all the ops it contains should be		Indicates that the given `target` op all the ops it contains should be
vectorized with the configuration specified by the attributes of this op.		vectorized with the configuration specified by the attributes of this op.
This vectorization only handles structured ops that operate on shaped types		This vectorization only handles structured ops that operate on shaped types
and does not vectorize loops or straight-line. Internally, it applies a		and does not vectorize loops or straight-line. Internally, it applies a
Show All 24 Lines

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

	Show First 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
	struct TiledLinalgOp {			struct TiledLinalgOp {
	LinalgOp op;			LinalgOp op;
	SmallVector<Operation *, 8> loops;			SmallVector<Operation *, 8> loops;
	SmallVector<Value, 4> tensorResults;			SmallVector<Value, 4> tensorResults;
	};			};
	FailureOr<TiledLinalgOp> tileLinalgOp(RewriterBase &b, LinalgOp op,			FailureOr<TiledLinalgOp> tileLinalgOp(RewriterBase &b, LinalgOp op,
	const LinalgTilingOptions &options);			const LinalgTilingOptions &options);

				/// Perform multi-size tiling of a single LinalgOp with desired tile sizes
				/// `targetSizes`. Multi-size tiling dynamically computes, for each desired tile
				/// size, two smaller tile sizes T1, T2 such that some linear combination of
				/// full tiles covers the entire iteration space dimension. That is, given the
				/// iteration space dimension D, T1n + T2m == D where n, m are some integer
				/// values. If `targetSizeDivisors` are provided, the computed tile sizes will
				/// be divisible by the corresponding value; this is useful for vectorization.
				/// Multi-size tiling produces an imperfectly-nested loop structure with two
				/// loops at each tiled dimension, each with different tile size. The result
				/// contains a list of LinalgOp instances operating on smaller data tiles, in
				/// their order of appearnace in the IR, which corresponds to the
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions appearance nicolasvasilache: appearance
				/// lexicographical order of tile indices. It also contains the tensor-valued
				/// outputs that were used to replace the original operation.
				struct MultiSizedTilingResult {
				SmallVector<LinalgOp> tiledOps;
				ValueRange tensorResults;
				};
				FailureOr<MultiSizedTilingResult>
				multiSizeTileLinalgOp(RewriterBase &b, LinalgOp linalgOp,
				ValueRange targetSizes,
				ValueRange targetSizeDivisors = {});

	/// Peel and canonicalize 'loops'.			/// Peel and canonicalize 'loops'.
	void peelLoops(RewriterBase &rewriter, ArrayRef<scf::ForOp> loops);			void peelLoops(RewriterBase &rewriter, ArrayRef<scf::ForOp> loops);

	/// Peel the loops of a TiledLinalgOp.			/// Peel the loops of a TiledLinalgOp.
	void peelTiledLinalgOp(RewriterBase &rewriter, TiledLinalgOp &res,			void peelTiledLinalgOp(RewriterBase &rewriter, TiledLinalgOp &res,
	ArrayRef<int64_t> peeledLoops,			ArrayRef<int64_t> peeledLoops,
	LinalgTilingLoopType loopType);			LinalgTilingLoopType loopType);

	▲ Show 20 Lines • Show All 1,386 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	SmallVector<Value> computeTileOffsets(OpBuilder &b, Location loc,
ValueRange ivs, ValueRange tileSizes);		ValueRange ivs, ValueRange tileSizes);

/// Compute tile sizes, given a list of `tileSizes` and dimension		/// Compute tile sizes, given a list of `tileSizes` and dimension
/// sizes (`sizeBounds`). In case a tile size is zero (i.e., no tiling), the		/// sizes (`sizeBounds`). In case a tile size is zero (i.e., no tiling), the
/// corresponding result size is the corresponding value from `sizeBounds`.		/// corresponding result size is the corresponding value from `sizeBounds`.
/// Note: The returned tile sizes are closed intervals.		/// Note: The returned tile sizes are closed intervals.
SmallVector<Value> computeTileSizes(OpBuilder &b, Location loc,		SmallVector<Value> computeTileSizes(OpBuilder &b, Location loc,
ValueRange tileSizes,		ValueRange tileSizes,
ArrayRef<Value> sizeBounds);		ValueRange sizeBounds);

/// Creates an extract_slice/subview op for a single `valueToTile` with		/// Creates an extract_slice/subview op for a single `valueToTile` with
/// `builder`. This new operation extracts a tile of `valueToTile`, starting		/// `builder`. This new operation extracts a tile of `valueToTile`, starting
/// at offsets `lbs` and with sizes `subShapeSizes`. `omitPartialTileCheck`		/// at offsets `lbs` and with sizes `subShapeSizes`. `omitPartialTileCheck`
/// controls whether to omit the partial/boundary tile condition check in cases		/// controls whether to omit the partial/boundary tile condition check in cases
/// where we statically know that it is unnecessary.		/// where we statically know that it is unnecessary.
Value makeTiledShape(OpBuilder &builder, Location loc, Value valueToTile,		Value makeTiledShape(OpBuilder &builder, Location loc, Value valueToTile,
ValueRange tileSizes, AffineMap map, ValueRange lbs,		ValueRange tileSizes, AffineMap map, ValueRange lbs,
ValueRange ubs, ValueRange subShapeSizes,		ValueRange ubs, ValueRange subShapeSizes,
bool omitPartialTileCheck);		bool omitPartialTileCheck);

/// Creates extract_slice/subview ops for all `valuesToTile` of the given		/// Creates extract_slice/subview ops for all `valuesToTile` of the given
/// `linalgOp` with `builder`, assuming `linalgOp` is being fused into a loop		/// `linalgOp` with `builder`, assuming `linalgOp` is being fused into a loop
/// nest for tiling with the given induction variables `ivs` and tile sizes		/// nest for tiling with the given induction variables `ivs` and tile sizes
/// `tileSizes`. `sizeBounds` are the iteration space bounds for all the		/// `tileSizes`. `sizeBounds` are the iteration space bounds for all the
/// implicit loops in `linalgOp`. `omitPartialTileCheck` controls whether to		/// implicit loops in `linalgOp`. `omitPartialTileCheck` controls whether to
/// omit the partial/boundary tile condition check in cases where we statically		/// omit the partial/boundary tile condition check in cases where we statically
/// know that it is unnecessary.		/// know that it is unnecessary.
///		///
/// Note that a constant zero in `tileSizes` means no tiling at that implicit		/// Note that a constant zero in `tileSizes` means no tiling at that implicit
/// loop. The number of non-zero values in `tileSizes` should be equal to the		/// loop. The number of non-zero values in `tileSizes` should be equal to the
/// number of values in `ivs`.		/// number of values in `ivs`.
SmallVector<Value, 4> makeTiledShapes(OpBuilder &builder, Location loc,		SmallVector<Value, 4>
LinalgOp linalgOp,		makeTiledShapes(OpBuilder &builder, Location loc, LinalgOp linalgOp,
ArrayRef<Value> valuesToTile,		ValueRange valuesToTile, ValueRange ivs, ValueRange tileSizes,
ValueRange ivs, ValueRange tileSizes,		ValueRange sizeBounds, bool omitPartialTileCheck);
ArrayRef<Value> sizeBounds,
bool omitPartialTileCheck);

/// Add the tile loop induction variables `ivs` to the IndexOp results found in		/// Add the tile loop induction variables `ivs` to the IndexOp results found in
/// the body of the `tiledOp` to account for the tile offset.		/// the body of the `tiledOp` to account for the tile offset.
void addTileLoopIvsToIndexOpResults(OpBuilder &b, LinalgOp tiledOp,		void addTileLoopIvsToIndexOpResults(OpBuilder &b, LinalgOp tiledOp,
ArrayRef<Value> ivs);		ArrayRef<Value> ivs);

using FusableOpDependencesTy = llvm::MapVector<		using FusableOpDependencesTy = llvm::MapVector<
Operation *,		Operation *,
▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/TransformOps/CMakeLists.txt

	add_mlir_dialect_library(MLIRLinalgTransformOps			add_mlir_dialect_library(MLIRLinalgTransformOps
	LinalgTransformOps.cpp			LinalgTransformOps.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Linalg/TransformOps			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Linalg/TransformOps

	DEPENDS			DEPENDS
	MLIRLinalgTransformOpsIncGen			MLIRLinalgTransformOpsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
				MLIRArithmeticDialect
	MLIRIR			MLIRIR
	MLIRLinalgDialect			MLIRLinalgDialect
	MLIRLinalgTransforms			MLIRLinalgTransforms
	MLIRParser			MLIRParser
	MLIRPDLDialect			MLIRPDLDialect
	MLIRSideEffectInterfaces			MLIRSideEffectInterfaces
	MLIRTransformDialect			MLIRTransformDialect
	MLIRVectorDialect			MLIRVectorDialect
	)			)

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

//===- LinalgTransformOps.cpp - Implementation of Linalg transform ops ----===//		//===- LinalgTransformOps.cpp - Implementation of Linalg transform ops ----===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h"		#include "mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h"

		#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/PDL/IR/PDL.h"		#include "mlir/Dialect/PDL/IR/PDL.h"
#include "mlir/Dialect/PDL/IR/PDLTypes.h"		#include "mlir/Dialect/PDL/IR/PDLTypes.h"
#include "mlir/Dialect/Transform/IR/TransformDialect.h"		#include "mlir/Dialect/Transform/IR/TransformDialect.h"
#include "mlir/Parser/Parser.h"		#include "mlir/Parser/Parser.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines

void TileOp::print(OpAsmPrinter &p) {		void TileOp::print(OpAsmPrinter &p) {
p << ' ';		p << ' ';
p << getTarget();		p << getTarget();
p.printOptionalAttrDict((*this)->getAttrs());		p.printOptionalAttrDict((*this)->getAttrs());
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// TileMultiSizeOp
		//===----------------------------------------------------------------------===//

		/// Emits the arithmetic constant operations defining index-typed values for the
		/// given list of constants using the provided builder and location.
		static SmallVector<Value> emitIndexConstants(OpBuilder &b, Location loc,
		ArrayRef<int64_t> values) {
		return llvm::to_vector(llvm::map_range(values, [&](int64_t value) -> Value {
		return b.create<arith::ConstantIndexOp>(loc, value);
		}));
		}

		DiagnosedSilenceableFailure
		transform::TileMultiSizeOp::apply(TransformResults &results,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Reviewed this part, LGTM. nicolasvasilache: Reviewed this part, LGTM.
		TransformState &state) {
		SmallVector<int64_t> tileSizes = extractI64Array(getTargetSizes());
		SmallVector<int64_t> tileSizeDivisors =
		extractI64Array(getTargetSizeDivisors());
		ArrayRef<Operation *> targets = state.getPayloadOps(getTarget());

		// Each tiled dimension doubles the number of linalg ops produced.
		SmallVector<Operation *> flattenedResults;
		flattenedResults.reserve((2 << tileSizes.size()) * targets.size());
		for (Operation *target : targets) {
		auto linalgOp = dyn_cast<LinalgOp>(target);
		if (!linalgOp) {
		DiagnosedSilenceableFailure diag =
		emitSilenceableError() << "only applies to Linalg structured ops";
		diag.attachNote(target->getLoc()) << "target op";
		return diag;
		}

		// Multi-sized tiling is designed for dynamic tile sizes provided as values,
		// so emit the constants as operations and use them to configure the
		// transformation.
		SimpleRewriter rewriter(getContext());
		rewriter.setInsertionPoint(target);
		SmallVector<Value> dynamicTileSizes =
		emitIndexConstants(rewriter, target->getLoc(), tileSizes);
		SmallVector<Value> dynamicTileSizeDivisors =
		emitIndexConstants(rewriter, target->getLoc(), tileSizeDivisors);
		FailureOr<MultiSizedTilingResult> result = multiSizeTileLinalgOp(
		rewriter, linalgOp, dynamicTileSizes, dynamicTileSizeDivisors);
		if (failed(result)) {
		DiagnosedSilenceableFailure diag = emitSilenceableError()
		<< "failed to apply";
		diag.attachNote(target->getLoc()) << "target op";
		return diag;
		}

		llvm::append_range(flattenedResults, result->tiledOps);
		}
		results.set(getTiledLinalgOps().cast<OpResult>(), flattenedResults);
		return DiagnosedSilenceableFailure::success();
		}

		LogicalResult transform::TileMultiSizeOp::verify() {
		SmallVector<int64_t> tileSizes = extractI64Array(getTargetSizes());
		SmallVector<int64_t> tileSizeDivisors =
		extractI64Array(getTargetSizeDivisors());
		if (tileSizes.size() != tileSizeDivisors.size() &&
		!tileSizeDivisors.empty()) {
		return emitOpError() << "expects as many divisors as tile sizes or none";
		}

		auto is_non_positive = [](int64_t value) { return value <= 0; };
		if (llvm::any_of(tileSizes, is_non_positive))
		return emitOpError() << "expects tile sizes to be strictly positive";
		if (llvm::any_of(tileSizeDivisors, is_non_positive))
		return emitOpError() << "expects divisors to be strictly positive";
		return success();
		}

		//===----------------------------------------------------------------------===//
// VectorizeOp		// VectorizeOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

FailureOr<Operation > VectorizeOp::applyToOne(Operation target) {		FailureOr<Operation > VectorizeOp::applyToOne(Operation target) {
if (!target->hasTrait<OpTrait::IsIsolatedFromAbove>()) {		if (!target->hasTrait<OpTrait::IsIsolatedFromAbove>()) {
InFlightDiagnostic diag = emitOpError()		InFlightDiagnostic diag = emitOpError()
<< "applies only to isolated-from-above targets";		<< "applies only to isolated-from-above targets";
diag.attachNote(target->getLoc()) << "non-isolated target";		diag.attachNote(target->getLoc()) << "non-isolated target";
Show All 19 Lines	FailureOr<Operation > VectorizeOp::applyToOne(Operation target) {
return target;		return target;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Transform op registration		// Transform op registration
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {
/// Registers new ops and declares PDL as dependent dialect since the additional		/// Registers new ops and declares dependent dialects.
/// ops are using PDL types for operands and results.
class LinalgTransformDialectExtension		class LinalgTransformDialectExtension
: public transform::TransformDialectExtension<		: public transform::TransformDialectExtension<
LinalgTransformDialectExtension> {		LinalgTransformDialectExtension> {
public:		public:
LinalgTransformDialectExtension() {		LinalgTransformDialectExtension() {
		declareDependentDialect<arith::ArithmeticDialect>();
declareDependentDialect<pdl::PDLDialect>();		declareDependentDialect<pdl::PDLDialect>();
declareDependentDialect<scf::SCFDialect>();		declareDependentDialect<scf::SCFDialect>();
declareDependentDialect<vector::VectorDialect>();		declareDependentDialect<vector::VectorDialect>();
registerTransformOps<		registerTransformOps<
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp.inc"		#include "mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp.inc"
>();		>();
}		}
Show All 10 Lines

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp

Show All 20 Lines
#include "mlir/Dialect/SCF/Transforms/Transforms.h"		#include "mlir/Dialect/SCF/Transforms/Transforms.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Utils/IndexingUtils.h"		#include "mlir/Dialect/Utils/IndexingUtils.h"
#include "mlir/IR/AffineExpr.h"		#include "mlir/IR/AffineExpr.h"
#include "mlir/IR/AffineMap.h"		#include "mlir/IR/AffineMap.h"
#include "mlir/Transforms/FoldUtils.h"		#include "mlir/Transforms/FoldUtils.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

		#include "llvm/ADT/ScopeExit.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
		#include "llvm/Support/Debug.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::linalg;		using namespace mlir::linalg;
using namespace mlir::scf;		using namespace mlir::scf;

#define DEBUG_TYPE "linalg-tiling"		#define DEBUG_TYPE "linalg-tiling"
		#define DBGS() (llvm::dbgs() << "[" DEBUG_TYPE "] ")

static bool isZero(Value v) {		static bool isZero(Value v) {
if (auto cst = v.getDefiningOp<arith::ConstantIndexOp>())		if (auto cst = v.getDefiningOp<arith::ConstantIndexOp>())
return cst.value() == 0;		return cst.value() == 0;
return false;		return false;
}		}

std::tuple<SmallVector<Range, 4>, LoopIndexToRangeIndexMap>		std::tuple<SmallVector<Range, 4>, LoopIndexToRangeIndexMap>
Show All 33 Lines	for (auto &en : enumerate(allIvs)) {
auto rangeIndex = loopIndexToRangeIndex.find(en.index());		auto rangeIndex = loopIndexToRangeIndex.find(en.index());
if (rangeIndex == loopIndexToRangeIndex.end())		if (rangeIndex == loopIndexToRangeIndex.end())
continue;		continue;
en.value() = ivs[rangeIndex->second];		en.value() = ivs[rangeIndex->second];
}		}
addTileLoopIvsToIndexOpResults(b, op, allIvs);		addTileLoopIvsToIndexOpResults(b, op, allIvs);
}		}

// Insert a tile `source` into the destination tensor `dest`. The position at		/// Insert a tile `source` into the destination tensor `dest`. The position at
// which the tile is inserted (as well as size of tile) is taken from a given		/// which the tile is inserted (as well as size of tile) is taken from a given
// ExtractSliceOp `sliceOp`.		/// ExtractSliceOp `sliceOp`.
static Value insertSliceIntoTensor(RewriterBase &b, Location loc,		static Value insertSliceIntoTensor(OpBuilder &b, Location loc,
tensor::ExtractSliceOp sliceOp, Value source,		tensor::ExtractSliceOp sliceOp, Value source,
Value dest) {		Value dest) {
return b.create<tensor::InsertSliceOp>(		return b.create<tensor::InsertSliceOp>(
loc, sliceOp.source().getType(), source, dest, sliceOp.offsets(),		loc, sliceOp.source().getType(), source, dest, sliceOp.offsets(),
sliceOp.sizes(), sliceOp.strides(), sliceOp.static_offsets(),		sliceOp.sizes(), sliceOp.strides(), sliceOp.static_offsets(),
sliceOp.static_sizes(), sliceOp.static_strides());		sliceOp.static_sizes(), sliceOp.static_strides());
}		}

		/// Insert the result slices produced by the `tiled` op back into output tensor
		/// operands in case these operands are produced by slice extraction.
		static scf::ValueVector insertSlicesBack(OpBuilder &b, Location loc,
		LinalgOp tiled,
		ValueRange tiledOperands) {
		scf::ValueVector tensorResults;
		unsigned resultIdx = 0;
		for (OpOperand *opOperand : tiled.getOutputTensorOperands()) {
		// TODO: use an interface/adaptor to avoid leaking position in
		// `tiledOperands`.
		Value outputTensor = tiledOperands[opOperand->getOperandNumber()];
		if (auto sliceOp = outputTensor.getDefiningOp<tensor::ExtractSliceOp>()) {
		tensorResults.push_back(insertSliceIntoTensor(
		b, loc, sliceOp, tiled->getResult(resultIdx), sliceOp.source()));
		} else {
		tensorResults.push_back(tiled->getResult(resultIdx));
		}
		++resultIdx;
		}
		return tensorResults;
		}

		/// Clone the given `op` while adusting its result types to match those of
		/// values taken as output tensor operands.
		static LinalgOp cloneWithSubshapeOperands(OpBuilder &b, Location loc,
		LinalgOp op, ValueRange operands) {
		// TODO: use an interface/adaptor to avoid leaking position in `operands`.
		SmallVector<Type, 4> resultTensorTypes;
		for (OpOperand *opOperand : op.getOutputTensorOperands())
		resultTensorTypes.push_back(
		operands[opOperand->getOperandNumber()].getType());

		return op.clone(b, loc, resultTensorTypes, operands);
		}

template <typename LoopTy>		template <typename LoopTy>
static FailureOr<TiledLinalgOp>		static FailureOr<TiledLinalgOp>
tileLinalgOpImpl(RewriterBase &b, LinalgOp op, ValueRange tileSizes,		tileLinalgOpImpl(RewriterBase &b, LinalgOp op, ValueRange tileSizes,
const LinalgTilingOptions &options) {		const LinalgTilingOptions &options) {
auto nLoops = op.getNumLoops();		auto nLoops = op.getNumLoops();
// Initial tile sizes may be too big, only take the first nLoops.		// Initial tile sizes may be too big, only take the first nLoops.
tileSizes = tileSizes.take_front(nLoops);		tileSizes = tileSizes.take_front(nLoops);

▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	assert(operandValuesToUse.size() ==
static_cast<size_t>(op.getNumInputsAndOutputs()) &&		static_cast<size_t>(op.getNumInputsAndOutputs()) &&
"expect the number of operands and inputs and outputs to match");		"expect the number of operands and inputs and outputs to match");
SmallVector<Value> valuesToTile = operandValuesToUse;		SmallVector<Value> valuesToTile = operandValuesToUse;
auto sizeBounds =		auto sizeBounds =
applyMapToValues(b, loc, shapeSizesToLoopsMap, allShapeSizes);		applyMapToValues(b, loc, shapeSizesToLoopsMap, allShapeSizes);
SmallVector<Value, 4> tiledOperands =		SmallVector<Value, 4> tiledOperands =
makeTiledShapes(b, loc, op, valuesToTile, interchangedIvs, tileSizes,		makeTiledShapes(b, loc, op, valuesToTile, interchangedIvs, tileSizes,
sizeBounds, /omitPartialTileCheck=/false);		sizeBounds, /omitPartialTileCheck=/false);
		res = cloneWithSubshapeOperands(b, loc, op, tiledOperands);

// TODO: use an interface/adaptor to avoid leaking position in		// Insert an insert_slice for each output tensor.
// `tiledOperands`.		return insertSlicesBack(b, loc, res, tiledOperands);
SmallVector<Type, 4> resultTensorTypes;
for (OpOperand *opOperand : op.getOutputTensorOperands())
resultTensorTypes.push_back(
tiledOperands[opOperand->getOperandNumber()].getType());

res = op.clone(b, loc, resultTensorTypes, tiledOperands);

// Insert a insert_slice for each output tensor.
unsigned resultIdx = 0;
for (OpOperand *opOperand : op.getOutputTensorOperands()) {
// TODO: use an interface/adaptor to avoid leaking position in
// `tiledOperands`.
Value outputTensor = tiledOperands[opOperand->getOperandNumber()];
// TODO: Propagate RewriterBase everywhere.
IRRewriter rewriter(b);
if (auto sliceOp = outputTensor.getDefiningOp<tensor::ExtractSliceOp>()) {
tensorResults.push_back(insertSliceIntoTensor(rewriter, loc, sliceOp,
res->getResult(resultIdx),
sliceOp.source()));
} else {
tensorResults.push_back(res->getResult(resultIdx));
}
++resultIdx;
}
return scf::ValueVector(tensorResults.begin(), tensorResults.end());
};		};
GenerateLoopNest<LoopTy>::doit(b, op.getLoc(), loopRanges, op, iteratorTypes,		GenerateLoopNest<LoopTy>::doit(b, op.getLoc(), loopRanges, op, iteratorTypes,
tiledLoopBodyBuilder, options.distribution,		tiledLoopBodyBuilder, options.distribution,
options.distributionTypes);		options.distributionTypes);

// 3. Transform IndexOp results w.r.t. the tiling.		// 3. Transform IndexOp results w.r.t. the tiling.
transformIndexOps(b, res, ivs, loopIndexToRangeIndex);		transformIndexOps(b, res, ivs, loopIndexToRangeIndex);

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	case LinalgTilingLoopType::Loops:
return tileLinalgOpImpl<scf::ForOp>(b, op, options);		return tileLinalgOpImpl<scf::ForOp>(b, op, options);
case LinalgTilingLoopType::ParallelLoops:		case LinalgTilingLoopType::ParallelLoops:
return tileLinalgOpImpl<scf::ParallelOp>(b, op, options);		return tileLinalgOpImpl<scf::ParallelOp>(b, op, options);
default:;		default:;
}		}
return failure();		return failure();
}		}

		namespace {
		/// A description of a multi-size tiling comprising tile sizes and numbers of
		/// tiles, expressed as Values which may or may not be constant. Multi-size
		/// currently means two-size.
		struct MultiSizeSpecification {
		/// Tile sizes.
		Value lowTileSize, highTileSize;
		/// Number of tiles associated with each size.
		Value lowTripCount, highTripCount;
		};
		} // namespace

		/// Emit the IR copmuting the multi-sized tiling specification with two tile
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions computing. nicolasvasilache: computing.
		/// sizes not exceeding `targetSize`, each divisible by `sizeDivisor`, such that
		/// there exist numbers of tiles with these sizes that fully cover the
		/// `originalTripCount` iterations.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I would add an example that mentions a rewrite such as `? -> ?8 + ?7`. nicolasvasilache: I would add an example that mentions a rewrite such as `? -> ?8 + ?7`.
		///
		/// The computation is as follows:
		///
		/// b = originalTripCount floordiv sizeDivisor
		/// t = (targetSize + sizeDivisor - 1) floordiv sizeDivisor
		/// d = (b + t - 1) floordiv t
		/// s = (b floordiv d) * sizeDivisor
		/// v = b % d
		/// u = d - v
		///
		/// where the tile sizes are `s` and `s` + `sizeDivisor`, and the numbers of
		/// the corresponding tiles are `u` and `v`, respectively. All four values are
		/// returned.
		static MultiSizeSpecification
		computeMultiSizeLoopTripCounts(ImplicitLocOpBuilder &b, Value targetSize,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Can we use `affine_apply` expressions that would make these quantities nicer to parse (and later compose) ? nicolasvasilache: Can we use `affine_apply` expressions that would make these quantities nicer to parse (and…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Just to be sure that I don't misinterpret the computation: this is the classical "load balancing" trick right ? I.e. instead of decomposing 801 into 8100 + 1, one would decompose in 8 94 + 7 * 7 ? I.e. To avoid a shriveled last tile, you take the remainder (8 - K) out of the last tiles (you take exactly 1 from each tile, so you take 1 from 7 tiles in this case). The result is always N or N-1 sized tiles ? The way this is described lends to suggest that it may not always be N or N-1 which would be different. I know there is a paper but I find the description to not be super intuitive. nicolasvasilache: Just to be sure that I don't misinterpret the computation: this is the classical "load…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Even simpler: 801 into 8100 + 1, one would decompose in 8 99 + 9 * 1. Just taking the mod and making the last mod tile +1. Not sure why I went for -1 instead of +1; +1 is the old form we used way back in the torch / lua days ... nicolasvasilache: Even simpler: 801 into 8100 + 1, one would decompose in 8 99 + 9 * 1. Just taking the mod…
		Value sizeDivisor, Value originalTripCount,
		Value one) {
		assert(targetSize.getType() == originalTripCount.getType() &&
		sizeDivisor.getType() == originalTripCount.getType() &&
		"expected all types to match");
		Value dividedBound =
		b.create<arith::FloorDivSIOp>(originalTripCount, sizeDivisor);
		Value targetPlusDivisor = b.create<arith::AddIOp>(targetSize, sizeDivisor);
		Value targetPlusDivisorSubOne =
		b.create<arith::SubIOp>(targetPlusDivisor, one);
		Value roundedTarget =
		b.create<arith::FloorDivSIOp>(targetPlusDivisorSubOne, sizeDivisor);
		Value boundPlusTargetRounded =
		b.create<arith::AddIOp>(dividedBound, roundedTarget);
		Value boundPlusTargetRoundedSubOne =
		b.create<arith::SubIOp>(boundPlusTargetRounded, one);
		Value divisorRounded = b.create<arith::FloorDivSIOp>(
		boundPlusTargetRoundedSubOne, roundedTarget);
		Value unscaledLowTileSize =
		b.create<arith::FloorDivSIOp>(dividedBound, divisorRounded);
		Value lowTileSize = b.create<arith::MulIOp>(unscaledLowTileSize, sizeDivisor);
		Value highTileSize = b.create<arith::AddIOp>(lowTileSize, sizeDivisor);
		Value highTripCount = b.create<arith::RemSIOp>(dividedBound, divisorRounded);
		Value lowTripCount = b.create<arith::SubIOp>(divisorRounded, highTripCount);
		return {lowTileSize, highTileSize, lowTripCount, highTripCount};
		}

		static ValueRange createMultiSizedTillingLoops(
		ImplicitLocOpBuilder &b, LinalgOp op, Value zero, Value one,
		ValueRange sizeBounds, ArrayRef<MultiSizeSpecification> specs,
		ValueRange initArgs, SmallVectorImpl<Value> &adjustedIterators,
		SmallVectorImpl<Value> &tileSizes, SmallVectorImpl<LinalgOp> &tiledOps);

		/// Emit the IR for one of the loops produced by multi-sized tiling, including
		/// all nested loops recursively. The recursion is bounded by the number of
		/// dimensions being tiled, which is known to be small (usualy <10). `isLow`
		/// indicates whether the part being emitted is the first (lower indices) or
		/// the last (higher indices), which affects the index adjustment. `op` is the
		/// operation being tiled. `zero` and `one` correspond to index-typed constants
		/// visible in the loop. `sizeBounds` contains the dimensions of the original
		/// iteration space; `specs` contains the tile sizes and numbers of tiles yet to
		/// be generated, starting with the current one. `initArgs` are the values that
		/// are passed as `iter_args` of the loop being emitted, and typically
		/// correspond to results of the operation being tiled partially updated by
		/// previous parts, if any. `adjustedIterators` is a mutable list of values that
		/// replace the indices of the original iteration space. `tileSizes` is a
		/// mutable list of tile sizes used by the parent (already emitted) loops.
		/// `tiledOps` is a mutable list of smaller instances of `op` produce by tiling.
		/// All mutable lists are updated before entering the recursion.
		///
		/// The loop structure resembles:
		///
		/// %partial = scf.for %i = 0 to %numLowTiles step 1
		/// iter_args(%iter_args = %result_inits) {
		/// %adjusted = %i * %lowTileSize
		/// %slices... = extractslice %original_inputs[...]
		/// %out_slices... = extractslice %iter_args[%adjusted, ...]
		/// %res = linlalg.op ins(%slices) outs(%out_slices)
		/// %loop_res = insertslice %res into %iter_args[%adjusted, ...]
		/// scf.yield %loop_res
		/// }
		///
		/// where "linalg.op" can be further decomposed into pairs of loops implementing
		/// the multi-size tiling on deeper dimensions.
		static ValueRange createOneMultiSizedPart(
		ImplicitLocOpBuilder &b, LinalgOp op, Value zero, Value one,
		ValueRange sizeBounds, bool isLow, ArrayRef<MultiSizeSpecification> specs,
		ValueRange initArgs, SmallVectorImpl<Value> &adjustedIterators,
		SmallVectorImpl<Value> &tileSizes, SmallVectorImpl<LinalgOp> &tiledOps) {
		// Create the loop itself.
		auto loop = b.create<scf::ForOp>(
		zero, isLow ? specs[0].lowTripCount : specs[0].highTripCount, one,
		initArgs);

		// Emit IR recovering the indices in the original iteration space. For the
		// lower part, this accounts for the tile size. For the higher part, this
		// accounts for the tile size and for the pieces computed by the lower part.
		OpBuilder::InsertionGuard guard(b);
		b.setInsertionPointToStart(loop.getBody());
		if (isLow) {
		adjustedIterators.push_back(
		b.create<arith::MulIOp>(loop.getInductionVar(), specs[0].lowTileSize));
		tileSizes.push_back(specs[0].lowTileSize);
		} else {
		Value previousProduct =
		b.create<arith::MulIOp>(specs[0].lowTripCount, specs[0].lowTileSize);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Same Q here, we could literally write something like: create<AffineApply>( {iv * highTileSize + lowTrip * lowTileSize}, ...) and they would render as such. Despite the artificial limitations on dim/sym classification, I find the construct goo to carry higher-level semantic information on such quantities. nicolasvasilache: Same Q here, we could literally write something like: ``` create<AffineApply>( {iv *…
		Value currentProduct =
		b.create<arith::MulIOp>(loop.getInductionVar(), specs[0].highTileSize);
		adjustedIterators.push_back(
		b.create<arith::AddIOp>(previousProduct, currentProduct));
		tileSizes.push_back(specs[0].highTileSize);
		}
		auto scope = llvm::make_scope_exit([&] {
		adjustedIterators.pop_back();
		tileSizes.pop_back();
		});

		// If this is not the innermost loop, recurse.
		if (specs.size() > 1) {
		ValueRange yielded = createMultiSizedTillingLoops(
		b, op, zero, one, sizeBounds, specs.drop_front(),
		loop.getRegionIterArgs(), adjustedIterators, tileSizes, tiledOps);
		b.create<scf::YieldOp>(yielded);
		return loop->getResults();
		}

		// Emit tiled operations in the innermost loop. By construction, both parts in
		// the multi-sized case have full tiles so omit the partial tile check code
		// generation.
		auto operandsToTile = llvm::to_vector(
		llvm::map_range(op.getInputOperands(),
		[](OpOperand *opOperand) { return opOperand->get(); }));
		llvm::append_range(operandsToTile, loop.getRegionIterArgs());
		SmallVector<Value, 4> tiledOperands =
		makeTiledShapes(b, b.getLoc(), op, operandsToTile, adjustedIterators,
		tileSizes, sizeBounds,
		/omitPartialTileCheck=/true);

		// Create the tiled operation.
		LinalgOp tiled = cloneWithSubshapeOperands(b, b.getLoc(), op, tiledOperands);
		addTileLoopIvsToIndexOpResults(b, tiled, adjustedIterators);
		tiledOps.push_back(tiled);

		// Insert partial results into tensors and yield them.
		scf::ValueVector results =
		insertSlicesBack(b, b.getLoc(), tiled, tiledOperands);
		b.create<scf::YieldOp>(results);
		return loop->getResults();
		}

		/// Emit the IR for loops representing the lower and the higher part of the
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions emit ? oh my .. :) :p nicolasvasilache: emit ? oh my .. :) :p
		/// multi-sized tiling of the given operation `op`. Operates recursively per
		/// dimension. See createOneMultiSizedPart for documentation on arguments. The
		/// IR structured resembles the following:
		///
		/// %partial = scf.for %i = 0 to %numLowTiles step 1 iter_args(%init_args) {
		/// %adjusted = %i * %lowTileSize
		/// // see createOneMultiSizedPart for loop body
		/// }
		/// %full = scf.for %i = 0 to %numHighTiles step 1 iter_args(%partial) {
		/// %adjusted = %i * %highTileSize + %numLowTiles * %lowTileSize
		/// // see createOneMultiSizedPart for loop body
		/// }
		static ValueRange createMultiSizedTillingLoops(
		ImplicitLocOpBuilder &b, LinalgOp op, Value zero, Value one,
		ValueRange sizeBounds, ArrayRef<MultiSizeSpecification> specs,
		ValueRange initArgs, SmallVectorImpl<Value> &adjustedIterators,
		SmallVectorImpl<Value> &tileSizes, SmallVectorImpl<LinalgOp> &tiledOps) {
		assert(!specs.empty() && "expected a non-empty list of tile specs");

		ValueRange lowResults = createOneMultiSizedPart(
		b, op, zero, one, sizeBounds, /isLow=/true, specs, initArgs,
		adjustedIterators, tileSizes, tiledOps);
		return createOneMultiSizedPart(b, op, zero, one, sizeBounds, /isLow=/false,
		specs, lowResults, adjustedIterators,
		tileSizes, tiledOps);
		}

		FailureOr<MultiSizedTilingResult>
		mlir::linalg::multiSizeTileLinalgOp(RewriterBase &b, LinalgOp linalgOp,
		ValueRange targetSizes,
		ValueRange targetSizeDivisors) {
		if (linalgOp.getNumLoops() > targetSizes.size()) {
		LLVM_DEBUG(
		DBGS() << "NYI: multi-size tiling only applies to all dimensions\n");
		return failure();
		}

		if (linalgOp.getNumWindowLoops() != 0) {
		LLVM_DEBUG(
		DBGS() << "NYI: multi-size tiling does not support window loops\n");
		return failure();
		}
		assert((targetSizeDivisors.empty() \|\|
		targetSizeDivisors.size() == targetSizes.size()) &&
		"expected the same number of divisors as target sizes");

		// No tiling is required.
		if (targetSizes.empty()) {
		MultiSizedTilingResult result;
		result.tiledOps.push_back(linalgOp);
		result.tensorResults = linalgOp->getResults();
		return result;
		}

		// Set up target divisors if necessary.
		ImplicitLocOpBuilder builder(linalgOp.getLoc(), b);
		SmallVector<Value> updatedTargetSizeDivisors;
		Value one = nullptr;
		if (targetSizeDivisors.empty()) {
		one = builder.create<arith::ConstantIndexOp>(1);
		updatedTargetSizeDivisors.resize(targetSizes.size(), one);
		targetSizeDivisors = llvm::makeArrayRef(updatedTargetSizeDivisors);
		}

		// Compute multi-sized tiling specifications. This includes tile sizes and the
		// number of tiles, the latter serve as trip counts for the produced loops.
		Location loc = linalgOp.getLoc();
		SmallVector<Value, 4> allShapes =
		linalgOp.createFlatListOfOperandDims(b, loc);
		AffineMap shapesToLoopsMap = linalgOp.getShapesToLoopsMap();
		if (!shapesToLoopsMap) {
		LLVM_DEBUG(DBGS() << "the op does not provide the shapes-to-loops map");
		return failure();
		}
		SmallVector<Value, 4> loopTripCounts =
		applyMapToValues(b, loc, shapesToLoopsMap, allShapes);
		unsigned numTiledDims = std::min(loopTripCounts.size(), targetSizes.size());
		one = one == nullptr ? builder.create<arith::ConstantIndexOp>(1) : one;
		SmallVector<MultiSizeSpecification> specs;
		specs.reserve(numTiledDims);
		for (unsigned i = 0; i < numTiledDims; ++i) {
		specs.push_back(computeMultiSizeLoopTripCounts(builder, targetSizes[i],
		targetSizeDivisors[i],
		loopTripCounts[i], one));
		}

		// Generate loops recursively for each iteration space dimension.
		SmallVector<Value, 4> sizeBounds =
		applyMapToValues(b, loc, shapesToLoopsMap, allShapes);
		Value zero = builder.create<arith::ConstantIndexOp>(0);
		auto tensorOutputs = llvm::to_vector(
		llvm::map_range(linalgOp.getOutputOperands(),
		[](OpOperand *operand) { return operand->get(); }));
		SmallVector<Value> adjustedIterators;
		SmallVector<Value> tileSizes;
		MultiSizedTilingResult result;
		result.tensorResults = createMultiSizedTillingLoops(
		builder, linalgOp, zero, one, sizeBounds, specs, tensorOutputs,
		adjustedIterators, tileSizes, result.tiledOps);

		b.replaceOp(linalgOp, result.tensorResults);
		return result;
		}

/// Generate a loop nest around a given tensor::PadOp (for tiling). `newPadOp`		/// Generate a loop nest around a given tensor::PadOp (for tiling). `newPadOp`
/// and `loopNest` are output parameters that return the new (tiled)		/// and `loopNest` are output parameters that return the new (tiled)
/// tensor::PadOp and the loop nest.		/// tensor::PadOp and the loop nest.
static LogicalResult tilePadOp(RewriterBase &builder, tensor::PadOp op,		static LogicalResult tilePadOp(RewriterBase &builder, tensor::PadOp op,
tensor::PadOp &newPadOp, LoopNest &loopNest,		tensor::PadOp &newPadOp, LoopNest &loopNest,
const LinalgTilingOptions &options) {		const LinalgTilingOptions &options) {
Location loc = op.getLoc();		Location loc = op.getLoc();
OpBuilder::InsertionGuard g(builder);		OpBuilder::InsertionGuard g(builder);
▲ Show 20 Lines • Show All 209 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

Show First 20 Lines • Show All 889 Lines • ▼ Show 20 Lines	for (unsigned idx = 0, idxIvs = 0, e = tileSizes.size(); idx < e; ++idx) {
LLVM_DEBUG(llvm::dbgs()		LLVM_DEBUG(llvm::dbgs()
<< "computeTileOffsets: " << offsets.back() << "\n");		<< "computeTileOffsets: " << offsets.back() << "\n");
}		}
return offsets;		return offsets;
}		}

SmallVector<Value> computeTileSizes(OpBuilder &b, Location loc,		SmallVector<Value> computeTileSizes(OpBuilder &b, Location loc,
ValueRange tileSizes,		ValueRange tileSizes,
ArrayRef<Value> sizeBounds) {		ValueRange sizeBounds) {
SmallVector<Value> sizes;		SmallVector<Value> sizes;
for (unsigned idx = 0, e = tileSizes.size(); idx < e; ++idx) {		for (unsigned idx = 0, e = tileSizes.size(); idx < e; ++idx) {
bool isTiled = !isZero(tileSizes[idx]);		bool isTiled = !isZero(tileSizes[idx]);
// Before composing, we need to make range a closed interval.		// Before composing, we need to make range a closed interval.
Value size = isTiled ? tileSizes[idx] : sizeBounds[idx];		Value size = isTiled ? tileSizes[idx] : sizeBounds[idx];
AffineExpr d0 = getAffineDimExpr(0, b.getContext());		AffineExpr d0 = getAffineDimExpr(0, b.getContext());
sizes.push_back(fullyComposeAndAffineApply(b, loc, d0 - 1, size));		sizes.push_back(fullyComposeAndAffineApply(b, loc, d0 - 1, size));
LLVM_DEBUG(llvm::dbgs() << "computeTileSizes: " << sizes.back() << "\n");		LLVM_DEBUG(llvm::dbgs() << "computeTileSizes: " << sizes.back() << "\n");
}		}
return sizes;		return sizes;
}		}

SmallVector<Value, 4> makeTiledShapes(OpBuilder &b, Location loc,		SmallVector<Value, 4>
LinalgOp linalgOp,		makeTiledShapes(OpBuilder &b, Location loc, LinalgOp linalgOp,
ArrayRef<Value> valuesToTile,		ValueRange valuesToTile, ValueRange ivs, ValueRange tileSizes,
ValueRange ivs, ValueRange tileSizes,		ValueRange sizeBounds, bool omitPartialTileCheck) {
ArrayRef<Value> sizeBounds,
bool omitPartialTileCheck) {
assert(ivs.size() == static_cast<size_t>(llvm::count_if(		assert(ivs.size() == static_cast<size_t>(llvm::count_if(
llvm::make_range(tileSizes.begin(), tileSizes.end()),		llvm::make_range(tileSizes.begin(), tileSizes.end()),
[](Value v) { return !isZero(v); })) &&		[](Value v) { return !isZero(v); })) &&
"expected as many ivs as non-zero sizes");		"expected as many ivs as non-zero sizes");

// Construct (potentially temporary) mins and maxes on which to apply maps		// Construct (potentially temporary) mins and maxes on which to apply maps
// that define tile subshapes.		// that define tile subshapes.
SmallVector<Value> lbs = computeTileOffsets(b, loc, ivs, tileSizes);		SmallVector<Value> lbs = computeTileOffsets(b, loc, ivs, tileSizes);
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

mlir/python/mlir/dialects/_structured_transform_ops_ext.py

Show First 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	super().__init__(
ip=ip)		ip=ip)

def __extract_values(self, attr: Optional[ArrayAttr]) -> List[int]:		def __extract_values(self, attr: Optional[ArrayAttr]) -> List[int]:
if not attr:		if not attr:
return []		return []
return [IntegerAttr(element).value for element in attr]		return [IntegerAttr(element).value for element in attr]


		class TileMultiSizeOp:
		"""Specialization for TileMultiSizeOp class."""

		def __init__(self,
		target: Union[Operation, Value],
		*,
		target_sizes: Union[ArrayAttr, IntOrAttrList],
		target_size_divisors: OptionalIntList = None,
		loc=None,
		ip=None):
		pdl_operation_type = pdl.OperationType.get()
		super().__init__(
		pdl_operation_type,
		_get_op_result_or_value(target),
		target_sizes=_get_int_array_attr(target_sizes),
		target_size_divisors=_get_int_array_attr(target_size_divisors),
		loc=loc,
		ip=ip)


class VectorizeOp:		class VectorizeOp:
"""Specialization for VectorizeOp class."""		"""Specialization for VectorizeOp class."""

def __init__(self,		def __init__(self,
target: Union[Operation, Value],		target: Union[Operation, Value],
*,		*,
vectorize_padding: Union[bool, BoolAttr] = False,		vectorize_padding: Union[bool, BoolAttr] = False,
loc=None,		loc=None,
Show All 10 Lines

mlir/test/Dialect/Linalg/transform-op-tile-multisize.mlir

This file was added.

				// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file --verify-diagnostics \| FileCheck %s
				// RUN: mlir-opt %s --test-transform-dialect-interpreter --canonicalize --split-input-file --verify-diagnostics \| FileCheck %s --check-prefix=CANON

				transform.sequence {
				^bb0(%arg0: !pdl.operation):
				// expected-error @below {{expects tile sizes to be strictly positive}}
				transform.structured.tile.multisize %arg0 { target_sizes = [0, 10] }
				}

				// -----

				transform.sequence {
				^bb0(%arg0: !pdl.operation):
				// expected-error @below {{expects divisors to be strictly positive}}
				transform.structured.tile.multisize %arg0 {
				target_sizes = [10, 10],
				target_size_divisors = [1, -1]
				}
				}

				// -----

				transform.sequence {
				^bb0(%arg0: !pdl.operation):
				// expected-error @below {{expects as many divisors as tile sizes or none}}
				transform.structured.tile.multisize %arg0 {
				target_sizes = [10, 10],
				target_size_divisors = [3]
				}
				}

				// -----

				//
				// Checking the successful transformation.
				//

				transform.with_pdl_patterns {
				^bb0(%arg0: !pdl.operation):
				pdl.pattern @linalg_generic : benefit(1) {
				%0 = pdl.operands
				%1 = pdl.types
				%2 = pdl.operation "linalg.generic"(%0 : !pdl.range<value>) -> (%1 : !pdl.range<type>)
				pdl.rewrite %2 with "transform.dialect"
				}

				transform.sequence %arg0 {
				^bb1(%arg1: !pdl.operation):
				%0 = transform.pdl_match @linalg_generic in %arg1
				transform.structured.tile.multisize %0 { target_sizes = [3, 10] }
				}
				}

				// CHECK-DAG: #[[$MINUS_ONE:.+]] = affine_map<()[s0] -> (s0 - 1)>
				// CHECK-DAG: #[[$ID_1D:.+]] = affine_map<(d0) -> (d0)>
				// CHECK-DAG: #[[$ID_2D:.+]] = affine_map<(d0, d1) -> (d0, d1)>
				// CHECK-DAG: #[[$SUM:.+]] = affine_map<(d0, d1) -> (d0 + d1)>

				func.func private @elem(%arg0: f32, %arg1: index, %arg2: index) -> f32

				// CHECK-LABEL: @one_d
				// CHECK-SAME: %[[IN:.+]]: tensor<10xf32>, %[[OUT:.+]]: tensor<10xf32>
				// CANON-LABEL: @one_d
				// CANON-SAME: %[[IN:.+]]: tensor<10xf32>, %[[OUT:.+]]: tensor<10xf32>
				func.func @one_d(%arg0: tensor<10xf32>, %arg1: tensor<10xf32>) -> tensor<10xf32> {
				// CHECK: %[[SIZE1:.+]] = arith.constant 3
				// CHECK: %[[SIZE2:.+]] = arith.constant 10
				// CHECK: %[[ONE:.+]] = arith.constant 1 :
				// These are emitted by createOrFold on dims.
				// CHECK-COUNT-2: constant 10
				// CHECK: %[[SHAPE:.+]] = arith.constant 10
				// CHECK: %[[DIVIDED_BOUND:.+]] = arith.floordivsi %[[SHAPE]], %[[ONE]]
				// CHECK: %[[TARGET_PLUS_DIVISOR:.+]] = arith.addi %[[SIZE1]], %[[ONE]]
				// CHECK: %[[TARGET_PLUS_DIVISOR_SUB_ONE:.+]] = arith.subi %[[TARGET_PLUS_DIVISOR]], %[[ONE]]
				// CHECK: %[[ROUNDED_TARGET:.+]] = arith.floordivsi %[[TARGET_PLUS_DIVISOR_SUB_ONE]], %[[ONE]]
				// CHECK: %[[BOUND_PLUS_TARGET_ROUNDED:.+]] = arith.addi %[[DIVIDED_BOUND]], %[[ROUNDED_TARGET]]
				// CHECK: %[[BOUND_PLUS_TARGET_ROUNDED_SUB_ONE:.+]] = arith.subi %[[BOUND_PLUS_TARGET_ROUNDED]], %[[ONE]]
				// CHECK: %[[DIVISOR_ROUNDED:.+]] = arith.floordivsi %[[BOUND_PLUS_TARGET_ROUNDED_SUB_ONE]], %[[ROUNDED_TARGET]]
				// CHECK: %[[UNSCALED_LOW_TILE_SIZE:.+]] = arith.floordivsi %[[DIVIDED_BOUND]], %[[DIVISOR_ROUNDED]]
				// CHECK: %[[LOW_TILE_SIZE:.+]] = arith.muli %[[UNSCALED_LOW_TILE_SIZE]], %[[ONE]]
				// CHECK: %[[HIGH_TILE_SIZE:.+]] = arith.addi %[[LOW_TILE_SIZE]], %[[ONE]]
				// CHECK: %[[HIGH_TRIP_COUNT:.+]] = arith.remsi %[[DIVIDED_BOUND]], %[[DIVISOR_ROUNDED]]
				// CHECK: %[[LOW_TRIP_COUNT:.+]] = arith.subi %[[DIVISOR_ROUNDED]], %[[HIGH_TRIP_COUNT]]

				// CHECK: %[[ZERO:.+]] = arith.constant 0
				// CHECK: %[[PARTIAL:.+]] = scf.for %[[I:.+]] = %[[ZERO]] to %[[LOW_TRIP_COUNT]] step %[[ONE]] iter_args(%[[ITER_ARG_I:.+]] = %[[OUT]])
				// CHECK: %[[ADJUSTED_I:.+]] = arith.muli %[[I]], %[[LOW_TILE_SIZE]]
				// CHECK: %[[IN_SLICE:.+]] = tensor.extract_slice %[[IN]][%[[ADJUSTED_I]]] [%[[LOW_TILE_SIZE]]] [1]
				// CHECK: %[[OUT_SLICE:.+]] = tensor.extract_slice %[[ITER_ARG_I]][%[[ADJUSTED_I]]] [%[[LOW_TILE_SIZE]]] [1]
				// CHECK: %[[RES_SLICE:.+]] = linalg.generic {{.*}} ins(%[[IN_SLICE]] : tensor<?xf32>) outs(%[[OUT_SLICE]] : tensor<?xf32>)
				// CHECK: %[[INDEX_0:.+]] = linalg.index 0
				// CHECK: %[[ADJUSTED_INDEX_0:.+]] = affine.apply #[[$SUM]](%[[INDEX_0]], %[[ADJUSTED_I]])
				// CHECK: %[[YIELD:.+]] = tensor.insert_slice %[[RES_SLICE]] into %[[ITER_ARG_I]][%[[ADJUSTED_I]]] [%[[LOW_TILE_SIZE]]] [1]
				// CHECK: scf.yield %[[YIELD]]


				// CHECK: scf.for %[[I:.+]] = %[[ZERO]] to %[[HIGH_TRIP_COUNT]] step %[[ONE]] iter_args(%[[ITER_ARG_I:.+]] = %[[PARTIAL]])
				// CHECK: %[[LOW_PART:.+]] = arith.muli %[[LOW_TRIP_COUNT]], %[[LOW_TILE_SIZE]]
				// CHECK: %[[SCALED_I:.+]] = arith.muli %[[I]], %[[HIGH_TILE_SIZE]]
				// CHECK: %[[ADJUSTED_I:.+]] = arith.addi %[[LOW_PART]], %[[SCALED_I]]
				// CHECK: %[[IN_SLICE:.+]] = tensor.extract_slice %[[IN]][%[[ADJUSTED_I]]] [%[[HIGH_TILE_SIZE]]] [1]
				// CHECK: %[[OUT_SLICE:.+]] = tensor.extract_slice %[[ITER_ARG_I]][%[[ADJUSTED_I]]] [%[[HIGH_TILE_SIZE]]] [1]
				// CHECK: %[[RES_SLICE:.+]] = linalg.generic {{.*}} ins(%[[IN_SLICE]] : tensor<?xf32>) outs(%[[OUT_SLICE]] : tensor<?xf32>)
				// CHECK: %[[INDEX_0:.+]] = linalg.index 0
				// CHECK: %[[ADJUSTED_INDEX_0:.+]] = affine.apply #[[$SUM]](%[[INDEX_0]], %[[ADJUSTED_I]])
				// CHECK: %[[YIELD:.+]] = tensor.insert_slice %[[RES_SLICE]] into %[[ITER_ARG_I]][%[[ADJUSTED_I]]] [%[[HIGH_TILE_SIZE]]] [1]
				// CHECK: scf.yield %[[YIELD]]

				// Check that canonicalization is able to recover static shapes.
				// CANON-DAG: %[[C0:.+]] = arith.constant 0
				// CANON-DAG: %[[C1:.+]] = arith.constant 1
				// CANON-DAG: %[[C2:.+]] = arith.constant 2
				// CANON-DAG: %[[C3:.+]] = arith.constant 3
				// CANON-DAG: %[[C4:.+]] = arith.constant 4
				// CANON: scf.for %{{.*}} = %[[C0]] to %[[C2]] step %[[C1]]
				// CANON: tensor.extract_slice %[[IN]][%{{.*}}] [2] [1] : tensor<10xf32> to tensor<2xf32>
				// CANON: tensor.extract_slice %{{.}}[%{{.}}] [2] [1] : tensor<10xf32> to tensor<2xf32>
				// CANON: scf.for %{{.*}} = %[[C0]] to %[[C2]] step %[[C1]]
				// CANON: tensor.extract_slice %[[IN]][%{{.*}}] [3] [1] : tensor<10xf32> to tensor<3xf32>
				// CANON: tensor.extract_slice %{{.}}[%{{.}}] [3] [1] : tensor<10xf32> to tensor<3xf32>
				%0 = linalg.generic {
				indexing_maps = [affine_map<(i) -> (i)>, affine_map<(i) -> (i)>],
				iterator_types = ["parallel"]
				}
				ins(%arg0: tensor<10xf32>) outs(%arg1: tensor<10xf32>) {
				^bb0(%0: f32, %1: f32):
				%i = linalg.index 0 : index
				%call_res = func.call @elem(%0, %i, %i) : (f32, index, index) -> f32
				linalg.yield %call_res : f32
				} -> tensor<10xf32>
				return %0 : tensor<10xf32>
				}

				// CHECK-LABEL: @two_d
				// CHECK-SAME: %[[IN:.+]]: tensor<10x34xf32>, %[[OUT:.+]]: tensor<10x34xf32>
				// CANON-LABEL: @two_d
				// CANON-SAME: %[[IN:.+]]: tensor<10x34xf32>, %[[OUT:.+]]: tensor<10x34xf32>
				func.func @two_d(%arg0: tensor<10x34xf32>, %arg1: tensor<10x34xf32>) -> tensor<10x34xf32> {
				// Only check the overall nesting and computation structure.
				// CHECK: %[[PARTIAL_I:.+]] = scf.for %{{.*}} iter_args(%[[ITER_ARG_I:.+]] = %[[OUT]])
				// CHECK: %[[ADJUSTED_I:.+]] = arith.muli
				//
				// CHECK: %[[PARTIAL_J:.+]] = scf.for %{{.*}} iter_args(%[[ITER_ARG_J:.+]] = %[[ITER_ARG_I]])
				// CHECK: %[[ADJUSTED_J:.+]] = arith.muli
				// CHECK: tensor.extract_slice %[[IN]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: tensor.extract_slice %[[ITER_ARG_J]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: %[[RES:.+]] = linalg.generic
				// CHECK: %[[INSERTED_J:.+]] = tensor.insert_slice %[[RES]] into %[[ITER_ARG_J]]
				// CHECK: scf.yield %[[INSERTED_J]]
				//
				// CHECK: %[[FULL_J:.+]] = scf.for %{{.*}} iter_args(%[[ITER_ARG_J:.+]] = %[[PARTIAL_J]])
				// CHECK: arith.muli
				// CHECK: arith.muli
				// CHECK: %[[ADJUSTED_J:.+]] = arith.addi
				// CHECK: tensor.extract_slice %[[IN]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: tensor.extract_slice %[[ITER_ARG_J]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: %[[RES:.+]] = linalg.generic
				// CHECK: %[[INSERTED_J:.+]] = tensor.insert_slice %[[RES]] into %[[ITER_ARG_J]]
				// CHECK: scf.yield %[[INSERTED_J]]
				//
				// CHECK: %{{.+}} = scf.for %{{.*}} iter_args(%[[ITER_ARG_I:.+]] = %[[PARTIAL_I]])
				// CHECK: arith.muli
				// CHECK: arith.muli
				// CHECK: %[[ADJUSTED_I:.+]] = arith.addi
				//
				// CHECK: %[[PARTIAL_J:.+]] = scf.for %{{.*}} iter_args(%[[ITER_ARG_J:.+]] = %[[ITER_ARG_I]])
				// CHECK: %[[ADJUSTED_J:.+]] = arith.muli
				// CHECK: tensor.extract_slice %[[IN]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: tensor.extract_slice %[[ITER_ARG_J]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: %[[RES:.+]] = linalg.generic
				// CHECK: %[[INSERTED_J:.+]] = tensor.insert_slice %[[RES]] into %[[ITER_ARG_J]]
				// CHECK: scf.yield %[[INSERTED_J]]
				//
				// CHECK: %[[FULL_J:.+]] = scf.for %{{.*}} iter_args(%[[ITER_ARG_J:.+]] = %[[PARTIAL_J]])
				// CHECK: arith.muli
				// CHECK: arith.muli
				// CHECK: %[[ADJUSTED_J:.+]] = arith.addi
				// CHECK: tensor.extract_slice %[[IN]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: tensor.extract_slice %[[ITER_ARG_J]][%[[ADJUSTED_I]], %[[ADJUSTED_J]]]
				// CHECK: %[[RES:.+]] = linalg.generic
				// CHECK: %[[INSERTED_J:.+]] = tensor.insert_slice %[[RES]] into %[[ITER_ARG_J]]
				// CHECK: scf.yield %[[INSERTED_J]]

				// Check that canonicalization is able to recover static shapes.
				// CANON-COUNT-2: tensor.extract_slice {{.*}} : tensor<10x34xf32> to tensor<2x8xf32>
				// CANON-COUNT-2: tensor.extract_slice {{.*}} : tensor<10x34xf32> to tensor<2x9xf32>
				// CANON-COUNT-2: tensor.extract_slice {{.*}} : tensor<10x34xf32> to tensor<3x8xf32>
				// CANON-COUNT-2: tensor.extract_slice {{.*}} : tensor<10x34xf32> to tensor<3x9xf32>
				%0 = linalg.generic {
				indexing_maps = [affine_map<(i, j) -> (i, j)>,
				affine_map<(i, j) -> (i, j)>],
				iterator_types = ["parallel", "parallel"]
				}
				ins(%arg0: tensor<10x34xf32>)
				outs(%arg1: tensor<10x34xf32>) {
				^bb0(%0: f32, %1: f32):
				%i = linalg.index 0 : index
				%j = linalg.index 1 : index
				%call_res = func.call @elem(%0, %i, %j) : (f32, index, index) -> f32
				linalg.yield %call_res : f32
				} -> tensor<10x34xf32>
				return %0 : tensor<10x34xf32>
				}

				// -----

				// expected-note @below {{target op}}
				module {
				transform.sequence {
				^bb1(%arg1: !pdl.operation):
				// expected-error @below {{only applies to Linalg structured ops}}
				transform.structured.tile.multisize %arg1 { target_sizes = [3, 10] }
				}
				}

				// -----

				//
				// Check failure to apply due to the insufficient number of target sizes.
				//
				// TODO: this should have a better error message, but it needs to be surfaced
				// from the transformation code itself to avoid duplication.
				//

				func.func private @elem(%arg0: f32, %arg1: index, %arg2: index) -> f32

				func.func @two_d(%arg0: tensor<10x34xf32>, %arg1: tensor<10x34xf32>) -> tensor<10x34xf32> {
				// expected-note @below {{target op}}
				%0 = linalg.generic {
				indexing_maps = [affine_map<(i, j) -> (i, j)>,
				affine_map<(i, j) -> (i, j)>],
				iterator_types = ["parallel", "parallel"]
				}
				ins(%arg0: tensor<10x34xf32>)
				outs(%arg1: tensor<10x34xf32>) {
				^bb0(%0: f32, %1: f32):
				%i = linalg.index 0 : index
				%j = linalg.index 1 : index
				%call_res = func.call @elem(%0, %i, %j) : (f32, index, index) -> f32
				linalg.yield %call_res : f32
				} -> tensor<10x34xf32>
				return %0 : tensor<10x34xf32>
				}

				transform.with_pdl_patterns {
				^bb0(%arg0: !pdl.operation):
				pdl.pattern @linalg_generic : benefit(1) {
				%0 = pdl.operands
				%1 = pdl.types
				%2 = pdl.operation "linalg.generic"(%0 : !pdl.range<value>) -> (%1 : !pdl.range<type>)
				pdl.rewrite %2 with "transform.dialect"
				}

				transform.sequence %arg0 {
				^bb1(%arg1: !pdl.operation):
				%0 = transform.pdl_match @linalg_generic in %arg1
				// expected-error @below {{failed to apply}}
				transform.structured.tile.multisize %0 { target_sizes = [3] }
				}
				}

mlir/test/python/dialects/transform_structured_ext.py

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	def testTileAttributes():
# CHECK-LABEL: TEST: testTileAttributes		# CHECK-LABEL: TEST: testTileAttributes
# CHECK: transform.sequence		# CHECK: transform.sequence
# CHECK: structured.tile		# CHECK: structured.tile
# CHECK-DAG: interchange = [0, 1]		# CHECK-DAG: interchange = [0, 1]
# CHECK-DAG: sizes = [4, 8]		# CHECK-DAG: sizes = [4, 8]


@run		@run
		def testTileMultiSize():
		sequence = transform.SequenceOp()
		with InsertionPoint(sequence.body):
		structured.TileMultiSizeOp(sequence.bodyTarget, target_sizes=[3, 7])
		transform.YieldOp()
		# CHECK-LABEL: TEST: testTileMultiSize
		# CHECK: transform.sequence
		# CHECK: structured.tile.multisize
		# CHECK: target_sizes = [3, 7]


		@run
def testTileZero():		def testTileZero():
sequence = transform.SequenceOp()		sequence = transform.SequenceOp()
with InsertionPoint(sequence.body):		with InsertionPoint(sequence.body):
structured.TileOp(		structured.TileOp(
sequence.bodyTarget, sizes=[4, 0, 2, 0], interchange=[0, 1, 2, 3])		sequence.bodyTarget, sizes=[4, 0, 2, 0], interchange=[0, 1, 2, 3])
transform.YieldOp()		transform.YieldOp()
# CHECK-LABEL: TEST: testTileZero		# CHECK-LABEL: TEST: testTileZero
# CHECK: transform.sequence		# CHECK: transform.sequence
Show All 15 Lines

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

Show First 20 Lines • Show All 7,392 Lines • ▼ Show 20 Lines	cc_library(
srcs = [		srcs = [
"lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp",		"lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp",
],		],
hdrs = [		hdrs = [
"include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h",		"include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h",
],		],
includes = ["include"],		includes = ["include"],
deps = [		deps = [
		":ArithmeticDialect",
":IR",		":IR",
":LinalgDialect",		":LinalgDialect",
":LinalgTransformOpsIncGen",		":LinalgTransformOpsIncGen",
":LinalgTransforms",		":LinalgTransforms",
":PDLDialect",		":PDLDialect",
":Parser",		":Parser",
":SideEffectInterfaces",		":SideEffectInterfaces",
":TransformDialect",		":TransformDialect",
▲ Show 20 Lines • Show All 1,850 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] introduce multi-sized tiling transformationAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 439390

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

mlir/lib/Dialect/Linalg/TransformOps/CMakeLists.txt

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

mlir/python/mlir/dialects/_structured_transform_ops_ext.py

mlir/test/Dialect/Linalg/transform-op-tile-multisize.mlir

mlir/test/python/dialects/transform_structured_ext.py

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

[mlir] introduce multi-sized tiling transformation
AbandonedPublic