This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/Utils/
-
mlir/
-
Dialect/
-
Linalg/
-
Utils/
2/2
Utils.h
-
lib/Dialect/Linalg/Utils/
-
Dialect/
-
Linalg/
-
Utils/
2/2
Utils.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
tile-and-distribute.mlir
-
tile-and-fuse-tensors.mlir
-
tile-conv.mlir
-
tile-to-foreach-thread.mlir
-
tile.mlir
-
transform-op-split.mlir

Differential D131053

[mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`.
ClosedPublic

Authored by pifon2a on Aug 3 2022, 1:02 AM.

Download Raw Diff

Details

Reviewers

herhut
jreiffers
nicolasvasilache
mravishankar

Commits

rG42f32f69ade7: [mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`.
rG56d94b3b902e: [mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pifon2a created this revision.Aug 3 2022, 1:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 3 2022, 1:02 AM

Herald added subscribers: bzcheeseman, sdasgup3, wenzhicui and 19 others. · View Herald Transcript

pifon2a requested review of this revision.Aug 3 2022, 1:02 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptAug 3 2022, 1:02 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

The tests are affected only because the materialization of extract_slice/subview was postponed.

Harbormaster completed remote builds in B178962: Diff 449567.Aug 3 2022, 1:24 AM

Can you provide some rationale for this change in the description? I assume this enables reuse of the functionality independent of extract_slice and friends but making this clear in the description would be nice.

This seems NFC and enabling more reuse is a good thing, so LGTM.

This revision is now accepted and ready to land.Aug 3 2022, 8:10 AM

Echo-ing that there seems to be a missing rationale here (or at least the rationale here seems to be driven by out-of-tree uses). This is increasing the API surface area of Tiling transformation, when in reality we need to be going the other way, reducing the API surface area.

This revision now requires changes to proceed.Aug 3 2022, 9:56 AM

In D131053#3696874, @mravishankar wrote:

This is increasing the API surface area of Tiling transformation, when in reality we need to be going the other way, reducing the API surface area.

What this does is reducing coupling between tiling linalg operations and the specific way the tiling is expressed in IR. It is true that this enables out-of-tree uses but MLIR is an infrastructure, so enabling out of tree uses should be a goal. There is no inherent reason why tiling a linalg operation should depend on using tensor operations.

You are right that this needs a bigger discussion and finding the right interface for tiling that is more dialect independent and captures only the essence of tiling (which I believe is the tiled operation and a description of the input/output tiles) would be desirable. However, while we find such solution, it makes sense to me to extend the linalg API to enable more uses. This is a very local API change within linalg after all.

What this does is reducing coupling between tiling linalg operations and the specific way the tiling is expressed in IR. It is true that this enables out-of-tree uses but MLIR is an infrastructure, so enabling out of tree uses should be a goal. There is no inherent reason why tiling a linalg operation should depend on using tensor operations.

I think a usage/API that is driven by a particular use of MLIR out-of-tree would become hard to maintain. I am looking for a more flushed out description of how any out-of-tree user is expected to use it (and thereby concretize the usage that is driving these changes).
Agree with what you are saying that tiling should not depend on using tensor operations. In the TilingInterface there is a method to get the offsets and sizes for result tiles. Maybe you want to add a new method there for getting the offsets and sizes for the operands.... That would make sense to me in terms of decoupling the tiling implementation from using tensor.extract_slice operations. Still though would need some justification as to why tensor.extract_slice does not serve the purpose, cause anyway all of this is still in rectangular domains.

In D131053#3697777, @mravishankar wrote:

What this does is reducing coupling between tiling linalg operations and the specific way the tiling is expressed in IR. It is true that this enables out-of-tree uses but MLIR is an infrastructure, so enabling out of tree uses should be a goal. There is no inherent reason why tiling a linalg operation should depend on using tensor operations.

I think a usage/API that is driven by a particular use of MLIR out-of-tree would become hard to maintain. I am looking for a more flushed out description of how any out-of-tree user is expected to use it (and thereby concretize the usage that is driving these changes).

It is essentially a first step to even enable the below tiling interface. If we were to change the interface, we would need to extract the functionality this patch makes available.

Agree with what you are saying that tiling should not depend on using tensor operations. In the TilingInterface there is a method to get the offsets and sizes for result tiles. Maybe you want to add a new method there for getting the offsets and sizes for the operands.... That would make sense to me in terms of decoupling the tiling implementation from using tensor.extract_slice operations. Still though would need some justification as to why tensor.extract_slice does not serve the purpose, cause anyway all of this is still in rectangular domains.

So we seem to agree that we need to expose this functionality. As I said, I am happy to start the discussion on the TilingInterface but adding a method to an interface seems the much bigger change and requires more careful API design than adding a utility function to linalg utils. In particular, I would be opposed to further complicate the interface by a method and much rather would split the creation of the tiled operation and the production of the slice operations into different interfaces/utilities.

Can we land this as a first minor step, gain some experience and then propose what a split interface would look like?

Rebase

Update

@herhut sure, we can.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 4 2022, 2:28 AM

This revision was landed with ongoing or failed builds.

Closed by commit rG56d94b3b902e: [mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`. (authored by pifon2a). · Explain Why

This revision was automatically updated to reflect the committed changes.

pifon2a added a commit: rG56d94b3b902e: [mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`..

Harbormaster completed remote builds in B179218: Diff 449905.Aug 4 2022, 2:33 AM

I don't think it is good practice to land a change when there are still outstanding changes requested. IMO that is not following community contribution guidelines. Please revert

@mravishankar What comments do you mean exactly?

Echo-ing that there seems to be a missing rationale here (or at least the rationale here seems to be driven by out-of-tree uses).

I added the description to the commit https://github.com/llvm/llvm-project/commit/56d94b3b902e21ff79b1ce9a6fb606a3f7c1c4db

This is increasing the API surface area of Tiling transformation, when in reality we need to be going the other way, reducing the API surface area.

This one has nothing to do with this PR. This PR just restructured two methods in Utils.h. I am not changing the TilingInterface here nor I intend to write an RFC for every small change.

This one has nothing to do with this PR. This PR just restructured two methods in Utils.h. I am not changing the TilingInterface here nor I intend to write an RFC for every small change.

That is just one of the comments. The main comment is that this is exposing API that is not fully justified, and unclear use case. Having custom API exposed for each use case is going to be hard to maintain. It might be just a couple of methods, but if everyone who is using Linalg keeps exposing entry points to customize for their use case, then it is a death by a thousand cuts. There are other questions in the comments that werent responded to before it was submitted. Would also be good to be sensitive to time zones. Having requested changes, I would expect been given enough time to review the udpates before it lands.

I dont see any changes in this patch that addresses the comments above. At the very least, I would wait for presumptive code owner @nicolasvasilache to break the tie (and I'll go with what he decides ultimately)

In D131053#3699457, @mravishankar wrote:

I don't think it is good practice to land a change when there are still outstanding changes requested. IMO that is not following community contribution guidelines. Please revert

Mashesh marked this revision as needed changes, expressing clearly a disagreement with where this is going. I don't understand why this was ignored and landed.
More importantly, the follow up asked for a revert: please do promptly.

pifon2a added a reverting change: rG6b03bae34682: Revert "[mlir] Extract offsets-sizes-strides computation from `makeTiledShape….Aug 5 2022, 5:54 AM

Thanks Mehdi and Thanks Alex for the revert.
In terms of path forward, there is not much technical discussion to be had AFAICS, but rather a more about API surface area of Linalg based transformations. I'll go with whatever Nicolas suggests. If he is on-board with it, then I apologize for the inconvenience caused here.

This revision now requires changes to proceed.Aug 7 2022, 4:24 PM

Reading Mahesh's comments I worried that we may indeed increase the surface API of things that are scheduled to be deleted but I do not think this is the case.

I think this is a step in the direction we want to go to collectively to support other extract/insert/parallel_insert than the existing ones; gather comes to mind (teaser: https://reviews.llvm.org/D130348).

This is still super early and the APIs surface will need to change further, but this incremental step does not pose concerns to me as it stands.

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h
227	Can we improve the doc here? I would move some of the description from `computeAllSliceParameters` here and refer to it in the next function.
247	The doc for this function could say: Computes SliceParamaters for all `valuesToTile` ... Calls computeSliceParameters. Some of the `valuesToTile` won't be affected by tiling. For these values, llvm::None will be returned.
mlir/lib/Dialect/Linalg/Utils/Utils.cpp
847–848	outdated comment.

nicolasvasilache added inline comments.Aug 9 2022, 9:02 AM

mlir/lib/Dialect/Linalg/Utils/Utils.cpp
805	Not for this PR obviously but this could become the basis for some SubsetOpInterface/SubsetExtractOpInterface. I have tried to sketch it out but found that we should prob. have the gather/scatter/parallel_scatter ops first and then generalize (teaser: https://reviews.llvm.org/D130348). This still requires discussion, I hope to send an RFC by EoW.

Withdrawing my objection based on Nicolas' comment.

This revision is now accepted and ready to land.Aug 9 2022, 10:32 AM

Closed by commit rG42f32f69ade7: [mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`. (authored by pifon2a). · Explain WhyAug 10 2022, 6:38 AM

This revision was automatically updated to reflect the committed changes.

pifon2a marked an inline comment as done.

pifon2a added a commit: rG42f32f69ade7: [mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`..

Thank you, Nicolas!

Looks like this broke the windows mlir buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/24387/steps/6/logs/stdio

In D131053#3712871, @stella.stamenova wrote:

Looks like this broke the windows mlir buildbot: https://lab.llvm.org/buildbot/#/builders/13/builds/24387/steps/6/logs/stdio

should be fixed by https://github.com/llvm/llvm-project/commit/47cf00407669b0d67f6ffd74c3b39433934752d8

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Utils/

Utils.h

37 lines

lib/

Dialect/

Linalg/

Utils/

Utils.cpp

131 lines

test/

Dialect/

Linalg/

tile-and-distribute.mlir

22 lines

tile-and-fuse-tensors.mlir

2 lines

tile-conv.mlir

2 lines

tile-to-foreach-thread.mlir

16 lines

tile.mlir

44 lines

transform-op-split.mlir

2 lines

Diff 451440

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

	Show First 20 Lines • Show All 207 Lines • ▼ Show 20 Lines

	/// Turns an OpFoldResult into a value, creating an index-typed constant if			/// Turns an OpFoldResult into a value, creating an index-typed constant if
	/// necessary.			/// necessary.
	Value materializeOpFoldResult(ImplicitLocOpBuilder &builder,			Value materializeOpFoldResult(ImplicitLocOpBuilder &builder,
	OpFoldResult opFoldResult);			OpFoldResult opFoldResult);
	Value materializeOpFoldResult(OpBuilder &b, Location loc,			Value materializeOpFoldResult(OpBuilder &b, Location loc,
	OpFoldResult opFoldResult);			OpFoldResult opFoldResult);

				/// A struct containg offsets-sizes-strides arguments of the tiled shape.
				struct SliceParameters {
				SmallVector<OpFoldResult> offsets;
				SmallVector<OpFoldResult> sizes;
				SmallVector<OpFoldResult> strides;
				};

				/// Computes SliceParameters for a single `valueToTile` assuming that its user
				/// is being tiled with the given loop bounds `lbs` and `ubs` and the tile sizes
				/// `tileSizes`.
				///
				/// `omitPartialTileCheck` controls whether to omit the partial/boundary tile
				nicolasvasilacheUnsubmitted Done Reply Inline Actions Can we improve the doc here? I would move some of the description from `computeAllSliceParameters` here and refer to it in the next function. nicolasvasilache: Can we improve the doc here? I would move some of the description from…
				/// condition check in cases where we statically know that it is unnecessary.
				SliceParameters
				computeSliceParameters(OpBuilder &builder, Location loc, Value valueToTile,
				ArrayRef<OpFoldResult> tileSizes, AffineMap map,
				ArrayRef<OpFoldResult> lbs, ArrayRef<OpFoldResult> ubs,
				ArrayRef<OpFoldResult> subShapeSizes,
				bool omitPartialTileCheck);

				/// Computes SliceParamaters for all `valuesToTile` of the given `linalgOp`,
				/// assuming `linalgOp` is being fused into a loop nest. Calls
				/// `computeSliceParameters` for every individual value.
				///
				/// Note that a constant zero in `tileSizes` means no tiling at that implicit
				/// loop. The number of non-zero values in `tileSizes` should be equal to the
				/// number of values in `ivs`.
				///
				/// Some of the `valuesToTile` won't be affected by tiling. For these values,
				/// llvm::None will be returned.
				SmallVector<Optional<SliceParameters>>
				computeAllSliceParameters(OpBuilder &builder, Location loc, LinalgOp linalgOp,
				nicolasvasilacheUnsubmitted Done Reply Inline Actions The doc for this function could say: Computes SliceParamaters for all `valuesToTile` ... Calls computeSliceParameters. Some of the `valuesToTile` won't be affected by tiling. For these values, llvm::None will be returned. nicolasvasilache: The doc for this function could say: ``` Computes SliceParamaters for all `valuesToTile` ...
				ValueRange valuesToTile, ArrayRef<OpFoldResult> ivs,
				ArrayRef<OpFoldResult> tileSizes,
				ArrayRef<OpFoldResult> sizeBounds,
				bool omitPartialTileCheck);

	/// Creates an extract_slice/subview op for a single `valueToTile` with			/// Creates an extract_slice/subview op for a single `valueToTile` with
	/// `builder`. This new operation extracts a tile of `valueToTile`, starting			/// `builder`. This new operation extracts a tile of `valueToTile`, starting
	/// at offsets `lbs` and with sizes `subShapeSizes`. `omitPartialTileCheck`			/// at offsets `lbs` and with sizes `subShapeSizes`. `omitPartialTileCheck`
	/// controls whether to omit the partial/boundary tile condition check in cases			/// controls whether to omit the partial/boundary tile condition check in cases
	/// where we statically know that it is unnecessary.			/// where we statically know that it is unnecessary.
	Value makeTiledShape(OpBuilder &builder, Location loc, Value valueToTile,			Value makeTiledShape(OpBuilder &builder, Location loc, Value valueToTile,
	ArrayRef<OpFoldResult> tileSizes, AffineMap map,			ArrayRef<OpFoldResult> tileSizes, AffineMap map,
	ArrayRef<OpFoldResult> lbs, ArrayRef<OpFoldResult> ubs,			ArrayRef<OpFoldResult> lbs, ArrayRef<OpFoldResult> ubs,
	▲ Show 20 Lines • Show All 271 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

Show First 20 Lines • Show All 796 Lines • ▼ Show 20 Lines	generateParallelLoopNest(
linalgOp.getInputAndOutputOperands();		linalgOp.getInputAndOutputOperands();
bodyBuilderFn(b, loc, ivs, operandValuesToUse);		bodyBuilderFn(b, loc, ivs, operandValuesToUse);
},		},
ivs, distributionMethod);		ivs, distributionMethod);

assert(ivs.size() == iteratorTypes.size() && "did not generate enough loops");		assert(ivs.size() == iteratorTypes.size() && "did not generate enough loops");
}		}

		static Value materializeTiledShape(OpBuilder &builder, Location loc,
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Not for this PR obviously but this could become the basis for some SubsetOpInterface/SubsetExtractOpInterface. I have tried to sketch it out but found that we should prob. have the gather/scatter/parallel_scatter ops first and then generalize (teaser: https://reviews.llvm.org/D130348). This still requires discussion, I hope to send an RFC by EoW. nicolasvasilache: Not for this PR obviously but this could become the basis for some…
		Value valueToTile,
		const SliceParameters &sliceParams) {
		auto shapedType = valueToTile.getType().dyn_cast<ShapedType>();
		auto sliceOp = TypeSwitch<ShapedType, Operation >(shapedType)
		.Case([&](MemRefType) {
		return builder.create<memref::SubViewOp>(
		loc, valueToTile, sliceParams.offsets,
		sliceParams.sizes, sliceParams.strides);
		})
		.Case([&](RankedTensorType) {
		return makeComposedExtractSliceOp(
		builder, loc, valueToTile, sliceParams.offsets,
		sliceParams.sizes, sliceParams.strides);
		})
		.Default([](ShapedType) -> Operation * {
		llvm_unreachable("Unexpected shaped type");
		});
		return sliceOp->getResult(0);
		}

Value makeTiledShape(OpBuilder &builder, Location loc, Value valueToTile,		Value makeTiledShape(OpBuilder &builder, Location loc, Value valueToTile,
ArrayRef<OpFoldResult> tileSizes, AffineMap map,		ArrayRef<OpFoldResult> tileSizes, AffineMap map,
ArrayRef<OpFoldResult> lbs, ArrayRef<OpFoldResult> ubs,		ArrayRef<OpFoldResult> lbs, ArrayRef<OpFoldResult> ubs,
ArrayRef<OpFoldResult> subShapeSizes,		ArrayRef<OpFoldResult> subShapeSizes,
bool omitPartialTileCheck) {		bool omitPartialTileCheck) {
		SliceParameters sliceParams =
		computeSliceParameters(builder, loc, valueToTile, tileSizes, map, lbs,
		ubs, subShapeSizes, omitPartialTileCheck);
		return materializeTiledShape(builder, loc, valueToTile, sliceParams);
		}

		SliceParameters
		computeSliceParameters(OpBuilder &builder, Location loc, Value valueToTile,
		ArrayRef<OpFoldResult> tileSizes, AffineMap map,
		ArrayRef<OpFoldResult> lbs, ArrayRef<OpFoldResult> ubs,
		ArrayRef<OpFoldResult> subShapeSizes,
		bool omitPartialTileCheck) {
auto shapedType = valueToTile.getType().dyn_cast<ShapedType>();		auto shapedType = valueToTile.getType().dyn_cast<ShapedType>();
assert(shapedType && "only shaped types can be tiled");		assert(shapedType && "only shaped types can be tiled");
ArrayRef<int64_t> shape = shapedType.getShape();		ArrayRef<int64_t> shape = shapedType.getShape();
int64_t rank = shapedType.getRank();		int64_t rank = shapedType.getRank();

// Construct a new subview / extract_slice for the tile.		// Compute offsets/sizes/strides for the tile.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions outdated comment. nicolasvasilache: outdated comment.
SmallVector<OpFoldResult, 4> offsets, sizes, strides;		SliceParameters sliceParams;
offsets.reserve(rank);		sliceParams.offsets.reserve(rank);
sizes.reserve(rank);		sliceParams.sizes.reserve(rank);
strides.reserve(rank);		sliceParams.strides.reserve(rank);
for (unsigned r = 0; r < rank; ++r) {		for (unsigned r = 0; r < rank; ++r) {
LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: for dim#" << r);		LLVM_DEBUG(llvm::dbgs() << "computeSliceParameters: for dim#" << r);
if (!isTiled(map.getSubMap({r}), tileSizes)) {		if (!isTiled(map.getSubMap({r}), tileSizes)) {
offsets.push_back(builder.getIndexAttr(0));		sliceParams.offsets.push_back(builder.getIndexAttr(0));
OpFoldResult dim = createFoldedDimOp(builder, loc, valueToTile, r);		OpFoldResult dim = createFoldedDimOp(builder, loc, valueToTile, r);
sizes.push_back(dim);		sliceParams.sizes.push_back(dim);
strides.push_back(builder.getIndexAttr(1));		sliceParams.strides.push_back(builder.getIndexAttr(1));
LLVM_DEBUG(llvm::dbgs() << ": not tiled: use size: " << dim << "\n");		LLVM_DEBUG(llvm::dbgs() << ": not tiled: use size: " << dim << "\n");
continue;		continue;
}		}
LLVM_DEBUG(llvm::dbgs() << ": tiled: figure out subsize...\n");		LLVM_DEBUG(llvm::dbgs() << ": tiled: figure out subsize...\n");

// Tiling creates a new slice at the proper index, the slice step is 1		// Tiling creates a new slice at the proper index, the slice step is 1
// (i.e. the op does not subsample, stepping occurs in the loop).		// (i.e. the op does not subsample, stepping occurs in the loop).
auto m = map.getSubMap({r});		auto m = map.getSubMap({r});
LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: submap: " << m << "\n");		LLVM_DEBUG(llvm::dbgs() << "computeSliceParameters: submap: " << m << "\n");
IRRewriter rewriter(builder);		IRRewriter rewriter(builder);
OpFoldResult offset = makeComposedFoldedAffineApply(rewriter, loc, m, lbs);		OpFoldResult offset = makeComposedFoldedAffineApply(rewriter, loc, m, lbs);
offsets.push_back(offset);		sliceParams.offsets.push_back(offset);
OpFoldResult closedIntSize =		OpFoldResult closedIntSize =
makeComposedFoldedAffineApply(rewriter, loc, m, subShapeSizes);		makeComposedFoldedAffineApply(rewriter, loc, m, subShapeSizes);
// Resulting size needs to be made half open interval again.		// Resulting size needs to be made half open interval again.
AffineExpr s0 = getAffineSymbolExpr(0, builder.getContext());		AffineExpr s0 = getAffineSymbolExpr(0, builder.getContext());
OpFoldResult size =		OpFoldResult size =
makeComposedFoldedAffineApply(rewriter, loc, s0 + 1, closedIntSize);		makeComposedFoldedAffineApply(rewriter, loc, s0 + 1, closedIntSize);
LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: raw size: " << size << "\n");
LLVM_DEBUG(llvm::dbgs()		LLVM_DEBUG(llvm::dbgs()
<< "makeTiledShape: new offset: " << offset << "\n");		<< "computeSliceParameters: raw size: " << size << "\n");
strides.push_back(builder.getIndexAttr(1));		LLVM_DEBUG(llvm::dbgs()
		<< "computeSliceParameters: new offset: " << offset << "\n");
		sliceParams.strides.push_back(builder.getIndexAttr(1));

if (omitPartialTileCheck) {		if (omitPartialTileCheck) {
// We statically know that the partial/boundary tile condition is		// We statically know that the partial/boundary tile condition is
// unnecessary.		// unnecessary.
LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: new size: " << size << "\n");		LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: new size: " << size << "\n");
sizes.push_back(size);		sliceParams.sizes.push_back(size);
continue;		continue;
}		}

// The size of the subview / extract_slice should be trimmed to avoid		// The size of the subview / extract_slice should be trimmed to avoid
// out-of-bounds accesses, unless:		// out-of-bounds accesses, unless:
// a. We statically know the subshape size divides the shape size evenly.		// a. We statically know the subshape size divides the shape size evenly.
// b. The subshape size is 1. According to the way the loops are set up,		// b. The subshape size is 1. According to the way the loops are set up,
// tensors with "0" dimensions would never be constructed.		// tensors with "0" dimensions would never be constructed.
Show All 35 Lines	if (!hasTileSizeOne && !dividesEvenly) {
// Compute min(dim - offset, size) to avoid out-of-bounds accesses.		// Compute min(dim - offset, size) to avoid out-of-bounds accesses.
AffineMap minMap = AffineMap::inferFromExprList(		AffineMap minMap = AffineMap::inferFromExprList(
{ArrayRef<AffineExpr>{dim1 - dim2, dim0}})		{ArrayRef<AffineExpr>{dim1 - dim2, dim0}})
.front();		.front();
size =		size =
makeComposedFoldedAffineMin(rewriter, loc, minMap, {size, d, offset});		makeComposedFoldedAffineMin(rewriter, loc, minMap, {size, d, offset});
}		}
LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: new size: " << size << "\n");		LLVM_DEBUG(llvm::dbgs() << "makeTiledShape: new size: " << size << "\n");
sizes.push_back(size);		sliceParams.sizes.push_back(size);
}		}
		return sliceParams;
auto sliceOp = TypeSwitch<ShapedType, Operation >(shapedType)
.Case([&](MemRefType) {
return builder.create<memref::SubViewOp>(
loc, valueToTile, offsets, sizes, strides);
})
.Case([&](RankedTensorType) {
return makeComposedExtractSliceOp(
builder, loc, valueToTile, offsets, sizes, strides);
})
.Default([](ShapedType) -> Operation * {
llvm_unreachable("Unexpected shaped type");
});
return sliceOp->getResult(0);
}		}

SmallVector<OpFoldResult> computeTileOffsets(OpBuilder &b, Location loc,		SmallVector<OpFoldResult> computeTileOffsets(OpBuilder &b, Location loc,
ArrayRef<OpFoldResult> ivs,		ArrayRef<OpFoldResult> ivs,
ArrayRef<OpFoldResult> tileSizes) {		ArrayRef<OpFoldResult> tileSizes) {
SmallVector<OpFoldResult> offsets;		SmallVector<OpFoldResult> offsets;
for (unsigned idx = 0, idxIvs = 0, e = tileSizes.size(); idx < e; ++idx) {		for (unsigned idx = 0, idxIvs = 0, e = tileSizes.size(); idx < e; ++idx) {
LLVM_DEBUG(llvm::dbgs() << "makeTiledShapes: for loop#" << idx << "\n");		LLVM_DEBUG(llvm::dbgs() << "makeTiledShapes: for loop#" << idx << "\n");
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
}		}

Value materializeOpFoldResult(OpBuilder &builder, Location loc,		Value materializeOpFoldResult(OpBuilder &builder, Location loc,
OpFoldResult opFoldResult) {		OpFoldResult opFoldResult) {
ImplicitLocOpBuilder b(loc, builder);		ImplicitLocOpBuilder b(loc, builder);
return materializeOpFoldResult(b, opFoldResult);		return materializeOpFoldResult(b, opFoldResult);
}		}

SmallVector<Value> makeTiledShapes(OpBuilder &b, Location loc,		SmallVector<Optional<SliceParameters>>
LinalgOp linalgOp, ValueRange valuesToTile,		computeAllSliceParameters(OpBuilder &builder, Location loc, LinalgOp linalgOp,
ArrayRef<OpFoldResult> ivs,		ValueRange valuesToTile, ArrayRef<OpFoldResult> ivs,
ArrayRef<OpFoldResult> tileSizes,		ArrayRef<OpFoldResult> tileSizes,
ArrayRef<OpFoldResult> sizeBounds,		ArrayRef<OpFoldResult> sizeBounds,
bool omitPartialTileCheck) {		bool omitPartialTileCheck) {
assert(ivs.size() == static_cast<size_t>(llvm::count_if(		assert(ivs.size() == static_cast<size_t>(llvm::count_if(
llvm::make_range(tileSizes.begin(), tileSizes.end()),		llvm::make_range(tileSizes.begin(), tileSizes.end()),
[](OpFoldResult v) { return !isZero(v); })) &&		[](OpFoldResult v) { return !isZero(v); })) &&
"expected as many ivs as non-zero sizes");		"expected as many ivs as non-zero sizes");

// Construct (potentially temporary) mins and maxes on which to apply maps		// Construct (potentially temporary) mins and maxes on which to apply maps
// that define tile subshapes.		// that define tile subshapes.
SmallVector<OpFoldResult> lbs = computeTileOffsets(b, loc, ivs, tileSizes);		SmallVector<OpFoldResult> lbs =
		computeTileOffsets(builder, loc, ivs, tileSizes);
SmallVector<OpFoldResult> subShapeSizes =		SmallVector<OpFoldResult> subShapeSizes =
computeTileSizes(b, loc, tileSizes, sizeBounds);		computeTileSizes(builder, loc, tileSizes, sizeBounds);

assert(static_cast<int64_t>(valuesToTile.size()) ==		assert(static_cast<int64_t>(valuesToTile.size()) ==
linalgOp.getNumInputsAndOutputs() &&		linalgOp.getNumInputsAndOutputs() &&
"expected one value to tile for every operand");		"expected one value to tile for every operand");
SmallVector<Value> tiledShapes;		SmallVector<Optional<SliceParameters>> allSliceParams;
tiledShapes.reserve(valuesToTile.size());		allSliceParams.reserve(valuesToTile.size());
for (OpOperand *opOperand : linalgOp.getInputAndOutputOperands()) {		for (OpOperand *opOperand : linalgOp.getInputAndOutputOperands()) {
Value shapedOp = valuesToTile[opOperand->getOperandNumber()];		Value shapedOp = valuesToTile[opOperand->getOperandNumber()];
LLVM_DEBUG(llvm::dbgs() << "makeTiledShapes: for operand " << shapedOp);		LLVM_DEBUG(llvm::dbgs() << "makeTiledShapes: for operand " << shapedOp);
AffineMap map = linalgOp.getTiedIndexingMap(opOperand);		AffineMap map = linalgOp.getTiedIndexingMap(opOperand);
// Use `opOperand` as is if it is not tiled and not an output tensor. Having		// Use `opOperand` as is if it is not tiled and not an output tensor. Having
// an extract/insert slice pair for all output tensors simplifies follow up		// an extract/insert slice pair for all output tensors simplifies follow up
// transformations such as padding and bufferization since the		// transformations such as padding and bufferization since the
// extract/insert slice pairs make the accessed iteration argument		// extract/insert slice pairs make the accessed iteration argument
// subdomains explicit.		// subdomains explicit.
if (!isTiled(map, tileSizes) && !linalgOp.isOutputTensor(opOperand)) {		if (!isTiled(map, tileSizes) && !linalgOp.isOutputTensor(opOperand)) {
tiledShapes.push_back(shapedOp);		allSliceParams.push_back(llvm::None);
LLVM_DEBUG(llvm::dbgs() << ": not tiled: use shape: "		LLVM_DEBUG(llvm::dbgs() << ": not tiled: use shape: "
<< opOperand->get().getType() << "\n");		<< opOperand->get().getType() << "\n");
continue;		continue;
}		}
LLVM_DEBUG(llvm::dbgs() << ": tiled: figure out subshape...\n");		LLVM_DEBUG(llvm::dbgs() << ": tiled: figure out subshape...\n");

tiledShapes.push_back(makeTiledShape(b, loc, shapedOp, tileSizes, map, lbs,		allSliceParams.push_back(computeSliceParameters(
sizeBounds, subShapeSizes,		builder, loc, shapedOp, tileSizes, map, lbs, sizeBounds, subShapeSizes,
omitPartialTileCheck));		omitPartialTileCheck));
}		}

		return allSliceParams;
		}

		SmallVector<Value> makeTiledShapes(OpBuilder &builder, Location loc,
		LinalgOp linalgOp, ValueRange valuesToTile,
		ArrayRef<OpFoldResult> ivs,
		ArrayRef<OpFoldResult> tileSizes,
		ArrayRef<OpFoldResult> sizeBounds,
		bool omitPartialTileCheck) {
		SmallVector<Optional<SliceParameters>> allSliceParameter =
		computeAllSliceParameters(builder, loc, linalgOp, valuesToTile, ivs,
		tileSizes, sizeBounds, omitPartialTileCheck);
		SmallVector<Value> tiledShapes;
		for (auto item : llvm::zip(valuesToTile, allSliceParameter)) {
		Value valueToTile = std::get<0>(item);
		Optional<SliceParameters> sliceParams = std::get<1>(item);
		tiledShapes.push_back(
		sliceParams.hasValue()
		? materializeTiledShape(builder, loc, valueToTile, *sliceParams)
		: valueToTile);
		}
return tiledShapes;		return tiledShapes;
}		}

void offsetIndices(OpBuilder &b, LinalgOp linalgOp,		void offsetIndices(OpBuilder &b, LinalgOp linalgOp,
ArrayRef<OpFoldResult> offsets) {		ArrayRef<OpFoldResult> offsets) {
IRRewriter rewriter(b);		IRRewriter rewriter(b);
offsetIndices(rewriter, linalgOp, offsets);		offsetIndices(rewriter, linalgOp, offsets);
}		}
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile-and-distribute.mlir

	Show All 10 Lines
	// CHECK: func @gemm1(			// CHECK: func @gemm1(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-DAG: %[[BIDY:.*]] = gpu.block_id y			// CHECK-DAG: %[[BIDY:.*]] = gpu.block_id y
	// CHECK-DAG: %[[BIDX:.*]] = gpu.block_id x			// CHECK-DAG: %[[BIDX:.*]] = gpu.block_id x
	// CHECK: scf.for %[[ARG3:.*]] =			// CHECK: scf.for %[[ARG3:.*]] =
	// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG3]]]
	// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG3]], %[[OFFSETX]]]
	// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[OFFSETX]]]			// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG3]]]
				// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG3]], %[[OFFSETX]]]
				// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[OFFSETX_2]]]
	// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]			// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]

	// -----			// -----

	func.func @gemm2(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)			func.func @gemm2(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)
	{			{
	linalg.matmul {__internal_linalg_transform__ = "distribute2"}			linalg.matmul {__internal_linalg_transform__ = "distribute2"}
	ins(%a, %b: memref<?x?xf32>, memref<?x?xf32>)			ins(%a, %b: memref<?x?xf32>, memref<?x?xf32>)
	Show All 10 Lines
	// CHECK: %[[ITERY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[ITERY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[ITERX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[ITERX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[INBOUNDSY:.]] = arith.cmpi slt, %[[ITERY]], %{{.}}			// CHECK: %[[INBOUNDSY:.]] = arith.cmpi slt, %[[ITERY]], %{{.}}
	// CHECK: %[[INBOUNDSX:.]] = arith.cmpi slt, %[[ITERX]], %{{.}}			// CHECK: %[[INBOUNDSX:.]] = arith.cmpi slt, %[[ITERX]], %{{.}}
	// CHECK: %[[INBOUNDS:.*]] = arith.andi %[[INBOUNDSY]], %[[INBOUNDSX]]			// CHECK: %[[INBOUNDS:.*]] = arith.andi %[[INBOUNDSY]], %[[INBOUNDSX]]
	// CHECK: scf.if %[[INBOUNDS]]			// CHECK: scf.if %[[INBOUNDS]]
	// CHECK: scf.for %[[ARG3:.*]] =			// CHECK: scf.for %[[ARG3:.*]] =
	// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG3]]]
	// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG3]], %[[OFFSETX]]]
	// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[OFFSETX_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
				// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG3]]]
				// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG3]], %[[OFFSETX]]]
	// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[OFFSETX_2]]]			// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[OFFSETX_2]]]
	// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]			// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]

	// -----			// -----

	func.func @gemm3(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)			func.func @gemm3(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)
	{			{
	linalg.matmul {__internal_linalg_transform__ = "distribute3"}			linalg.matmul {__internal_linalg_transform__ = "distribute3"}
	Show All 37 Lines
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-DAG: %[[BIDY:.*]] = gpu.block_id y			// CHECK-DAG: %[[BIDY:.*]] = gpu.block_id y
	// CHECK-DAG: %[[BIDX:.*]] = gpu.block_id x			// CHECK-DAG: %[[BIDX:.*]] = gpu.block_id x
	// CHECK: %[[LBX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[LBX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[INBOUNDS:.]] = arith.cmpi slt, %[[LBX]], %{{.}}			// CHECK: %[[INBOUNDS:.]] = arith.cmpi slt, %[[LBX]], %{{.}}
	// CHECK: scf.if %[[INBOUNDS]]			// CHECK: scf.if %[[INBOUNDS]]
	// CHECK: scf.for %[[ARG3:.*]] =			// CHECK: scf.for %[[ARG3:.*]] =
	// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG3]]]
	// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG3]], %[[OFFSETX]]]
	// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[OFFSETX_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
				// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG3]]]
				// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG3]], %[[OFFSETX]]]
	// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[OFFSETX_2]]]			// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[OFFSETX_2]]]
	// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]			// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]

	// -----			// -----

	func.func @gemm5(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)			func.func @gemm5(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)
	{			{
	linalg.matmul {__internal_linalg_transform__ = "distribute5"}			linalg.matmul {__internal_linalg_transform__ = "distribute5"}
	Show All 12 Lines
	// CHECK: %[[LBY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[LBY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[LBX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[LBX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[STEPX:.*]] = affine.apply #[[MAP0]]()[%[[NBLOCKSX]]]			// CHECK: %[[STEPX:.*]] = affine.apply #[[MAP0]]()[%[[NBLOCKSX]]]
	// CHECK: %[[INBOUNDS:.]] = arith.cmpi slt, %[[LBY]], %{{.}}			// CHECK: %[[INBOUNDS:.]] = arith.cmpi slt, %[[LBY]], %{{.}}
	// CHECK: scf.if %[[INBOUNDS]]			// CHECK: scf.if %[[INBOUNDS]]
	// CHECK: scf.parallel (%[[ARG3:.]]) = (%[[LBX]]) to (%{{.}}) step (%[[STEPX]])			// CHECK: scf.parallel (%[[ARG3:.]]) = (%[[LBX]]) to (%{{.}}) step (%[[STEPX]])
	// CHECK: scf.for %[[ARG4:.*]] =			// CHECK: scf.for %[[ARG4:.*]] =
	// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[OFFSETY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
				// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG4]]]			// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[OFFSETY]], %[[ARG4]]]
	// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG4]], %[[ARG3]]]			// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG4]], %[[ARG3]]]
	// CHECK: %[[OFFSETY_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[ARG3]]]			// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[OFFSETY_2]], %[[ARG3]]]
	// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]			// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]

	// -----			// -----

	func.func @gemm6(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)			func.func @gemm6(%a : memref<?x?xf32>, %b : memref<?x?xf32>, %c : memref<?x?xf32>)
	{			{
	linalg.matmul {__internal_linalg_transform__ = "distribute6"}			linalg.matmul {__internal_linalg_transform__ = "distribute6"}
	ins(%a, %b: memref<?x?xf32>, memref<?x?xf32>)			ins(%a, %b: memref<?x?xf32>, memref<?x?xf32>)
	outs(%c: memref<?x?xf32>)			outs(%c: memref<?x?xf32>)
	return			return
	}			}
	// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> (s0 8)>			// CHECK-DAG: #[[MAP0:.]] = affine_map<()[s0] -> (s0 8)>
	// CHECK: func @gemm6(			// CHECK: func @gemm6(
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]*]]: memref<?x?xf32>
	// CHECK-DAG: %[[BIDY:.*]] = gpu.block_id y			// CHECK-DAG: %[[BIDY:.*]] = gpu.block_id y
	// CHECK-DAG: %[[NBLOCKSY:.*]] = gpu.grid_dim y			// CHECK-DAG: %[[NBLOCKSY:.*]] = gpu.grid_dim y
	// CHECK-DAG: %[[BIDX:.*]] = gpu.block_id x			// CHECK-DAG: %[[BIDX:.*]] = gpu.block_id x
	// CHECK: %[[LBY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]			// CHECK: %[[LBY:.*]] = affine.apply #[[MAP0]]()[%[[BIDY]]]
	// CHECK: %[[STEPY:.*]] = affine.apply #[[MAP0]]()[%[[NBLOCKSY]]]			// CHECK: %[[STEPY:.*]] = affine.apply #[[MAP0]]()[%[[NBLOCKSY]]]
	// CHECK: scf.parallel (%[[ARG3:.]]) = (%[[LBY]]) to (%{{.}}) step (%[[STEPY]])			// CHECK: scf.parallel (%[[ARG3:.]]) = (%[[LBY]]) to (%{{.}}) step (%[[STEPY]])
	// CHECK: scf.for %[[ARG4:.*]] =			// CHECK: scf.for %[[ARG4:.*]] =
	// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[ARG3]], %[[ARG4]]]
	// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
	// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG4]], %[[OFFSETX]]]
	// CHECK: %[[OFFSETX_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]			// CHECK: %[[OFFSETX_2:.*]] = affine.apply #[[MAP0]]()[%[[BIDX]]]
				// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[ARG3]], %[[ARG4]]]
				// CHECK: %[[SV2:.*]] = memref.subview %[[ARG1]][%[[ARG4]], %[[OFFSETX]]]
	// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[ARG3]], %[[OFFSETX_2]]]			// CHECK: %[[SV3:.*]] = memref.subview %[[ARG2]][%[[ARG3]], %[[OFFSETX_2]]]
	// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]			// CHECK: linalg.matmul ins(%[[SV1]], %[[SV2]]{{.*}} outs(%[[SV3]]

	// -----			// -----

	// CHECK: #[[MULMAP:.+]] = affine_map<()[s0, s1] -> (s0 * s1)>			// CHECK: #[[MULMAP:.+]] = affine_map<()[s0, s1] -> (s0 * s1)>
	// CHECK: #[[ADDMAP:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>			// CHECK: #[[ADDMAP:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// CHECK: func @matmul_tensors(			// CHECK: func @matmul_tensors(
	Show All 39 Lines

mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir

	Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: %[[OFFSET_OH:.+]] = affine.apply #[[X2_MAP]](%[[IV1]])			// CHECK-NEXT: %[[OFFSET_OH:.+]] = affine.apply #[[X2_MAP]](%[[IV1]])
	// CHECK-NEXT: %[[SIZE_INPUT_H:.+]] = affine.min #[[INPUT_BOUND]](%[[IV1]], %[[SIZE_ELEM_OH]])[%[[FILL_H]], %[[FILTER_H]]]			// CHECK-NEXT: %[[SIZE_INPUT_H:.+]] = affine.min #[[INPUT_BOUND]](%[[IV1]], %[[SIZE_ELEM_OH]])[%[[FILL_H]], %[[FILTER_H]]]
	// CHECK-NEXT: %[[SIZE_ELEM_OH_2:.+]] = affine.min #[[BOUND16_MAP_2]](%[[IV1]])[%[[FILL_H]], %[[ELEM_OH]]]			// CHECK-NEXT: %[[SIZE_ELEM_OH_2:.+]] = affine.min #[[BOUND16_MAP_2]](%[[IV1]])[%[[FILL_H]], %[[ELEM_OH]]]
	// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %[[ELEM_OW]]			// CHECK-NEXT: scf.for %[[IV2:.+]] = %{{.+}} to %[[ELEM_OW]]
	// CHECK-NEXT: %[[SIZE_ELEM_OW:.+]] = affine.min #[[BOUND4_MAP]](%[[IV2]])[%[[ELEM_OW]]]			// CHECK-NEXT: %[[SIZE_ELEM_OW:.+]] = affine.min #[[BOUND4_MAP]](%[[IV2]])[%[[ELEM_OW]]]
	// CHECK-NEXT: %[[SIZE_ELEM_OC:.+]] = affine.min #[[BOUND2_MAP]](%[[IV2]])[%[[ELEM_OC]]]			// CHECK-NEXT: %[[SIZE_ELEM_OC:.+]] = affine.min #[[BOUND2_MAP]](%[[IV2]])[%[[ELEM_OC]]]
	// CHECK-NEXT: %[[OFFSET_OW:.+]] = affine.apply #[[X2_MAP]](%[[IV2]])			// CHECK-NEXT: %[[OFFSET_OW:.+]] = affine.apply #[[X2_MAP]](%[[IV2]])
	// CHECK-NEXT: %[[SIZE_INPUT_W:.+]] = affine.min #[[INPUT_BOUND]](%[[IV2]], %[[SIZE_ELEM_OW]])[%[[FILL_W]], %[[FILTER_W]]]			// CHECK-NEXT: %[[SIZE_INPUT_W:.+]] = affine.min #[[INPUT_BOUND]](%[[IV2]], %[[SIZE_ELEM_OW]])[%[[FILL_W]], %[[FILTER_W]]]
				// CHECK-NEXT: %[[SIZE_ELEM_OW_2:.+]] = affine.min #[[BOUND4_MAP_2]](%[[IV2]])[%[[FILL_W]], %[[ELEM_OW]]]
	// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %[[INPUT]][%[[IV0]], %[[OFFSET_OH]], %[[OFFSET_OW]], 0]			// CHECK-NEXT: %[[ST_INPUT:.+]] = tensor.extract_slice %[[INPUT]][%[[IV0]], %[[OFFSET_OH]], %[[OFFSET_OW]], 0]
	// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_INPUT_H]], %[[SIZE_INPUT_W]], %[[INPUT_C]]]			// CHECK-SAME: [%[[SIZE_INPUT_N]], %[[SIZE_INPUT_H]], %[[SIZE_INPUT_W]], %[[INPUT_C]]]
	// CHECK-NEXT: %[[SIZE_ELEM_OW_2:.+]] = affine.min #[[BOUND4_MAP_2]](%[[IV2]])[%[[FILL_W]], %[[ELEM_OW]]]
	// CHECK-NEXT: scf.for %[[IV3:.+]] = %{{.+}} to %[[ELEM_OC]] step %{{.+}} iter_args(%[[ARG:[a-z0-9]+]]			// CHECK-NEXT: scf.for %[[IV3:.+]] = %{{.+}} to %[[ELEM_OC]] step %{{.+}} iter_args(%[[ARG:[a-z0-9]+]]
	// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]			// CHECK-NEXT: %[[ST_ELEM:.+]] = tensor.extract_slice %[[ELEM]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
	// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]			// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]
	// CHECK-NEXT: %[[ST_ARG:.+]] = tensor.extract_slice %[[ARG]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]			// CHECK-NEXT: %[[ST_ARG:.+]] = tensor.extract_slice %[[ARG]][%[[IV0]], %[[IV1]], %[[IV2]], %[[IV3]]]
	// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]			// CHECK-SAME: [%[[SIZE_ELEM_N]], %[[SIZE_ELEM_OH]], %[[SIZE_ELEM_OW]], %[[SIZE_ELEM_OC]]]
	// CHECK-NEXT: %[[SIZE_ELEM_OC_2:.+]] = affine.min #[[BOUND2_MAP_2]](%[[IV3]], %[[IV2]])[%[[FILTER_OC]], %[[ELEM_OC]]]			// CHECK-NEXT: %[[SIZE_ELEM_OC_2:.+]] = affine.min #[[BOUND2_MAP_2]](%[[IV3]], %[[IV2]])[%[[FILTER_OC]], %[[ELEM_OC]]]
	// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV3]]]			// CHECK-NEXT: %[[ST_FILTER:.+]] = tensor.extract_slice %[[FILTER]][0, 0, 0, %[[IV3]]]
	// CHECK-SAME: [%[[FILTER_H]], %[[FILTER_W]], %[[FILTER_IC]], %[[SIZE_ELEM_OC_2]]]			// CHECK-SAME: [%[[FILTER_H]], %[[FILTER_W]], %[[FILTER_IC]], %[[SIZE_ELEM_OC_2]]]
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/tile-conv.mlir

	Show All 20 Lines
	// CHECK-DAG: %[[T0:.*]] = memref.dim %[[ARG1]], %[[C0]]			// CHECK-DAG: %[[T0:.*]] = memref.dim %[[ARG1]], %[[C0]]
	// CHECK-DAG: %[[T1:.*]] = memref.dim %[[ARG1]], %[[C1]]			// CHECK-DAG: %[[T1:.*]] = memref.dim %[[ARG1]], %[[C1]]
	// CHECK-DAG: %[[T2:.*]] = memref.dim %[[ARG2]], %[[C0]]			// CHECK-DAG: %[[T2:.*]] = memref.dim %[[ARG2]], %[[C0]]
	// CHECK-DAG: %[[T3:.*]] = memref.dim %[[ARG2]], %[[C1]]			// CHECK-DAG: %[[T3:.*]] = memref.dim %[[ARG2]], %[[C1]]
	// CHECK: scf.for %[[ARG3:.*]] = %[[C0]] to %[[T2]] step %[[C2]]			// CHECK: scf.for %[[ARG3:.*]] = %[[C0]] to %[[T2]] step %[[C2]]
	// CHECK: scf.for %[[ARG4:.*]] = %[[C0]] to %[[T3]] step %[[C3]]			// CHECK: scf.for %[[ARG4:.*]] = %[[C0]] to %[[T3]] step %[[C3]]
	// CHECK: %[[T4:.*]] = affine.min #[[MAP0]](%[[ARG3]])[%[[T2]], %[[T0]]]			// CHECK: %[[T4:.*]] = affine.min #[[MAP0]](%[[ARG3]])[%[[T2]], %[[T0]]]
	// CHECK: %[[T5:.*]] = affine.min #[[MAP1]](%[[ARG4]])[%[[T3]], %[[T1]]]			// CHECK: %[[T5:.*]] = affine.min #[[MAP1]](%[[ARG4]])[%[[T3]], %[[T1]]]
	// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[ARG3]], %[[ARG4]]] [%[[T4]], %[[T5]]]
	// CHECK: %[[T6:.*]] = affine.min #[[MAP2]](%[[ARG3]])[%[[T2]]			// CHECK: %[[T6:.*]] = affine.min #[[MAP2]](%[[ARG3]])[%[[T2]]
	// CHECK: %[[T7:.*]] = affine.min #[[MAP3]](%[[ARG4]])[%[[T3]]]			// CHECK: %[[T7:.*]] = affine.min #[[MAP3]](%[[ARG4]])[%[[T3]]]
				// CHECK: %[[SV1:.*]] = memref.subview %[[ARG0]][%[[ARG3]], %[[ARG4]]] [%[[T4]], %[[T5]]]
	// CHECK: %[[SV2:.*]] = memref.subview %[[ARG2]][%[[ARG3]], %[[ARG4]]] [%[[T6]], %[[T7]]]			// CHECK: %[[SV2:.*]] = memref.subview %[[ARG2]][%[[ARG3]], %[[ARG4]]] [%[[T6]], %[[T7]]]
	// CHECK: linalg.conv_2d			// CHECK: linalg.conv_2d
	// CHECK-SAME: ins(%[[SV1]], %[[ARG1]]			// CHECK-SAME: ins(%[[SV1]], %[[ARG1]]
	// CHECK-SAME: outs(%[[SV2]]			// CHECK-SAME: outs(%[[SV2]]

mlir/test/Dialect/Linalg/tile-to-foreach-thread.mlir

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	func.func @matmul_static(%A: tensor<100x200xf32>, %B: tensor<200x300xf32>, %C: tensor<100x300xf32>) -> tensor<100x300xf32> {
// CHECK-DAG: %[[c10:.+]] = arith.constant 10 : index		// CHECK-DAG: %[[c10:.+]] = arith.constant 10 : index
// CHECK-DAG: %[[c21:.+]] = arith.constant 21 : index		// CHECK-DAG: %[[c21:.+]] = arith.constant 21 : index
// CHECK: scf.foreach_thread (%[[IV0:.+]], %[[IV1:.+]]) in (%[[c10]], %[[c21]])		// CHECK: scf.foreach_thread (%[[IV0:.+]], %[[IV1:.+]]) in (%[[c10]], %[[c21]])
// CHECK: %[[TSMIN:.+]] = affine.min #[[$map0]](%[[IV1]])		// CHECK: %[[TSMIN:.+]] = affine.min #[[$map0]](%[[IV1]])
// CHECK: %[[TS:.+]] = affine.max #[[$map1]](%[[TSMIN]])		// CHECK: %[[TS:.+]] = affine.max #[[$map1]](%[[TSMIN]])
// CHECK-NOT: affine.min		// CHECK-NOT: affine.min
// CHECK-NOT: affine.max		// CHECK-NOT: affine.max
// CHECK: %[[LB0:.+]] = affine.apply #[[$map2]](%[[IV0]])		// CHECK: %[[LB0:.+]] = affine.apply #[[$map2]](%[[IV0]])
// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :
// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])		// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])
		// CHECK: %[[LB0_1:.+]] = affine.apply #[[$map2]](%[[IV0]])
		// CHECK: %[[LB1_1:.+]] = affine.apply #[[$map3]](%[[IV1]])
		// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :
// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :		// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :
// CHECK: %[[LB0:.+]] = affine.apply #[[$map2]](%[[IV0]])		// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C]][%[[LB0_1]], %[[LB1_1]]] [10, %[[TS]]] [1, 1] :
// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])
// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C]][%[[LB0]], %[[LB1]]] [10, %[[TS]]] [1, 1] :
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK: scf.foreach_thread.perform_concurrently		// CHECK: scf.foreach_thread.perform_concurrently
// CHECK-NEXT: tensor.parallel_insert_slice		// CHECK-NEXT: tensor.parallel_insert_slice
%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)
outs(%C : tensor<100x300xf32>) -> (tensor<100x300xf32>)		outs(%C : tensor<100x300xf32>) -> (tensor<100x300xf32>)
return %0 : tensor<100x300xf32>		return %0 : tensor<100x300xf32>
}		}

▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
func.func @matmul_tile_size_static(%A: tensor<100x200xf32>, %B: tensor<200x300xf32>, %C: tensor<100x300xf32>) -> tensor<100x300xf32> {		func.func @matmul_tile_size_static(%A: tensor<100x200xf32>, %B: tensor<200x300xf32>, %C: tensor<100x300xf32>) -> tensor<100x300xf32> {
// CHECK-DAG: %[[c10:.+]] = arith.constant 10 :		// CHECK-DAG: %[[c10:.+]] = arith.constant 10 :
// CHECK-DAG: %[[c15:.+]] = arith.constant 15 :		// CHECK-DAG: %[[c15:.+]] = arith.constant 15 :
// CHECK: scf.foreach_thread (%[[IV0:.+]], %[[IV1:.+]]) in (%[[c10]], %[[c15]])		// CHECK: scf.foreach_thread (%[[IV0:.+]], %[[IV1:.+]]) in (%[[c10]], %[[c15]])
// CHECK: %[[TS:.+]] = affine.min #[[$map0]](%[[IV1]])		// CHECK: %[[TS:.+]] = affine.min #[[$map0]](%[[IV1]])
// CHECK-NOT: affine.max		// CHECK-NOT: affine.max
// CHECK-NOT: affine.min		// CHECK-NOT: affine.min
// CHECK: %[[LB0:.+]] = affine.apply #[[$map2]](%[[IV0]])		// CHECK: %[[LB0:.+]] = affine.apply #[[$map2]](%[[IV0]])
// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :
// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])		// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])
		// CHECK: %[[LB0_1:.+]] = affine.apply #[[$map2]](%[[IV0]])
		// CHECK: %[[LB1_1:.+]] = affine.apply #[[$map3]](%[[IV1]])
		// CHECK: %[[tA:.+]] = tensor.extract_slice %[[A]][%[[LB0]], 0] [10, 200] [1, 1] :
// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :		// CHECK: %[[tB:.+]] = tensor.extract_slice %[[B]][0, %[[LB1]]] [200, %[[TS]]] [1, 1] :
// CHECK: %[[LB0:.+]] = affine.apply #[[$map2]](%[[IV0]])		// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C]][%[[LB0_1]], %[[LB1_1]]] [10, %[[TS]]] [1, 1] :
// CHECK: %[[LB1:.+]] = affine.apply #[[$map3]](%[[IV1]])
// CHECK: %[[tC:.+]] = tensor.extract_slice %[[C]][%[[LB0]], %[[LB1]]] [10, %[[TS]]] [1, 1] :
// CHECK: linalg.matmul		// CHECK: linalg.matmul
// CHECK: scf.foreach_thread.perform_concurrently		// CHECK: scf.foreach_thread.perform_concurrently
// CHECK-NEXT: tensor.parallel_insert_slice		// CHECK-NEXT: tensor.parallel_insert_slice
%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)		%0 = linalg.matmul ins(%A, %B : tensor<100x200xf32>, tensor<200x300xf32>)
outs(%C : tensor<100x300xf32>) -> (tensor<100x300xf32>)		outs(%C : tensor<100x300xf32>) -> (tensor<100x300xf32>)
return %0 : tensor<100x300xf32>		return %0 : tensor<100x300xf32>
}		}

transform.with_pdl_patterns {		transform.with_pdl_patterns {
^bb0(%arg0: !pdl.operation):		^bb0(%arg0: !pdl.operation):
transform.sequence %arg0 {		transform.sequence %arg0 {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1		%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [10, 21]		%1:2 = transform.structured.tile_to_foreach_thread_op %0 tile_sizes [10, 21]
}		}
}		}

mlir/test/Dialect/Linalg/tile.mlir

	Show All 34 Lines
	}			}
	// TILE-2-LABEL: func @matmul(			// TILE-2-LABEL: func @matmul(
	// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]
	// TILE-2: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[K]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[szK:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szK:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]
	// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
				// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[K]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szK]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szK]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: linalg.matmul ins(%[[sAi]]{{.*}} outs(%[[sCi]]			// TILE-2: linalg.matmul ins(%[[sAi]]{{.*}} outs(%[[sCi]]

	// TILE-02-LABEL: func @matmul(			// TILE-02-LABEL: func @matmul(
	// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-02: %[[N:.*]] = memref.dim %arg1, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[N:.*]] = memref.dim %arg1, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: scf.for %[[J:.]] = %{{.}} to %[[N]] step %{{.*}} {			// TILE-02: scf.for %[[J:.]] = %{{.}} to %[[N]] step %{{.*}} {
	// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[N]]]			// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[N]]]
	// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[K]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szK:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[N]]]			// TILE-02: %[[szK:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[N]]]
				// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[K]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[sCj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[sCj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: linalg.matmul ins(%{{.}}, %[[sBj]]{{.}} outs(%[[sCj]]			// TILE-02: linalg.matmul ins(%{{.}}, %[[sBj]]{{.}} outs(%[[sCj]]

	// TILE-002-LABEL: func @matmul(			// TILE-002-LABEL: func @matmul(
	// TILE-002-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-002-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-002-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-002-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-002: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {			// TILE-002: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {
	// TILE-002: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: %[[szK:.*]] = affine.min #[[$bound_map]](%[[K]])[%[[ubK]]]			// TILE-002: %[[szK:.*]] = affine.min #[[$bound_map]](%[[K]])[%[[ubK]]]
	// TILE-002: %[[sAj:.]] = memref.subview %{{.}}[0, %[[K]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[szK_1:.*]] = affine.min #[[$bound_map]](%[[K]])[%[[ubK]]]
	// TILE-002: %[[szK:.*]] = affine.min #[[$bound_map]](%[[K]])[%[[ubK]]]
	// TILE-002: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: %[[sBj:.]] = memref.subview %{{.}}[%[[K]], 0] [%[[szK]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-002: %[[sAj:.]] = memref.subview %{{.}}[0, %[[K]]] [%[[M]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
				// TILE-002: %[[sBj:.]] = memref.subview %{{.}}[%[[K]], 0] [%[[szK_1]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-002: linalg.matmul ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}			// TILE-002: linalg.matmul ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}

	// TILE-234-LABEL: func @matmul(			// TILE-234-LABEL: func @matmul(
	// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index			// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index
	// TILE-234-DAG: %[[C4:.*]] = arith.constant 4 : index			// TILE-234-DAG: %[[C4:.*]] = arith.constant 4 : index
	// TILE-234: %[[ubM:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[ubM:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[ubN:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[ubN:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[ubM]] step %{{.}} {			// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[ubM]] step %{{.}} {
	// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[ubN]] step %{{.}} {			// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[ubN]] step %{{.}} {
	// TILE-234: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {			// TILE-234: scf.for %[[K:.]] = %{{.}}{{.}} to %[[ubK]] step %{{.}} {
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubM]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubM]]]
	// TILE-234: %[[szK:.*]] = affine.min #[[$bound_map_4]](%[[K]])[%[[ubK]]]			// TILE-234: %[[szK:.*]] = affine.min #[[$bound_map_4]](%[[K]])[%[[ubK]]]
	// TILE-234: %[[sAik:.]] = memref.subview %{{.}}[%[[I]], %[[K]]] [%[[szM]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[szK_1:.*]] = affine.min #[[$bound_map_4]](%[[K]])[%[[ubK]]]
	// TILE-234: %[[szK:.*]] = affine.min #[[$bound_map_4]](%[[K]])[%[[ubK]]]
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[ubN]]]			// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[ubN]]]
	// TILE-234: %[[sBkj:.]] = memref.subview %{{.}}[%[[K]], %[[J]]] [%[[szK]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[szM_1:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubM]]]
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubM]]]			// TILE-234: %[[szN_1:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[ubN]]]
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[ubN]]]			// TILE-234: %[[sAik:.]] = memref.subview %{{.}}[%[[I]], %[[K]]] [%[[szM]], %[[szK]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[sCij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[sBkj:.]] = memref.subview %{{.}}[%[[K]], %[[J]]] [%[[szK_1]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
				// TILE-234: %[[sCij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM_1]], %[[szN_1]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	//			//
	// TILE-234: linalg.matmul ins(%[[sAik]], %[[sBkj]]{{.*}} outs(%[[sCij]]			// TILE-234: linalg.matmul ins(%[[sAik]], %[[sBkj]]{{.*}} outs(%[[sCij]]

	// When the buffer shapes are known at compile time, it is possible to avoid			// When the buffer shapes are known at compile time, it is possible to avoid
	// the "min" in subview size computation. This test uses buffer sizes divisible			// the "min" in subview size computation. This test uses buffer sizes divisible
	// by respective tile sizes (M=10 divisble by 2, N=12 divisible by 2 and 3,			// by respective tile sizes (M=10 divisble by 2, N=12 divisible by 2 and 3,
	// K=16 divisble by 2 and 4).			// K=16 divisble by 2 and 4).
	func.func @matmul_static(%arg0: memref<10x16xf32, offset: ?, strides: [?, 1]>,			func.func @matmul_static(%arg0: memref<10x16xf32, offset: ?, strides: [?, 1]>,
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	// TILE-2-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-2-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-2-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-2-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]
	// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-2: %[[N:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[szN:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szN:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]
				// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]], 0] [%[[szM]], %[[N]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-2: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-2: linalg.matvec ins(%[[sAi]], %{{.*}} outs(%[[sCi]]			// TILE-2: linalg.matvec ins(%[[sAi]], %{{.*}} outs(%[[sCi]]

	// TILE-02-LABEL: func @matvec(			// TILE-02-LABEL: func @matvec(
	// TILE-02-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref			// TILE-02-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref
	// TILE-02-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-02-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-02-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-02-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-02-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-02-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {			// TILE-02: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {
	// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[K]]]			// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[K]]]
				// TILE-02: %[[szN_1:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[K]]]
	// TILE-02: %[[sAj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-02: %[[sAj:.]] = memref.subview %{{.}}[0, %[[J]]] [%[[M]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-02: %[[szN:.*]] = affine.min #[[$bound_map]](%[[J]])[%[[K]]]			// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN_1]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-02: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-02: linalg.matvec ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}			// TILE-02: linalg.matvec ins(%[[sAj]], %[[sBj]]{{.}} outs(%{{.}}

	// TILE-002-LABEL: func @matvec(			// TILE-002-LABEL: func @matvec(
	// TILE-002-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref			// TILE-002-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref
	// TILE-002-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-002-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-002-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-002-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-002-NOT: scf.for			// TILE-002-NOT: scf.for

	// TILE-234-LABEL: func @matvec(			// TILE-234-LABEL: func @matvec(
	// TILE-234-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref			// TILE-234-SAME: %[[ARG0:[0-9a-zA-Z]*]]: memref
	// TILE-234-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref			// TILE-234-SAME: %[[ARG1:[0-9a-zA-Z]*]]: memref
	// TILE-234-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref			// TILE-234-SAME: %[[ARG2:[0-9a-zA-Z]*]]: memref
	// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index			// TILE-234-DAG: %[[C3:.*]] = arith.constant 3 : index
	// TILE-234: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[K:.]] = memref.dim %{{.}}, %c1 : memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-234: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {			// TILE-234: scf.for %[[J:.]] = %{{.}}{{.}} to %[[K]] step %{{.}} {
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[M]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[M]]]
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[K]]]			// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[K]]]
				// TILE-234: %[[szN_1:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[K]]]
				// TILE-234: %[[szM_1:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[M]]]
	// TILE-234: %[[sAij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>			// TILE-234: %[[sAij:.]] = memref.subview %{{.}}[%[[I]], %[[J]]] [%[[szM]], %[[szN]]] [1, 1] : memref<?x?xf32, #[[$strided2D]]> to memref<?x?xf32, #[[$strided2D]]>
	// TILE-234: %[[szN:.*]] = affine.min #[[$bound_map_3]](%[[J]])[%[[K]]]			// TILE-234: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN_1]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: %[[sBj:.]] = memref.subview %{{.}}[%[[J]]] [%[[szN]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM_1]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[M]]]
	// TILE-234: %[[sCi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	//			//
	// TILE-234: linalg.matvec ins(%[[sAij]], %[[sBj]]{{.*}} outs(%[[sCi]]			// TILE-234: linalg.matvec ins(%[[sAij]], %[[sBj]]{{.*}} outs(%[[sCi]]

	func.func @dot(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {			func.func @dot(%arg0: memref<?xf32, offset: ?, strides: [1]>, %arg1: memref<?xf32, offset: ?, strides: [1]>, %arg2: memref<f32>) {
	linalg.dot			linalg.dot
	ins(%arg0, %arg1: memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>)			ins(%arg0, %arg1: memref<?xf32, offset: ?, strides: [1]>, memref<?xf32, offset: ?, strides: [1]>)
	outs(%arg2: memref<f32>)			outs(%arg2: memref<f32>)
	return			return
	}			}
	// TILE-2-LABEL: func @dot(			// TILE-2-LABEL: func @dot(
	// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-2-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-2-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>			// TILE-2: %[[M:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>
	// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {			// TILE-2: scf.for %[[I:.]] = %{{.}}{{.}} to %[[M]] step %{{.}} {
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]
				// TILE-2: %[[szM_1:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]
	// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-2: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-2: %[[szM:.*]] = affine.min #[[$bound_map]](%[[I]])[%[[M]]]			// TILE-2: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM_1]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-2: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-2: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(			// TILE-2: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(

	// TILE-02-LABEL: func @dot(			// TILE-02-LABEL: func @dot(
	// TILE-02-NOT: scf.for			// TILE-02-NOT: scf.for

	// TILE-002-LABEL: func @dot(			// TILE-002-LABEL: func @dot(
	// TILE-002-NOT: scf.for			// TILE-002-NOT: scf.for

	// TILE-234-LABEL: func @dot(			// TILE-234-LABEL: func @dot(
	// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index			// TILE-234-DAG: %[[C0:.*]] = arith.constant 0 : index
	// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index			// TILE-234-DAG: %[[C2:.*]] = arith.constant 2 : index
	// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[ubK:.]] = memref.dim %{{.}}, %c0 : memref<?xf32, #[[$strided1D]]>
	// TILE-234: scf.for %[[I:.]] = %{{.}} to %[[ubK]] step %{{.*}} {			// TILE-234: scf.for %[[I:.]] = %{{.}} to %[[ubK]] step %{{.*}} {
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubK]]]			// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubK]]]
				// TILE-234: %[[szM_1:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubK]]]
	// TILE-234: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>			// TILE-234: %[[sAi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: %[[szM:.*]] = affine.min #[[$bound_map_2]](%[[I]])[%[[ubK]]]			// TILE-234: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM_1]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: %[[sBi:.]] = memref.subview %{{.}}[%[[I]]] [%[[szM]]] [1] : memref<?xf32, #[[$strided1D]]> to memref<?xf32, #[[$strided1D]]>
	// TILE-234: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(			// TILE-234: linalg.dot ins(%[[sAi]], %[[sBi]]{{.*}} outs(

	func.func @fill_static(%arg0: memref<127x99xf32>, %arg1: f32) {			func.func @fill_static(%arg0: memref<127x99xf32>, %arg1: f32) {
	linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<127x99xf32>)			linalg.fill ins(%arg1 : f32) outs(%arg0 : memref<127x99xf32>)
	return			return
	}			}
	// TILE-2-LABEL: func @fill_static			// TILE-2-LABEL: func @fill_static
	// TILE-2: for			// TILE-2: for
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/transform-op-split.mlir

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	func.func @dynamic(%arg0: tensor<100xf32>, %arg1: tensor<100xf32>) -> tensor<100xf32> {
// CHECK: %[[IN_SLICE_LOW:.+]] = tensor.extract_slice %[[IN:.+]][0] [%[[SPLIT_LOW]]] [1] : tensor<100xf32> to tensor<?xf32>		// CHECK: %[[IN_SLICE_LOW:.+]] = tensor.extract_slice %[[IN:.+]][0] [%[[SPLIT_LOW]]] [1] : tensor<100xf32> to tensor<?xf32>
// CHECK: %[[OUT_SLICE_LOW:.+]] = tensor.extract_slice %[[OUT:.+]][0] [%[[SPLIT_LOW]]] [1] : tensor<100xf32> to tensor<?xf32>		// CHECK: %[[OUT_SLICE_LOW:.+]] = tensor.extract_slice %[[OUT:.+]][0] [%[[SPLIT_LOW]]] [1] : tensor<100xf32> to tensor<?xf32>
// CHECK: %[[RES_SLICE_LOW:.+]] = linalg.generic		// CHECK: %[[RES_SLICE_LOW:.+]] = linalg.generic
// CHECK: ins(%[[IN_SLICE_LOW]]		// CHECK: ins(%[[IN_SLICE_LOW]]
// CHECK: outs(%[[OUT_SLICE_LOW]]		// CHECK: outs(%[[OUT_SLICE_LOW]]
// CHECK: %[[PARTIAL:.+]] = tensor.insert_slice %[[RES_SLICE_LOW]] into %[[OUT]][0] [%[[SPLIT_LOW]]] [1]		// CHECK: %[[PARTIAL:.+]] = tensor.insert_slice %[[RES_SLICE_LOW]] into %[[OUT]][0] [%[[SPLIT_LOW]]] [1]
//		//
// CHECK: %[[SPLIT_HIGH_2:.+]] = affine.apply #[[$MAP_S_MINUS_100]]()[%[[SPLIT_LOW]]]		// CHECK: %[[SPLIT_HIGH_2:.+]] = affine.apply #[[$MAP_S_MINUS_100]]()[%[[SPLIT_LOW]]]
// CHECK: %[[IN_SLICE_HIGH:.+]] = tensor.extract_slice %[[IN:.+]][%[[SPLIT_LOW]]] [%[[SPLIT_HIGH_2]]] [1] : tensor<100xf32> to tensor<?xf32>
// CHECK: %[[SPLIT_HIGH_3:.+]] = affine.apply #[[$MAP_S_MINUS_100]]()[%[[SPLIT_LOW]]]		// CHECK: %[[SPLIT_HIGH_3:.+]] = affine.apply #[[$MAP_S_MINUS_100]]()[%[[SPLIT_LOW]]]
		// CHECK: %[[IN_SLICE_HIGH:.+]] = tensor.extract_slice %[[IN:.+]][%[[SPLIT_LOW]]] [%[[SPLIT_HIGH_2]]] [1] : tensor<100xf32> to tensor<?xf32>
// CHECK: %[[OUT_SLICE_HIGH:.+]] = tensor.extract_slice %[[PARTIAL:.+]][%[[SPLIT_LOW]]] [%[[SPLIT_HIGH_3]]] [1] : tensor<100xf32> to tensor<?xf32>		// CHECK: %[[OUT_SLICE_HIGH:.+]] = tensor.extract_slice %[[PARTIAL:.+]][%[[SPLIT_LOW]]] [%[[SPLIT_HIGH_3]]] [1] : tensor<100xf32> to tensor<?xf32>
// CHECK: %[[RES_SLICE_HIGH:.+]] = linalg.generic		// CHECK: %[[RES_SLICE_HIGH:.+]] = linalg.generic
// CHECK: ins(%[[IN_SLICE_HIGH]]		// CHECK: ins(%[[IN_SLICE_HIGH]]
// CHECK: outs(%[[OUT_SLICE_HIGH]]		// CHECK: outs(%[[OUT_SLICE_HIGH]]
// CHECK: %[[SPLIT_HIGH_4:.+]] = affine.apply #[[$MAP_S_MINUS_100]]()[%[[SPLIT_LOW]]]		// CHECK: %[[SPLIT_HIGH_4:.+]] = affine.apply #[[$MAP_S_MINUS_100]]()[%[[SPLIT_LOW]]]
// CHECK: tensor.insert_slice %[[RES_SLICE_HIGH]] into %[[PARTIAL]][%[[SPLIT_LOW]]] [%[[SPLIT_HIGH_4]]] [1]		// CHECK: tensor.insert_slice %[[RES_SLICE_HIGH]] into %[[PARTIAL]][%[[SPLIT_LOW]]] [%[[SPLIT_HIGH_4]]] [1]
%0 = func.call @get_size() : () -> index		%0 = func.call @get_size() : () -> index
%1 = linalg.generic {		%1 = linalg.generic {
▲ Show 20 Lines • Show All 191 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 451440

mlir/include/mlir/Dialect/Linalg/Utils/Utils.h

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

mlir/test/Dialect/Linalg/tile-and-distribute.mlir

mlir/test/Dialect/Linalg/tile-and-fuse-tensors.mlir

mlir/test/Dialect/Linalg/tile-conv.mlir

mlir/test/Dialect/Linalg/tile-to-foreach-thread.mlir

mlir/test/Dialect/Linalg/tile.mlir

mlir/test/Dialect/Linalg/transform-op-split.mlir

[mlir] Extract offsets-sizes-strides computation from `makeTiledShape(s)`.
ClosedPublic