This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Dialect/Vector/
-
Vector/
-
IR/
-
VectorOps.td
-
Transforms/
-
VectorRewritePatterns.h
-
VectorTransforms.h
-
Interfaces/
-
VectorInterfaces.td
-
lib/
-
Conversion/VectorToSCF/
-
VectorToSCF/
-
VectorToSCF.cpp
-
Dialect/
-
Affine/Transforms/
-
Transforms/
-
SuperVectorize.cpp
-
Linalg/Transforms/
-
Transforms/
-
Vectorization.cpp
-
MemRef/Transforms/
-
Transforms/
-
FoldMemRefAliasOps.cpp
-
Tensor/Transforms/
-
Transforms/
-
FoldTensorSubsetOps.cpp
-
Vector/
-
IR/
-
VectorOps.cpp
-
Transforms/
-
LowerVectorTransfer.cpp
-
VectorDropLeadUnitDim.cpp
-
VectorTransferOpTransforms.cpp
-
VectorTransferSplitRewritePatterns.cpp
-
VectorTransforms.cpp
-
test/
-
Conversion/
-
GPUCommon/
-
transfer_write.mlir
-
VectorToGPU/
-
vector-to-mma-ops-mma-sync.mlir
-
vector-to-mma-ops.mlir
-
VectorToLLVM/
-
vector-to-llvm.mlir
-
VectorToSCF/
-
tensor-transfer-ops.mlir
-
unrolled-tensor-transfer-ops.mlir
-
vector-to-scf-mask-and-permutation-map.mlir
-
vector-to-scf.mlir
-
Dialect/
-
Affine/SuperVectorize/
-
SuperVectorize/
-
vector_utils.mlir
-
vectorize_1d.mlir
-
vectorize_2d.mlir
-
vectorize_outer_loop_2d.mlir
-
vectorize_outer_loop_transpose_2d.mlir
-
vectorize_transpose_2d.mlir
-
Linalg/
-
hoisting.mlir
-
vectorization-masked.mlir
-
vectorization.mlir
-
vectorize-tensor-extract.mlir
-
MemRef/
-
extract-address-computations.mlir
-
fold-memref-alias-ops.mlir
-
Tensor/
-
fold-tensor-subset-ops-into-vector-transfers.mlir
-
fold-tensor-subset-ops.mlir
-
Vector/
-
canonicalize.mlir
-
invalid.mlir
-
ops.mlir
-
scalar-vector-transfer-to-memref.mlir
-
vector-dropleadunitdim-transforms.mlir
-
vector-transfer-collapse-inner-most-dims.mlir
-
vector-transfer-drop-unit-dims-patterns.mlir
-
vector-transfer-flatten.mlir
-
vector-transfer-materialize-masks.mlir
-
vector-transfer-permutation-lowering.mlir
-
vector-transfer-to-vector-load-store.mlir
-
vector-transfer-unroll.mlir
-
vector-warp-distribute.mlir
-
Integration/Dialect/
-
Dialect/
-
SparseTensor/
-
CPU/
-
sparse_coo_test.mlir
-
sparse_sampled_matmul.mlir
-
sparse_transpose.mlir
-
GPU/CUDA/
-
CUDA/
-
sparse-matmul-lib.mlir
-
sparse-mma-2-4-f16.mlir
-
Vector/CPU/
-
CPU/
-
test-transfer-read-1d.mlir
-
test-transfer-read-2d.mlir
-
test-transfer-read-3d.mlir
-
test-transfer-to-loops.mlir

Differential D155205

[mlir][vector] Transfer ops: one `in_bounds` bool per memref/tensor dim
Changes PlannedPublic

Authored by springerm on Jul 13 2023, 7:27 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
aartbik
ftynse
bondhugula
ThomasRaoux
dcaballe
herhut

Summary

The in_bounds attribute specifies which dimensions of a vector transfer are in-bounds and which are out-of-bounds. Until now, it was expressed in terms of transfer dimensions. This was confusing because "in-bounds" is a property of shaped value indexing.

With this change, in_bounds is expressed in terms of shaped value ("source") dimensions. I.e.:

The number of in_bounds bools must match the rank of the shaped value (and not the rank of the vector).
Special rules around broadcast dimensions are no longer needed. (Previously, broadcast dimensions had to be "in-bounds".)

This change is in preparation of allowing the starting offsets to be out-of-bounds (in a future change). This rule is *not* changed yet with this revision: starting points must be in-bounds and that is verified for non-transfer dimensions. (It cannot be verified statically for transfer dimensions of size >1.)

The following examples illustrate how transfer ops are changing.

Example 1:

// previously:
%0 = vector.transfer_read %m[%i, %j], %cst {in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (d1, d0)>} : memref<?x?xf32>, vector<5x6xf32>
// now: in_bounds is swapped because it is now expressed in terms of shaped value dimensions
%0 = vector.transfer_read %m[%i, %j], %cst {in_bounds = [false, true], permutation_map = affine_map<(d0, d1) -> (d1, d0)>} : memref<?x?xf32>, vector<5x6xf32>

Example 2:

// previously:
%0 = vector.transfer_read %t[%i, %j, %k], %cst {in_bounds = [false]} : tensor<?x?x?xf32>, vector<5xf32>
// now: There are 3 in_bounds values. The first two in_bounds must be "true" because they are non-transfer dimensions.
%0 = vector.transfer_read %t[%i, %j, %k], %cst {in_bounds = [true, true, false]} : tensor<?x?x?xf32>, vector<5xf32>

Note that in the above example, in_bounds used to be optional. In the absence of in_bounds, all dimension are considered out-of-bounds. But this would be incorrect with the new op semantics because the non-transfer dimensions would be out-of-bounds.

Depends On: D155277

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Jul 13 2023, 7:27 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJul 13 2023, 7:27 AM

Herald added a reviewer: ftynse. · View Herald Transcript

Herald added a reviewer: aartbik. · View Herald Transcript

Herald added a reviewer: bondhugula. · View Herald Transcript

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: gysit, Dinistro, bviyer and 33 others. · View Herald Transcript

springerm requested review of this revision.Jul 13 2023, 7:27 AM

Herald added a reviewer: herhut. · View Herald TranscriptJul 13 2023, 7:27 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: wangpc, stephenneuendorffer. · View Herald Transcript

springerm added a parent revision: D155196: [mlir][vector] VectorToSCF: Omit redundant out-of-bounds check.Jul 13 2023, 7:27 AM

add invalid.mlir test case

Harbormaster completed remote builds in B245125: Diff 540037.Jul 13 2023, 9:37 AM

springerm edited the summary of this revision. (Show Details)Jul 14 2023, 2:13 AM

springerm edited parent revisions, added: D155277: [mlir][vector][NFC] Minor VectorTransferOpInterface cleanup; removed: D155196: [mlir][vector] VectorToSCF: Omit redundant out-of-bounds check.

rebase

Harbormaster completed remote builds in B245332: Diff 540333.Jul 14 2023, 3:34 AM

rebase

springerm mentioned this in D155303: [mlir][vector] Lowering of transfers with out-of-bounds non-transfer dims.Jul 14 2023, 8:38 AM

springerm added a child revision: D155303: [mlir][vector] Lowering of transfers with out-of-bounds non-transfer dims.Jul 14 2023, 8:39 AM

Harbormaster completed remote builds in B245401: Diff 540433.Jul 14 2023, 9:38 AM

Hey Matthias!

Big change! This seems to be significantly changing the meaning of the in_bounds attribute and I'm not sure the new definition is a superset of the previous definition. The in_bounds attribute in xfer ops refers to the transferred elements themself. That is, at least, how it was originally defined. The starting address could be "in bounds" but the xfer operation can still read/writer out-of-bounds, which is what in_bounds is supposed to model. Example:

%0 = vector.transfer_read %t[%c2], %cst {in_bounds = [?]} : tensor<4xf32>, vector<5xf32>

Should in_bounds be true or false for this example with the new definition? According to the original definition, it should be false...

It's not totally clear either what we are trying to model with an in_bounds value per index. For example, what does the following example mean? Is it valid? Why do we care if the first index is in_bounds or not?

%5 = vector.transfer_read %0[%c2, %c3], %cst {permutation_map = #map1, in_bounds = [false, true]} : memref<?x?xf32>, vector<5xf32>

Hopefully you can help me understand the motivation. I'm not against this change but it's a big one and I'm not sure I totally understand the implications of it. Would it make sense to send an RFC?
It would also be great to learn more about what you are trying to model.

Thanks!
Diego

This revision now requires changes to proceed.Jul 17 2023, 11:35 PM

Putting this on hold for the moment. We may not need it.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Vector/

IR/

VectorOps.td

35 lines

Transforms/

VectorRewritePatterns.h

2 lines

VectorTransforms.h

3 lines

Interfaces/

VectorInterfaces.td

10 lines

lib/

Conversion/

VectorToSCF/

VectorToSCF.cpp

37 lines

Dialect/

Affine/

Transforms/

SuperVectorize.cpp

15 lines

Linalg/

Transforms/

Vectorization.cpp

45 lines

MemRef/

Transforms/

FoldMemRefAliasOps.cpp

43 lines

Tensor/

Transforms/

FoldTensorSubsetOps.cpp

41 lines

Vector/

IR/

VectorOps.cpp

75 lines

Transforms/

LowerVectorTransfer.cpp

44 lines

VectorDropLeadUnitDim.cpp

14 lines

VectorTransferOpTransforms.cpp

9 lines

VectorTransferSplitRewritePatterns.cpp

43 lines

VectorTransforms.cpp

24 lines

test/

Conversion/

GPUCommon/

transfer_write.mlir

2 lines

VectorToGPU/

vector-to-mma-ops-mma-sync.mlir

16 lines

vector-to-mma-ops.mlir

22 lines

VectorToLLVM/

vector-to-llvm.mlir

3 lines

VectorToSCF/

tensor-transfer-ops.mlir

4 lines

unrolled-tensor-transfer-ops.mlir

12 lines

vector-to-scf-mask-and-permutation-map.mlir

2 lines

vector-to-scf.mlir

56 lines

Dialect/

Affine/

SuperVectorize/

vector_utils.mlir

2 lines

vectorize_1d.mlir

26 lines

vectorize_2d.mlir

4 lines

vectorize_outer_loop_2d.mlir

2 lines

vectorize_outer_loop_transpose_2d.mlir

8 lines

vectorize_transpose_2d.mlir

8 lines

Linalg/

hoisting.mlir

160 lines

vectorization-masked.mlir

6 lines

vectorization.mlir

16 lines

vectorize-tensor-extract.mlir

2 lines

MemRef/

extract-address-computations.mlir

20 lines

fold-memref-alias-ops.mlir

14 lines

Tensor/

fold-tensor-subset-ops-into-vector-transfers.mlir

12 lines

fold-tensor-subset-ops.mlir

22 lines

Vector/

canonicalize.mlir

31 lines

invalid.mlir

32 lines

ops.mlir

56 lines

scalar-vector-transfer-to-memref.mlir

16 lines

vector-dropleadunitdim-transforms.mlir

12 lines

vector-transfer-collapse-inner-most-dims.mlir

2 lines

vector-transfer-drop-unit-dims-patterns.mlir

12 lines

vector-transfer-flatten.mlir

8 lines

vector-transfer-materialize-masks.mlir

33 lines

vector-transfer-permutation-lowering.mlir

2 lines

vector-transfer-to-vector-load-store.mlir

60 lines

vector-transfer-unroll.mlir

6 lines

vector-warp-distribute.mlir

6 lines

Integration/

Dialect/

SparseTensor/

CPU/

sparse_coo_test.mlir

6 lines

sparse_sampled_matmul.mlir

2 lines

sparse_transpose.mlir

4 lines

GPU/

CUDA/

sparse-matmul-lib.mlir

16 lines

sparse-mma-2-4-f16.mlir

26 lines

Vector/

CPU/

test-transfer-read-1d.mlir

20 lines

test-transfer-read-2d.mlir

9 lines

test-transfer-read-3d.mlir

31 lines

test-transfer-to-loops.mlir

2 lines

Diff 540433

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

Show First 20 Lines • Show All 1,217 Lines • ▼ Show 20 Lines	let description = [{
and/or masking.		and/or masking.

An optional SSA value `mask` may be specified to mask out elements read from		An optional SSA value `mask` may be specified to mask out elements read from
the MemRef/Tensor. The `mask` type is an `i1` vector with a shape that		the MemRef/Tensor. The `mask` type is an `i1` vector with a shape that
matches how elements are read from the MemRef/Tensor, before any		matches how elements are read from the MemRef/Tensor, before any
permutation or broadcasting. Elements whose corresponding mask element is		permutation or broadcasting. Elements whose corresponding mask element is
`0` are masked out and replaced with `padding`.		`0` are masked out and replaced with `padding`.

An optional boolean array attribute `in_bounds` specifies for every vector		An optional boolean array attribute `in_bounds` specifies for every tensor/
dimension if the transfer is guaranteed to be within the source bounds.		memref dimension if the transfer is guaranteed to be within the source
While the starting point of the transfer has to be in-bounds, accesses may		bounds. While the starting point of the transfer has to be in-bounds,
run out-of-bounds as indices increase. Broadcast dimensions must always be		accesses may run out-of-bounds as indices increase. If specified, the
in-bounds. If specified, the `in_bounds` array length has to be equal to the		`in_bounds` array length has to be equal to the rank of the source. In
vector rank. In absence of the attribute, accesses along all dimensions		absence of the attribute, accesses along all dimensions may run
(except for broadcasts) may run out-of-bounds. A `vector.transfer_read` can		out-of-bounds. A `vector.transfer_read` can be lowered to a simple load if
be lowered to a simple load if all dimensions are specified to be within		all dimensions are specified to be within bounds and no `mask` was
bounds and no `mask` was specified.		specified.

This operation is called 'read' by opposition to 'load' because the		This operation is called 'read' by opposition to 'load' because the
super-vector granularity is generally not representable with a single		super-vector granularity is generally not representable with a single
hardware register. A `vector.transfer_read` is thus a mid-level abstraction		hardware register. A `vector.transfer_read` is thus a mid-level abstraction
that supports super-vectorization with non-effecting padding for full-tile		that supports super-vectorization with non-effecting padding for full-tile
only operations.		only operations.

More precisely, let's dive deeper into the permutation_map for the following		More precisely, let's dive deeper into the permutation_map for the following
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	let description = [{
matches how elements are written into the MemRef/Tensor, after applying		matches how elements are written into the MemRef/Tensor, after applying
any permutation. Elements whose corresponding mask element is `0` are		any permutation. Elements whose corresponding mask element is `0` are
masked out.		masked out.

An optional SSA value `mask` of the same shape as the vector type may be		An optional SSA value `mask` of the same shape as the vector type may be
specified to mask out elements. Elements whose corresponding mask element		specified to mask out elements. Elements whose corresponding mask element
is `0` are masked out.		is `0` are masked out.

An optional boolean array attribute `in_bounds` specifies for every vector		An optional boolean array attribute `in_bounds` specifies for every tensor/
dimension if the transfer is guaranteed to be within the source bounds.		memref dimension if the transfer is guaranteed to be within the source
While the starting point of the transfer has to be in-bounds, accesses may		bounds. While the starting point of the transfer has to be in-bounds,
run out-of-bounds as indices increase. If specified, the `in_bounds` array		accesses may run out-of-bounds as indices increase. If specified, the
length has to be equal to the vector rank. In absence of the attribute,		`in_bounds` array length has to be equal to the rank of the source. In
accesses along all dimensions may run out-of-bounds. A		absence of the attribute, accesses along all dimensions may run
`vector.transfer_write` can be lowered to a simple store if all dimensions		out-of-bounds. A `vector.transfer_write` can be lowered to a simple store
are specified to be within bounds and no `mask` was specified.		if all dimensions are specified to be within bounds and no `mask` was
		specified.

This operation is called 'write' by opposition to 'store' because the		This operation is called 'write' by opposition to 'store' because the
super-vector granularity is generally not representable with a single		super-vector granularity is generally not representable with a single
hardware register. A `vector.transfer_write` is thus a		hardware register. A `vector.transfer_write` is thus a
mid-level abstraction that supports super-vectorization with non-effecting		mid-level abstraction that supports super-vectorization with non-effecting
padding for full-tile-only code. It is the responsibility of		padding for full-tile-only code. It is the responsibility of
`vector.transfer_write`'s implementation to ensure the memory writes are		`vector.transfer_write`'s implementation to ensure the memory writes are
valid. Different lowerings may be pertinent depending on the hardware		valid. Different lowerings may be pertinent depending on the hardware
▲ Show 20 Lines • Show All 1,432 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	/// // fast path, direct cast			/// // fast path, direct cast
	/// memref.cast %A: memref<A...> to compatibleMemRefType			/// memref.cast %A: memref<A...> to compatibleMemRefType
	/// scf.yield %view : compatibleMemRefType, index, index			/// scf.yield %view : compatibleMemRefType, index, index
	/// } else {			/// } else {
	/// // slow path, not in-bounds vector.transfer or linalg.copy.			/// // slow path, not in-bounds vector.transfer or linalg.copy.
	/// memref.cast %alloc: memref<B...> to compatibleMemRefType			/// memref.cast %alloc: memref<B...> to compatibleMemRefType
	/// scf.yield %4 : compatibleMemRefType, index, index			/// scf.yield %4 : compatibleMemRefType, index, index
	// }			// }
	/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true ... true]}			/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true, true]}
	/// ```			/// ```
	/// where `alloc` is a top of the function alloca'ed buffer of one vector.			/// where `alloc` is a top of the function alloca'ed buffer of one vector.
	///			///
	/// Preconditions:			/// Preconditions:
	/// 1. `xferOp.permutation_map()` must be a minor identity map			/// 1. `xferOp.permutation_map()` must be a minor identity map
	/// 2. the rank of the `xferOp.memref()` and the rank of the `xferOp.vector()`			/// 2. the rank of the `xferOp.memref()` and the rank of the `xferOp.vector()`
	/// must be equal. This will be relaxed in the future but requires			/// must be equal. This will be relaxed in the future but requires
	/// rank-reducing subviews.			/// rank-reducing subviews.
	▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Vector/Transforms/VectorTransforms.h

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	/// // fastpath, direct cast			/// // fastpath, direct cast
	/// memref.cast %A: memref<A...> to compatibleMemRefType			/// memref.cast %A: memref<A...> to compatibleMemRefType
	/// scf.yield %view : compatibleMemRefType, index, index			/// scf.yield %view : compatibleMemRefType, index, index
	/// } else {			/// } else {
	/// // slowpath, not in-bounds vector.transfer or linalg.copy.			/// // slowpath, not in-bounds vector.transfer or linalg.copy.
	/// memref.cast %alloc: memref<B...> to compatibleMemRefType			/// memref.cast %alloc: memref<B...> to compatibleMemRefType
	/// scf.yield %4 : compatibleMemRefType, index, index			/// scf.yield %4 : compatibleMemRefType, index, index
	// }			// }
	/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true ...			/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true, true]}
	/// true]}
	/// ```			/// ```
	/// where `alloc` is a top of the function alloca'ed buffer of one vector.			/// where `alloc` is a top of the function alloca'ed buffer of one vector.
	///			///
	/// Preconditions:			/// Preconditions:
	/// 1. `xferOp.permutation_map()` must be a minor identity map			/// 1. `xferOp.permutation_map()` must be a minor identity map
	/// 2. the rank of the `xferOp.memref()` and the rank of the			/// 2. the rank of the `xferOp.memref()` and the rank of the
	/// `xferOp.vector()` must be equal. This will be relaxed in the future but			/// `xferOp.vector()` must be equal. This will be relaxed in the future but
	/// requires rank-reducing subviews.			/// requires rank-reducing subviews.
	Show All 18 Lines

mlir/include/mlir/Interfaces/VectorInterfaces.td

Show First 20 Lines • Show All 69 Lines • ▼ Show 20 Lines	let methods = [
InterfaceMethod<		InterfaceMethod<
/desc=/[{ Return `true` if dimension `dim` is in-bounds. Return `false`		/desc=/[{ Return `true` if dimension `dim` is in-bounds. Return `false`
otherwise. }],		otherwise. }],
/retTy=/"bool",		/retTy=/"bool",
/methodName=/"isDimInBounds",		/methodName=/"isDimInBounds",
/args=/(ins "unsigned":$dim),		/args=/(ins "unsigned":$dim),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
return $_op.isBroadcastDim(dim)		return $_op.getInBounds()
\|\| ($_op.getInBounds()		&& cast<::mlir::BoolAttr>(cast<::mlir::ArrayAttr>(*$_op.getInBounds())[dim]).getValue();
&& cast<::mlir::BoolAttr>(cast<::mlir::ArrayAttr>(*$_op.getInBounds())[dim]).getValue());
}]		}]
>,		>,
InterfaceMethod<		InterfaceMethod<
/desc=/"Return the memref or ranked tensor operand.",		/desc=/"Return the memref or ranked tensor operand.",
/retTy=/"::mlir::Value",		/retTy=/"::mlir::Value",
/methodName=/"source",		/methodName=/"source",
/args=/(ins),		/args=/(ins),
/methodBody=/"return $_op.getSource();"		/methodBody=/"return $_op.getSource();"
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	let methods = [
InterfaceMethod<		InterfaceMethod<
/desc=/"Return a vector of all in_bounds values as booleans.",		/desc=/"Return a vector of all in_bounds values as booleans.",
/retTy=/"::llvm::SmallVector<bool>",		/retTy=/"::llvm::SmallVector<bool>",
/methodName=/"getInBoundsValues",		/methodName=/"getInBoundsValues",
/args=/(ins),		/args=/(ins),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
::llvm::SmallVector<bool> inBounds;		::llvm::SmallVector<bool> inBounds;
for (int64_t i = 0, e = $_op.getTransferRank(); i < e; ++i)		for (int64_t i = 0, e = $_op.getShapedType().getRank(); i < e; ++i)
inBounds.push_back($_op.isDimInBounds(i));		inBounds.push_back($_op.isDimInBounds(i));
return inBounds;		return inBounds;
}]		}]
>,		>,
InterfaceMethod<		InterfaceMethod<
/desc=/"Return the ShapedType.",		/desc=/"Return the ShapedType.",
/retTy=/"::mlir::ShapedType",		/retTy=/"::mlir::ShapedType",
/methodName=/"getShapedType",		/methodName=/"getShapedType",
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	let methods = [
InterfaceMethod<		InterfaceMethod<
/desc=/[{ Returns true if at least one of the dimensions may be		/desc=/[{ Returns true if at least one of the dimensions may be
out-of-bounds.}],		out-of-bounds.}],
/retTy=/"bool",		/retTy=/"bool",
/methodName=/"hasOutOfBoundsDim",		/methodName=/"hasOutOfBoundsDim",
/args=/(ins),		/args=/(ins),
/methodBody=/"",		/methodBody=/"",
/defaultImplementation=/[{		/defaultImplementation=/[{
for (unsigned idx = 0, e = $_op.getTransferRank(); idx < e; ++idx)		for (unsigned idx = 0, e = $_op.getShapedType().getRank();
		idx < e; ++idx)
if (!$_op.isDimInBounds(idx))		if (!$_op.isDimInBounds(idx))
return true;		return true;
return false;		return false;
}]		}]
>,		>,
InterfaceMethod<		InterfaceMethod<
/desc=/[{		/desc=/[{
Helper function to account for the fact that `permutationMap` results and		Helper function to account for the fact that `permutationMap` results and
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	if (!isBroadcast) {
AffineExpr d0, d1;		AffineExpr d0, d1;
bindDims(xferOp.getContext(), d0, d1);		bindDims(xferOp.getContext(), d0, d1);
Value offset = adaptor.getIndices()[*dim];		Value offset = adaptor.getIndices()[*dim];
indices[*dim] =		indices[*dim] =
affine::makeComposedAffineApply(b, loc, d0 + d1, {offset, iv});		affine::makeComposedAffineApply(b, loc, d0 + d1, {offset, iv});
}		}
}		}

		/// Calculate the in_bounds attribute of the new vector transfer op. The dropped
		/// vector transfer dimension is now in-bounds; an scf.if check was generated
		/// around the new transfer op.
		template <typename OpTy>
		static ArrayAttr getXferInBoundsAttr(OpBuilder &b, OpTy xferOp) {
		SmallVector<bool> inBounds = xferOp.getInBoundsValues();
		auto dim = unpackedDim(xferOp);
		bool isBroadcast = !dim.has_value();
		if (!isBroadcast)
		inBounds[*dim] = true;
		return b.getBoolArrayAttr(inBounds);
		}

static void maybeYieldValue(OpBuilder &b, Location loc, bool hasRetVal,		static void maybeYieldValue(OpBuilder &b, Location loc, bool hasRetVal,
Value value) {		Value value) {
if (hasRetVal) {		if (hasRetVal) {
assert(value && "Expected non-empty value");		assert(value && "Expected non-empty value");
b.create<scf::YieldOp>(loc, value);		b.create<scf::YieldOp>(loc, value);
} else {		} else {
b.create<scf::YieldOp>(loc);		b.create<scf::YieldOp>(loc);
}		}
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	static Value generateInBoundsCheck(
function_ref<Value(OpBuilder &, Location)> outOfBoundsCase = nullptr) {		function_ref<Value(OpBuilder &, Location)> outOfBoundsCase = nullptr) {
bool hasRetVal = !resultTypes.empty();		bool hasRetVal = !resultTypes.empty();
Value cond; // Condition to be built...		Value cond; // Condition to be built...

// Condition check 1: Access in-bounds?		// Condition check 1: Access in-bounds?
bool isBroadcast = !dim; // No in-bounds check for broadcasts.		bool isBroadcast = !dim; // No in-bounds check for broadcasts.
Location loc = xferOp.getLoc();		Location loc = xferOp.getLoc();
ImplicitLocOpBuilder lb(xferOp.getLoc(), b);		ImplicitLocOpBuilder lb(xferOp.getLoc(), b);
if (!xferOp.isDimInBounds(0) && !isBroadcast) {		if (!isBroadcast && !xferOp.isDimInBounds(*dim)) {
Value memrefDim =		Value memrefDim =
vector::createOrFoldDimOp(b, loc, xferOp.getSource(), *dim);		vector::createOrFoldDimOp(b, loc, xferOp.getSource(), *dim);
AffineExpr d0, d1;		AffineExpr d0, d1;
bindDims(xferOp.getContext(), d0, d1);		bindDims(xferOp.getContext(), d0, d1);
Value base = xferOp.getIndices()[*dim];		Value base = xferOp.getIndices()[*dim];
Value memrefIdx =		Value memrefIdx =
affine::makeComposedAffineApply(b, loc, d0 + d1, {base, iv});		affine::makeComposedAffineApply(b, loc, d0 + d1, {base, iv});
cond = lb.create<arith::CmpIOp>(arith::CmpIPredicate::sgt, memrefDim,		cond = lb.create<arith::CmpIOp>(arith::CmpIPredicate::sgt, memrefDim,
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	generateInBoundsCheck(
/outOfBoundsCase=/		/outOfBoundsCase=/
[&](OpBuilder &b, Location loc) {		[&](OpBuilder &b, Location loc) {
if (outOfBoundsCase)		if (outOfBoundsCase)
outOfBoundsCase(b, loc);		outOfBoundsCase(b, loc);
return Value();		return Value();
});		});
}		}

/// Given an ArrayAttr, return a copy where the first element is dropped.
static ArrayAttr dropFirstElem(OpBuilder &b, ArrayAttr attr) {
if (!attr)
return attr;
return ArrayAttr::get(b.getContext(), attr.getValue().drop_front());
}

/// Add the pass label to a vector transfer op if its rank is not the target		/// Add the pass label to a vector transfer op if its rank is not the target
/// rank.		/// rank.
template <typename OpTy>		template <typename OpTy>
static void maybeApplyPassLabel(OpBuilder &b, OpTy newXferOp,		static void maybeApplyPassLabel(OpBuilder &b, OpTy newXferOp,
unsigned targetRank) {		unsigned targetRank) {
if (newXferOp.getVectorType().getRank() > targetRank)		if (newXferOp.getVectorType().getRank() > targetRank)
newXferOp->setAttr(kPassLabel, b.getUnitAttr());		newXferOp->setAttr(kPassLabel, b.getUnitAttr());
}		}
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	static TransferReadOp rewriteOp(OpBuilder &b,
storeIndices.push_back(iv);		storeIndices.push_back(iv);

SmallVector<Value, 8> xferIndices;		SmallVector<Value, 8> xferIndices;
getXferIndices(b, xferOp, iv, xferIndices);		getXferIndices(b, xferOp, iv, xferIndices);

Location loc = xferOp.getLoc();		Location loc = xferOp.getLoc();
auto bufferType = dyn_cast<ShapedType>(buffer.getType());		auto bufferType = dyn_cast<ShapedType>(buffer.getType());
auto vecType = dyn_cast<VectorType>(bufferType.getElementType());		auto vecType = dyn_cast<VectorType>(bufferType.getElementType());
auto inBoundsAttr = dropFirstElem(b, xferOp.getInBoundsAttr());
auto newXferOp = b.create<vector::TransferReadOp>(		auto newXferOp = b.create<vector::TransferReadOp>(
loc, vecType, xferOp.getSource(), xferIndices,		loc, vecType, xferOp.getSource(), xferIndices,
AffineMapAttr::get(unpackedPermutationMap(b, xferOp)),		AffineMapAttr::get(unpackedPermutationMap(b, xferOp)),
xferOp.getPadding(), Value(), inBoundsAttr);		xferOp.getPadding(), Value(), getXferInBoundsAttr(b, xferOp));

maybeApplyPassLabel(b, newXferOp, options.targetRank);		maybeApplyPassLabel(b, newXferOp, options.targetRank);

b.create<memref::StoreOp>(loc, newXferOp.getVector(), buffer, storeIndices);		b.create<memref::StoreOp>(loc, newXferOp.getVector(), buffer, storeIndices);
return newXferOp;		return newXferOp;
}		}

/// Handle out-of-bounds accesses on the to-be-unpacked dimension: Write		/// Handle out-of-bounds accesses on the to-be-unpacked dimension: Write
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	static TransferWriteOp rewriteOp(OpBuilder &b,
getBufferIndices(xferOp, loadIndices);		getBufferIndices(xferOp, loadIndices);
loadIndices.push_back(iv);		loadIndices.push_back(iv);

SmallVector<Value, 8> xferIndices;		SmallVector<Value, 8> xferIndices;
getXferIndices(b, xferOp, iv, xferIndices);		getXferIndices(b, xferOp, iv, xferIndices);

Location loc = xferOp.getLoc();		Location loc = xferOp.getLoc();
auto vec = b.create<memref::LoadOp>(loc, buffer, loadIndices);		auto vec = b.create<memref::LoadOp>(loc, buffer, loadIndices);
auto inBoundsAttr = dropFirstElem(b, xferOp.getInBoundsAttr());
auto source = loopState.empty() ? xferOp.getSource() : loopState[0];		auto source = loopState.empty() ? xferOp.getSource() : loopState[0];
Type type = isTensorOp(xferOp) ? xferOp.getShapedType() : Type();		Type type = isTensorOp(xferOp) ? xferOp.getShapedType() : Type();
auto newXferOp = b.create<vector::TransferWriteOp>(		auto newXferOp = b.create<vector::TransferWriteOp>(
loc, type, vec, source, xferIndices,		loc, type, vec, source, xferIndices,
AffineMapAttr::get(unpackedPermutationMap(b, xferOp)), Value(),		AffineMapAttr::get(unpackedPermutationMap(b, xferOp)), Value(),
inBoundsAttr);		getXferInBoundsAttr(b, xferOp));

maybeApplyPassLabel(b, newXferOp, options.targetRank);		maybeApplyPassLabel(b, newXferOp, options.targetRank);

return newXferOp;		return newXferOp;
}		}

/// Handle out-of-bounds accesses on the to-be-unpacked dimension.		/// Handle out-of-bounds accesses on the to-be-unpacked dimension.
static Value handleOutOfBoundsDim(OpBuilder &b, TransferWriteOp xferOp,		static Value handleOutOfBoundsDim(OpBuilder &b, TransferWriteOp xferOp,
▲ Show 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	for (int64_t i = 0; i < dimSize; ++i) {
SmallVector<Value, 8> xferIndices;		SmallVector<Value, 8> xferIndices;
getXferIndices(b, xferOp, iv, xferIndices);		getXferIndices(b, xferOp, iv, xferIndices);

// Indices for the new vector.insert op.		// Indices for the new vector.insert op.
SmallVector<int64_t, 8> insertionIndices;		SmallVector<int64_t, 8> insertionIndices;
getInsertionIndices(xferOp, insertionIndices);		getInsertionIndices(xferOp, insertionIndices);
insertionIndices.push_back(i);		insertionIndices.push_back(i);

auto inBoundsAttr = dropFirstElem(b, xferOp.getInBoundsAttr());
auto newXferOp = b.create<vector::TransferReadOp>(		auto newXferOp = b.create<vector::TransferReadOp>(
loc, newXferVecType, xferOp.getSource(), xferIndices,		loc, newXferVecType, xferOp.getSource(), xferIndices,
AffineMapAttr::get(unpackedPermutationMap(b, xferOp)),		AffineMapAttr::get(unpackedPermutationMap(b, xferOp)),
xferOp.getPadding(), Value(), inBoundsAttr);		xferOp.getPadding(), Value(), getXferInBoundsAttr(b, xferOp));
maybeAssignMask(b, xferOp, newXferOp, i);		maybeAssignMask(b, xferOp, newXferOp, i);
return b.create<vector::InsertOp>(loc, newXferOp, vec,		return b.create<vector::InsertOp>(loc, newXferOp, vec,
insertionIndices);		insertionIndices);
},		},
/outOfBoundsCase=/		/outOfBoundsCase=/
[&](OpBuilder &b, Location loc) {		[&](OpBuilder &b, Location loc) {
// Loop through original (unmodified) vector.		// Loop through original (unmodified) vector.
return vec;		return vec;
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	for (int64_t i = 0; i < dimSize; ++i) {

// Indices for the new vector.extract op.		// Indices for the new vector.extract op.
SmallVector<int64_t, 8> extractionIndices;		SmallVector<int64_t, 8> extractionIndices;
getExtractionIndices(xferOp, extractionIndices);		getExtractionIndices(xferOp, extractionIndices);
extractionIndices.push_back(i);		extractionIndices.push_back(i);

auto extracted =		auto extracted =
b.create<vector::ExtractOp>(loc, vec, extractionIndices);		b.create<vector::ExtractOp>(loc, vec, extractionIndices);
auto inBoundsAttr = dropFirstElem(b, xferOp.getInBoundsAttr());
auto newXferOp = b.create<vector::TransferWriteOp>(		auto newXferOp = b.create<vector::TransferWriteOp>(
loc, sourceType, extracted, source, xferIndices,		loc, sourceType, extracted, source, xferIndices,
AffineMapAttr::get(unpackedPermutationMap(b, xferOp)), Value(),		AffineMapAttr::get(unpackedPermutationMap(b, xferOp)), Value(),
inBoundsAttr);		getXferInBoundsAttr(b, xferOp));

maybeAssignMask(b, xferOp, newXferOp, i);		maybeAssignMask(b, xferOp, newXferOp, i);

return isTensorOp(xferOp) ? newXferOp->getResult(0) : Value();		return isTensorOp(xferOp) ? newXferOp->getResult(0) : Value();
},		},
/outOfBoundsCase=/		/outOfBoundsCase=/
[&](OpBuilder &b, Location loc) {		[&](OpBuilder &b, Location loc) {
return isTensorOp(xferOp) ? source : Value();		return isTensorOp(xferOp) ? source : Value();
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines
///		///
/// TODO: In some cases (no masking, etc.), LLVM::MatrixColumnMajorLoadOp		/// TODO: In some cases (no masking, etc.), LLVM::MatrixColumnMajorLoadOp
/// can be generated instead of TransferOp1dConversion. Add such a pattern		/// can be generated instead of TransferOp1dConversion. Add such a pattern
/// to ConvertVectorToLLVM.		/// to ConvertVectorToLLVM.
///		///
/// E.g.:		/// E.g.:
/// ```		/// ```
/// vector.transfer_write %vec, %A[%a, %b]		/// vector.transfer_write %vec, %A[%a, %b]
/// {permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [true]}		/// {permutation_map = affine_map<(d0, d1) -> (d0)>,
		/// in_bounds = [true, true]}
/// : vector<9xf32>, memref<?x?xf32>		/// : vector<9xf32>, memref<?x?xf32>
/// ```		/// ```
/// Is rewritten to approximately the following pseudo-IR:		/// Is rewritten to approximately the following pseudo-IR:
/// ```		/// ```
/// for i = 0 to 9 {		/// for i = 0 to 9 {
/// %t = vector.extractelement %vec[i] : vector<9xf32>		/// %t = vector.extractelement %vec[i] : vector<9xf32>
/// memref.store %t, %arg0[%a + i, %b] : memref<?x?xf32>		/// memref.store %t, %arg0[%a + i, %b] : memref<?x?xf32>
/// }		/// }
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp

Show First 20 Lines • Show All 1,195 Lines • ▼ Show 20 Lines	auto permutationMap = makePermutationMap(state.builder.getInsertionBlock(),
indices, state.vecLoopToVecDim);		indices, state.vecLoopToVecDim);
if (!permutationMap) {		if (!permutationMap) {
LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ can't compute permutationMap\n");		LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ can't compute permutationMap\n");
return nullptr;		return nullptr;
}		}
LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ permutationMap: ");		LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ permutationMap: ");
LLVM_DEBUG(permutationMap.print(dbgs()));		LLVM_DEBUG(permutationMap.print(dbgs()));

		// Non-transfer dims are in-bounds.
		SmallVector<bool> inBounds(memRefType.getRank(), true);
		for (AffineExpr expr : permutationMap.getResults())
		if (auto dimExpr = expr.dyn_cast<AffineDimExpr>())
		inBounds[dimExpr.getPosition()] = false;
auto transfer = state.builder.create<vector::TransferReadOp>(		auto transfer = state.builder.create<vector::TransferReadOp>(
loadOp.getLoc(), vectorType, loadOp.getMemRef(), indices, permutationMap);		loadOp.getLoc(), vectorType, loadOp.getMemRef(), indices, permutationMap,
		inBounds);

// Register replacement for future uses in the scope.		// Register replacement for future uses in the scope.
state.registerOpVectorReplacement(loadOp, transfer);		state.registerOpVectorReplacement(loadOp, transfer);
return transfer;		return transfer;
}		}

/// Vectorizes an affine store with the vectorization strategy in 'state' by		/// Vectorizes an affine store with the vectorization strategy in 'state' by
/// generating a 'vector.transfer_write' op with the proper permutation map		/// generating a 'vector.transfer_write' op with the proper permutation map
Show All 25 Lines	static Operation *vectorizeAffineStore(AffineStoreOp storeOp,
// Compute permutation map using the information of new vector loops.		// Compute permutation map using the information of new vector loops.
auto permutationMap = makePermutationMap(state.builder.getInsertionBlock(),		auto permutationMap = makePermutationMap(state.builder.getInsertionBlock(),
indices, state.vecLoopToVecDim);		indices, state.vecLoopToVecDim);
if (!permutationMap)		if (!permutationMap)
return nullptr;		return nullptr;
LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ permutationMap: ");		LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ permutationMap: ");
LLVM_DEBUG(permutationMap.print(dbgs()));		LLVM_DEBUG(permutationMap.print(dbgs()));

		// Non-transfer dims are in-bounds.
		SmallVector<bool> inBounds(memRefType.getRank(), true);
		for (AffineExpr expr : permutationMap.getResults())
		if (auto dimExpr = expr.dyn_cast<AffineDimExpr>())
		inBounds[dimExpr.getPosition()] = false;
auto transfer = state.builder.create<vector::TransferWriteOp>(		auto transfer = state.builder.create<vector::TransferWriteOp>(
storeOp.getLoc(), vectorValue, storeOp.getMemRef(), indices,		storeOp.getLoc(), vectorValue, storeOp.getMemRef(), indices,
permutationMap);		permutationMap, inBounds);
LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ vectorized store: " << transfer);		LLVM_DEBUG(dbgs() << "\n[early-vect]+++++ vectorized store: " << transfer);

// Register replacement for future uses in the scope.		// Register replacement for future uses in the scope.
state.registerOpVectorReplacement(storeOp, transfer);		state.registerOpVectorReplacement(storeOp, transfer);
return transfer;		return transfer;
}		}

/// Returns true if `value` is a constant equal to the neutral element of the		/// Returns true if `value` is a constant equal to the neutral element of the
▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 634 Lines • ▼ Show 20 Lines	static Value buildVectorWrite(RewriterBase &rewriter, Value value,
}		}

write = state.maskOperation(rewriter, write, linalgOp, opOperandMap);		write = state.maskOperation(rewriter, write, linalgOp, opOperandMap);

// If masked, set in-bounds to true. Masking guarantees that the access will		// If masked, set in-bounds to true. Masking guarantees that the access will
// be in-bounds.		// be in-bounds.
if (auto maskOp = dyn_cast<vector::MaskingOpInterface>(write)) {		if (auto maskOp = dyn_cast<vector::MaskingOpInterface>(write)) {
auto maskedWriteOp = cast<vector::TransferWriteOp>(maskOp.getMaskableOp());		auto maskedWriteOp = cast<vector::TransferWriteOp>(maskOp.getMaskableOp());
SmallVector<bool> inBounds(maskedWriteOp.getVectorType().getRank(), true);		SmallVector<bool> inBounds(maskedWriteOp.getShapedType().getRank(), true);
maskedWriteOp.setInBoundsAttr(rewriter.getBoolArrayAttr(inBounds));		maskedWriteOp.setInBoundsAttr(rewriter.getBoolArrayAttr(inBounds));
}		}

LDBG("vectorized op: " << *write << "\n");		LDBG("vectorized op: " << *write << "\n");
if (!write->getResults().empty())		if (!write->getResults().empty())
return write->getResult(0);		return write->getResult(0);
return Value();		return Value();
}		}
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	auto indexAs1dVector = rewriter.create<vector::ShapeCastOp>(
bvm.lookup(extractOp.getIndices()[i]));		bvm.lookup(extractOp.getIndices()[i]));
transferReadIdxs.push_back(		transferReadIdxs.push_back(
rewriter.create<vector::ExtractElementOp>(loc, indexAs1dVector, zero));		rewriter.create<vector::ExtractElementOp>(loc, indexAs1dVector, zero));
}		}

// `tensor.extract_element` is always in-bounds, hence the following holds.		// `tensor.extract_element` is always in-bounds, hence the following holds.
auto dstRank = resultType.getRank();		auto dstRank = resultType.getRank();
auto srcRank = extractOp.getTensor().getType().getRank();		auto srcRank = extractOp.getTensor().getType().getRank();
SmallVector<bool> inBounds(dstRank, true);		SmallVector<bool> inBounds(srcRank, true);

// 2a. Handle scalar broadcast access.		// 2a. Handle scalar broadcast access.
if (memAccessKind == VectorMemoryAccessKind::ScalarBroadcast) {		if (memAccessKind == VectorMemoryAccessKind::ScalarBroadcast) {
MLIRContext *ctx = rewriter.getContext();		MLIRContext *ctx = rewriter.getContext();
SmallVector<AffineExpr> exprs(dstRank, getAffineConstantExpr(0, ctx));		SmallVector<AffineExpr> exprs(dstRank, getAffineConstantExpr(0, ctx));
auto permutationMap = AffineMap::get(srcRank, 0, exprs, ctx);		auto permutationMap = AffineMap::get(srcRank, 0, exprs, ctx);

auto transferReadOp = rewriter.create<vector::TransferReadOp>(		auto transferReadOp = rewriter.create<vector::TransferReadOp>(
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	if (linalgOp.isDpsInput(opOperand)) {
// vector shape to the output domain and back to the canonical domain.		// vector shape to the output domain and back to the canonical domain.
readMap = inversePermutation(reindexIndexingMap(indexingMap));		readMap = inversePermutation(reindexIndexingMap(indexingMap));
readType =		readType =
state.getCanonicalVecType(elemType, readMap.compose(indexingMap));		state.getCanonicalVecType(elemType, readMap.compose(indexingMap));
}		}

SmallVector<Value> indices(linalgOp.getShape(opOperand).size(), zero);		SmallVector<Value> indices(linalgOp.getShape(opOperand).size(), zero);

		// Non-transfer dims are in-bounds, all others are out-of-bounds.
		SmallVector<bool> inBounds(indices.size(), true);
		for (AffineExpr expr : readMap.getResults())
		if (auto dimExpr = expr.dyn_cast<AffineDimExpr>())
		inBounds[dimExpr.getPosition()] = false;

Operation *read = rewriter.create<vector::TransferReadOp>(		Operation *read = rewriter.create<vector::TransferReadOp>(
loc, readType, opOperand->get(), indices, readMap);		loc, readType, opOperand->get(), indices, readMap, inBounds);
read = state.maskOperation(rewriter, read, linalgOp, maskingMap);		read = state.maskOperation(rewriter, read, linalgOp, maskingMap);
Value readValue = read->getResult(0);		Value readValue = read->getResult(0);

// 3.b. If masked, set in-bounds to true. Masking guarantees that the access		// 3.b. If masked, set in-bounds to true. Masking guarantees that the access
// will be in-bounds.		// will be in-bounds.
if (auto maskOp = dyn_cast<vector::MaskingOpInterface>(read)) {		if (auto maskOp = dyn_cast<vector::MaskingOpInterface>(read)) {
SmallVector<bool> inBounds(readType.getRank(), true);		SmallVector<bool> inBounds(indices.size(), true);
cast<vector::TransferReadOp>(maskOp.getMaskableOp())		cast<vector::TransferReadOp>(maskOp.getMaskableOp())
.setInBoundsAttr(rewriter.getBoolArrayAttr(inBounds));		.setInBoundsAttr(rewriter.getBoolArrayAttr(inBounds));
}		}

// 3.c. Not all ops support 0-d vectors, extract the scalar for now.		// 3.c. Not all ops support 0-d vectors, extract the scalar for now.
// TODO: remove this.		// TODO: remove this.
if (readType.getRank() == 0)		if (readType.getRank() == 0)
readValue = rewriter.create<vector::ExtractElementOp>(loc, readValue);		readValue = rewriter.create<vector::ExtractElementOp>(loc, readValue);
▲ Show 20 Lines • Show All 565 Lines • ▼ Show 20 Lines
/// ```		/// ```
/// %0 = tensor.pad %src ... : tensor<?x?xf32> to tensor<17x5xf32>		/// %0 = tensor.pad %src ... : tensor<?x?xf32> to tensor<17x5xf32>
/// %r = vector.transfer_read %0[%c0, %c0], %cst		/// %r = vector.transfer_read %0[%c0, %c0], %cst
/// {in_bounds = [true, true]} : tensor<17x5xf32>, vector<17x5xf32>		/// {in_bounds = [true, true]} : tensor<17x5xf32>, vector<17x5xf32>
/// ```		/// ```
/// is rewritten to:		/// is rewritten to:
/// ```		/// ```
/// %r = vector.transfer_read %src[%c0, %c0], %padding		/// %r = vector.transfer_read %src[%c0, %c0], %padding
/// {in_bounds = [true, true]}		/// {in_bounds = [false, false]}
/// : tensor<?x?xf32>, vector<17x5xf32>		/// : tensor<?x?xf32>, vector<17x5xf32>
/// ```		/// ```
/// Note: By restricting this pattern to in-bounds TransferReadOps, we can be		/// Note: By restricting this pattern to in-bounds TransferReadOps, we can be
/// sure that the original padding value %cst was never used.		/// sure that the original padding value %cst was never used.
///		///
/// This rewrite is possible if:		/// This rewrite is possible if:
/// - `xferOp` has no out-of-bounds dims or mask.		/// - `xferOp` has no out-of-bounds dims or mask.
/// - Low padding is static 0.		/// - Low padding is static 0.
Show All 12 Lines	LogicalResult rewriteUser(PatternRewriter &rewriter, tensor::PadOp padOp,
auto padValue = padOp.getConstantPaddingValue();		auto padValue = padOp.getConstantPaddingValue();
if (!padValue)		if (!padValue)
return failure();		return failure();
// Padding value of existing `xferOp` is unused.		// Padding value of existing `xferOp` is unused.
if (xferOp.hasOutOfBoundsDim() \|\| xferOp.getMask())		if (xferOp.hasOutOfBoundsDim() \|\| xferOp.getMask())
return failure();		return failure();

rewriter.updateRootInPlace(xferOp, [&]() {		rewriter.updateRootInPlace(xferOp, [&]() {
SmallVector<bool> inBounds(xferOp.getVectorType().getRank(), false);		SmallVector<bool> inBounds(xferOp.getShapedType().getRank(), false);
xferOp->setAttr(xferOp.getInBoundsAttrName(),		xferOp->setAttr(xferOp.getInBoundsAttrName(),
rewriter.getBoolArrayAttr(inBounds));		rewriter.getBoolArrayAttr(inBounds));
xferOp.getSourceMutable().assign(padOp.getSource());		xferOp.getSourceMutable().assign(padOp.getSource());
xferOp.getPaddingMutable().assign(padValue);		xferOp.getPaddingMutable().assign(padValue);
});		});

return success();		return success();
}		}
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	if (!trimPadding.hasZeroOffset())
return failure();		return failure();
// trimPadding must remove the amount of padding that was added earlier.		// trimPadding must remove the amount of padding that was added earlier.
if (!hasSameTensorSize(padOp.getSource(), trimPadding))		if (!hasSameTensorSize(padOp.getSource(), trimPadding))
return failure();		return failure();

// Insert the new TransferWriteOp at position of the old TransferWriteOp.		// Insert the new TransferWriteOp at position of the old TransferWriteOp.
rewriter.setInsertionPoint(xferOp);		rewriter.setInsertionPoint(xferOp);

SmallVector<bool> inBounds(xferOp.getVectorType().getRank(), false);		SmallVector<bool> inBounds(xferOp.getShapedType().getRank(), false);
auto newXferOp = rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(		auto newXferOp = rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
xferOp, padOp.getSource().getType(), xferOp.getVector(),		xferOp, padOp.getSource().getType(), xferOp.getVector(),
padOp.getSource(), xferOp.getIndices(), xferOp.getPermutationMapAttr(),		padOp.getSource(), xferOp.getIndices(), xferOp.getPermutationMapAttr(),
xferOp.getMask(), rewriter.getBoolArrayAttr(inBounds));		xferOp.getMask(), rewriter.getBoolArrayAttr(inBounds));
rewriter.replaceOp(trimPadding, newXferOp->getResult(0));		rewriter.replaceOp(trimPadding, newXferOp->getResult(0));

return success();		return success();
}		}
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
/// into %dest[%a, %b, 0, 0] [1, 1, 17, 5] [1, 1, 1, 1]		/// into %dest[%a, %b, 0, 0] [1, 1, 17, 5] [1, 1, 1, 1]
/// : tensor<17x5xf32> into tensor<?x?x17x5xf32>		/// : tensor<17x5xf32> into tensor<?x?x17x5xf32>
/// ```		/// ```
/// is rewritten to:		/// is rewritten to:
/// ```		/// ```
/// %0 = vector.transfer_read %src[%c0, %c0], %padding		/// %0 = vector.transfer_read %src[%c0, %c0], %padding
/// : tensor<?x?xf32>, vector<17x5xf32>		/// : tensor<?x?xf32>, vector<17x5xf32>
/// %r = vector.transfer_write %0, %dest[%a, %b, %c0, %c0]		/// %r = vector.transfer_write %0, %dest[%a, %b, %c0, %c0]
/// {in_bounds = [true, true]} : vector<17x5xf32>, tensor<?x?x17x5xf32>		/// {in_bounds = [true, true, true, true]} : vector<17x5xf32>,
		/// tensor<?x?x17x5xf32>
/// ```		/// ```
///		///
/// This rewrite is possible if:		/// This rewrite is possible if:
/// - Low padding is static 0.		/// - Low padding is static 0.
/// - `padOp` result shape is static.		/// - `padOp` result shape is static.
/// - The entire padded tensor is inserted.		/// - The entire padded tensor is inserted.
/// (Implies that sizes of `insertOp` are all static.)		/// (Implies that sizes of `insertOp` are all static.)
/// - Only unit strides in `insertOp`.		/// - Only unit strides in `insertOp`.
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	LogicalResult rewriteUser(PatternRewriter &rewriter, tensor::PadOp padOp,
auto read = rewriter.create<vector::TransferReadOp>(		auto read = rewriter.create<vector::TransferReadOp>(
padOp.getLoc(), vecType, padOp.getSource(), readIndices, padValue);		padOp.getLoc(), vecType, padOp.getSource(), readIndices, padValue);

// Generate TransferWriteOp: Write to InsertSliceOp's dest tensor at		// Generate TransferWriteOp: Write to InsertSliceOp's dest tensor at
// specified offsets. Write is fully in-bounds because a InsertSliceOp's		// specified offsets. Write is fully in-bounds because a InsertSliceOp's
// source must fit into the destination at the specified offsets.		// source must fit into the destination at the specified offsets.
auto writeIndices =		auto writeIndices =
ofrToIndexValues(rewriter, padOp.getLoc(), insertOp.getMixedOffsets());		ofrToIndexValues(rewriter, padOp.getLoc(), insertOp.getMixedOffsets());
SmallVector<bool> inBounds(vecRank, true);		SmallVector<bool> inBounds(tensorRank, true);
rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(		rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
insertOp, read, insertOp.getDest(), writeIndices,		insertOp, read, insertOp.getDest(), writeIndices,
ArrayRef<bool>{inBounds});		ArrayRef<bool>{inBounds});

return success();		return success();
}		}
};		};

▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	LogicalResult LinalgCopyVTRForwardingPattern::matchAndRewrite(

// `in` is the subview that memref.copy reads. Replace it.		// `in` is the subview that memref.copy reads. Replace it.
Value in = copyOp.getSource();		Value in = copyOp.getSource();

// memref.copy + linalg.fill can be used to create a padded local buffer.		// memref.copy + linalg.fill can be used to create a padded local buffer.
// The `masked` attribute is only valid on this padded buffer.		// The `masked` attribute is only valid on this padded buffer.
// When forwarding to vector.transfer_read, the attribute must be reset		// When forwarding to vector.transfer_read, the attribute must be reset
// conservatively.		// conservatively.

		// in_bounds is explicitly reset. Non-transfer dims are in-bounds, all others
		// are out-of-bounds.
		SmallVector<bool> inBounds(xferOp.getIndices().size(), true);
		for (AffineExpr expr : xferOp.getPermutationMap().getResults())
		if (auto dimExpr = expr.dyn_cast<AffineDimExpr>())
		inBounds[dimExpr.getPosition()] = false;
Value res = rewriter.create<vector::TransferReadOp>(		Value res = rewriter.create<vector::TransferReadOp>(
xferOp.getLoc(), xferOp.getVectorType(), in, xferOp.getIndices(),		xferOp.getLoc(), xferOp.getVectorType(), in, xferOp.getIndices(),
xferOp.getPermutationMapAttr(), xferOp.getPadding(), xferOp.getMask(),		xferOp.getPermutationMapAttr(), xferOp.getPadding(), xferOp.getMask(),
// in_bounds is explicitly reset		rewriter.getBoolArrayAttr(inBounds));
/inBoundsAttr=/ArrayAttr());

if (maybeFillOp)		if (maybeFillOp)
rewriter.eraseOp(maybeFillOp);		rewriter.eraseOp(maybeFillOp);
rewriter.eraseOp(copyOp);		rewriter.eraseOp(copyOp);
rewriter.replaceOp(xferOp, res);		rewriter.replaceOp(xferOp, res);

return success();		return success();
}		}
Show All 37 Lines	LogicalResult LinalgCopyVTWForwardingPattern::matchAndRewrite(
assert(isa<MemRefType>(copyOp.getTarget().getType()));		assert(isa<MemRefType>(copyOp.getTarget().getType()));
Value out = copyOp.getTarget();		Value out = copyOp.getTarget();

// Forward vector.transfer into copy.		// Forward vector.transfer into copy.
// memref.copy + linalg.fill can be used to create a padded local buffer.		// memref.copy + linalg.fill can be used to create a padded local buffer.
// The `masked` attribute is only valid on this padded buffer.		// The `masked` attribute is only valid on this padded buffer.
// When forwarding to vector.transfer_write, the attribute must be reset		// When forwarding to vector.transfer_write, the attribute must be reset
// conservatively.		// conservatively.

		// in_bounds is explicitly reset. Non-transfer dims are in-bounds, all others
		// are out-of-bounds.
		SmallVector<bool> inBounds(xferOp.getIndices().size(), true);
		for (AffineExpr expr : xferOp.getPermutationMap().getResults())
		if (auto dimExpr = expr.dyn_cast<AffineDimExpr>())
		inBounds[dimExpr.getPosition()] = false;
rewriter.create<vector::TransferWriteOp>(		rewriter.create<vector::TransferWriteOp>(
xferOp.getLoc(), xferOp.getVector(), out, xferOp.getIndices(),		xferOp.getLoc(), xferOp.getVector(), out, xferOp.getIndices(),
xferOp.getPermutationMapAttr(), xferOp.getMask(),		xferOp.getPermutationMapAttr(), xferOp.getMask(),
// in_bounds is explicitly reset		rewriter.getBoolArrayAttr(inBounds));
/inBoundsAttr=/ArrayAttr());

rewriter.eraseOp(copyOp);		rewriter.eraseOp(copyOp);
rewriter.eraseOp(xferOp);		rewriter.eraseOp(xferOp);

return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 778 Lines • Show Last 20 Lines

mlir/lib/Dialect/MemRef/Transforms/FoldMemRefAliasOps.cpp

Show All 39 Lines
} // namespace mlir		} // namespace mlir

using namespace mlir;		using namespace mlir;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Utility functions		// Utility functions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		/// Compute the new in_bounds attribute of a vector transfer op when folding a
		/// potentially rank-reducing alias op into a vector transfer op.
		template <typename OpTy>
		static ArrayAttr
		expandInBoundsToRank(Builder &b, OpTy xferOp, int64_t rank,
		const llvm::SmallBitVector &projectedDimensions) {
		SmallVector<bool> inBounds = xferOp.getInBoundsValues();
		SmallVector<bool> expandedInBounds;
		int64_t idx = 0;
		for (int64_t i = 0; i < rank; ++i) {
		if (projectedDimensions.test(i)) {
		// This dimension was rank-reduced. It must be in-bounds after folding.
		expandedInBounds.push_back(true);
		continue;
		}

		// Not a rank-reduced dim: take in_bounds from the xfer op.
		assert(idx < inBounds.size() && "invalid rank");
		expandedInBounds.push_back(inBounds[idx++]);
		}
		assert(idx == inBounds.size() && "invalid rank");
		return b.getBoolArrayAttr(expandedInBounds);
		}

/// Given the 'indices' of a load/store operation where the memref is a result		/// Given the 'indices' of a load/store operation where the memref is a result
/// of a expand_shape op, returns the indices w.r.t to the source memref of the		/// of a expand_shape op, returns the indices w.r.t to the source memref of the
/// expand_shape op. For example		/// expand_shape op. For example
///		///
/// %0 = ... : memref<12x42xf32>		/// %0 = ... : memref<12x42xf32>
/// %1 = memref.expand_shape %0 [[0, 1], [2]]		/// %1 = memref.expand_shape %0 [[0, 1], [2]]
/// : memref<12x42xf32> into memref<2x6x42xf32>		/// : memref<12x42xf32> into memref<2x6x42xf32>
/// %2 = load %1[%i1, %i2, %i3] : memref<2x6x42xf32		/// %2 = load %1[%i1, %i2, %i3] : memref<2x6x42xf32
▲ Show 20 Lines • Show All 343 Lines • ▼ Show 20 Lines	llvm::TypeSwitch<Operation *, void>(loadOp)
rewriter.replaceOpWithNewOp<memref::LoadOp>(		rewriter.replaceOpWithNewOp<memref::LoadOp>(
loadOp, subViewOp.getSource(), sourceIndices, op.getNontemporal());		loadOp, subViewOp.getSource(), sourceIndices, op.getNontemporal());
})		})
.Case([&](vector::LoadOp op) {		.Case([&](vector::LoadOp op) {
rewriter.replaceOpWithNewOp<vector::LoadOp>(		rewriter.replaceOpWithNewOp<vector::LoadOp>(
op, op.getType(), subViewOp.getSource(), sourceIndices);		op, op.getType(), subViewOp.getSource(), sourceIndices);
})		})
.Case([&](vector::TransferReadOp op) {		.Case([&](vector::TransferReadOp op) {
		int64_t rank = subViewOp.getSourceType().getRank();
rewriter.replaceOpWithNewOp<vector::TransferReadOp>(		rewriter.replaceOpWithNewOp<vector::TransferReadOp>(
op, op.getVectorType(), subViewOp.getSource(), sourceIndices,		op, op.getVectorType(), subViewOp.getSource(), sourceIndices,
AffineMapAttr::get(expandDimsToRank(		AffineMapAttr::get(expandDimsToRank(op.getPermutationMap(), rank,
op.getPermutationMap(), subViewOp.getSourceType().getRank(),
subViewOp.getDroppedDims())),		subViewOp.getDroppedDims())),
op.getPadding(), /mask=/Value(), op.getInBoundsAttr());		op.getPadding(), /mask=/Value(),
		expandInBoundsToRank(rewriter, op, rank,
		subViewOp.getDroppedDims()));
})		})
.Case([&](gpu::SubgroupMmaLoadMatrixOp op) {		.Case([&](gpu::SubgroupMmaLoadMatrixOp op) {
rewriter.replaceOpWithNewOp<gpu::SubgroupMmaLoadMatrixOp>(		rewriter.replaceOpWithNewOp<gpu::SubgroupMmaLoadMatrixOp>(
op, op.getType(), subViewOp.getSource(), sourceIndices,		op, op.getType(), subViewOp.getSource(), sourceIndices,
op.getLeadDimension(), op.getTransposeAttr());		op.getLeadDimension(), op.getTransposeAttr());
})		})
.Case([&](nvgpu::LdMatrixOp op) {		.Case([&](nvgpu::LdMatrixOp op) {
rewriter.replaceOpWithNewOp<nvgpu::LdMatrixOp>(		rewriter.replaceOpWithNewOp<nvgpu::LdMatrixOp>(
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	llvm::TypeSwitch<Operation *, void>(storeOp)
op, op.getValue(), subViewOp.getSource(), sourceIndices);		op, op.getValue(), subViewOp.getSource(), sourceIndices);
})		})
.Case([&](memref::StoreOp op) {		.Case([&](memref::StoreOp op) {
rewriter.replaceOpWithNewOp<memref::StoreOp>(		rewriter.replaceOpWithNewOp<memref::StoreOp>(
op, op.getValue(), subViewOp.getSource(), sourceIndices,		op, op.getValue(), subViewOp.getSource(), sourceIndices,
op.getNontemporal());		op.getNontemporal());
})		})
.Case([&](vector::TransferWriteOp op) {		.Case([&](vector::TransferWriteOp op) {
		int64_t rank = subViewOp.getSourceType().getRank();
rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(		rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
op, op.getValue(), subViewOp.getSource(), sourceIndices,		op, op.getValue(), subViewOp.getSource(), sourceIndices,
AffineMapAttr::get(expandDimsToRank(		AffineMapAttr::get(expandDimsToRank(op.getPermutationMap(), rank,
op.getPermutationMap(), subViewOp.getSourceType().getRank(),
subViewOp.getDroppedDims())),		subViewOp.getDroppedDims())),
op.getInBoundsAttr());		expandInBoundsToRank(rewriter, op, rank,
		subViewOp.getDroppedDims()));
})		})
.Case([&](gpu::SubgroupMmaStoreMatrixOp op) {		.Case([&](gpu::SubgroupMmaStoreMatrixOp op) {
rewriter.replaceOpWithNewOp<gpu::SubgroupMmaStoreMatrixOp>(		rewriter.replaceOpWithNewOp<gpu::SubgroupMmaStoreMatrixOp>(
op, op.getSrc(), subViewOp.getSource(), sourceIndices,		op, op.getSrc(), subViewOp.getSource(), sourceIndices,
op.getLeadDimension(), op.getTransposeAttr());		op.getLeadDimension(), op.getTransposeAttr());
})		})
.Default([](Operation *) { llvm_unreachable("unexpected operation."); });		.Default([](Operation *) { llvm_unreachable("unexpected operation."); });
return success();		return success();
▲ Show 20 Lines • Show All 168 Lines • Show Last 20 Lines

mlir/lib/Dialect/Tensor/Transforms/FoldTensorSubsetOps.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	if (!extractOrInsertSliceOp.hasUnitStride()) {
return rewriter.notifyMatchFailure(		return rewriter.notifyMatchFailure(
xferOp, "non-1 stride insert/extract, requires keeping track of "		xferOp, "non-1 stride insert/extract, requires keeping track of "
"strides, this may result in needing to insert "		"strides, this may result in needing to insert "
"vector.insert_strided_slice/extract_strided_slice ops");		"vector.insert_strided_slice/extract_strided_slice ops");
}		}
return success();		return success();
}		}

		/// Compute the new in_bounds attribute of a vector transfer op when folding a
		/// potentially rank-reducing subset op into a vector transfer op.
		template <typename OpTy>
		static ArrayAttr
		expandInBoundsToRank(Builder &b, OpTy xferOp, int64_t rank,
		const llvm::SmallBitVector &projectedDimensions) {
		SmallVector<bool> inBounds = xferOp.getInBoundsValues();
		SmallVector<bool> expandedInBounds;
		int64_t idx = 0;
		for (int64_t i = 0; i < rank; ++i) {
		if (projectedDimensions.test(i)) {
		// This dimension was rank-reduced. It must be in-bounds after folding.
		expandedInBounds.push_back(true);
		continue;
		}

		// Not a rank-reduced dim: take in_bounds from the xfer op.
		assert(idx < inBounds.size() && "invalid rank");
		expandedInBounds.push_back(inBounds[idx++]);
		}
		assert(idx == inBounds.size() && "invalid rank");
		return b.getBoolArrayAttr(expandedInBounds);
		}

LogicalResult TransferReadOfExtractSliceOpFolder::matchAndRewrite(		LogicalResult TransferReadOfExtractSliceOpFolder::matchAndRewrite(
vector::TransferReadOp readOp, PatternRewriter &rewriter) const {		vector::TransferReadOp readOp, PatternRewriter &rewriter) const {
auto extractSliceOp =		auto extractSliceOp =
getTensorOperand(readOp).getDefiningOp<tensor::ExtractSliceOp>();		getTensorOperand(readOp).getDefiningOp<tensor::ExtractSliceOp>();
if (!extractSliceOp)		if (!extractSliceOp)
return rewriter.notifyMatchFailure(readOp, "not an extract_slice");		return rewriter.notifyMatchFailure(readOp, "not an extract_slice");

LogicalResult preconditionResult =		LogicalResult preconditionResult =
preconditionsFoldExtractOrInsertWithTransferOp(rewriter, readOp,		preconditionsFoldExtractOrInsertWithTransferOp(rewriter, readOp,
extractSliceOp);		extractSliceOp);
if (failed(preconditionResult))		if (failed(preconditionResult))
return preconditionResult;		return preconditionResult;

SmallVector<Value> indices(readOp.getIndices().begin(),		SmallVector<Value> indices(readOp.getIndices().begin(),
readOp.getIndices().end());		readOp.getIndices().end());
SmallVector<Value> sourceIndices;		SmallVector<Value> sourceIndices;
affine::resolveIndicesIntoOpWithOffsetsAndStrides(		affine::resolveIndicesIntoOpWithOffsetsAndStrides(
rewriter, readOp.getLoc(), extractSliceOp.getMixedOffsets(),		rewriter, readOp.getLoc(), extractSliceOp.getMixedOffsets(),
extractSliceOp.getMixedStrides(), extractSliceOp.getDroppedDims(),		extractSliceOp.getMixedStrides(), extractSliceOp.getDroppedDims(),
indices, sourceIndices);		indices, sourceIndices);

		int64_t expandedRank = extractSliceOp.getSourceType().getRank();
rewriter.replaceOpWithNewOp<vector::TransferReadOp>(		rewriter.replaceOpWithNewOp<vector::TransferReadOp>(
readOp, readOp.getVectorType(), extractSliceOp.getSource(), sourceIndices,		readOp, readOp.getVectorType(), extractSliceOp.getSource(), sourceIndices,
AffineMapAttr::get(expandDimsToRank(		AffineMapAttr::get(expandDimsToRank(readOp.getPermutationMap(),
readOp.getPermutationMap(), extractSliceOp.getSourceType().getRank(),		expandedRank,
extractSliceOp.getDroppedDims())),		extractSliceOp.getDroppedDims())),
readOp.getPadding(),		readOp.getPadding(),
/mask=/Value(), readOp.getInBoundsAttr());		/mask=/Value(),
		expandInBoundsToRank(rewriter, readOp, expandedRank,
		extractSliceOp.getDroppedDims()));

return success();		return success();
}		}

LogicalResult InsertSliceOfTransferWriteOpFolder::matchAndRewrite(		LogicalResult InsertSliceOfTransferWriteOpFolder::matchAndRewrite(
tensor::InsertSliceOp insertSliceOp, PatternRewriter &rewriter) const {		tensor::InsertSliceOp insertSliceOp, PatternRewriter &rewriter) const {
auto writeOp = getTensorOperand(insertSliceOp)		auto writeOp = getTensorOperand(insertSliceOp)
.template getDefiningOp<vector::TransferWriteOp>();		.template getDefiningOp<vector::TransferWriteOp>();
Show All 9 Lines	LogicalResult InsertSliceOfTransferWriteOpFolder::matchAndRewrite(
SmallVector<Value> indices(writeOp.getIndices().begin(),		SmallVector<Value> indices(writeOp.getIndices().begin(),
writeOp.getIndices().end());		writeOp.getIndices().end());
SmallVector<Value> sourceIndices;		SmallVector<Value> sourceIndices;
affine::resolveIndicesIntoOpWithOffsetsAndStrides(		affine::resolveIndicesIntoOpWithOffsetsAndStrides(
rewriter, writeOp.getLoc(), insertSliceOp.getMixedOffsets(),		rewriter, writeOp.getLoc(), insertSliceOp.getMixedOffsets(),
insertSliceOp.getMixedStrides(), insertSliceOp.getDroppedDims(), indices,		insertSliceOp.getMixedStrides(), insertSliceOp.getDroppedDims(), indices,
sourceIndices);		sourceIndices);

		int64_t expandedRank = insertSliceOp.getDestType().getRank();
rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(		rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
insertSliceOp, writeOp.getValue(), insertSliceOp.getDest(), sourceIndices,		insertSliceOp, writeOp.getValue(), insertSliceOp.getDest(), sourceIndices,
AffineMapAttr::get(expandDimsToRank(writeOp.getPermutationMap(),		AffineMapAttr::get(expandDimsToRank(writeOp.getPermutationMap(),
insertSliceOp.getDestType().getRank(),		expandedRank,
insertSliceOp.getDroppedDims())),		insertSliceOp.getDroppedDims())),
writeOp.getInBoundsAttr());		expandInBoundsToRank(rewriter, writeOp, expandedRank,
		insertSliceOp.getDroppedDims()));

return success();		return success();
}		}

template <typename OpTy>		template <typename OpTy>
struct InsertSliceOfInsertSliceFolder : public OpRewritePattern<OpTy> {		struct InsertSliceOfInsertSliceFolder : public OpRewritePattern<OpTy> {
using OpRewritePattern<OpTy>::OpRewritePattern;		using OpRewritePattern<OpTy>::OpRewritePattern;

▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

Show First 20 Lines • Show All 3,458 Lines • ▼ Show 20 Lines	return op->emitOpError("requires a permutation_map with input dims of the "
"same rank as the source type");		"same rank as the source type");

if (maskType && maskType != inferredMaskType)		if (maskType && maskType != inferredMaskType)
return op->emitOpError("inferred mask type (")		return op->emitOpError("inferred mask type (")
<< inferredMaskType << ") and mask operand type (" << maskType		<< inferredMaskType << ") and mask operand type (" << maskType
<< ") don't match";		<< ") don't match";

if (inBounds) {		if (inBounds) {
if (permutationMap.getNumResults() != static_cast<int64_t>(inBounds.size()))		if (shapedType.getRank() != static_cast<int64_t>(inBounds.size()))
return op->emitOpError("expects the optional in_bounds attr of same rank "		return op->emitOpError("expects the optional in_bounds attr of same rank "
"as permutation_map results: ")		"as the source type: ")
<< AffineMapAttr::get(permutationMap)		<< shapedType.getRank()
<< " vs inBounds of size: " << inBounds.size();		<< " vs inBounds of size: " << inBounds.size();
for (unsigned int i = 0; i < permutationMap.getNumResults(); ++i)		}
if (permutationMap.getResult(i).isa<AffineConstantExpr>() &&
!llvm::cast<BoolAttr>(inBounds.getValue()[i]).getValue())		// Make sure that all non-transfer dimensions are in-bounds.
return op->emitOpError("requires broadcast dimensions to be in-bounds");		SmallVector<bool> inBoundsVals(op.getShapedType().getRank(), false);
		if (inBounds)
		inBoundsVals = extractFromIntegerArrayAttr<bool>(inBounds);
		DenseSet<int64_t> xferDims;
		for (AffineExpr expr : permutationMap.getResults()) {
		if (auto dimExpr = expr.template dyn_cast<AffineDimExpr>())
		xferDims.insert(dimExpr.getPosition());
		}
		for (int64_t i = 0, e = op.getShapedType().getRank(); i < e; ++i)
		if (!xferDims.contains(i) && !op.isDimInBounds(i)) {
		return op->emitOpError(
		"expects that all non-transfer dims are in-bounds");
}		}

return success();		return success();
}		}

static void printTransferAttrs(OpAsmPrinter &p, VectorTransferOpInterface op) {		static void printTransferAttrs(OpAsmPrinter &p, VectorTransferOpInterface op) {
SmallVector<StringRef, 3> elidedAttrs;		SmallVector<StringRef, 3> elidedAttrs;
elidedAttrs.push_back(TransferReadOp::getOperandSegmentSizeAttr());		elidedAttrs.push_back(TransferReadOp::getOperandSegmentSizeAttr());
if (op.getPermutationMap().isMinorIdentity())		if (op.getPermutationMap().isMinorIdentity())
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	static bool isInBounds(TransferOp op, int64_t resultIdx, int64_t indicesIdx) {
int64_t sourceSize = op.getShapedType().getDimSize(indicesIdx);		int64_t sourceSize = op.getShapedType().getDimSize(indicesIdx);
int64_t vectorSize = op.getVectorType().getDimSize(resultIdx);		int64_t vectorSize = op.getVectorType().getDimSize(resultIdx);

return cstOp.value() + vectorSize <= sourceSize;		return cstOp.value() + vectorSize <= sourceSize;
}		}

template <typename TransferOp>		template <typename TransferOp>
static LogicalResult foldTransferInBoundsAttribute(TransferOp op) {		static LogicalResult foldTransferInBoundsAttribute(TransferOp op) {
// TODO: support 0-d corner case.		SmallVector<int64_t> accessedChunk = op.getTransferChunkAccessed();
// TODO: Be less conservative.
if (op.getTransferRank() == 0)		// Prepare new in_bounds values.
return failure();
AffineMap permutationMap = op.getPermutationMap();
bool changed = false;		bool changed = false;
SmallVector<bool, 4> newInBounds;		SmallVector<bool> newInBounds = op.getInBoundsValues();
newInBounds.reserve(op.getTransferRank());
for (unsigned i = 0; i < op.getTransferRank(); ++i) {		for (unsigned i = 0; i < op.getShapedType().getRank(); ++i) {
// Already marked as in-bounds, nothing to see here.		// Already marked as in-bounds, nothing to see here.
if (op.isDimInBounds(i)) {		if (newInBounds[i])
newInBounds.push_back(true);
continue;		continue;
}
// Currently out-of-bounds, check whether we can statically determine it is		// Cannot infer in_bounds for dynamic dimensions.
// inBounds.		if (op.getShapedType().isDynamicDim(i))
auto dimExpr = permutationMap.getResult(i).dyn_cast<AffineDimExpr>();		continue;
assert(dimExpr && "Broadcast dims must be in-bounds");		int64_t sourceSize = op.getShapedType().getDimSize(i);
auto inBounds =
isInBounds(op, /resultIdx=/i, /indicesIdx=/dimExpr.getPosition());		// Cannot infer in_bounds for non-constant indices.
newInBounds.push_back(inBounds);		Value index = op.getIndices()[i];
// We commit the pattern if it is "more inbounds".		std::optional<int64_t> constantOffset = getConstantIntValue(index);
		if (!constantOffset.has_value())
		continue;

		// If the dimension is not part of the transfer shape, it's like a "1"
		// dimensions that is casted away.
		int64_t vectorSize = accessedChunk[i] == 0 ? 1 : accessedChunk[i];

		bool inBounds = constantOffset.value() + vectorSize <= sourceSize;
		newInBounds[i] = inBounds;
changed \|= inBounds;		changed \|= inBounds;
}		}

if (!changed)		if (!changed)
return failure();		return failure();

// OpBuilder is only used as a helper to build an I64ArrayAttr.		// OpBuilder is only used as a helper to build an I64ArrayAttr.
OpBuilder b(op.getContext());		OpBuilder b(op.getContext());
op->setAttr(TransferOp::getInBoundsAttrStrName(),		op->setAttr(TransferOp::getInBoundsAttrStrName(),
b.getBoolArrayAttr(newInBounds));		b.getBoolArrayAttr(newInBounds));
return success();		return success();
}		}

/// ```		/// ```
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
/// Store to load forwarding for transfer operations with permuation maps.		/// Store to load forwarding for transfer operations with permuation maps.
/// Even if the permutation maps are different we can still propagate the store		/// Even if the permutation maps are different we can still propagate the store
/// into the load if the size of the dimensions read and written match. Then we		/// into the load if the size of the dimensions read and written match. Then we
/// can replace the transfer_read + transfer_write by vector.broadcast and		/// can replace the transfer_read + transfer_write by vector.broadcast and
/// vector.transpose.		/// vector.transpose.
/// Example:		/// Example:
/// ```		/// ```
/// %w0 = vector.transfer_write %v0, %arg0[%c0, %c0, %c0]		/// %w0 = vector.transfer_write %v0, %arg0[%c0, %c0, %c0]
/// {in_bounds = [true, true],		/// {in_bounds = [true, true, true],
/// permutation_map = affine_map<(d0, d1, d2) -> (d2, d1)>} :		/// permutation_map = affine_map<(d0, d1, d2) -> (d2, d1)>} :
/// vector<4x1xf32>, tensor<4x4x4xf32>		/// vector<4x1xf32>, tensor<4x4x4xf32>
/// %r = vector.transfer_read %w0[%c0, %c0, %c0], %cf0		/// %r = vector.transfer_read %w0[%c0, %c0, %c0], %cf0
/// {in_bounds = [true, true, true, true],		/// {in_bounds = [true, true, true],
/// permutation_map = affine_map<(d0, d1, d2) -> (d1, 0, d2, 0)>} :		/// permutation_map = affine_map<(d0, d1, d2) -> (d1, 0, d2, 0)>} :
/// tensor<4x4x4xf32>, vector<1x100x4x5xf32>		/// tensor<4x4x4xf32>, vector<1x100x4x5xf32>
/// ```		/// ```
/// To:		/// To:
/// ```		/// ```
/// %0 = vector.broadcast %arg1 : vector<4x1xf32> to vector<100x5x4x1xf32>		/// %0 = vector.broadcast %arg1 : vector<4x1xf32> to vector<100x5x4x1xf32>
/// %r = vector.transpose %0, [3, 0, 2, 1] :		/// %r = vector.transpose %0, [3, 0, 2, 1] :
/// vector<100x5x4x1xf32> to vector<1x100x4x5xf32>		/// vector<100x5x4x1xf32> to vector<1x100x4x5xf32>
▲ Show 20 Lines • Show All 2,170 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/Transforms/LowerVectorTransfer.cpp

Show All 14 Lines
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"		#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"
#include "mlir/Interfaces/VectorInterfaces.h"		#include "mlir/Interfaces/VectorInterfaces.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::vector;		using namespace mlir::vector;

/// Transpose a vector transfer op's `in_bounds` attribute by applying reverse
/// permutation based on the given indices.
static ArrayAttr
inverseTransposeInBoundsAttr(OpBuilder &builder, ArrayAttr attr,
const SmallVector<unsigned> &permutation) {
SmallVector<bool> newInBoundsValues(permutation.size());
size_t index = 0;
for (unsigned pos : permutation)
newInBoundsValues[pos] =
cast<BoolAttr>(attr.getValue()[index++]).getValue();
return builder.getBoolArrayAttr(newInBoundsValues);
}

/// Extend the rank of a vector Value by `addedRanks` by adding outer unit		/// Extend the rank of a vector Value by `addedRanks` by adding outer unit
/// dimensions.		/// dimensions.
static Value extendVectorRank(OpBuilder &builder, Location loc, Value vec,		static Value extendVectorRank(OpBuilder &builder, Location loc, Value vec,
int64_t addedRank) {		int64_t addedRank) {
auto originalVecType = cast<VectorType>(vec.getType());		auto originalVecType = cast<VectorType>(vec.getType());
SmallVector<int64_t> newShape(addedRank, 1);		SmallVector<int64_t> newShape(addedRank, 1);
newShape.append(originalVecType.getShape().begin(),		newShape.append(originalVecType.getShape().begin(),
originalVecType.getShape().end());		originalVecType.getShape().end());
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferReadOp op,
AffineMap newMap = permutationMap.compose(map);		AffineMap newMap = permutationMap.compose(map);
// Apply the reverse transpose to deduce the type of the transfer_read.		// Apply the reverse transpose to deduce the type of the transfer_read.
ArrayRef<int64_t> originalShape = op.getVectorType().getShape();		ArrayRef<int64_t> originalShape = op.getVectorType().getShape();
SmallVector<int64_t> newVectorShape(originalShape.size());		SmallVector<int64_t> newVectorShape(originalShape.size());
for (const auto &pos : llvm::enumerate(permutation)) {		for (const auto &pos : llvm::enumerate(permutation)) {
newVectorShape[pos.value()] = originalShape[pos.index()];		newVectorShape[pos.value()] = originalShape[pos.index()];
}		}

// Transpose in_bounds attribute.
ArrayAttr newInBoundsAttr =
op.getInBounds() ? inverseTransposeInBoundsAttr(
rewriter, op.getInBounds().value(), permutation)
: ArrayAttr();

// Generate new transfer_read operation.		// Generate new transfer_read operation.
VectorType newReadType =		VectorType newReadType =
VectorType::get(newVectorShape, op.getVectorType().getElementType());		VectorType::get(newVectorShape, op.getVectorType().getElementType());
Value newRead = rewriter.create<vector::TransferReadOp>(		Value newRead = rewriter.create<vector::TransferReadOp>(
op.getLoc(), newReadType, op.getSource(), op.getIndices(),		op.getLoc(), newReadType, op.getSource(), op.getIndices(),
AffineMapAttr::get(newMap), op.getPadding(), op.getMask(),		AffineMapAttr::get(newMap), op.getPadding(), op.getMask(),
newInBoundsAttr);		op.getInBounds() ? *op.getInBounds() : ArrayAttr());

// Transpose result of transfer_read.		// Transpose result of transfer_read.
SmallVector<int64_t> transposePerm(permutation.begin(), permutation.end());		SmallVector<int64_t> transposePerm(permutation.begin(), permutation.end());
rewriter.replaceOpWithNewOp<vector::TransposeOp>(op, newRead,		rewriter.replaceOpWithNewOp<vector::TransposeOp>(op, newRead,
transposePerm);		transposePerm);
return success();		return success();
}		}
};		};
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferWriteOp op,
AffineMap permutationMap = inversePermutation(comp);		AffineMap permutationMap = inversePermutation(comp);
// Get positions of remaining result dims.		// Get positions of remaining result dims.
SmallVector<int64_t> indices;		SmallVector<int64_t> indices;
llvm::transform(permutationMap.getResults(), std::back_inserter(indices),		llvm::transform(permutationMap.getResults(), std::back_inserter(indices),
[](AffineExpr expr) {		[](AffineExpr expr) {
return expr.dyn_cast<AffineDimExpr>().getPosition();		return expr.dyn_cast<AffineDimExpr>().getPosition();
});		});

// Transpose in_bounds attribute.
ArrayAttr newInBoundsAttr =
op.getInBounds() ? inverseTransposeInBoundsAttr(
rewriter, op.getInBounds().value(), permutation)
: ArrayAttr();

// Generate new transfer_write operation.		// Generate new transfer_write operation.
Value newVec = rewriter.create<vector::TransposeOp>(		Value newVec = rewriter.create<vector::TransposeOp>(
op.getLoc(), op.getVector(), indices);		op.getLoc(), op.getVector(), indices);
auto newMap = AffineMap::getMinorIdentityMap(		auto newMap = AffineMap::getMinorIdentityMap(
map.getNumDims(), map.getNumResults(), rewriter.getContext());		map.getNumDims(), map.getNumResults(), rewriter.getContext());
rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(		rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
op, newVec, op.getSource(), op.getIndices(), AffineMapAttr::get(newMap),		op, newVec, op.getSource(), op.getIndices(), AffineMapAttr::get(newMap),
op.getMask(), newInBoundsAttr);		op.getMask(), op.getInBounds() ? *op.getInBounds() : ArrayAttr());

return success();		return success();
}		}
};		};

/// Convert a transfer.write op with a map which isn't the permutation of a		/// Convert a transfer.write op with a map which isn't the permutation of a
/// minor identity into a vector.broadcast + transfer_write with permutation of		/// minor identity into a vector.broadcast + transfer_write with permutation of
/// minor identity map by adding unit dim on inner dimension. Ex:		/// minor identity map by adding unit dim on inner dimension. Ex:
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferWriteOp op,
// Mask: add unit dims at the end of the shape.		// Mask: add unit dims at the end of the shape.
Value newMask;		Value newMask;
if (op.getMask())		if (op.getMask())
newMask = extendMaskRank(rewriter, op.getLoc(), op.getMask(),		newMask = extendMaskRank(rewriter, op.getLoc(), op.getMask(),
missingInnerDim.size());		missingInnerDim.size());
exprs.append(map.getResults().begin(), map.getResults().end());		exprs.append(map.getResults().begin(), map.getResults().end());
AffineMap newMap =		AffineMap newMap =
AffineMap::get(map.getNumDims(), 0, exprs, op.getContext());		AffineMap::get(map.getNumDims(), 0, exprs, op.getContext());
// All the new dimensions added are inbound.
SmallVector<bool> newInBoundsValues(missingInnerDim.size(), true);
for (int64_t i = 0, e = op.getVectorType().getRank(); i < e; ++i) {
newInBoundsValues.push_back(op.isDimInBounds(i));
}
ArrayAttr newInBoundsAttr = rewriter.getBoolArrayAttr(newInBoundsValues);
rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(		rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
op, newVec, op.getSource(), op.getIndices(), AffineMapAttr::get(newMap),		op, newVec, op.getSource(), op.getIndices(), AffineMapAttr::get(newMap),
newMask, newInBoundsAttr);		newMask, op.getInBoundsAttr());
return success();		return success();
}		}
};		};

/// Lower transfer_read op with broadcast in the leading dimensions into		/// Lower transfer_read op with broadcast in the leading dimensions into
/// transfer_read of lower rank + vector.broadcast.		/// transfer_read of lower rank + vector.broadcast.
/// Ex: vector.transfer_read ...		/// Ex: vector.transfer_read ...
/// permutation_map: (d0, d1, d2, d3) -> (0, d1, 0, d3)		/// permutation_map: (d0, d1, d2, d3) -> (0, d1, 0, d3)
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferReadOp op,
SmallVector<int64_t> newShape = llvm::to_vector<4>(		SmallVector<int64_t> newShape = llvm::to_vector<4>(
originalVecType.getShape().take_back(reducedShapeRank));		originalVecType.getShape().take_back(reducedShapeRank));
// Vector rank cannot be zero. Handled by TransferReadToVectorLoadLowering.		// Vector rank cannot be zero. Handled by TransferReadToVectorLoadLowering.
if (newShape.empty())		if (newShape.empty())
return rewriter.notifyMatchFailure(op, "rank-reduced vector is 0-d");		return rewriter.notifyMatchFailure(op, "rank-reduced vector is 0-d");

VectorType newReadType =		VectorType newReadType =
VectorType::get(newShape, originalVecType.getElementType());		VectorType::get(newShape, originalVecType.getElementType());
ArrayAttr newInBoundsAttr =
op.getInBounds()
? rewriter.getArrayAttr(
op.getInBoundsAttr().getValue().take_back(reducedShapeRank))
: ArrayAttr();
Value newRead = rewriter.create<vector::TransferReadOp>(		Value newRead = rewriter.create<vector::TransferReadOp>(
op.getLoc(), newReadType, op.getSource(), op.getIndices(),		op.getLoc(), newReadType, op.getSource(), op.getIndices(),
AffineMapAttr::get(newMap), op.getPadding(), op.getMask(),		AffineMapAttr::get(newMap), op.getPadding(), op.getMask(),
newInBoundsAttr);		op.getInBoundsAttr());
rewriter.replaceOpWithNewOp<vector::BroadcastOp>(op, originalVecType,		rewriter.replaceOpWithNewOp<vector::BroadcastOp>(op, originalVecType,
newRead);		newRead);
return success();		return success();
}		}
};		};

} // namespace		} // namespace

▲ Show 20 Lines • Show All 247 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/Transforms/VectorDropLeadUnitDim.cpp

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferReadOp read,

AffineMap oldMap = read.getPermutationMap();		AffineMap oldMap = read.getPermutationMap();
ArrayRef<AffineExpr> newResults =		ArrayRef<AffineExpr> newResults =
oldMap.getResults().take_back(newType.getRank());		oldMap.getResults().take_back(newType.getRank());
AffineMap newMap =		AffineMap newMap =
AffineMap::get(oldMap.getNumDims(), oldMap.getNumSymbols(), newResults,		AffineMap::get(oldMap.getNumDims(), oldMap.getNumSymbols(), newResults,
rewriter.getContext());		rewriter.getContext());

ArrayAttr inBoundsAttr;
if (read.getInBounds())
inBoundsAttr = rewriter.getArrayAttr(
read.getInBoundsAttr().getValue().take_back(newType.getRank()));

auto newRead = rewriter.create<vector::TransferReadOp>(		auto newRead = rewriter.create<vector::TransferReadOp>(
read.getLoc(), newType, read.getSource(), read.getIndices(),		read.getLoc(), newType, read.getSource(), read.getIndices(),
AffineMapAttr::get(newMap), read.getPadding(), /mask=/Value(),		AffineMapAttr::get(newMap), read.getPadding(), /mask=/Value(),
inBoundsAttr);		read.getInBoundsAttr());
rewriter.replaceOpWithNewOp<vector::BroadcastOp>(read, oldType, newRead);		rewriter.replaceOpWithNewOp<vector::BroadcastOp>(read, oldType, newRead);

return success();		return success();
}		}
};		};

// Turns vector.transfer_write on vector with leading 1 dimensions into		// Turns vector.transfer_write on vector with leading 1 dimensions into
// vector.shape_cast followed by vector.transfer_write on vector without leading		// vector.shape_cast followed by vector.transfer_write on vector without leading
Show All 23 Lines	LogicalResult matchAndRewrite(vector::TransferWriteOp write,

AffineMap oldMap = write.getPermutationMap();		AffineMap oldMap = write.getPermutationMap();
ArrayRef<AffineExpr> newResults =		ArrayRef<AffineExpr> newResults =
oldMap.getResults().take_back(newType.getRank());		oldMap.getResults().take_back(newType.getRank());
AffineMap newMap =		AffineMap newMap =
AffineMap::get(oldMap.getNumDims(), oldMap.getNumSymbols(), newResults,		AffineMap::get(oldMap.getNumDims(), oldMap.getNumSymbols(), newResults,
rewriter.getContext());		rewriter.getContext());

ArrayAttr inBoundsAttr;
if (write.getInBounds())
inBoundsAttr = rewriter.getArrayAttr(
write.getInBoundsAttr().getValue().take_back(newType.getRank()));

auto newVector = rewriter.create<vector::ExtractOp>(		auto newVector = rewriter.create<vector::ExtractOp>(
write.getLoc(), write.getVector(), splatZero(dropDim));		write.getLoc(), write.getVector(), splatZero(dropDim));
rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(		rewriter.replaceOpWithNewOp<vector::TransferWriteOp>(
write, newVector, write.getSource(), write.getIndices(),		write, newVector, write.getSource(), write.getIndices(),
AffineMapAttr::get(newMap), inBoundsAttr);		AffineMapAttr::get(newMap), write.getInBoundsAttr());

return success();		return success();
}		}
};		};

} // namespace		} // namespace

LogicalResult		LogicalResult
▲ Show 20 Lines • Show All 173 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp

Show First 20 Lines • Show All 517 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferReadOp transferReadOp,
SmallVector<AffineExpr, 1> dimExprs{		SmallVector<AffineExpr, 1> dimExprs{
getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};		getAffineDimExpr(firstContiguousInnerDim, rewriter.getContext())};
auto collapsedMap =		auto collapsedMap =
AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());		AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},		VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
vectorType.getElementType());		vectorType.getElementType());
vector::TransferReadOp flatRead = rewriter.create<vector::TransferReadOp>(		vector::TransferReadOp flatRead = rewriter.create<vector::TransferReadOp>(
loc, flatVectorType, collapsedSource, collapsedIndices, collapsedMap);		loc, flatVectorType, collapsedSource, collapsedIndices, collapsedMap);
flatRead.setInBoundsAttr(rewriter.getBoolArrayAttr({true}));		SmallVector<bool> newInBounds(collapsedIndices.size(), true);
		flatRead.setInBoundsAttr(rewriter.getBoolArrayAttr(newInBounds));
rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(		rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(
transferReadOp, cast<VectorType>(vector.getType()), flatRead);		transferReadOp, cast<VectorType>(vector.getType()), flatRead);
return success();		return success();
}		}
};		};

/// Rewrites contiguous row-major vector.transfer_write ops by inserting		/// Rewrites contiguous row-major vector.transfer_write ops by inserting
/// memref.collapse_shape on the source so that the resulting		/// memref.collapse_shape on the source so that the resulting
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	auto collapsedMap =
AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());		AffineMap::get(collapsedRank, 0, dimExprs, rewriter.getContext());
VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},		VectorType flatVectorType = VectorType::get({vectorType.getNumElements()},
vectorType.getElementType());		vectorType.getElementType());
Value flatVector =		Value flatVector =
rewriter.create<vector::ShapeCastOp>(loc, flatVectorType, vector);		rewriter.create<vector::ShapeCastOp>(loc, flatVectorType, vector);
vector::TransferWriteOp flatWrite =		vector::TransferWriteOp flatWrite =
rewriter.create<vector::TransferWriteOp>(		rewriter.create<vector::TransferWriteOp>(
loc, flatVector, collapsedSource, collapsedIndices, collapsedMap);		loc, flatVector, collapsedSource, collapsedIndices, collapsedMap);
flatWrite.setInBoundsAttr(rewriter.getBoolArrayAttr({true}));		SmallVector<bool> newInBounds(collapsedIndices.size(), true);
		flatWrite.setInBoundsAttr(rewriter.getBoolArrayAttr(newInBounds));
rewriter.eraseOp(transferWriteOp);		rewriter.eraseOp(transferWriteOp);
return success();		return success();
}		}
};		};

/// Base class for `vector.extract/vector.extract_element(vector.transfer_read)`		/// Base class for `vector.extract/vector.extract_element(vector.transfer_read)`
/// to `memref.load` patterns. The `match` method is shared for both		/// to `memref.load` patterns. The `match` method is shared for both
/// `vector.extract` and `vector.extract_element`.		/// `vector.extract` and `vector.extract_element`.
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::TransferWriteOp xferOp,
if (!llvm::all_of(vecType.getShape(), [](int64_t sz) { return sz == 1; }))		if (!llvm::all_of(vecType.getShape(), [](int64_t sz) { return sz == 1; }))
return failure();		return failure();
// Mask not supported.		// Mask not supported.
if (xferOp.getMask())		if (xferOp.getMask())
return failure();		return failure();
// Map not supported.		// Map not supported.
if (!xferOp.getPermutationMap().isMinorIdentity())		if (!xferOp.getPermutationMap().isMinorIdentity())
return failure();		return failure();
		// Cannot rewrite out-of-bounds transfers.
		if (xferOp.hasOutOfBoundsDim())
		return failure();
// Only float and integer element types are supported.		// Only float and integer element types are supported.
Value scalar;		Value scalar;
if (vecType.getRank() == 0) {		if (vecType.getRank() == 0) {
// vector.extract does not support vector<f32> etc., so use		// vector.extract does not support vector<f32> etc., so use
// vector.extractelement instead.		// vector.extractelement instead.
scalar = rewriter.create<vector::ExtractElementOp>(xferOp.getLoc(),		scalar = rewriter.create<vector::ExtractElementOp>(xferOp.getLoc(),
xferOp.getVector());		xferOp.getVector());
} else {		} else {
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp

Show All 34 Lines

#define DEBUG_TYPE "vector-transfer-split"		#define DEBUG_TYPE "vector-transfer-split"

using namespace mlir;		using namespace mlir;
using namespace mlir::vector;		using namespace mlir::vector;

/// Build the condition to ensure that a particular VectorTransferOpInterface		/// Build the condition to ensure that a particular VectorTransferOpInterface
/// is in-bounds.		/// is in-bounds.
static Value createInBoundsCond(RewriterBase &b,		static Value createInBoundsCondition(RewriterBase &b,
VectorTransferOpInterface xferOp) {		VectorTransferOpInterface xferOp) {
assert(xferOp.getPermutationMap().isMinorIdentity() &&		assert(xferOp.getPermutationMap().isMinorIdentity() &&
"Expected minor identity map");		"Expected minor identity map");
		SmallVector<int64_t> accessedChunk = xferOp.getTransferChunkAccessed();
Value inBoundsCond;		Value inBoundsCond;
xferOp.zipResultAndIndexing([&](int64_t resultIdx, int64_t indicesIdx) {
// Zip over the resulting vector shape and memref indices.		for (int64_t dim = 0, e = xferOp.getShapedType().getRank(); dim < e; ++dim) {
// If the dimension is known to be in-bounds, it does not participate in		// If the dimension is known to be in-bounds, it does not participate in
// the construction of `inBoundsCond`.		// the construction of `inBoundsCond`.
if (xferOp.isDimInBounds(resultIdx))		if (xferOp.isDimInBounds(dim))
return;		continue;

// Fold or create the check that `index + vector_size` <= `memref_size`.		// Fold or create the check that `index + vector_size` <= `memref_size`.
Location loc = xferOp.getLoc();		Location loc = xferOp.getLoc();
int64_t vectorSize = xferOp.getVectorType().getDimSize(resultIdx);		// If the dimension is not part of the transfer shape, it's like a "1"
		// dimensions that is casted away.
		int64_t vectorSize = accessedChunk[dim] == 0 ? 1 : accessedChunk[dim];
OpFoldResult sum = affine::makeComposedFoldedAffineApply(		OpFoldResult sum = affine::makeComposedFoldedAffineApply(
b, loc, b.getAffineDimExpr(0) + b.getAffineConstantExpr(vectorSize),		b, loc, b.getAffineDimExpr(0) + b.getAffineConstantExpr(vectorSize),
{xferOp.indices()[indicesIdx]});		{xferOp.indices()[dim]});
OpFoldResult dimSz =		OpFoldResult dimSz = memref::getMixedSize(b, loc, xferOp.source(), dim);
memref::getMixedSize(b, loc, xferOp.source(), indicesIdx);
		// Skip check if it is statically "true";
auto maybeCstSum = getConstantIntValue(sum);		auto maybeCstSum = getConstantIntValue(sum);
auto maybeCstDimSz = getConstantIntValue(dimSz);		auto maybeCstDimSz = getConstantIntValue(dimSz);
if (maybeCstSum && maybeCstDimSz && maybeCstSum <= maybeCstDimSz)		if (maybeCstSum && maybeCstDimSz && maybeCstSum <= maybeCstDimSz)
return;		continue;

		// Generate condition.
Value cond =		Value cond =
b.create<arith::CmpIOp>(loc, arith::CmpIPredicate::sle,		b.create<arith::CmpIOp>(loc, arith::CmpIPredicate::sle,
getValueOrCreateConstantIndexOp(b, loc, sum),		getValueOrCreateConstantIndexOp(b, loc, sum),
getValueOrCreateConstantIndexOp(b, loc, dimSz));		getValueOrCreateConstantIndexOp(b, loc, dimSz));

// Conjunction over all dims for which we are in-bounds.		// Conjunction over all dims for which we are in-bounds.
if (inBoundsCond)		if (inBoundsCond)
inBoundsCond = b.create<arith::AndIOp>(loc, inBoundsCond, cond);		inBoundsCond = b.create<arith::AndIOp>(loc, inBoundsCond, cond);
else		else
inBoundsCond = cond;		inBoundsCond = cond;
});		}

return inBoundsCond;		return inBoundsCond;
}		}

/// Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds		/// Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds
/// masking) fast path and a slow path.		/// masking) fast path and a slow path.
/// If `ifOp` is not null and the result is `success, the `ifOp` points to the		/// If `ifOp` is not null and the result is `success, the `ifOp` points to the
/// newly created conditional upon function return.		/// newly created conditional upon function return.
/// To accommodate for the fact that the original vector.transfer indexing may		/// To accommodate for the fact that the original vector.transfer indexing may
Show All 11 Lines
/// // fast path, direct cast		/// // fast path, direct cast
/// memref.cast %A: memref<A...> to compatibleMemRefType		/// memref.cast %A: memref<A...> to compatibleMemRefType
/// scf.yield %view : compatibleMemRefType, index, index		/// scf.yield %view : compatibleMemRefType, index, index
/// } else {		/// } else {
/// // slow path, not in-bounds vector.transfer or linalg.copy.		/// // slow path, not in-bounds vector.transfer or linalg.copy.
/// memref.cast %alloc: memref<B...> to compatibleMemRefType		/// memref.cast %alloc: memref<B...> to compatibleMemRefType
/// scf.yield %4 : compatibleMemRefType, index, index		/// scf.yield %4 : compatibleMemRefType, index, index
// }		// }
/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true ... true]}		/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true, true]}
/// ```		/// ```
/// where `alloc` is a top of the function alloca'ed buffer of one vector.		/// where `alloc` is a top of the function alloca'ed buffer of one vector.
///		///
/// Preconditions:		/// Preconditions:
/// 1. `xferOp.getPermutationMap()` must be a minor identity map		/// 1. `xferOp.getPermutationMap()` must be a minor identity map
/// 2. the rank of the `xferOp.memref()` and the rank of the `xferOp.vector()`		/// 2. the rank of the `xferOp.memref()` and the rank of the `xferOp.vector()`
/// must be equal. This will be relaxed in the future but requires		/// must be equal. This will be relaxed in the future but requires
/// rank-reducing subviews.		/// rank-reducing subviews.
▲ Show 20 Lines • Show All 364 Lines • ▼ Show 20 Lines
/// // fastpath, direct cast		/// // fastpath, direct cast
/// memref.cast %A: memref<A...> to compatibleMemRefType		/// memref.cast %A: memref<A...> to compatibleMemRefType
/// scf.yield %view : compatibleMemRefType, index, index		/// scf.yield %view : compatibleMemRefType, index, index
/// } else {		/// } else {
/// // slowpath, not in-bounds vector.transfer or linalg.copy.		/// // slowpath, not in-bounds vector.transfer or linalg.copy.
/// memref.cast %alloc: memref<B...> to compatibleMemRefType		/// memref.cast %alloc: memref<B...> to compatibleMemRefType
/// scf.yield %4 : compatibleMemRefType, index, index		/// scf.yield %4 : compatibleMemRefType, index, index
// }		// }
/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true ... true]}		/// %0 = vector.transfer_read %1#0[%1#1, %1#2] {in_bounds = [true, true]}
/// ```		/// ```
/// where `alloc` is a top of the function alloca'ed buffer of one vector.		/// where `alloc` is a top of the function alloca'ed buffer of one vector.
///		///
/// For vector.transfer_write:		/// For vector.transfer_write:
/// There are 2 conditional blocks. First a block to decide which memref and		/// There are 2 conditional blocks. First a block to decide which memref and
/// indices to use for an unmasked, inbounds write. Then a conditional block to		/// indices to use for an unmasked, inbounds write. Then a conditional block to
/// further copy a partial buffer into the final result in the slow path case.		/// further copy a partial buffer into the final result in the slow path case.
///		///
/// Example (a 2-D vector.transfer_write):		/// Example (a 2-D vector.transfer_write):
/// ```		/// ```
/// vector.transfer_write %arg, %0[...], %pad : memref<A...>, vector<...>		/// vector.transfer_write %arg, %0[...], %pad : memref<A...>, vector<...>
/// ```		/// ```
/// is transformed into:		/// is transformed into:
/// ```		/// ```
/// %1:3 = scf.if (%inBounds) {		/// %1:3 = scf.if (%inBounds) {
/// memref.cast %A: memref<A...> to compatibleMemRefType		/// memref.cast %A: memref<A...> to compatibleMemRefType
/// scf.yield %view : compatibleMemRefType, index, index		/// scf.yield %view : compatibleMemRefType, index, index
/// } else {		/// } else {
/// memref.cast %alloc: memref<B...> to compatibleMemRefType		/// memref.cast %alloc: memref<B...> to compatibleMemRefType
/// scf.yield %4 : compatibleMemRefType, index, index		/// scf.yield %4 : compatibleMemRefType, index, index
/// }		/// }
/// %0 = vector.transfer_write %arg, %1#0[%1#1, %1#2] {in_bounds = [true ...		/// %0 = vector.transfer_write %arg, %1#0[%1#1, %1#2] {in_bounds = [true,
/// true]}		/// true]}
/// scf.if (%notInBounds) {		/// scf.if (%notInBounds) {
/// // slowpath: not in-bounds vector.transfer or linalg.copy.		/// // slowpath: not in-bounds vector.transfer or linalg.copy.
/// }		/// }
/// ```		/// ```
/// where `alloc` is a top of the function alloca'ed buffer of one vector.		/// where `alloc` is a top of the function alloca'ed buffer of one vector.
///		///
/// Preconditions:		/// Preconditions:
/// 1. `xferOp.getPermutationMap()` must be a minor identity map		/// 1. `xferOp.getPermutationMap()` must be a minor identity map
/// 2. the rank of the `xferOp.source()` and the rank of the `xferOp.vector()`		/// 2. the rank of the `xferOp.source()` and the rank of the `xferOp.vector()`
/// must be equal. This will be relaxed in the future but requires		/// must be equal. This will be relaxed in the future but requires
/// rank-reducing subviews.		/// rank-reducing subviews.
LogicalResult mlir::vector::splitFullAndPartialTransfer(		LogicalResult mlir::vector::splitFullAndPartialTransfer(
RewriterBase &b, VectorTransferOpInterface xferOp,		RewriterBase &b, VectorTransferOpInterface xferOp,
VectorTransformsOptions options, scf::IfOp *ifOp) {		VectorTransformsOptions options, scf::IfOp *ifOp) {
if (options.vectorTransferSplit == VectorTransferSplit::None)		if (options.vectorTransferSplit == VectorTransferSplit::None)
return failure();		return failure();

SmallVector<bool, 4> bools(xferOp.getTransferRank(), true);		SmallVector<bool, 4> bools(xferOp.getShapedType().getRank(), true);
auto inBoundsAttr = b.getBoolArrayAttr(bools);		auto inBoundsAttr = b.getBoolArrayAttr(bools);
if (options.vectorTransferSplit == VectorTransferSplit::ForceInBounds) {		if (options.vectorTransferSplit == VectorTransferSplit::ForceInBounds) {
b.updateRootInPlace(xferOp, [&]() {		b.updateRootInPlace(xferOp, [&]() {
xferOp->setAttr(xferOp.getInBoundsAttrStrName(), inBoundsAttr);		xferOp->setAttr(xferOp.getInBoundsAttrStrName(), inBoundsAttr);
});		});
return success();		return success();
}		}

Show All 11 Lines	// ensure they aren't used in the wrong scopes further down.
if (xferWriteOp && xferWriteOp.getMask())		if (xferWriteOp && xferWriteOp.getMask())
return failure();		return failure();
if (xferReadOp && xferReadOp.getMask())		if (xferReadOp && xferReadOp.getMask())
return failure();		return failure();
}		}

RewriterBase::InsertionGuard guard(b);		RewriterBase::InsertionGuard guard(b);
b.setInsertionPoint(xferOp);		b.setInsertionPoint(xferOp);
Value inBoundsCond = createInBoundsCond(		Value inBoundsCond = createInBoundsCondition(
b, cast<VectorTransferOpInterface>(xferOp.getOperation()));		b, cast<VectorTransferOpInterface>(xferOp.getOperation()));
if (!inBoundsCond)		if (!inBoundsCond)
return failure();		return failure();

// Top of the function `alloc` for transient storage.		// Top of the function `alloc` for transient storage.
Value alloc;		Value alloc;
{		{
RewriterBase::InsertionGuard guard(b);		RewriterBase::InsertionGuard guard(b);
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

Show First 20 Lines • Show All 986 Lines • ▼ Show 20 Lines	static Value buildVectorComparison(PatternRewriter &rewriter, Operation *op,
// Construct the vector comparison.		// Construct the vector comparison.
Value bound = getValueOrCreateCastToIndexLike(rewriter, loc, idxType, b);		Value bound = getValueOrCreateCastToIndexLike(rewriter, loc, idxType, b);
Value bounds =		Value bounds =
rewriter.create<vector::SplatOp>(loc, indices.getType(), bound);		rewriter.create<vector::SplatOp>(loc, indices.getType(), bound);
return rewriter.create<arith::CmpIOp>(loc, arith::CmpIPredicate::slt, indices,		return rewriter.create<arith::CmpIOp>(loc, arith::CmpIPredicate::slt, indices,
bounds);		bounds);
}		}

		/// Materialize a mask for a 1D vector transfer, where the transfer dimension
		/// is out-of-bounds.
template <typename ConcreteOp>		template <typename ConcreteOp>
struct MaterializeTransferMask : public OpRewritePattern<ConcreteOp> {		struct MaterializeTransferMask : public OpRewritePattern<ConcreteOp> {
public:		public:
explicit MaterializeTransferMask(MLIRContext *context, bool enableIndexOpt,		explicit MaterializeTransferMask(MLIRContext *context, bool enableIndexOpt,
PatternBenefit benefit = 1)		PatternBenefit benefit = 1)
: mlir::OpRewritePattern<ConcreteOp>(context, benefit),		: mlir::OpRewritePattern<ConcreteOp>(context, benefit),
force32BitVectorIndices(enableIndexOpt) {}		force32BitVectorIndices(enableIndexOpt) {}

LogicalResult matchAndRewrite(ConcreteOp xferOp,		LogicalResult matchAndRewrite(ConcreteOp xferOp,
PatternRewriter &rewriter) const override {		PatternRewriter &rewriter) const override {
if (!xferOp.hasOutOfBoundsDim())		if (xferOp.getVectorType().getRank() > 1 \|\| xferOp.getIndices().empty())
return failure();		return failure();

if (xferOp.getVectorType().getRank() > 1 \|\| xferOp.getIndices().empty())		// Get transfer dimension.
		assert(xferOp.getPermutationMap().getNumResults() == 1 &&
		"expected one result in AffineMap");
		AffineDimExpr dimExpr = xferOp.getPermutationMap()
		.getResult(0)
		.template dyn_cast<AffineDimExpr>();
		assert(dimExpr && "expected dimension, not broadcast");
		int64_t transferDim = dimExpr.getPosition();

		// The dimension must be out-of-bounds.
		if (xferOp.isDimInBounds(transferDim))
return failure();		return failure();

Location loc = xferOp->getLoc();		Location loc = xferOp->getLoc();
VectorType vtp = xferOp.getVectorType();		VectorType vtp = xferOp.getVectorType();

// Create the in-bounds mask with all elements between [0 .. dim - offset)		// Create the in-bounds mask with all elements between [0 .. dim - offset)
// set and [dim - offset .. vector_length) unset.		// set and [dim - offset .. vector_length) unset.
//		//
Show All 9 Lines	Value mask = rewriter.create<vector::CreateMaskOp>(
VectorType::get(vtp.getShape(), rewriter.getI1Type(),		VectorType::get(vtp.getShape(), rewriter.getI1Type(),
vtp.getScalableDims()),		vtp.getScalableDims()),
b);		b);
if (xferOp.getMask()) {		if (xferOp.getMask()) {
// Intersect the in-bounds with the mask specified as an op parameter.		// Intersect the in-bounds with the mask specified as an op parameter.
mask = rewriter.create<arith::AndIOp>(loc, mask, xferOp.getMask());		mask = rewriter.create<arith::AndIOp>(loc, mask, xferOp.getMask());
}		}

		// The masked dimension is now in-bounds.
		SmallVector<bool> inBounds = xferOp.getInBoundsValues();
		assert(!inBounds[transferDim] &&
		"expected that dimension was out-of-bounds");
		inBounds[transferDim] = true;

rewriter.updateRootInPlace(xferOp, [&]() {		rewriter.updateRootInPlace(xferOp, [&]() {
xferOp.getMaskMutable().assign(mask);		xferOp.getMaskMutable().assign(mask);
xferOp.setInBoundsAttr(rewriter.getBoolArrayAttr({true}));		xferOp.setInBoundsAttr(rewriter.getBoolArrayAttr(inBounds));
});		});

return success();		return success();
}		}

private:		private:
const bool force32BitVectorIndices;		const bool force32BitVectorIndices;
};		};
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

mlir/test/Conversion/GPUCommon/transfer_write.mlir

	// RUN: mlir-opt %s --gpu-to-llvm='use-opaque-pointers=1' \| FileCheck %s			// RUN: mlir-opt %s --gpu-to-llvm='use-opaque-pointers=1' \| FileCheck %s

	func.func @warp_extract(%arg0: index, %arg1: memref<1024x1024xf32>, %arg2: index, %arg3: vector<1xf32>) {			func.func @warp_extract(%arg0: index, %arg1: memref<1024x1024xf32>, %arg2: index, %arg3: vector<1xf32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.warp_execute_on_lane_0(%arg0)[32] {			vector.warp_execute_on_lane_0(%arg0)[32] {
	// CHECK:%[[val:[0-9]+]] = llvm.extractelement			// CHECK:%[[val:[0-9]+]] = llvm.extractelement
	// CHECK:%[[base:[0-9]+]] = llvm.extractvalue			// CHECK:%[[base:[0-9]+]] = llvm.extractvalue
	// CHECK:%[[ptr:[0-9]+]] = llvm.getelementptr %[[base]]			// CHECK:%[[ptr:[0-9]+]] = llvm.getelementptr %[[base]]
	// CHECK:llvm.store %[[val]], %[[ptr]]			// CHECK:llvm.store %[[val]], %[[ptr]]
	vector.transfer_write %arg3, %arg1[%c0, %c0] {in_bounds = [true]} : vector<1xf32>, memref<1024x1024xf32>			vector.transfer_write %arg3, %arg1[%c0, %c0] {in_bounds = [true, true]} : vector<1xf32>, memref<1024x1024xf32>
	}			}
	return			return
	}			}

mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir

Show First 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	func.func @multi_dim_m16n8k16_fp16_row_row_row(%arg0: memref<4x32x1x32xf16, #gpu.address_space<workgroup>>, %arg1: memref<4x1x32x32xf16, #gpu.address_space<workgroup>>, %arg2: memref<1x32x40xf16, #gpu.address_space<workgroup>>) {

// CHECK-DAG: [[c0:%.+]] = arith.constant 0 : index		// CHECK-DAG: [[c0:%.+]] = arith.constant 0 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16		%cst = arith.constant 0.000000e+00 : f16

// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]		// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]
// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$contiguous_map]]		// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$contiguous_map]]
// CHECK: [[fragmentA:%.+]] = nvgpu.ldmatrix %arg0[[[c0]], [[m_coord]], [[c0]], [[k_coord]]] {numTiles = 4 : i32, transpose = false}		// CHECK: [[fragmentA:%.+]] = nvgpu.ldmatrix %arg0[[[c0]], [[m_coord]], [[c0]], [[k_coord]]] {numTiles = 4 : i32, transpose = false}
%A = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true], permutation_map = #map_a} : memref<4x32x1x32xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>		%A = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true], permutation_map = #map_a} : memref<4x32x1x32xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>

// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]		// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]
// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$strided_map]]		// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$strided_map]]
// CHECK-DAG: [[fragmentB:%.+]] = nvgpu.ldmatrix %arg1[[[c0]], [[c0]], [[k_coord]], [[n_coord]]] {numTiles = 4 : i32, transpose = true}		// CHECK-DAG: [[fragmentB:%.+]] = nvgpu.ldmatrix %arg1[[[c0]], [[c0]], [[k_coord]], [[n_coord]]] {numTiles = 4 : i32, transpose = true}
%B = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true], permutation_map = #map_b} : memref<4x1x32x32xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>		%B = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true], permutation_map = #map_b} : memref<4x1x32x32xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>

// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]		// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]
// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]		// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]
// CHECK-DAG: [[fragmentC:%.*]] = nvgpu.ldmatrix %arg2[[[c0]], [[m_coord]], [[n_coord]]] {numTiles = 4 : i32, transpose = false}		// CHECK-DAG: [[fragmentC:%.*]] = nvgpu.ldmatrix %arg2[[[c0]], [[m_coord]], [[n_coord]]] {numTiles = 4 : i32, transpose = false}
%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<1x32x40xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>		%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x32x40xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>

// CHECK-DAG: [[fragmentB0:%.+]] = vector.extract_strided_slice [[fragmentB]] {offsets = [0, 0], sizes = [2, 2], strides = [1, 1]} : vector<4x2xf16> to vector<2x2xf16>		// CHECK-DAG: [[fragmentB0:%.+]] = vector.extract_strided_slice [[fragmentB]] {offsets = [0, 0], sizes = [2, 2], strides = [1, 1]} : vector<4x2xf16> to vector<2x2xf16>
// CHECK-DAG: [[fragmentC0:%.+]] = vector.extract_strided_slice [[fragmentC]] {offsets = [0, 0], sizes = [2, 2], strides = [1, 1]} : vector<4x2xf16> to vector<2x2xf16>		// CHECK-DAG: [[fragmentC0:%.+]] = vector.extract_strided_slice [[fragmentC]] {offsets = [0, 0], sizes = [2, 2], strides = [1, 1]} : vector<4x2xf16> to vector<2x2xf16>
// CHECK: nvgpu.mma.sync([[fragmentA]], [[fragmentB0]], [[fragmentC0]]) {mmaShape = [16, 8, 16]} : (vector<4x2xf16>, vector<2x2xf16>, vector<2x2xf16>) -> vector<2x2xf16>		// CHECK: nvgpu.mma.sync([[fragmentA]], [[fragmentB0]], [[fragmentC0]]) {mmaShape = [16, 8, 16]} : (vector<4x2xf16>, vector<2x2xf16>, vector<2x2xf16>) -> vector<2x2xf16>
%B0 = vector.extract_strided_slice %B {offsets = [0, 0], sizes = [8, 16], strides = [1, 1]} : vector<16x16xf16> to vector<8x16xf16>		%B0 = vector.extract_strided_slice %B {offsets = [0, 0], sizes = [8, 16], strides = [1, 1]} : vector<16x16xf16> to vector<8x16xf16>
%C0 = vector.extract_strided_slice %C {offsets = [0, 0], sizes = [16, 8], strides = [1, 1]} : vector<16x16xf16> to vector<16x8xf16>		%C0 = vector.extract_strided_slice %C {offsets = [0, 0], sizes = [16, 8], strides = [1, 1]} : vector<16x16xf16> to vector<16x8xf16>
%D0 = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B0, %C0 : vector<16x16xf16>, vector<8x16xf16> into vector<16x8xf16>		%D0 = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B0, %C0 : vector<16x16xf16>, vector<8x16xf16> into vector<16x8xf16>
vector.transfer_write %D0, %arg2[%c0, %c0, %c0] {in_bounds = [true, true]} : vector<16x8xf16>, memref<1x32x40xf16, #gpu.address_space<workgroup>>		vector.transfer_write %D0, %arg2[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<16x8xf16>, memref<1x32x40xf16, #gpu.address_space<workgroup>>

return		return
}		}

// -----		// -----

// CHECK-DAG: [[$strided_map:#.+]] = affine_map<()[s0] -> (s0 mod 16)>		// CHECK-DAG: [[$strided_map:#.+]] = affine_map<()[s0] -> (s0 mod 16)>
// CHECK-DAG: [[$contiguous_map:#.+]] = affine_map<()[s0] -> ((s0 floordiv 16) * 8)>		// CHECK-DAG: [[$contiguous_map:#.+]] = affine_map<()[s0] -> ((s0 floordiv 16) * 8)>

#map0 = affine_map<(d0, d1, d2) -> (d2, d1)>		#map0 = affine_map<(d0, d1, d2) -> (d2, d1)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>		#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
#map2 = affine_map<(d0, d1, d2) -> (d1, d2)>		#map2 = affine_map<(d0, d1, d2) -> (d1, d2)>
#map3 = affine_map<(d0, d1, d2) -> (d0, d1)>		#map3 = affine_map<(d0, d1, d2) -> (d0, d1)>

// CHECK-LABEL: func @batch_m16n8k16_fp16_row_row_row		// CHECK-LABEL: func @batch_m16n8k16_fp16_row_row_row
func.func @batch_m16n8k16_fp16_row_row_row(%arg0: memref<2x20x20xf16, #gpu.address_space<workgroup>>, %arg1: memref<2x20x20xf16, #gpu.address_space<workgroup>>, %arg2: memref<2x20x20xf16, #gpu.address_space<workgroup>>) {		func.func @batch_m16n8k16_fp16_row_row_row(%arg0: memref<2x20x20xf16, #gpu.address_space<workgroup>>, %arg1: memref<2x20x20xf16, #gpu.address_space<workgroup>>, %arg2: memref<2x20x20xf16, #gpu.address_space<workgroup>>) {
%cst_0 = arith.constant dense<0.000000e+00> : vector<20x20xf16>		%cst_0 = arith.constant dense<0.000000e+00> : vector<20x20xf16>
// CHECK: [[C0:%.+]] = arith.constant 0 : index		// CHECK: [[C0:%.+]] = arith.constant 0 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16		%cst = arith.constant 0.000000e+00 : f16

// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]		// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]
// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$contiguous_map]]		// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$contiguous_map]]
// CHECK: nvgpu.ldmatrix %arg0[[[C0]], [[m_coord]], [[k_coord]]] {numTiles = 4 : i32, transpose = false} : memref<2x20x20xf16, #gpu.address_space<workgroup>> -> vector<4x2xf16>		// CHECK: nvgpu.ldmatrix %arg0[[[C0]], [[m_coord]], [[k_coord]]] {numTiles = 4 : i32, transpose = false} : memref<2x20x20xf16, #gpu.address_space<workgroup>> -> vector<4x2xf16>
%A = vector.transfer_read %arg0[%c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<2x20x20xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>		%A = vector.transfer_read %arg0[%c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<2x20x20xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>

// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]		// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]
// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$strided_map]]		// CHECK-DAG: [[k_coord:%.+]] = affine.apply [[$strided_map]]
// CHECK: nvgpu.ldmatrix %arg1[[[C0]], [[k_coord]], [[n_coord]]] {numTiles = 2 : i32, transpose = true} : memref<2x20x20xf16, #gpu.address_space<workgroup>> -> vector<2x2xf16>		// CHECK: nvgpu.ldmatrix %arg1[[[C0]], [[k_coord]], [[n_coord]]] {numTiles = 2 : i32, transpose = true} : memref<2x20x20xf16, #gpu.address_space<workgroup>> -> vector<2x2xf16>
%B = vector.transfer_read %arg1[%c0, %c0, %c0], %cst {permutation_map = #map0, in_bounds = [true, true]} : memref<2x20x20xf16, #gpu.address_space<workgroup>>, vector<8x16xf16>		%B = vector.transfer_read %arg1[%c0, %c0, %c0], %cst {permutation_map = #map0, in_bounds = [true, true, true]} : memref<2x20x20xf16, #gpu.address_space<workgroup>>, vector<8x16xf16>

// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]		// CHECK-DAG: [[m_coord:%.+]] = affine.apply [[$strided_map]]
// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]		// CHECK-DAG: [[n_coord:%.+]] = affine.apply [[$contiguous_map]]
// CHECK: nvgpu.ldmatrix %arg2[[[C0]], [[m_coord]], [[n_coord]]] {numTiles = 2 : i32, transpose = false} : memref<2x20x20xf16, #gpu.address_space<workgroup>> -> vector<2x2xf16>		// CHECK: nvgpu.ldmatrix %arg2[[[C0]], [[m_coord]], [[n_coord]]] {numTiles = 2 : i32, transpose = false} : memref<2x20x20xf16, #gpu.address_space<workgroup>> -> vector<2x2xf16>
%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<2x20x20xf16, #gpu.address_space<workgroup>>, vector<16x8xf16>		%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<2x20x20xf16, #gpu.address_space<workgroup>>, vector<16x8xf16>
%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<8x16xf16> into vector<16x8xf16>		%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<8x16xf16> into vector<16x8xf16>
vector.transfer_write %D, %arg2[%c0, %c0, %c0] {in_bounds = [true, true]} : vector<16x8xf16>, memref<2x20x20xf16, #gpu.address_space<workgroup>>		vector.transfer_write %D, %arg2[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<16x8xf16>, memref<2x20x20xf16, #gpu.address_space<workgroup>>
return		return
}		}

// -----		// -----

//#########################################################		//#########################################################
// FP16 row-col-row		// FP16 row-col-row
//#########################################################		//#########################################################
▲ Show 20 Lines • Show All 373 Lines • Show Last 20 Lines

mlir/test/Conversion/VectorToGPU/vector-to-mma-ops.mlir

Show First 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	func.func @matmul_fused_broadcast(%arg0: memref<16x16xf16>, %arg1: memref<16x16xf16>,
%arg2: memref<16x16xf16>, %arg3: memref<16x16x16x16xf16>) {		%arg2: memref<16x16xf16>, %arg3: memref<16x16x16x16xf16>) {
%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>		%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16		%cst = arith.constant 0.000000e+00 : f16
%A = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : memref<16x16xf16>, vector<16x16xf16>		%A = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true, true]} : memref<16x16xf16>, vector<16x16xf16>
%B = vector.transfer_read %arg1[%c0, %c0], %cst {permutation_map = #map0, in_bounds = [true, true]} : memref<16x16xf16>, vector<16x16xf16>		%B = vector.transfer_read %arg1[%c0, %c0], %cst {permutation_map = #map0, in_bounds = [true, true]} : memref<16x16xf16>, vector<16x16xf16>
%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %cst_0 : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>		%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %cst_0 : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
%E = vector.transfer_read %arg3[%c0, %c0, %c0, %c0], %cst		%E = vector.transfer_read %arg3[%c0, %c0, %c0, %c0], %cst
{in_bounds = [true, true], permutation_map = affine_map<(d0, d1, d2, d3)->(0, d3)>}		{in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2, d3)->(0, d3)>}
: memref<16x16x16x16xf16>, vector<16x16xf16>		: memref<16x16x16x16xf16>, vector<16x16xf16>
%F = arith.divf %D, %E : vector<16x16xf16>		%F = arith.divf %D, %E : vector<16x16xf16>
vector.transfer_write %F, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<16x16xf16>		vector.transfer_write %F, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<16x16xf16>
return		return
}		}

// -----		// -----

Show All 10 Lines
// CHECK-DAG: %[[B:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]]] {leadDimension = 0 : index} : memref<16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">		// CHECK-DAG: %[[B:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]]] {leadDimension = 0 : index} : memref<16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">
// CHECK-DAG: %[[C:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : memref<2x16x16xf16> -> !gpu.mma_matrix<16x16xf16, "COp">		// CHECK-DAG: %[[C:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : memref<2x16x16xf16> -> !gpu.mma_matrix<16x16xf16, "COp">
// CHECK: %[[D:.+]] = gpu.subgroup_mma_compute %[[A]], %[[B]], %[[C]] : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">		// CHECK: %[[D:.+]] = gpu.subgroup_mma_compute %[[A]], %[[B]], %[[C]] : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">
// CHECK: gpu.subgroup_mma_store_matrix %[[D]], %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<2x16x16xf16>		// CHECK: gpu.subgroup_mma_store_matrix %[[D]], %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<2x16x16xf16>
func.func @matmul_3Dmemref(%arg0: memref<2x16x16xf16>, %arg1: memref<16xf16>, %arg2: memref<2x16x16xf16>) {		func.func @matmul_3Dmemref(%arg0: memref<2x16x16xf16>, %arg1: memref<16xf16>, %arg2: memref<2x16x16xf16>) {
%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>		%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16		%cst = arith.constant 0.000000e+00 : f16
%A = vector.transfer_read %arg0[%c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<2x16x16xf16>, vector<16x16xf16>		%A = vector.transfer_read %arg0[%c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<2x16x16xf16>, vector<16x16xf16>
%B = vector.transfer_read %arg1[%c0], %cst {permutation_map = #map4, in_bounds = [true, true]} : memref<16xf16>, vector<16x16xf16>		%B = vector.transfer_read %arg1[%c0], %cst {permutation_map = #map4, in_bounds = [true]} : memref<16xf16>, vector<16x16xf16>
%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<2x16x16xf16>, vector<16x16xf16>		%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<2x16x16xf16>, vector<16x16xf16>
%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>		%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
vector.transfer_write %D, %arg2[%c0, %c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<2x16x16xf16>		vector.transfer_write %D, %arg2[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<16x16xf16>, memref<2x16x16xf16>
return		return
}		}

// -----		// -----

#map0 = affine_map<(d0, d1) -> (d1, d0)>		#map0 = affine_map<(d0, d1) -> (d1, d0)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>		#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
#map2 = affine_map<(d0, d1, d2) -> (d1, d2)>		#map2 = affine_map<(d0, d1, d2) -> (d1, d2)>
#map3 = affine_map<(d0, d1, d2) -> (d0, d1)>		#map3 = affine_map<(d0, d1, d2) -> (d0, d1)>
#map4 = affine_map<(d0) -> (d0, 0)>		#map4 = affine_map<(d0) -> (d0, 0)>
#map5 = affine_map<(d0, d1) -> (d0, d1)>		#map5 = affine_map<(d0, d1) -> (d0, d1)>

// CHECK-LABEL: func @matmul_memref_strided		// CHECK-LABEL: func @matmul_memref_strided
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[A:.+]] = gpu.subgroup_mma_load_matrix %{{.}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 32 : index} : memref<2x16x16xf16, #{{.}}> -> !gpu.mma_matrix<16x16xf16, "AOp">		// CHECK-DAG: %[[A:.+]] = gpu.subgroup_mma_load_matrix %{{.}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 32 : index} : memref<2x16x16xf16, #{{.}}> -> !gpu.mma_matrix<16x16xf16, "AOp">
// CHECK-DAG: %[[B:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]]] {leadDimension = 0 : index} : memref<16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">		// CHECK-DAG: %[[B:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]]] {leadDimension = 0 : index} : memref<16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">
// CHECK-DAG: %[[C:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : memref<2x16x16xf16> -> !gpu.mma_matrix<16x16xf16, "COp">		// CHECK-DAG: %[[C:.+]] = gpu.subgroup_mma_load_matrix %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : memref<2x16x16xf16> -> !gpu.mma_matrix<16x16xf16, "COp">
// CHECK: %[[D:.+]] = gpu.subgroup_mma_compute %[[A]], %[[B]], %[[C]] : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">		// CHECK: %[[D:.+]] = gpu.subgroup_mma_compute %[[A]], %[[B]], %[[C]] : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">
// CHECK: gpu.subgroup_mma_store_matrix %[[D]], %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<2x16x16xf16>		// CHECK: gpu.subgroup_mma_store_matrix %[[D]], %{{.*}}[%[[C0]], %[[C0]], %[[C0]]] {leadDimension = 16 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<2x16x16xf16>
func.func @matmul_memref_strided(%arg0: memref<2x16x16xf16, affine_map<(d0, d1, d2) -> (d0 * 512 + d1 * 32 + d2)>>, %arg1: memref<16xf16>, %arg2: memref<2x16x16xf16>) {		func.func @matmul_memref_strided(%arg0: memref<2x16x16xf16, affine_map<(d0, d1, d2) -> (d0 * 512 + d1 * 32 + d2)>>, %arg1: memref<16xf16>, %arg2: memref<2x16x16xf16>) {
%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>		%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16		%cst = arith.constant 0.000000e+00 : f16
%A = vector.transfer_read %arg0[%c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<2x16x16xf16, affine_map<(d0, d1, d2) -> (d0 * 512 + d1 * 32 + d2)>>, vector<16x16xf16>		%A = vector.transfer_read %arg0[%c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<2x16x16xf16, affine_map<(d0, d1, d2) -> (d0 * 512 + d1 * 32 + d2)>>, vector<16x16xf16>
%B = vector.transfer_read %arg1[%c0], %cst {permutation_map = #map4, in_bounds = [true, true]} : memref<16xf16>, vector<16x16xf16>		%B = vector.transfer_read %arg1[%c0], %cst {permutation_map = #map4, in_bounds = [true]} : memref<16xf16>, vector<16x16xf16>
%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true]} : memref<2x16x16xf16>, vector<16x16xf16>		%C = vector.transfer_read %arg2[%c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<2x16x16xf16>, vector<16x16xf16>
%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>		%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
vector.transfer_write %D, %arg2[%c0, %c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<2x16x16xf16>		vector.transfer_write %D, %arg2[%c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<16x16xf16>, memref<2x16x16xf16>
return		return
}		}

// -----		// -----

#map0 = affine_map<(d0, d1) -> (d1, d0)>		#map0 = affine_map<(d0, d1) -> (d1, d0)>
#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>		#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
#map2 = affine_map<(d0, d1, d2) -> (d1, d2)>		#map2 = affine_map<(d0, d1, d2) -> (d1, d2)>
Show All 33 Lines
// CHECK-DAG: %[[B:.+]] = gpu.subgroup_mma_load_matrix %{{.}}[%{{.}}] {leadDimension = 0 : index} : memref<16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">		// CHECK-DAG: %[[B:.+]] = gpu.subgroup_mma_load_matrix %{{.}}[%{{.}}] {leadDimension = 0 : index} : memref<16xf16> -> !gpu.mma_matrix<16x16xf16, "BOp">
// CHECK-DAG: %[[C:.+]] = gpu.subgroup_mma_load_matrix %{{.}}[%{{.}}, %{{.*}}] {leadDimension = 16 : index} : memref<16x16xf16> -> !gpu.mma_matrix<16x16xf16, "COp">		// CHECK-DAG: %[[C:.+]] = gpu.subgroup_mma_load_matrix %{{.}}[%{{.}}, %{{.*}}] {leadDimension = 16 : index} : memref<16x16xf16> -> !gpu.mma_matrix<16x16xf16, "COp">
// CHECK: %[[D:.+]] = gpu.subgroup_mma_compute %[[A]], %[[B]], %[[C]] : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">		// CHECK: %[[D:.+]] = gpu.subgroup_mma_compute %[[A]], %[[B]], %[[C]] : !gpu.mma_matrix<16x16xf16, "AOp">, !gpu.mma_matrix<16x16xf16, "BOp"> -> !gpu.mma_matrix<16x16xf16, "COp">
// CHECK: gpu.subgroup_mma_store_matrix %[[D]], %{{.}}[%{{.}}, %{{.*}}] {leadDimension = 16 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<16x16xf16>		// CHECK: gpu.subgroup_mma_store_matrix %[[D]], %{{.}}[%{{.}}, %{{.*}}] {leadDimension = 16 : index} : !gpu.mma_matrix<16x16xf16, "COp">, memref<16x16xf16>
func.func @matmul_transposed_broadcasted_1d(%arg0: memref<16xf16>, %arg1: memref<16xf16>, %arg2: memref<16x16xf16>) {		func.func @matmul_transposed_broadcasted_1d(%arg0: memref<16xf16>, %arg1: memref<16xf16>, %arg2: memref<16x16xf16>) {
%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>		%cst_0 = arith.constant dense<0.000000e+00> : vector<16x16xf16>
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f16		%cst = arith.constant 0.000000e+00 : f16
%A = vector.transfer_read %arg0[%c0], %cst {in_bounds = [true, true], permutation_map = affine_map<(d0) -> (d0, 0)>} : memref<16xf16>, vector<16x16xf16>		%A = vector.transfer_read %arg0[%c0], %cst {in_bounds = [true], permutation_map = affine_map<(d0) -> (d0, 0)>} : memref<16xf16>, vector<16x16xf16>
%B = vector.transfer_read %arg1[%c0], %cst {in_bounds = [true, true], permutation_map = affine_map<(d0) -> (d0, 0)>} : memref<16xf16>, vector<16x16xf16>		%B = vector.transfer_read %arg1[%c0], %cst {in_bounds = [true], permutation_map = affine_map<(d0) -> (d0, 0)>} : memref<16xf16>, vector<16x16xf16>
%C = vector.transfer_read %arg2[%c0, %c0], %cst {in_bounds = [true, true]} : memref<16x16xf16>, vector<16x16xf16>		%C = vector.transfer_read %arg2[%c0, %c0], %cst {in_bounds = [true, true]} : memref<16x16xf16>, vector<16x16xf16>
%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>		%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A, %B, %C : vector<16x16xf16>, vector<16x16xf16> into vector<16x16xf16>
vector.transfer_write %D, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<16x16xf16>		vector.transfer_write %D, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xf16>, memref<16x16xf16>
return		return
}		}

// -----		// -----

▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	func.func @matmul_mixed_signedness_int8(%arg0: memref<16x32xi8>, %arg1: memref<16x32xi8>, %arg2: memref<16x16xi32>) {
%Br = vector.transfer_read %arg1[%c0, %c0], %cst_i8 {permutation_map = #map0, in_bounds = [true, true]} : memref<16x32xi8>, vector<16x32xi8>		%Br = vector.transfer_read %arg1[%c0, %c0], %cst_i8 {permutation_map = #map0, in_bounds = [true, true]} : memref<16x32xi8>, vector<16x32xi8>
%C = vector.transfer_read %arg2[%c0, %c0], %cst_i32 {in_bounds = [true, true]} : memref<16x16xi32>, vector<16x16xi32>		%C = vector.transfer_read %arg2[%c0, %c0], %cst_i32 {in_bounds = [true, true]} : memref<16x16xi32>, vector<16x16xi32>
%Ae = arith.extui %Ar : vector<16x32xi8> to vector<16x32xi32>		%Ae = arith.extui %Ar : vector<16x32xi8> to vector<16x32xi32>
%Be = arith.extsi %Br : vector<16x32xi8> to vector<16x32xi32>		%Be = arith.extsi %Br : vector<16x32xi8> to vector<16x32xi32>
%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %Ae, %Be, %C : vector<16x32xi32>, vector<16x32xi32> into vector<16x16xi32>		%D = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %Ae, %Be, %C : vector<16x32xi32>, vector<16x32xi32> into vector<16x16xi32>
vector.transfer_write %D, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xi32>, memref<16x16xi32>		vector.transfer_write %D, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x16xi32>, memref<16x16xi32>
return		return
}		}
No newline at end of file		No newline at end of file

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

	Show First 20 Lines • Show All 1,710 Lines • ▼ Show 20 Lines
	// CHECK: llvm.intr.masked.store %[[loaded]], %{{.}}, %{{.}} {alignment = 8 : i32} :			// CHECK: llvm.intr.masked.store %[[loaded]], %{{.}}, %{{.}} {alignment = 8 : i32} :
	// CHECK-SAME: vector<17xi64>, vector<17xi1> into !llvm.ptr			// CHECK-SAME: vector<17xi64>, vector<17xi1> into !llvm.ptr

	// -----			// -----

	func.func @transfer_read_2d_to_1d(%A : memref<?x?xf32>, %base0: index, %base1: index) -> vector<17xf32> {			func.func @transfer_read_2d_to_1d(%A : memref<?x?xf32>, %base0: index, %base1: index) -> vector<17xf32> {
	%f7 = arith.constant 7.0: f32			%f7 = arith.constant 7.0: f32
	%f = vector.transfer_read %A[%base0, %base1], %f7			%f = vector.transfer_read %A[%base0, %base1], %f7
	{permutation_map = affine_map<(d0, d1) -> (d1)>} :			{permutation_map = affine_map<(d0, d1) -> (d1)>,
				in_bounds = [true, false]} :
	memref<?x?xf32>, vector<17xf32>			memref<?x?xf32>, vector<17xf32>
	return %f: vector<17xf32>			return %f: vector<17xf32>
	}			}
	// CHECK-LABEL: func @transfer_read_2d_to_1d			// CHECK-LABEL: func @transfer_read_2d_to_1d
	// CHECK-SAME: %[[BASE_0:[a-zA-Z0-9]]]: index, %[[BASE_1:[a-zA-Z0-9]]]: index) -> vector<17xf32>			// CHECK-SAME: %[[BASE_0:[a-zA-Z0-9]]]: index, %[[BASE_1:[a-zA-Z0-9]]]: index) -> vector<17xf32>
	// CHECK: %[[c1:.*]] = arith.constant 1 : index			// CHECK: %[[c1:.*]] = arith.constant 1 : index
	// CHECK: %[[DIM:.]] = memref.dim %{{.}}, %[[c1]] : memref<?x?xf32>			// CHECK: %[[DIM:.]] = memref.dim %{{.}}, %[[c1]] : memref<?x?xf32>
	//			//
	▲ Show 20 Lines • Show All 548 Lines • Show Last 20 Lines

mlir/test/Conversion/VectorToSCF/tensor-transfer-ops.mlir

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf{lower-tensors=true}))" -split-input-file -allow-unregistered-dialect \| FileCheck %s			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf{lower-tensors=true}))" -split-input-file -allow-unregistered-dialect \| FileCheck %s

	// CHECK-LABEL: func @transfer_read_2d(			// CHECK-LABEL: func @transfer_read_2d(
	// CHECK: %[[ALLOC:.*]] = memref.alloca() : memref<vector<4x9xf32>>			// CHECK: %[[ALLOC:.*]] = memref.alloca() : memref<vector<4x9xf32>>
	// CHECK: %[[CASTED:.*]] = vector.type_cast %[[ALLOC]] : memref<vector<4x9xf32>> to memref<4xvector<9xf32>>			// CHECK: %[[CASTED:.*]] = vector.type_cast %[[ALLOC]] : memref<vector<4x9xf32>> to memref<4xvector<9xf32>>
	// CHECK: scf.for {{.*}} {			// CHECK: scf.for {{.*}} {
	// CHECK: %[[READ:.]] = vector.transfer_read %{{.}}[{{.*}}], %cst {in_bounds = [true]} : tensor<?x?xf32>, vector<9xf32>			// CHECK: %[[READ:.]] = vector.transfer_read %{{.}}[{{.*}}], %cst {in_bounds = [true, true]} : tensor<?x?xf32>, vector<9xf32>
	// CHECK: memref.store %[[READ]], %[[CASTED]][%{{.*}}] : memref<4xvector<9xf32>>			// CHECK: memref.store %[[READ]], %[[CASTED]][%{{.*}}] : memref<4xvector<9xf32>>
	// CHECK: }			// CHECK: }
	// CHECK: %[[LOADED:.*]] = memref.load %[[ALLOC]][] : memref<vector<4x9xf32>>			// CHECK: %[[LOADED:.*]] = memref.load %[[ALLOC]][] : memref<vector<4x9xf32>>
	// CHECK: return %[[LOADED]] : vector<4x9xf32>			// CHECK: return %[[LOADED]] : vector<4x9xf32>
	func.func @transfer_read_2d(%A : tensor<?x?xf32>, %base1 : index, %base2 : index)			func.func @transfer_read_2d(%A : tensor<?x?xf32>, %base1 : index, %base2 : index)
	-> (vector<4x9xf32>){			-> (vector<4x9xf32>){
	%p = arith.constant -42.0: f32			%p = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%base1, %base2], %p {in_bounds = [true, true]}			%f = vector.transfer_read %A[%base1, %base2], %p {in_bounds = [true, true]}
	: tensor<?x?xf32>, vector<4x9xf32>			: tensor<?x?xf32>, vector<4x9xf32>
	return %f : vector<4x9xf32>			return %f : vector<4x9xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_write_2d(			// CHECK-LABEL: func @transfer_write_2d(
	// CHECK: %[[ALLOC:.*]] = memref.alloca() : memref<vector<2x3xf32>>			// CHECK: %[[ALLOC:.*]] = memref.alloca() : memref<vector<2x3xf32>>
	// CHECK: memref.store {{.*}}, %[[ALLOC]][] : memref<vector<2x3xf32>>			// CHECK: memref.store {{.*}}, %[[ALLOC]][] : memref<vector<2x3xf32>>
	// CHECK: %[[CASTED:.*]] = vector.type_cast %[[ALLOC]] : memref<vector<2x3xf32>> to memref<2xvector<3xf32>>			// CHECK: %[[CASTED:.*]] = vector.type_cast %[[ALLOC]] : memref<vector<2x3xf32>> to memref<2xvector<3xf32>>
	// CHECK: %[[RESULT:.]] = scf.for {{.}} iter_args(%[[STATE:.]] = %{{.}}) -> (tensor<?x?xf32>) {			// CHECK: %[[RESULT:.]] = scf.for {{.}} iter_args(%[[STATE:.]] = %{{.}}) -> (tensor<?x?xf32>) {
	// CHECK: %[[LOADED:.]] = memref.load %[[CASTED]][%{{.}}] : memref<2xvector<3xf32>>			// CHECK: %[[LOADED:.]] = memref.load %[[CASTED]][%{{.}}] : memref<2xvector<3xf32>>
	// CHECK: %[[WRITE:.]] = vector.transfer_write %[[LOADED]], %[[STATE]][{{.}}] {in_bounds = [true]} : vector<3xf32>, tensor<?x?xf32>			// CHECK: %[[WRITE:.]] = vector.transfer_write %[[LOADED]], %[[STATE]][{{.}}] {in_bounds = [true, true]} : vector<3xf32>, tensor<?x?xf32>
	// CHECK: scf.yield %[[WRITE]] : tensor<?x?xf32>			// CHECK: scf.yield %[[WRITE]] : tensor<?x?xf32>
	// CHECK: }			// CHECK: }
	// CHECK: return %[[RESULT]] : tensor<?x?xf32>			// CHECK: return %[[RESULT]] : tensor<?x?xf32>
	func.func @transfer_write_2d(%A : tensor<?x?xf32>, %vec : vector<2x3xf32>,			func.func @transfer_write_2d(%A : tensor<?x?xf32>, %vec : vector<2x3xf32>,
	%base1 : index, %base2 : index) -> (tensor<?x?xf32>) {			%base1 : index, %base2 : index) -> (tensor<?x?xf32>) {
	%t = vector.transfer_write %vec, %A[%base1, %base2] {in_bounds = [true, true]}			%t = vector.transfer_write %vec, %A[%base1, %base2] {in_bounds = [true, true]}
	: vector<2x3xf32>, tensor<?x?xf32>			: vector<2x3xf32>, tensor<?x?xf32>
	return %t : tensor<?x?xf32>			return %t : tensor<?x?xf32>
	}			}

mlir/test/Conversion/VectorToSCF/unrolled-tensor-transfer-ops.mlir

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf{full-unroll=true lower-tensors=true}))" -split-input-file -allow-unregistered-dialect \| FileCheck %s			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf{full-unroll=true lower-tensors=true}))" -split-input-file -allow-unregistered-dialect \| FileCheck %s

	// CHECK-LABEL: func @transfer_read_2d(			// CHECK-LABEL: func @transfer_read_2d(
	// CHECK: %[[V_INIT:.*]] = arith.constant dense<-4.200000e+01> : vector<4x9xf32>			// CHECK: %[[V_INIT:.*]] = arith.constant dense<-4.200000e+01> : vector<4x9xf32>
	// CHECK: %[[V0:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true]} : tensor<?x?xf32>, vector<9xf32>			// CHECK: %[[V0:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<9xf32>
	// CHECK: %[[I0:.*]] = vector.insert %[[V0]], %[[V_INIT]] [0] : vector<9xf32> into vector<4x9xf32>			// CHECK: %[[I0:.*]] = vector.insert %[[V0]], %[[V_INIT]] [0] : vector<9xf32> into vector<4x9xf32>
	// CHECK: %[[V1:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true]} : tensor<?x?xf32>, vector<9xf32>			// CHECK: %[[V1:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<9xf32>
	// CHECK: %[[I1:.*]] = vector.insert %[[V1]], %[[I0]] [1] : vector<9xf32> into vector<4x9xf32>			// CHECK: %[[I1:.*]] = vector.insert %[[V1]], %[[I0]] [1] : vector<9xf32> into vector<4x9xf32>
	// CHECK: %[[V2:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true]} : tensor<?x?xf32>, vector<9xf32>			// CHECK: %[[V2:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<9xf32>
	// CHECK: %[[I2:.*]] = vector.insert %[[V2]], %[[I1]] [2] : vector<9xf32> into vector<4x9xf32>			// CHECK: %[[I2:.*]] = vector.insert %[[V2]], %[[I1]] [2] : vector<9xf32> into vector<4x9xf32>
	// CHECK: %[[V3:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true]} : tensor<?x?xf32>, vector<9xf32>			// CHECK: %[[V3:.]] = vector.transfer_read %{{.}}[{{.}}], %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<9xf32>
	// CHECK: %[[I3:.*]] = vector.insert %[[V3]], %[[I2]] [3] : vector<9xf32> into vector<4x9xf32>			// CHECK: %[[I3:.*]] = vector.insert %[[V3]], %[[I2]] [3] : vector<9xf32> into vector<4x9xf32>
	// CHECK: return %[[I3]] : vector<4x9xf32>			// CHECK: return %[[I3]] : vector<4x9xf32>
	func.func @transfer_read_2d(%A : tensor<?x?xf32>, %base1 : index, %base2 : index)			func.func @transfer_read_2d(%A : tensor<?x?xf32>, %base1 : index, %base2 : index)
	-> (vector<4x9xf32>){			-> (vector<4x9xf32>){
	%p = arith.constant -42.0: f32			%p = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%base1, %base2], %p {in_bounds = [true, true]}			%f = vector.transfer_read %A[%base1, %base2], %p {in_bounds = [true, true]}
	: tensor<?x?xf32>, vector<4x9xf32>			: tensor<?x?xf32>, vector<4x9xf32>
	return %f : vector<4x9xf32>			return %f : vector<4x9xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_write_2d(			// CHECK-LABEL: func @transfer_write_2d(
	// CHECK: %[[V0:.]] = vector.extract %{{.}}[0] : vector<2x3xf32>			// CHECK: %[[V0:.]] = vector.extract %{{.}}[0] : vector<2x3xf32>
	// CHECK: %[[T0:.]] = vector.transfer_write %[[V0]], %{{.}}[{{.*}}] {in_bounds = [true]} : vector<3xf32>, tensor<?x?xf32>			// CHECK: %[[T0:.]] = vector.transfer_write %[[V0]], %{{.}}[{{.*}}] {in_bounds = [true, true]} : vector<3xf32>, tensor<?x?xf32>
	// CHECK: %[[V1:.]] = vector.extract %{{.}}[1] : vector<2x3xf32>			// CHECK: %[[V1:.]] = vector.extract %{{.}}[1] : vector<2x3xf32>
	// CHECK: %[[T1:.]] = vector.transfer_write %[[V1]], %[[T0]][{{.}}] {in_bounds = [true]} : vector<3xf32>, tensor<?x?xf32>			// CHECK: %[[T1:.]] = vector.transfer_write %[[V1]], %[[T0]][{{.}}] {in_bounds = [true, true]} : vector<3xf32>, tensor<?x?xf32>
	// CHECK: return %[[T1]] : tensor<?x?xf32>			// CHECK: return %[[T1]] : tensor<?x?xf32>
	func.func @transfer_write_2d(%A : tensor<?x?xf32>, %vec : vector<2x3xf32>,			func.func @transfer_write_2d(%A : tensor<?x?xf32>, %vec : vector<2x3xf32>,
	%base1 : index, %base2 : index) -> (tensor<?x?xf32>) {			%base1 : index, %base2 : index) -> (tensor<?x?xf32>) {
	%t = vector.transfer_write %vec, %A[%base1, %base2] {in_bounds = [true, true]}			%t = vector.transfer_write %vec, %A[%base1, %base2] {in_bounds = [true, true]}
	: vector<2x3xf32>, tensor<?x?xf32>			: vector<2x3xf32>, tensor<?x?xf32>
	return %t : tensor<?x?xf32>			return %t : tensor<?x?xf32>
	}			}

mlir/test/Conversion/VectorToSCF/vector-to-scf-mask-and-permutation-map.mlir

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf))" -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf))" -split-input-file \| FileCheck %s

	// Ensure that the permutation map is lowered (by inserting a transpose op)			// Ensure that the permutation map is lowered (by inserting a transpose op)
	// before lowering the vector.transfer_read.			// before lowering the vector.transfer_read.

	// CHECK-LABEL: func @transfer_read_2d_mask_transposed(			// CHECK-LABEL: func @transfer_read_2d_mask_transposed(
	// CHECK-DAG: %[[PADDING:.*]] = arith.constant dense<-4.200000e+01> : vector<9xf32>			// CHECK-DAG: %[[PADDING:.*]] = arith.constant dense<-4.200000e+01> : vector<9xf32>
	// CHECK-DAG: %[[MASK:.]] = arith.constant dense<{{.}}> : vector<4x9xi1>			// CHECK-DAG: %[[MASK:.]] = arith.constant dense<{{.}}> : vector<4x9xi1>
	// CHECK: %[[MASK_MEM:.*]] = memref.alloca() : memref<vector<4x9xi1>>			// CHECK: %[[MASK_MEM:.*]] = memref.alloca() : memref<vector<4x9xi1>>
	// CHECK: memref.store %[[MASK]], %[[MASK_MEM]][] : memref<vector<4x9xi1>>			// CHECK: memref.store %[[MASK]], %[[MASK_MEM]][] : memref<vector<4x9xi1>>
	// CHECK: %[[MASK_CASTED:.*]] = vector.type_cast %[[MASK_MEM]] : memref<vector<4x9xi1>> to memref<4xvector<9xi1>>			// CHECK: %[[MASK_CASTED:.*]] = vector.type_cast %[[MASK_MEM]] : memref<vector<4x9xi1>> to memref<4xvector<9xi1>>
	// CHECK: scf.for {{.*}} {			// CHECK: scf.for {{.*}} {
	// CHECK: scf.if {{.*}} {			// CHECK: scf.if {{.*}} {
	// CHECK: %[[MASK_LOADED:.]] = memref.load %[[MASK_CASTED]][%{{.}}] : memref<4xvector<9xi1>>			// CHECK: %[[MASK_LOADED:.]] = memref.load %[[MASK_CASTED]][%{{.}}] : memref<4xvector<9xi1>>
	// CHECK: %[[READ:.]] = vector.transfer_read %{{.}}, %{{.*}}, %[[MASK_LOADED]] : memref<?x?xf32>, vector<9xf32>			// CHECK: %[[READ:.]] = vector.transfer_read %{{.}}, %{{.*}}, %[[MASK_LOADED]] {in_bounds = [true, false]} : memref<?x?xf32>, vector<9xf32>
	// CHECK: memref.store %[[READ]], %{{.*}} : memref<4xvector<9xf32>>			// CHECK: memref.store %[[READ]], %{{.*}} : memref<4xvector<9xf32>>
	// CHECK: }			// CHECK: }
	// CHECK: }			// CHECK: }
	// CHECK: %[[RESULT:.]] = memref.load %{{.}} : memref<vector<4x9xf32>>			// CHECK: %[[RESULT:.]] = memref.load %{{.}} : memref<vector<4x9xf32>>
	// CHECK: %[[RESULT_T:.*]] = vector.transpose %[[RESULT]], [1, 0] : vector<4x9xf32> to vector<9x4xf32>			// CHECK: %[[RESULT_T:.*]] = vector.transpose %[[RESULT]], [1, 0] : vector<4x9xf32> to vector<9x4xf32>
	// CHECK: return %[[RESULT_T]] : vector<9x4xf32>			// CHECK: return %[[RESULT_T]] : vector<9x4xf32>

	// Vector load with mask + transpose.			// Vector load with mask + transpose.
	Show All 12 Lines

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir

Show All 20 Lines
// -----		// -----

// CHECK-LABEL: func @materialize_read_1d() {		// CHECK-LABEL: func @materialize_read_1d() {
func.func @materialize_read_1d() {		func.func @materialize_read_1d() {
%f0 = arith.constant 0.0: f32		%f0 = arith.constant 0.0: f32
%A = memref.alloc () : memref<7x42xf32>		%A = memref.alloc () : memref<7x42xf32>
affine.for %i0 = 0 to 7 step 4 {		affine.for %i0 = 0 to 7 step 4 {
affine.for %i1 = 0 to 42 step 4 {		affine.for %i1 = 0 to 42 step 4 {
%f1 = vector.transfer_read %A[%i0, %i1], %f0 {permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>		%f1 = vector.transfer_read %A[%i0, %i1], %f0 {in_bounds = [false, true], permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>
%ip1 = affine.apply affine_map<(d0) -> (d0 + 1)> (%i1)		%ip1 = affine.apply affine_map<(d0) -> (d0 + 1)> (%i1)
%f2 = vector.transfer_read %A[%i0, %ip1], %f0 {permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>		%f2 = vector.transfer_read %A[%i0, %ip1], %f0 {in_bounds = [false, true],permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>
%ip2 = affine.apply affine_map<(d0) -> (d0 + 2)> (%i1)		%ip2 = affine.apply affine_map<(d0) -> (d0 + 2)> (%i1)
%f3 = vector.transfer_read %A[%i0, %ip2], %f0 {permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>		%f3 = vector.transfer_read %A[%i0, %ip2], %f0 {in_bounds = [false, true], permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>
%ip3 = affine.apply affine_map<(d0) -> (d0 + 3)> (%i1)		%ip3 = affine.apply affine_map<(d0) -> (d0 + 3)> (%i1)
%f4 = vector.transfer_read %A[%i0, %ip3], %f0 {permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>		%f4 = vector.transfer_read %A[%i0, %ip3], %f0 {in_bounds = [false, true], permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<7x42xf32>, vector<4xf32>
// Both accesses in the load must be clipped otherwise %i1 + 2 and %i1 + 3 will go out of bounds.		// Both accesses in the load must be clipped otherwise %i1 + 2 and %i1 + 3 will go out of bounds.
// CHECK: scf.if		// CHECK: scf.if
// CHECK-NEXT: memref.load		// CHECK-NEXT: memref.load
// CHECK-NEXT: vector.insertelement		// CHECK-NEXT: vector.insertelement
// CHECK-NEXT: scf.yield		// CHECK-NEXT: scf.yield
// CHECK-NEXT: else		// CHECK-NEXT: else
// CHECK-NEXT: scf.yield		// CHECK-NEXT: scf.yield
// Add a dummy use to prevent dead code elimination from removing transfer		// Add a dummy use to prevent dead code elimination from removing transfer
Show All 10 Lines
func.func @materialize_read_1d_partially_specialized(%dyn1 : index, %dyn2 : index, %dyn4 : index) {		func.func @materialize_read_1d_partially_specialized(%dyn1 : index, %dyn2 : index, %dyn4 : index) {
%f0 = arith.constant 0.0: f32		%f0 = arith.constant 0.0: f32
%A = memref.alloc (%dyn1, %dyn2, %dyn4) : memref<7x?x?x42x?xf32>		%A = memref.alloc (%dyn1, %dyn2, %dyn4) : memref<7x?x?x42x?xf32>
affine.for %i0 = 0 to 7 {		affine.for %i0 = 0 to 7 {
affine.for %i1 = 0 to %dyn1 {		affine.for %i1 = 0 to %dyn1 {
affine.for %i2 = 0 to %dyn2 {		affine.for %i2 = 0 to %dyn2 {
affine.for %i3 = 0 to 42 step 2 {		affine.for %i3 = 0 to 42 step 2 {
affine.for %i4 = 0 to %dyn4 {		affine.for %i4 = 0 to %dyn4 {
%f1 = vector.transfer_read %A[%i0, %i1, %i2, %i3, %i4], %f0 {permutation_map = affine_map<(d0, d1, d2, d3, d4) -> (d3)>} : memref<7x?x?x42x?xf32>, vector<4xf32>		%f1 = vector.transfer_read %A[%i0, %i1, %i2, %i3, %i4], %f0 {in_bounds = [true, true, true, false, true], permutation_map = affine_map<(d0, d1, d2, d3, d4) -> (d3)>} : memref<7x?x?x42x?xf32>, vector<4xf32>
%i3p1 = affine.apply affine_map<(d0) -> (d0 + 1)> (%i3)		%i3p1 = affine.apply affine_map<(d0) -> (d0 + 1)> (%i3)
%f2 = vector.transfer_read %A[%i0, %i1, %i2, %i3p1, %i4], %f0 {permutation_map = affine_map<(d0, d1, d2, d3, d4) -> (d3)>} : memref<7x?x?x42x?xf32>, vector<4xf32>		%f2 = vector.transfer_read %A[%i0, %i1, %i2, %i3p1, %i4], %f0 {in_bounds = [true, true, true, false, true], permutation_map = affine_map<(d0, d1, d2, d3, d4) -> (d3)>} : memref<7x?x?x42x?xf32>, vector<4xf32>
// Add a dummy use to prevent dead code elimination from removing		// Add a dummy use to prevent dead code elimination from removing
// transfer read ops.		// transfer read ops.
"dummy_use"(%f1, %f2) : (vector<4xf32>, vector<4xf32>) -> ()		"dummy_use"(%f1, %f2) : (vector<4xf32>, vector<4xf32>) -> ()
}		}
}		}
}		}
}		}
}		}
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	func.func @materialize_read(%M: index, %N: index, %O: index, %P: index) {
// Check that I0 + I4 (of size 3) read from first index load(L0, ...) and write into last index store(..., I4)		// Check that I0 + I4 (of size 3) read from first index load(L0, ...) and write into last index store(..., I4)
// Check that I3 + I6 (of size 5) read from last index load(..., L3) and write into first index store(I6, ...)		// Check that I3 + I6 (of size 5) read from last index load(..., L3) and write into first index store(I6, ...)
// Other dimensions are just accessed with I1, I2 resp.		// Other dimensions are just accessed with I1, I2 resp.
%A = memref.alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0>		%A = memref.alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0>
affine.for %i0 = 0 to %M step 3 {		affine.for %i0 = 0 to %M step 3 {
affine.for %i1 = 0 to %N {		affine.for %i1 = 0 to %N {
affine.for %i2 = 0 to %O {		affine.for %i2 = 0 to %O {
affine.for %i3 = 0 to %P step 5 {		affine.for %i3 = 0 to %P step 5 {
%f = vector.transfer_read %A[%i0, %i1, %i2, %i3], %f0 {permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, 0, d0)>} : memref<?x?x?x?xf32>, vector<5x4x3xf32>		%f = vector.transfer_read %A[%i0, %i1, %i2, %i3], %f0 {in_bounds = [false, true, true, false], permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, 0, d0)>} : memref<?x?x?x?xf32>, vector<5x4x3xf32>
// Add a dummy use to prevent dead code elimination from removing		// Add a dummy use to prevent dead code elimination from removing
// transfer read ops.		// transfer read ops.
"dummy_use"(%f) : (vector<5x4x3xf32>) -> ()		"dummy_use"(%f) : (vector<5x4x3xf32>) -> ()
}		}
}		}
}		}
}		}
return		return
Show All 24 Lines	func.func @materialize_write(%M: index, %N: index, %O: index, %P: index) {
// CHECK: %[[VECTOR_VIEW2:.*]] = vector.type_cast %[[VECTOR_VIEW1]] : memref<3xvector<4x1x5xf32>> to memref<3x4xvector<1x5xf32>>		// CHECK: %[[VECTOR_VIEW2:.*]] = vector.type_cast %[[VECTOR_VIEW1]] : memref<3xvector<4x1x5xf32>> to memref<3x4xvector<1x5xf32>>
// CHECK: scf.for %[[I5:.*]] = %[[C0]] to %[[C4]] step %[[C1]] {		// CHECK: scf.for %[[I5:.*]] = %[[C0]] to %[[C4]] step %[[C1]] {
// CHECK: scf.if		// CHECK: scf.if
// CHECK: %[[S1:.*]] = affine.apply #[[$ADD]](%[[I1]], %[[I5]])		// CHECK: %[[S1:.*]] = affine.apply #[[$ADD]](%[[I1]], %[[I5]])
// CHECK: %[[VECTOR_VIEW3:.*]] = vector.type_cast %[[VECTOR_VIEW2]] : memref<3x4xvector<1x5xf32>> to memref<3x4x1xvector<5xf32>>		// CHECK: %[[VECTOR_VIEW3:.*]] = vector.type_cast %[[VECTOR_VIEW2]] : memref<3x4xvector<1x5xf32>> to memref<3x4x1xvector<5xf32>>
// CHECK: scf.for %[[I6:.*]] = %[[C0]] to %[[C1]] step %[[C1]] {		// CHECK: scf.for %[[I6:.*]] = %[[C0]] to %[[C1]] step %[[C1]] {
// CHECK: %[[S0:.*]] = affine.apply #[[$ADD]](%[[I2]], %[[I6]])		// CHECK: %[[S0:.*]] = affine.apply #[[$ADD]](%[[I2]], %[[I6]])
// CHECK: %[[VEC:.*]] = memref.load %[[VECTOR_VIEW3]][%[[I4]], %[[I5]], %[[I6]]] : memref<3x4x1xvector<5xf32>>		// CHECK: %[[VEC:.*]] = memref.load %[[VECTOR_VIEW3]][%[[I4]], %[[I5]], %[[I6]]] : memref<3x4x1xvector<5xf32>>
// CHECK: vector.transfer_write %[[VEC]], %{{.*}}[%[[S3]], %[[S1]], %[[S0]], %[[I3]]] : vector<5xf32>, memref<?x?x?x?xf32>		// CHECK: vector.transfer_write %[[VEC]], %{{.*}}[%[[S3]], %[[S1]], %[[S0]], %[[I3]]] {in_bounds = [true, true, true, false]} : vector<5xf32>, memref<?x?x?x?xf32>
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: return		// CHECK: return

// Check that I0 + I4 (of size 3) read from last index load(..., I4) and write into first index store(S0, ...)		// Check that I0 + I4 (of size 3) read from last index load(..., I4) and write into first index store(S0, ...)
// Check that I1 + I5 (of size 4) read from second index load(..., I5, ...) and write into second index store(..., S1, ...)		// Check that I1 + I5 (of size 4) read from second index load(..., I5, ...) and write into second index store(..., S1, ...)
// Check that I3 + I6 (of size 5) read from first index load(I6, ...) and write into last index store(..., S3)		// Check that I3 + I6 (of size 5) read from first index load(I6, ...) and write into last index store(..., S3)
// Other dimension is just accessed with I2.		// Other dimension is just accessed with I2.
%A = memref.alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0>		%A = memref.alloc (%M, %N, %O, %P) : memref<?x?x?x?xf32, 0>
%f1 = arith.constant dense<1.000000e+00> : vector<5x4x3xf32>		%f1 = arith.constant dense<1.000000e+00> : vector<5x4x3xf32>
affine.for %i0 = 0 to %M step 3 {		affine.for %i0 = 0 to %M step 3 {
affine.for %i1 = 0 to %N step 4 {		affine.for %i1 = 0 to %N step 4 {
affine.for %i2 = 0 to %O {		affine.for %i2 = 0 to %O {
affine.for %i3 = 0 to %P step 5 {		affine.for %i3 = 0 to %P step 5 {
vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3] {permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, d1, d0)>} : vector<5x4x3xf32>, memref<?x?x?x?xf32>		vector.transfer_write %f1, %A[%i0, %i1, %i2, %i3] {in_bounds = [false, false, true, false], permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, d1, d0)>} : vector<5x4x3xf32>, memref<?x?x?x?xf32>
}		}
}		}
}		}
}		}
return		return
}		}

// -----		// -----
Show All 21 Lines	func.func @transfer_read_progressive(%A : memref<?x?xf32>, %base: index) -> vector<3x15xf32> {
// CHECK-DAG: %[[splat:.*]] = arith.constant dense<7.000000e+00> : vector<15xf32>		// CHECK-DAG: %[[splat:.*]] = arith.constant dense<7.000000e+00> : vector<15xf32>
// CHECK-DAG: %[[alloc:.*]] = memref.alloca() : memref<vector<3x15xf32>>		// CHECK-DAG: %[[alloc:.*]] = memref.alloca() : memref<vector<3x15xf32>>
// CHECK: %[[alloc_casted:.*]] = vector.type_cast %[[alloc]] : memref<vector<3x15xf32>> to memref<3xvector<15xf32>>		// CHECK: %[[alloc_casted:.*]] = vector.type_cast %[[alloc]] : memref<vector<3x15xf32>> to memref<3xvector<15xf32>>
// CHECK: scf.for %[[I:.*]] = %[[C0]] to %[[C3]]		// CHECK: scf.for %[[I:.*]] = %[[C0]] to %[[C3]]
// CHECK: %[[dim:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>		// CHECK: %[[dim:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>
// CHECK: %[[add:.*]] = affine.apply #[[$MAP0]](%[[I]])[%[[base]]]		// CHECK: %[[add:.*]] = affine.apply #[[$MAP0]](%[[I]])[%[[base]]]
// CHECK: %[[cond1:.*]] = arith.cmpi sgt, %[[dim]], %[[add]] : index		// CHECK: %[[cond1:.*]] = arith.cmpi sgt, %[[dim]], %[[add]] : index
// CHECK: scf.if %[[cond1]] {		// CHECK: scf.if %[[cond1]] {
// CHECK: %[[vec_1d:.]] = vector.transfer_read %[[A]][%{{.}}, %[[base]]], %[[C7]] : memref<?x?xf32>, vector<15xf32>		// CHECK: %[[vec_1d:.]] = vector.transfer_read %[[A]][%{{.}}, %[[base]]], %[[C7]] {in_bounds = [true, false]} : memref<?x?xf32>, vector<15xf32>
// CHECK: memref.store %[[vec_1d]], %[[alloc_casted]][%[[I]]] : memref<3xvector<15xf32>>		// CHECK: memref.store %[[vec_1d]], %[[alloc_casted]][%[[I]]] : memref<3xvector<15xf32>>
// CHECK: } else {		// CHECK: } else {
// CHECK: store %[[splat]], %[[alloc_casted]][%[[I]]] : memref<3xvector<15xf32>>		// CHECK: store %[[splat]], %[[alloc_casted]][%[[I]]] : memref<3xvector<15xf32>>
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: %[[cst:.*]] = memref.load %[[alloc]][] : memref<vector<3x15xf32>>		// CHECK: %[[cst:.*]] = memref.load %[[alloc]][] : memref<vector<3x15xf32>>

// FULL-UNROLL-DAG: %[[C7:.*]] = arith.constant 7.000000e+00 : f32		// FULL-UNROLL-DAG: %[[C7:.*]] = arith.constant 7.000000e+00 : f32
// FULL-UNROLL-DAG: %[[VEC0:.*]] = arith.constant dense<7.000000e+00> : vector<3x15xf32>		// FULL-UNROLL-DAG: %[[VEC0:.*]] = arith.constant dense<7.000000e+00> : vector<3x15xf32>
// FULL-UNROLL-DAG: %[[C0:.*]] = arith.constant 0 : index		// FULL-UNROLL-DAG: %[[C0:.*]] = arith.constant 0 : index
// FULL-UNROLL: %[[DIM:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>		// FULL-UNROLL: %[[DIM:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>
// FULL-UNROLL: cmpi sgt, %[[DIM]], %[[base]] : index		// FULL-UNROLL: cmpi sgt, %[[DIM]], %[[base]] : index
// FULL-UNROLL: %[[VEC1:.]] = scf.if %{{.}} -> (vector<3x15xf32>) {		// FULL-UNROLL: %[[VEC1:.]] = scf.if %{{.}} -> (vector<3x15xf32>) {
// FULL-UNROLL: vector.transfer_read %[[A]][%[[base]], %[[base]]], %[[C7]] : memref<?x?xf32>, vector<15xf32>		// FULL-UNROLL: vector.transfer_read %[[A]][%[[base]], %[[base]]], %[[C7]] {in_bounds = [true, false]} : memref<?x?xf32>, vector<15xf32>
// FULL-UNROLL: vector.insert %{{.*}}, %[[VEC0]] [0] : vector<15xf32> into vector<3x15xf32>		// FULL-UNROLL: vector.insert %{{.*}}, %[[VEC0]] [0] : vector<15xf32> into vector<3x15xf32>
// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>		// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>
// FULL-UNROLL: } else {		// FULL-UNROLL: } else {
// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>		// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>
// FULL-UNROLL: }		// FULL-UNROLL: }
// FULL-UNROLL: affine.apply #[[$MAP1]]()[%[[base]]]		// FULL-UNROLL: affine.apply #[[$MAP1]]()[%[[base]]]
// FULL-UNROLL: cmpi sgt, %{{.}}, %{{.}} : index		// FULL-UNROLL: cmpi sgt, %{{.}}, %{{.}} : index
// FULL-UNROLL: %[[VEC2:.]] = scf.if %{{.}} -> (vector<3x15xf32>) {		// FULL-UNROLL: %[[VEC2:.]] = scf.if %{{.}} -> (vector<3x15xf32>) {
// FULL-UNROLL: vector.transfer_read %[[A]][%{{.*}}, %[[base]]], %[[C7]] : memref<?x?xf32>, vector<15xf32>		// FULL-UNROLL: vector.transfer_read %[[A]][%{{.*}}, %[[base]]], %[[C7]] {in_bounds = [true, false]} : memref<?x?xf32>, vector<15xf32>
// FULL-UNROLL: vector.insert %{{.*}}, %[[VEC1]] [1] : vector<15xf32> into vector<3x15xf32>		// FULL-UNROLL: vector.insert %{{.*}}, %[[VEC1]] [1] : vector<15xf32> into vector<3x15xf32>
// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>		// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>
// FULL-UNROLL: } else {		// FULL-UNROLL: } else {
// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>		// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>
// FULL-UNROLL: }		// FULL-UNROLL: }
// FULL-UNROLL: affine.apply #[[$MAP2]]()[%[[base]]]		// FULL-UNROLL: affine.apply #[[$MAP2]]()[%[[base]]]
// FULL-UNROLL: cmpi sgt, %{{.}}, %{{.}} : index		// FULL-UNROLL: cmpi sgt, %{{.}}, %{{.}} : index
// FULL-UNROLL: %[[VEC3:.]] = scf.if %{{.}} -> (vector<3x15xf32>) {		// FULL-UNROLL: %[[VEC3:.]] = scf.if %{{.}} -> (vector<3x15xf32>) {
// FULL-UNROLL: vector.transfer_read %[[A]][%{{.*}}, %[[base]]], %[[C7]] : memref<?x?xf32>, vector<15xf32>		// FULL-UNROLL: vector.transfer_read %[[A]][%{{.*}}, %[[base]]], %[[C7]] {in_bounds = [true, false]} : memref<?x?xf32>, vector<15xf32>
// FULL-UNROLL: vector.insert %{{.*}}, %[[VEC2]] [2] : vector<15xf32> into vector<3x15xf32>		// FULL-UNROLL: vector.insert %{{.*}}, %[[VEC2]] [2] : vector<15xf32> into vector<3x15xf32>
// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>		// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>
// FULL-UNROLL: } else {		// FULL-UNROLL: } else {
// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>		// FULL-UNROLL: scf.yield %{{.*}} : vector<3x15xf32>
// FULL-UNROLL: }		// FULL-UNROLL: }

%f = vector.transfer_read %A[%base, %base], %f7 :		%f = vector.transfer_read %A[%base, %base], %f7 :
memref<?x?xf32>, vector<3x15xf32>		memref<?x?xf32>, vector<3x15xf32>
Show All 24 Lines	func.func @transfer_write_progressive(%A : memref<?x?xf32>, %base: index, %vec: vector<3x15xf32>) {
// CHECK: memref.store %[[vec]], %[[alloc]][] : memref<vector<3x15xf32>>		// CHECK: memref.store %[[vec]], %[[alloc]][] : memref<vector<3x15xf32>>
// CHECK: %[[vmemref:.*]] = vector.type_cast %[[alloc]] : memref<vector<3x15xf32>> to memref<3xvector<15xf32>>		// CHECK: %[[vmemref:.*]] = vector.type_cast %[[alloc]] : memref<vector<3x15xf32>> to memref<3xvector<15xf32>>
// CHECK: scf.for %[[I:.*]] = %[[C0]] to %[[C3]]		// CHECK: scf.for %[[I:.*]] = %[[C0]] to %[[C3]]
// CHECK: %[[dim:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>		// CHECK: %[[dim:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>
// CHECK: %[[add:.*]] = affine.apply #[[$MAP0]](%[[I]])[%[[base]]]		// CHECK: %[[add:.*]] = affine.apply #[[$MAP0]](%[[I]])[%[[base]]]
// CHECK: %[[cmp:.*]] = arith.cmpi sgt, %[[dim]], %[[add]] : index		// CHECK: %[[cmp:.*]] = arith.cmpi sgt, %[[dim]], %[[add]] : index
// CHECK: scf.if %[[cmp]] {		// CHECK: scf.if %[[cmp]] {
// CHECK: %[[vec_1d:.*]] = memref.load %[[vmemref]][%[[I]]] : memref<3xvector<15xf32>>		// CHECK: %[[vec_1d:.*]] = memref.load %[[vmemref]][%[[I]]] : memref<3xvector<15xf32>>
// CHECK: vector.transfer_write %[[vec_1d]], %[[A]][{{.*}}, %[[base]]] : vector<15xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %[[vec_1d]], %[[A]][{{.*}}, %[[base]]] {in_bounds = [true, false]} : vector<15xf32>, memref<?x?xf32>
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }

// FULL-UNROLL: %[[C0:.*]] = arith.constant 0 : index		// FULL-UNROLL: %[[C0:.*]] = arith.constant 0 : index
// FULL-UNROLL: %[[DIM:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>		// FULL-UNROLL: %[[DIM:.*]] = memref.dim %[[A]], %[[C0]] : memref<?x?xf32>
// FULL-UNROLL: %[[CMP0:.*]] = arith.cmpi sgt, %[[DIM]], %[[base]] : index		// FULL-UNROLL: %[[CMP0:.*]] = arith.cmpi sgt, %[[DIM]], %[[base]] : index
// FULL-UNROLL: scf.if %[[CMP0]] {		// FULL-UNROLL: scf.if %[[CMP0]] {
// FULL-UNROLL: %[[V0:.*]] = vector.extract %[[vec]][0] : vector<3x15xf32>		// FULL-UNROLL: %[[V0:.*]] = vector.extract %[[vec]][0] : vector<3x15xf32>
// FULL-UNROLL: vector.transfer_write %[[V0]], %[[A]][%[[base]], %[[base]]] : vector<15xf32>, memref<?x?xf32>		// FULL-UNROLL: vector.transfer_write %[[V0]], %[[A]][%[[base]], %[[base]]] {in_bounds = [true, false]} : vector<15xf32>, memref<?x?xf32>
// FULL-UNROLL: }		// FULL-UNROLL: }
// FULL-UNROLL: %[[I1:.*]] = affine.apply #[[$MAP1]]()[%[[base]]]		// FULL-UNROLL: %[[I1:.*]] = affine.apply #[[$MAP1]]()[%[[base]]]
// FULL-UNROLL: %[[CMP1:.]] = arith.cmpi sgt, %{{.}}, %[[I1]] : index		// FULL-UNROLL: %[[CMP1:.]] = arith.cmpi sgt, %{{.}}, %[[I1]] : index
// FULL-UNROLL: scf.if %[[CMP1]] {		// FULL-UNROLL: scf.if %[[CMP1]] {
// FULL-UNROLL: %[[V1:.*]] = vector.extract %[[vec]][1] : vector<3x15xf32>		// FULL-UNROLL: %[[V1:.*]] = vector.extract %[[vec]][1] : vector<3x15xf32>
// FULL-UNROLL: vector.transfer_write %[[V1]], %[[A]][%{{.*}}, %[[base]]] : vector<15xf32>, memref<?x?xf32>		// FULL-UNROLL: vector.transfer_write %[[V1]], %[[A]][%{{.*}}, %[[base]]] {in_bounds = [true, false]} : vector<15xf32>, memref<?x?xf32>
// FULL-UNROLL: }		// FULL-UNROLL: }
// FULL-UNROLL: %[[I2:.*]] = affine.apply #[[$MAP2]]()[%[[base]]]		// FULL-UNROLL: %[[I2:.*]] = affine.apply #[[$MAP2]]()[%[[base]]]
// FULL-UNROLL: %[[CMP2:.]] = arith.cmpi sgt, %{{.}}, %[[I2]] : index		// FULL-UNROLL: %[[CMP2:.]] = arith.cmpi sgt, %{{.}}, %[[I2]] : index
// FULL-UNROLL: scf.if %[[CMP2]] {		// FULL-UNROLL: scf.if %[[CMP2]] {
// FULL-UNROLL: %[[V2:.*]] = vector.extract %[[vec]][2] : vector<3x15xf32>		// FULL-UNROLL: %[[V2:.*]] = vector.extract %[[vec]][2] : vector<3x15xf32>
// FULL-UNROLL: vector.transfer_write %[[V2]], %[[A]][%{{.*}}, %[[base]]] : vector<15xf32>, memref<?x?xf32>		// FULL-UNROLL: vector.transfer_write %[[V2]], %[[A]][%{{.*}}, %[[base]]] {in_bounds = [true, false]} : vector<15xf32>, memref<?x?xf32>
// FULL-UNROLL: }		// FULL-UNROLL: }

vector.transfer_write %vec, %A[%base, %base] :		vector.transfer_write %vec, %A[%base, %base] :
vector<3x15xf32>, memref<?x?xf32>		vector<3x15xf32>, memref<?x?xf32>
return		return
}		}

// -----		// -----
Show All 16 Lines	func.func @transfer_write_progressive_inbounds(%A : memref<?x?xf32>, %base: index, %vec: vector<3x15xf32>) {
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
// CHECK: %[[alloc:.*]] = memref.alloca() : memref<vector<3x15xf32>>		// CHECK: %[[alloc:.*]] = memref.alloca() : memref<vector<3x15xf32>>
// CHECK-NEXT: memref.store %[[vec]], %[[alloc]][] : memref<vector<3x15xf32>>		// CHECK-NEXT: memref.store %[[vec]], %[[alloc]][] : memref<vector<3x15xf32>>
// CHECK-NEXT: %[[vmemref:.*]] = vector.type_cast %[[alloc]] : memref<vector<3x15xf32>> to memref<3xvector<15xf32>>		// CHECK-NEXT: %[[vmemref:.*]] = vector.type_cast %[[alloc]] : memref<vector<3x15xf32>> to memref<3xvector<15xf32>>
// CHECK-NEXT: scf.for %[[I:.*]] = %[[C0]] to %[[C3]]		// CHECK-NEXT: scf.for %[[I:.*]] = %[[C0]] to %[[C3]]
// CHECK-NEXT: %[[add:.*]] = affine.apply #[[$MAP0]](%[[I]])[%[[base]]]		// CHECK-NEXT: %[[add:.*]] = affine.apply #[[$MAP0]](%[[I]])[%[[base]]]
// CHECK-NEXT: %[[vec_1d:.*]] = memref.load %[[vmemref]][%[[I]]] : memref<3xvector<15xf32>>		// CHECK-NEXT: %[[vec_1d:.*]] = memref.load %[[vmemref]][%[[I]]] : memref<3xvector<15xf32>>
// CHECK-NEXT: vector.transfer_write %[[vec_1d]], %[[A]][%[[add]], %[[base]]] {in_bounds = [true]} : vector<15xf32>, memref<?x?xf32>		// CHECK-NEXT: vector.transfer_write %[[vec_1d]], %[[A]][%[[add]], %[[base]]] {in_bounds = [true, true]} : vector<15xf32>, memref<?x?xf32>

// FULL-UNROLL: %[[VEC0:.*]] = vector.extract %[[vec]][0] : vector<3x15xf32>		// FULL-UNROLL: %[[VEC0:.*]] = vector.extract %[[vec]][0] : vector<3x15xf32>
// FULL-UNROLL: vector.transfer_write %[[VEC0]], %[[A]][%[[base]], %[[base]]] {in_bounds = [true]} : vector<15xf32>, memref<?x?xf32>		// FULL-UNROLL: vector.transfer_write %[[VEC0]], %[[A]][%[[base]], %[[base]]] {in_bounds = [true, true]} : vector<15xf32>, memref<?x?xf32>
// FULL-UNROLL: %[[I1:.*]] = affine.apply #[[$MAP1]]()[%[[base]]]		// FULL-UNROLL: %[[I1:.*]] = affine.apply #[[$MAP1]]()[%[[base]]]
// FULL-UNROLL: %[[VEC1:.*]] = vector.extract %[[vec]][1] : vector<3x15xf32>		// FULL-UNROLL: %[[VEC1:.*]] = vector.extract %[[vec]][1] : vector<3x15xf32>
// FULL-UNROLL: vector.transfer_write %2, %[[A]][%[[I1]], %[[base]]] {in_bounds = [true]} : vector<15xf32>, memref<?x?xf32>		// FULL-UNROLL: vector.transfer_write %2, %[[A]][%[[I1]], %[[base]]] {in_bounds = [true, true]} : vector<15xf32>, memref<?x?xf32>
// FULL-UNROLL: %[[I2:.*]] = affine.apply #[[$MAP2]]()[%[[base]]]		// FULL-UNROLL: %[[I2:.*]] = affine.apply #[[$MAP2]]()[%[[base]]]
// FULL-UNROLL: %[[VEC2:.*]] = vector.extract %[[vec]][2] : vector<3x15xf32>		// FULL-UNROLL: %[[VEC2:.*]] = vector.extract %[[vec]][2] : vector<3x15xf32>
// FULL-UNROLL: vector.transfer_write %[[VEC2:.*]], %[[A]][%[[I2]], %[[base]]] {in_bounds = [true]} : vector<15xf32>, memref<?x?xf32>		// FULL-UNROLL: vector.transfer_write %[[VEC2:.*]], %[[A]][%[[I2]], %[[base]]] {in_bounds = [true, true]} : vector<15xf32>, memref<?x?xf32>
vector.transfer_write %vec, %A[%base, %base] {in_bounds = [true, true]} :		vector.transfer_write %vec, %A[%base, %base] {in_bounds = [true, true]} :
vector<3x15xf32>, memref<?x?xf32>		vector<3x15xf32>, memref<?x?xf32>
return		return
}		}

// -----		// -----

// FULL-UNROLL-LABEL: transfer_read_simple		// FULL-UNROLL-LABEL: transfer_read_simple
Show All 10 Lines	func.func @transfer_read_simple(%A : memref<2x2xf32>) -> vector<2x2xf32> {
%0 = vector.transfer_read %A[%c0, %c0], %f0 : memref<2x2xf32>, vector<2x2xf32>		%0 = vector.transfer_read %A[%c0, %c0], %f0 : memref<2x2xf32>, vector<2x2xf32>
return %0 : vector<2x2xf32>		return %0 : vector<2x2xf32>
}		}

func.func @transfer_read_minor_identity(%A : memref<?x?x?x?xf32>) -> vector<3x3xf32> {		func.func @transfer_read_minor_identity(%A : memref<?x?x?x?xf32>) -> vector<3x3xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
%0 = vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0		%0 = vector.transfer_read %A[%c0, %c0, %c0, %c0], %f0
{ permutation_map = affine_map<(d0, d1, d2, d3) -> (d2, d3)> }		{ permutation_map = affine_map<(d0, d1, d2, d3) -> (d2, d3)>,
		in_bounds = [true, true, false, false] }
: memref<?x?x?x?xf32>, vector<3x3xf32>		: memref<?x?x?x?xf32>, vector<3x3xf32>
return %0 : vector<3x3xf32>		return %0 : vector<3x3xf32>
}		}

// CHECK-LABEL: transfer_read_minor_identity(		// CHECK-LABEL: transfer_read_minor_identity(
// CHECK-SAME: %[[A:.*]]: memref<?x?x?x?xf32>) -> vector<3x3xf32>		// CHECK-SAME: %[[A:.*]]: memref<?x?x?x?xf32>) -> vector<3x3xf32>
// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[c2:.*]] = arith.constant 2 : index		// CHECK-DAG: %[[c2:.*]] = arith.constant 2 : index
// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index
// CHECK-DAG: %[[f0:.*]] = arith.constant 0.000000e+00 : f32		// CHECK-DAG: %[[f0:.*]] = arith.constant 0.000000e+00 : f32
// CHECK-DAG: %[[cst0:.*]] = arith.constant dense<0.000000e+00> : vector<3xf32>		// CHECK-DAG: %[[cst0:.*]] = arith.constant dense<0.000000e+00> : vector<3xf32>
// CHECK: %[[m:.*]] = memref.alloca() : memref<vector<3x3xf32>>		// CHECK: %[[m:.*]] = memref.alloca() : memref<vector<3x3xf32>>
// CHECK: %[[cast:.*]] = vector.type_cast %[[m]] : memref<vector<3x3xf32>> to memref<3xvector<3xf32>>		// CHECK: %[[cast:.*]] = vector.type_cast %[[m]] : memref<vector<3x3xf32>> to memref<3xvector<3xf32>>
// CHECK: scf.for %[[arg1:.*]] = %[[c0]] to %[[c3]]		// CHECK: scf.for %[[arg1:.*]] = %[[c0]] to %[[c3]]
// CHECK: %[[d:.*]] = memref.dim %[[A]], %[[c2]] : memref<?x?x?x?xf32>		// CHECK: %[[d:.*]] = memref.dim %[[A]], %[[c2]] : memref<?x?x?x?xf32>
// CHECK: %[[cmp:.*]] = arith.cmpi sgt, %[[d]], %[[arg1]] : index		// CHECK: %[[cmp:.*]] = arith.cmpi sgt, %[[d]], %[[arg1]] : index
// CHECK: scf.if %[[cmp]] {		// CHECK: scf.if %[[cmp]] {
// CHECK: %[[tr:.*]] = vector.transfer_read %[[A]][%c0, %c0, %[[arg1]], %c0], %[[f0]] : memref<?x?x?x?xf32>, vector<3xf32>		// CHECK: %[[tr:.*]] = vector.transfer_read %[[A]][%c0, %c0, %[[arg1]], %c0], %[[f0]] {in_bounds = [true, true, true, false]} : memref<?x?x?x?xf32>, vector<3xf32>
// CHECK: memref.store %[[tr]], %[[cast]][%[[arg1]]] : memref<3xvector<3xf32>>		// CHECK: memref.store %[[tr]], %[[cast]][%[[arg1]]] : memref<3xvector<3xf32>>
// CHECK: } else {		// CHECK: } else {
// CHECK: memref.store %[[cst0]], %[[cast]][%[[arg1]]] : memref<3xvector<3xf32>>		// CHECK: memref.store %[[cst0]], %[[cast]][%[[arg1]]] : memref<3xvector<3xf32>>
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: %[[ret:.*]] = memref.load %[[m]][] : memref<vector<3x3xf32>>		// CHECK: %[[ret:.*]] = memref.load %[[m]][] : memref<vector<3x3xf32>>
// CHECK: return %[[ret]] : vector<3x3xf32>		// CHECK: return %[[ret]] : vector<3x3xf32>

func.func @transfer_write_minor_identity(%A : vector<3x3xf32>, %B : memref<?x?x?x?xf32>) {		func.func @transfer_write_minor_identity(%A : vector<3x3xf32>, %B : memref<?x?x?x?xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
vector.transfer_write %A, %B[%c0, %c0, %c0, %c0]		vector.transfer_write %A, %B[%c0, %c0, %c0, %c0]
{ permutation_map = affine_map<(d0, d1, d2, d3) -> (d2, d3)> }		{ permutation_map = affine_map<(d0, d1, d2, d3) -> (d2, d3)>,
		in_bounds = [true, true, false, false] }
: vector<3x3xf32>, memref<?x?x?x?xf32>		: vector<3x3xf32>, memref<?x?x?x?xf32>
return		return
}		}

// CHECK-LABEL: transfer_write_minor_identity(		// CHECK-LABEL: transfer_write_minor_identity(
// CHECK-SAME: %[[A:.*]]: vector<3x3xf32>,		// CHECK-SAME: %[[A:.*]]: vector<3x3xf32>,
// CHECK-SAME: %[[B:.*]]: memref<?x?x?x?xf32>)		// CHECK-SAME: %[[B:.*]]: memref<?x?x?x?xf32>)
// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[c0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[c1:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[c2:.*]] = arith.constant 2 : index		// CHECK-DAG: %[[c2:.*]] = arith.constant 2 : index
// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index
// CHECK: %[[m:.*]] = memref.alloca() : memref<vector<3x3xf32>>		// CHECK: %[[m:.*]] = memref.alloca() : memref<vector<3x3xf32>>
// CHECK: memref.store %[[A]], %[[m]][] : memref<vector<3x3xf32>>		// CHECK: memref.store %[[A]], %[[m]][] : memref<vector<3x3xf32>>
// CHECK: %[[cast:.*]] = vector.type_cast %[[m]] : memref<vector<3x3xf32>> to memref<3xvector<3xf32>>		// CHECK: %[[cast:.*]] = vector.type_cast %[[m]] : memref<vector<3x3xf32>> to memref<3xvector<3xf32>>
// CHECK: scf.for %[[arg2:.*]] = %[[c0]] to %[[c3]]		// CHECK: scf.for %[[arg2:.*]] = %[[c0]] to %[[c3]]
// CHECK: %[[d:.*]] = memref.dim %[[B]], %[[c2]] : memref<?x?x?x?xf32>		// CHECK: %[[d:.*]] = memref.dim %[[B]], %[[c2]] : memref<?x?x?x?xf32>
// CHECK: %[[cmp:.*]] = arith.cmpi sgt, %[[d]], %[[arg2]] : index		// CHECK: %[[cmp:.*]] = arith.cmpi sgt, %[[d]], %[[arg2]] : index
// CHECK: scf.if %[[cmp]] {		// CHECK: scf.if %[[cmp]] {
// CHECK: %[[tmp:.*]] = memref.load %[[cast]][%[[arg2]]] : memref<3xvector<3xf32>>		// CHECK: %[[tmp:.*]] = memref.load %[[cast]][%[[arg2]]] : memref<3xvector<3xf32>>
// CHECK: vector.transfer_write %[[tmp]], %[[B]][%[[c0]], %[[c0]], %[[arg2]], %[[c0]]] : vector<3xf32>, memref<?x?x?x?xf32>		// CHECK: vector.transfer_write %[[tmp]], %[[B]][%[[c0]], %[[c0]], %[[arg2]], %[[c0]]] {in_bounds = [true, true, true, false]} : vector<3xf32>, memref<?x?x?x?xf32>
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
// CHECK: return		// CHECK: return


// -----		// -----

func.func @transfer_read_strided(%A : memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>) -> vector<4xf32> {		func.func @transfer_read_strided(%A : memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>) -> vector<4xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
%0 = vector.transfer_read %A[%c0, %c0], %f0		%0 = vector.transfer_read %A[%c0, %c0], %f0 {in_bounds = [true, false]}
: memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>, vector<4xf32>		: memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>, vector<4xf32>
return %0 : vector<4xf32>		return %0 : vector<4xf32>
}		}

// CHECK-LABEL: transfer_read_strided(		// CHECK-LABEL: transfer_read_strided(
// CHECK: scf.for		// CHECK: scf.for
// CHECK: memref.load		// CHECK: memref.load

func.func @transfer_write_strided(%A : vector<4xf32>, %B : memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>) {		func.func @transfer_write_strided(%A : vector<4xf32>, %B : memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
vector.transfer_write %A, %B[%c0, %c0] :		vector.transfer_write %A, %B[%c0, %c0] {in_bounds = [true, false]} :
vector<4xf32>, memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>		vector<4xf32>, memref<8x4xf32, affine_map<(d0, d1) -> (d0 + d1 * 8)>>
return		return
}		}

// CHECK-LABEL: transfer_write_strided(		// CHECK-LABEL: transfer_write_strided(
// CHECK: scf.for		// CHECK: scf.for
// CHECK: store		// CHECK: store

▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/SuperVectorize/vector_utils.mlir

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	func.func @double_loop_nest(%a: memref<20x30xf32>, %b: memref<20xf32>) {

return		return
}		}

// VECNEST: affine.for %{{.*}} = 0 to 20 step 4 {		// VECNEST: affine.for %{{.*}} = 0 to 20 step 4 {
// VECNEST: vector.transfer_read		// VECNEST: vector.transfer_read
// VECNEST-NEXT: affine.for %{{.*}} = 0 to 30 {		// VECNEST-NEXT: affine.for %{{.*}} = 0 to 30 {
// VECNEST: vector.transfer_read		// VECNEST: vector.transfer_read
// VECNEST-NEXT: vector.transfer_write %{{.}}, %{{.}}[%{{.}}, %{{.}}] {permutation_map = #{{.*}}}		// VECNEST-NEXT: vector.transfer_write %{{.}}, %{{.}}[%{{.}}, %{{.}}] {in_bounds = [false, true], permutation_map = #{{.*}}}
// VECNEST-NEXT: }		// VECNEST-NEXT: }
// VECNEST-NEXT: vector.transfer_write		// VECNEST-NEXT: vector.transfer_write
// VECNEST: }		// VECNEST: }

mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir

Show All 16 Lines	// CHECK-DAG: [[ARG_P:%[0-9a-zA-Z_]+]] = memref.dim %{{.*}}, %[[C2]] : memref<?x?x?xf32>
%M = memref.dim %A, %c0 : memref<?x?xf32>		%M = memref.dim %A, %c0 : memref<?x?xf32>
%N = memref.dim %A, %c1 : memref<?x?xf32>		%N = memref.dim %A, %c1 : memref<?x?xf32>
%P = memref.dim %B, %c2 : memref<?x?x?xf32>		%P = memref.dim %B, %c2 : memref<?x?x?xf32>

// CHECK: for {{.*}} step 128		// CHECK: for {{.*}} step 128
// CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])		// CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])
// CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])		// CHECK-NEXT: %{{.*}} = affine.apply #[[$map_id1]](%[[C0]])
// CHECK-NEXT: %{{.}} = arith.constant 0.0{{.}}: f32		// CHECK-NEXT: %{{.}} = arith.constant 0.0{{.}}: f32
// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>		// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{.*}} {in_bounds = [true, true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i0 = 0 to %M { // vectorized due to scalar -> vector		affine.for %i0 = 0 to %M { // vectorized due to scalar -> vector
%a0 = affine.load %A[%c0, %c0] : memref<?x?xf32>		%a0 = affine.load %A[%c0, %c0] : memref<?x?xf32>
}		}
return		return
}		}

// -----		// -----

Show All 9 Lines	// CHECK-DAG: [[ARG_P:%[0-9a-zA-Z_]+]] = memref.dim %{{.*}}, %[[C2]] : memref<?x?x?xf32>
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%M = memref.dim %A, %c0 : memref<?x?xf32>		%M = memref.dim %A, %c0 : memref<?x?xf32>
%N = memref.dim %A, %c1 : memref<?x?xf32>		%N = memref.dim %A, %c1 : memref<?x?xf32>
%P = memref.dim %B, %c2 : memref<?x?x?xf32>		%P = memref.dim %B, %c2 : memref<?x?x?xf32>

// CHECK:for [[IV3:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128		// CHECK:for [[IV3:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
// CHECK-NEXT: %[[CST:.]] = arith.constant 0.0{{.}}: f32		// CHECK-NEXT: %[[CST:.]] = arith.constant 0.0{{.}}: f32
// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %[[CST]] : memref<?x?xf32>, vector<128xf32>		// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %[[CST]] {in_bounds = [true, false]} : memref<?x?xf32>, vector<128xf32>
affine.for %i3 = 0 to %M { // vectorized		affine.for %i3 = 0 to %M { // vectorized
%a3 = affine.load %A[%c0, %i3] : memref<?x?xf32>		%a3 = affine.load %A[%c0, %i3] : memref<?x?xf32>
}		}
return		return
}		}

// -----		// -----

Show All 12 Lines	// CHECK-DAG: [[ARG_P:%[0-9a-zA-Z_]+]] = memref.dim %arg1, %[[C2]] : memref<?x?x?xf32>
%N = memref.dim %A, %c1 : memref<?x?xf32>		%N = memref.dim %A, %c1 : memref<?x?xf32>
%P = memref.dim %B, %c2 : memref<?x?x?xf32>		%P = memref.dim %B, %c2 : memref<?x?x?xf32>

// CHECK:for [[IV8:%[0-9a-zA-Z_]+]] = 0 to [[ARG_M]] step 128		// CHECK:for [[IV8:%[0-9a-zA-Z_]+]] = 0 to [[ARG_M]] step 128
// CHECK-NEXT: for [[IV9:%[0-9a-zA-Z_]*]] = 0 to [[ARG_N]] {		// CHECK-NEXT: for [[IV9:%[0-9a-zA-Z_]*]] = 0 to [[ARG_N]] {
// CHECK-NEXT: %[[APP9_0:[0-9a-zA-Z_]+]] = affine.apply {{.*}}([[IV9]], [[IV8]])		// CHECK-NEXT: %[[APP9_0:[0-9a-zA-Z_]+]] = affine.apply {{.*}}([[IV9]], [[IV8]])
// CHECK-NEXT: %[[APP9_1:[0-9a-zA-Z_]+]] = affine.apply {{.*}}([[IV9]], [[IV8]])		// CHECK-NEXT: %[[APP9_1:[0-9a-zA-Z_]+]] = affine.apply {{.*}}([[IV9]], [[IV8]])
// CHECK-NEXT: %[[CST:.]] = arith.constant 0.0{{.}}: f32		// CHECK-NEXT: %[[CST:.]] = arith.constant 0.0{{.}}: f32
// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%[[APP9_0]], %[[APP9_1]]], %[[CST]] : memref<?x?xf32>, vector<128xf32>		// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%[[APP9_0]], %[[APP9_1]]], %[[CST]] {in_bounds = [true, false]} : memref<?x?xf32>, vector<128xf32>
affine.for %i8 = 0 to %M { // vectorized		affine.for %i8 = 0 to %M { // vectorized
affine.for %i9 = 0 to %N {		affine.for %i9 = 0 to %N {
%a9 = affine.load %A[%i9, %i8 + %i9] : memref<?x?xf32>		%a9 = affine.load %A[%i9, %i8 + %i9] : memref<?x?xf32>
}		}
}		}
return		return
}		}

// -----		// -----

// CHECK-LABEL: func @vector_add_2d		// CHECK-LABEL: func @vector_add_2d
func.func @vector_add_2d(%M : index, %N : index) -> f32 {		func.func @vector_add_2d(%M : index, %N : index) -> f32 {
%A = memref.alloc (%M, %N) : memref<?x?xf32, 0>		%A = memref.alloc (%M, %N) : memref<?x?xf32, 0>
%B = memref.alloc (%M, %N) : memref<?x?xf32, 0>		%B = memref.alloc (%M, %N) : memref<?x?xf32, 0>
%C = memref.alloc (%M, %N) : memref<?x?xf32, 0>		%C = memref.alloc (%M, %N) : memref<?x?xf32, 0>
%f1 = arith.constant 1.0 : f32		%f1 = arith.constant 1.0 : f32
%f2 = arith.constant 2.0 : f32		%f2 = arith.constant 2.0 : f32
affine.for %i0 = 0 to %M {		affine.for %i0 = 0 to %M {
affine.for %i1 = 0 to %N {		affine.for %i1 = 0 to %N {
// CHECK: %[[C1:.*]] = arith.constant dense<1.000000e+00> : vector<128xf32>		// CHECK: %[[C1:.*]] = arith.constant dense<1.000000e+00> : vector<128xf32>
// CHECK: vector.transfer_write %[[C1]], {{.*}} : vector<128xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %[[C1]], {{.*}} {in_bounds = [true, false]} : vector<128xf32>, memref<?x?xf32>
// non-scoped %f1		// non-scoped %f1
affine.store %f1, %A[%i0, %i1] : memref<?x?xf32, 0>		affine.store %f1, %A[%i0, %i1] : memref<?x?xf32, 0>
}		}
}		}
affine.for %i2 = 0 to %M {		affine.for %i2 = 0 to %M {
affine.for %i3 = 0 to %N {		affine.for %i3 = 0 to %N {
// CHECK: %[[C3:.*]] = arith.constant dense<2.000000e+00> : vector<128xf32>		// CHECK: %[[C3:.*]] = arith.constant dense<2.000000e+00> : vector<128xf32>
// CHECK: vector.transfer_write %[[C3]], {{.*}} : vector<128xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %[[C3]], {{.*}} {in_bounds = [true, false]} : vector<128xf32>, memref<?x?xf32>
// non-scoped %f2		// non-scoped %f2
affine.store %f2, %B[%i2, %i3] : memref<?x?xf32, 0>		affine.store %f2, %B[%i2, %i3] : memref<?x?xf32, 0>
}		}
}		}
affine.for %i4 = 0 to %M {		affine.for %i4 = 0 to %M {
affine.for %i5 = 0 to %N {		affine.for %i5 = 0 to %N {
// CHECK: %[[SPLAT2:.*]] = arith.constant dense<2.000000e+00> : vector<128xf32>		// CHECK: %[[SPLAT2:.*]] = arith.constant dense<2.000000e+00> : vector<128xf32>
// CHECK: %[[SPLAT1:.*]] = arith.constant dense<1.000000e+00> : vector<128xf32>		// CHECK: %[[SPLAT1:.*]] = arith.constant dense<1.000000e+00> : vector<128xf32>
// CHECK: %[[A5:.]] = vector.transfer_read %{{.}}[{{.}}], %{{[a-zA-Z0-9_]}} : memref<?x?xf32>, vector<128xf32>		// CHECK: %[[A5:.]] = vector.transfer_read %{{.}}[{{.}}], %{{[a-zA-Z0-9_]}} {in_bounds = [true, false]} : memref<?x?xf32>, vector<128xf32>
// CHECK: %[[B5:.]] = vector.transfer_read %{{.}}[{{.}}], %{{[a-zA-Z0-9_]}} : memref<?x?xf32>, vector<128xf32>		// CHECK: %[[B5:.]] = vector.transfer_read %{{.}}[{{.}}], %{{[a-zA-Z0-9_]}} {in_bounds = [true, false]} : memref<?x?xf32>, vector<128xf32>
// CHECK: %[[S5:.*]] = arith.addf %[[A5]], %[[B5]] : vector<128xf32>		// CHECK: %[[S5:.*]] = arith.addf %[[A5]], %[[B5]] : vector<128xf32>
// CHECK: %[[S6:.*]] = arith.addf %[[S5]], %[[SPLAT1]] : vector<128xf32>		// CHECK: %[[S6:.*]] = arith.addf %[[S5]], %[[SPLAT1]] : vector<128xf32>
// CHECK: %[[S7:.*]] = arith.addf %[[S5]], %[[SPLAT2]] : vector<128xf32>		// CHECK: %[[S7:.*]] = arith.addf %[[S5]], %[[SPLAT2]] : vector<128xf32>
// CHECK: %[[S8:.*]] = arith.addf %[[S7]], %[[S6]] : vector<128xf32>		// CHECK: %[[S8:.*]] = arith.addf %[[S7]], %[[S6]] : vector<128xf32>
// CHECK: vector.transfer_write %[[S8]], {{.*}} : vector<128xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %[[S8]], {{.*}} {in_bounds = [true, false]} : vector<128xf32>, memref<?x?xf32>
%a5 = affine.load %A[%i4, %i5] : memref<?x?xf32, 0>		%a5 = affine.load %A[%i4, %i5] : memref<?x?xf32, 0>
%b5 = affine.load %B[%i4, %i5] : memref<?x?xf32, 0>		%b5 = affine.load %B[%i4, %i5] : memref<?x?xf32, 0>
%s5 = arith.addf %a5, %b5 : f32		%s5 = arith.addf %a5, %b5 : f32
// non-scoped %f1		// non-scoped %f1
%s6 = arith.addf %s5, %f1 : f32		%s6 = arith.addf %s5, %f1 : f32
// non-scoped %f2		// non-scoped %f2
%s7 = arith.addf %s5, %f2 : f32		%s7 = arith.addf %s5, %f2 : f32
// diamond dependency.		// diamond dependency.
Show All 12 Lines
// CHECK-LABEL: func @vec_constant_with_two_users		// CHECK-LABEL: func @vec_constant_with_two_users
func.func @vec_constant_with_two_users(%M : index, %N : index) -> (f32, f32) {		func.func @vec_constant_with_two_users(%M : index, %N : index) -> (f32, f32) {
%A = memref.alloc (%M, %N) : memref<?x?xf32, 0>		%A = memref.alloc (%M, %N) : memref<?x?xf32, 0>
%B = memref.alloc (%M) : memref<?xf32, 0>		%B = memref.alloc (%M) : memref<?xf32, 0>
%f1 = arith.constant 1.0 : f32		%f1 = arith.constant 1.0 : f32
affine.for %i0 = 0 to %M { // vectorized		affine.for %i0 = 0 to %M { // vectorized
// CHECK: %[[C1:.*]] = arith.constant dense<1.000000e+00> : vector<128xf32>		// CHECK: %[[C1:.*]] = arith.constant dense<1.000000e+00> : vector<128xf32>
// CHECK-NEXT: affine.for		// CHECK-NEXT: affine.for
// CHECK-NEXT: vector.transfer_write %[[C1]], {{.*}} : vector<128xf32>, memref<?x?xf32>		// CHECK-NEXT: vector.transfer_write %[[C1]], {{.*}} {in_bounds = [true, false]} : vector<128xf32>, memref<?x?xf32>
affine.for %i1 = 0 to %N {		affine.for %i1 = 0 to %N {
affine.store %f1, %A[%i1, %i0] : memref<?x?xf32, 0>		affine.store %f1, %A[%i1, %i0] : memref<?x?xf32, 0>
}		}
// CHECK: vector.transfer_write %[[C1]], {{.*}} : vector<128xf32>, memref<?xf32>		// CHECK: vector.transfer_write %[[C1]], {{.*}} : vector<128xf32>, memref<?xf32>
affine.store %f1, %B[%i0] : memref<?xf32, 0>		affine.store %f1, %B[%i0] : memref<?xf32, 0>
}		}
%c12 = arith.constant 12 : index		%c12 = arith.constant 12 : index
%res1 = affine.load %A[%c12, %c12] : memref<?x?xf32, 0>		%res1 = affine.load %A[%c12, %c12] : memref<?x?xf32, 0>
%res2 = affine.load %B[%c12] : memref<?xf32, 0>		%res2 = affine.load %B[%c12] : memref<?xf32, 0>
return %res1, %res2 : f32, f32		return %res1, %res2 : f32, f32
}		}

// -----		// -----

// CHECK-LABEL: func @vec_block_arg		// CHECK-LABEL: func @vec_block_arg
func.func @vec_block_arg(%A : memref<32x512xi32>) {		func.func @vec_block_arg(%A : memref<32x512xi32>) {
// CHECK: affine.for %[[IV0:[0-9a-zA-Z_]+]] = 0 to 512 step 128 {		// CHECK: affine.for %[[IV0:[0-9a-zA-Z_]+]] = 0 to 512 step 128 {
// CHECK-NEXT: affine.for %[[IV1:[0-9a-zA-Z_]+]] = 0 to 32 {		// CHECK-NEXT: affine.for %[[IV1:[0-9a-zA-Z_]+]] = 0 to 32 {
// CHECK-NEXT: %[[BROADCAST:.*]] = vector.broadcast %[[IV1]] : index to vector<128xindex>		// CHECK-NEXT: %[[BROADCAST:.*]] = vector.broadcast %[[IV1]] : index to vector<128xindex>
// CHECK-NEXT: %[[CAST:.*]] = arith.index_cast %[[BROADCAST]] : vector<128xindex> to vector<128xi32>		// CHECK-NEXT: %[[CAST:.*]] = arith.index_cast %[[BROADCAST]] : vector<128xindex> to vector<128xi32>
// CHECK-NEXT: vector.transfer_write %[[CAST]], {{.*}}[%[[IV1]], %[[IV0]]] : vector<128xi32>, memref<32x512xi32>		// CHECK-NEXT: vector.transfer_write %[[CAST]], {{.*}}[%[[IV1]], %[[IV0]]] {in_bounds = [true, false]} : vector<128xi32>, memref<32x512xi32>
affine.for %i = 0 to 512 { // vectorized		affine.for %i = 0 to 512 { // vectorized
affine.for %j = 0 to 32 {		affine.for %j = 0 to 32 {
%idx = arith.index_cast %j : index to i32		%idx = arith.index_cast %j : index to i32
affine.store %idx, %A[%j, %i] : memref<32x512xi32>		affine.store %idx, %A[%j, %i] : memref<32x512xi32>
}		}
}		}
return		return
}		}
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	// CHECK-DAG: [[ARG_P:%[0-9a-zA-Z_]+]] = memref.dim %{{.*}}, %[[C2]] : memref<?x?x?xf32>
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%M = memref.dim %A, %c0 : memref<?x?xf32>		%M = memref.dim %A, %c0 : memref<?x?xf32>
%N = memref.dim %A, %c1 : memref<?x?xf32>		%N = memref.dim %A, %c1 : memref<?x?xf32>
%P = memref.dim %B, %c2 : memref<?x?x?xf32>		%P = memref.dim %B, %c2 : memref<?x?x?xf32>

// CHECK:for [[IV4:%[0-9a-zA-Z_]+]] = 0 to [[ARG_M]] step 128 {		// CHECK:for [[IV4:%[0-9a-zA-Z_]+]] = 0 to [[ARG_M]] step 128 {
// CHECK-NEXT: for [[IV5:%[0-9a-zA-Z_]*]] = 0 to [[ARG_N]] {		// CHECK-NEXT: for [[IV5:%[0-9a-zA-Z_]*]] = 0 to [[ARG_N]] {
// CHECK-NEXT: %{{.}} = arith.constant 0.0{{.}}: f32		// CHECK-NEXT: %{{.}} = arith.constant 0.0{{.}}: f32
// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{[a-zA-Z0-9_]*}} : memref<?x?xf32>, vector<128xf32>		// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{[a-zA-Z0-9_]*}} {in_bounds = [true, false]} : memref<?x?xf32>, vector<128xf32>
affine.for %i4 = 0 to %M { // vectorized		affine.for %i4 = 0 to %M { // vectorized
affine.for %i5 = 0 to %N { // not vectorized, would vectorize with --test-fastest-varying=1		affine.for %i5 = 0 to %N { // not vectorized, would vectorize with --test-fastest-varying=1
%a5 = affine.load %A[%i5, %i4] : memref<?x?xf32>		%a5 = affine.load %A[%i5, %i4] : memref<?x?xf32>
}		}
}		}
return		return
}		}

▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	// CHECK-DAG: [[ARG_P:%[0-9a-zA-Z_]+]] = memref.dim %{{.*}}, %[[C2]] : memref<?x?x?xf32>
%N = memref.dim %A, %c1 : memref<?x?xf32>		%N = memref.dim %A, %c1 : memref<?x?xf32>
%P = memref.dim %B, %c2 : memref<?x?x?xf32>		%P = memref.dim %B, %c2 : memref<?x?x?xf32>

// CHECK: affine.for %{{.}}{{[0-9a-zA-Z_]}} = 0 to %{{[0-9a-zA-Z_]*}} {		// CHECK: affine.for %{{.}}{{[0-9a-zA-Z_]}} = 0 to %{{[0-9a-zA-Z_]*}} {
// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128		// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
// CHECK: %{{.}} = affine.apply #[[$map_id1]](%{{.}})		// CHECK: %{{.}} = affine.apply #[[$map_id1]](%{{.}})
// CHECK: %{{.}} = affine.apply #[[$map_id1]](%{{.}})		// CHECK: %{{.}} = affine.apply #[[$map_id1]](%{{.}})
// CHECK: %{{.}} = arith.constant 0.0{{.}}: f32		// CHECK: %{{.}} = arith.constant 0.0{{.}}: f32
// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>		// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{.*}} {in_bounds = [true, true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %{{.}} in DFS post-order prevents vectorizing %{{.}}		affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %{{.}} in DFS post-order prevents vectorizing %{{.}}
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector		affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
%a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>		%a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>
}		}
}		}
return		return
}		}

Show All 17 Lines	// CHECK-DAG: [[ARG_P:%[0-9a-zA-Z_]+]] = memref.dim %{{.*}}, %[[C2]] : memref<?x?x?xf32>
%N = memref.dim %A, %c1 : memref<?x?xf32>		%N = memref.dim %A, %c1 : memref<?x?xf32>
%P = memref.dim %B, %c2 : memref<?x?x?xf32>		%P = memref.dim %B, %c2 : memref<?x?x?xf32>

// CHECK: affine.for %{{.}}{{[0-9a-zA-Z_]}} = 0 to %{{[0-9a-zA-Z_]*}} {		// CHECK: affine.for %{{.}}{{[0-9a-zA-Z_]}} = 0 to %{{[0-9a-zA-Z_]*}} {
// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128		// CHECK: for [[IV18:%[a-zA-Z0-9]+]] = 0 to [[ARG_M]] step 128
// CHECK: %{{.}} = affine.apply #[[$map_id1]](%{{.}})		// CHECK: %{{.}} = affine.apply #[[$map_id1]](%{{.}})
// CHECK-NEXT: %{{.}} = affine.apply #[[$map_id1]](%{{.}})		// CHECK-NEXT: %{{.}} = affine.apply #[[$map_id1]](%{{.}})
// CHECK-NEXT: %{{.}} = arith.constant 0.0{{.}}: f32		// CHECK-NEXT: %{{.}} = arith.constant 0.0{{.}}: f32
// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{.*}} {permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>		// CHECK-NEXT: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}], %{{.*}} {in_bounds = [true, true], permutation_map = #[[$map_proj_d0d1_0]]} : memref<?x?xf32>, vector<128xf32>
affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %{{.*}}		affine.for %i17 = 0 to %M { // not vectorized, the 1-D pattern that matched %i18 in DFS post-order prevents vectorizing %{{.*}}
affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector		affine.for %i18 = 0 to %M { // vectorized due to scalar -> vector
%a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>		%a18 = affine.load %A[%c0, %c0] : memref<?x?xf32>
}		}
}		}
return		return
}		}

▲ Show 20 Lines • Show All 216 Lines • Show Last 20 Lines

mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	affine.for %i0 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%M) {
affine.for %i1 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%N) {		affine.for %i1 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%N) {
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
affine.store %cst, %arg2[%i0, %i1] : memref<?x?xf32>		affine.store %cst, %arg2[%i0, %i1] : memref<?x?xf32>
}		}
}		}
// VECT: affine.for %[[I2:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[M]]) step 4 {		// VECT: affine.for %[[I2:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[M]]) step 4 {
// VECT-NEXT: affine.for %[[I3:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[N]]) step 8 {		// VECT-NEXT: affine.for %[[I3:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[N]]) step 8 {
// VECT-NEXT: affine.for %[[I4:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[K]]) {		// VECT-NEXT: affine.for %[[I4:.*]] = #[[$map_id1]](%[[C0]]) to #[[$map_id1]](%[[K]]) {
// VECT: %[[A:.]] = vector.transfer_read %{{.}}[%[[I4]], %[[I3]]], %{{.*}} {permutation_map = #[[$map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>		// VECT: %[[A:.]] = vector.transfer_read %{{.}}[%[[I4]], %[[I3]]], %{{.*}} {in_bounds = [true, false], permutation_map = #[[$map_proj_d0d1_zerod1]]} : memref<?x?xf32>, vector<4x8xf32>
// VECT: %[[B:.]] = vector.transfer_read %{{.}}[%[[I2]], %[[I4]]], %{{.*}} {permutation_map = #[[$map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>		// VECT: %[[B:.]] = vector.transfer_read %{{.}}[%[[I2]], %[[I4]]], %{{.*}} {in_bounds = [false, true], permutation_map = #[[$map_proj_d0d1_d0zero]]} : memref<?x?xf32>, vector<4x8xf32>
// VECT-NEXT: %[[C:.*]] = arith.mulf %[[B]], %[[A]] : vector<4x8xf32>		// VECT-NEXT: %[[C:.*]] = arith.mulf %[[B]], %[[A]] : vector<4x8xf32>
// VECT: %[[D:.]] = vector.transfer_read %{{.}}[%[[I2]], %[[I3]]], %{{.*}} : memref<?x?xf32>, vector<4x8xf32>		// VECT: %[[D:.]] = vector.transfer_read %{{.}}[%[[I2]], %[[I3]]], %{{.*}} : memref<?x?xf32>, vector<4x8xf32>
// VECT-NEXT: %[[E:.*]] = arith.addf %[[D]], %[[C]] : vector<4x8xf32>		// VECT-NEXT: %[[E:.*]] = arith.addf %[[D]], %[[C]] : vector<4x8xf32>
// VECT: vector.transfer_write %[[E]], %{{.*}}[%[[I2]], %[[I3]]] : vector<4x8xf32>, memref<?x?xf32>		// VECT: vector.transfer_write %[[E]], %{{.*}}[%[[I2]], %[[I3]]] : vector<4x8xf32>, memref<?x?xf32>
affine.for %i2 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%M) {		affine.for %i2 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%M) {
affine.for %i3 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%N) {		affine.for %i3 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%N) {
affine.for %i4 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%K) {		affine.for %i4 = affine_map<(d0) -> (d0)>(%c0) to affine_map<(d0) -> (d0)>(%K) {
%6 = affine.load %arg1[%i4, %i3] : memref<?x?xf32>		%6 = affine.load %arg1[%i4, %i3] : memref<?x?xf32>
Show All 10 Lines

mlir/test/Dialect/Affine/SuperVectorize/vectorize_outer_loop_2d.mlir

	// RUN: mlir-opt %s -affine-super-vectorize="virtual-vector-size=32,256 test-fastest-varying=2,0" \| FileCheck %s			// RUN: mlir-opt %s -affine-super-vectorize="virtual-vector-size=32,256 test-fastest-varying=2,0" \| FileCheck %s

	// Permutation maps used in vectorization.			// Permutation maps used in vectorization.
	// CHECK: #[[map_proj_d0d1d2_d0d2:map[0-9]*]] = affine_map<(d0, d1, d2) -> (d0, d2)>			// CHECK: #[[map_proj_d0d1d2_d0d2:map[0-9]*]] = affine_map<(d0, d1, d2) -> (d0, d2)>

	func.func @vec2d(%A : memref<?x?x?xf32>) {			func.func @vec2d(%A : memref<?x?x?xf32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c2 = arith.constant 2 : index			%c2 = arith.constant 2 : index
	%M = memref.dim %A, %c0 : memref<?x?x?xf32>			%M = memref.dim %A, %c0 : memref<?x?x?xf32>
	%N = memref.dim %A, %c1 : memref<?x?x?xf32>			%N = memref.dim %A, %c1 : memref<?x?x?xf32>
	%P = memref.dim %A, %c2 : memref<?x?x?xf32>			%P = memref.dim %A, %c2 : memref<?x?x?xf32>
	// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32			// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32
	// CHECK: affine.for %{{.}} = 0 to %{{.}} {			// CHECK: affine.for %{{.}} = 0 to %{{.}} {
	// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256			// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256
	// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d0d2]]} : memref<?x?x?xf32>, vector<32x256xf32>			// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [false, true, false], permutation_map = #[[map_proj_d0d1d2_d0d2]]} : memref<?x?x?xf32>, vector<32x256xf32>
	affine.for %i0 = 0 to %M {			affine.for %i0 = 0 to %M {
	affine.for %i1 = 0 to %N {			affine.for %i1 = 0 to %N {
	affine.for %i2 = 0 to %P {			affine.for %i2 = 0 to %P {
	%a2 = affine.load %A[%i0, %i1, %i2] : memref<?x?x?xf32>			%a2 = affine.load %A[%i0, %i1, %i2] : memref<?x?x?xf32>
	}			}
	}			}
	}			}
	// CHECK: for {{.}} = 0 to %{{.}} {			// CHECK: for {{.}} = 0 to %{{.}} {
	Show All 13 Lines

mlir/test/Dialect/Affine/SuperVectorize/vectorize_outer_loop_transpose_2d.mlir

Show All 19 Lines	affine.for %i1 = 0 to %N {
affine.for %i2 = 0 to %P {		affine.for %i2 = 0 to %P {
%a2 = affine.load %A[%i0, %i1, %i2] : memref<?x?x?xf32>		%a2 = affine.load %A[%i0, %i1, %i2] : memref<?x?x?xf32>
}		}
}		}
}		}
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256
// CHECK: affine.for %{{.}} = 0 to %{{.}} {		// CHECK: affine.for %{{.}} = 0 to %{{.}} {
// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [false, true, false], permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i3 = 0 to %M {		affine.for %i3 = 0 to %M {
affine.for %i4 = 0 to %N {		affine.for %i4 = 0 to %N {
affine.for %i5 = 0 to %P {		affine.for %i5 = 0 to %P {
%a5 = affine.load %A[%i4, %i5, %i3] : memref<?x?x?xf32>		%a5 = affine.load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
}		}
}		}
}		}
return		return
}		}

func.func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {		func.func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%0 = memref.dim %A, %c0 : memref<?x?x?xf32>		%0 = memref.dim %A, %c0 : memref<?x?x?xf32>
%1 = memref.dim %A, %c1 : memref<?x?x?xf32>		%1 = memref.dim %A, %c1 : memref<?x?x?xf32>
%2 = memref.dim %A, %c2 : memref<?x?x?xf32>		%2 = memref.dim %A, %c2 : memref<?x?x?xf32>
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32 {		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32 {
// CHECK: affine.for %{{.}} = 0 to %{{.}} {		// CHECK: affine.for %{{.}} = 0 to %{{.}} {
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {
// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [false, true, false], permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {
// CHECK: affine.for %{{.}} = 0 to %{{.}} {		// CHECK: affine.for %{{.}} = 0 to %{{.}} {
// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [false, true, false], permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %{{.}} = 0 to %{{.}} {		// CHECK: affine.for %{{.}} = 0 to %{{.}} {
// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [false, true, false], permutation_map = #[[map_proj_d0d1d2_d2d0]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i0 = 0 to %0 {		affine.for %i0 = 0 to %0 {
affine.for %i1 = 0 to %1 {		affine.for %i1 = 0 to %1 {
affine.for %i2 = 0 to %2 {		affine.for %i2 = 0 to %2 {
%a2 = affine.load %A[%i2, %i1, %i0] : memref<?x?x?xf32>		%a2 = affine.load %A[%i2, %i1, %i0] : memref<?x?x?xf32>
}		}
}		}
affine.for %i3 = 0 to %1 {		affine.for %i3 = 0 to %1 {
affine.for %i4 = 0 to %2 {		affine.for %i4 = 0 to %2 {
Show All 9 Lines

mlir/test/Dialect/Affine/SuperVectorize/vectorize_transpose_2d.mlir

Show All 19 Lines	affine.for %i1 = 0 to %N {
affine.for %i2 = 0 to %P {		affine.for %i2 = 0 to %P {
%a2 = affine.load %A[%i0, %i1, %i2] : memref<?x?x?xf32>		%a2 = affine.load %A[%i0, %i1, %i2] : memref<?x?x?xf32>
}		}
}		}
}		}
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32
// CHECK: affine.for %{{.}} = 0 to %{{.}} {		// CHECK: affine.for %{{.}} = 0 to %{{.}} {
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256
// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: {{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [true, false, false], permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i3 = 0 to %M {		affine.for %i3 = 0 to %M {
affine.for %i4 = 0 to %N {		affine.for %i4 = 0 to %N {
affine.for %i5 = 0 to %P {		affine.for %i5 = 0 to %P {
%a5 = affine.load %A[%i4, %i5, %i3] : memref<?x?x?xf32>		%a5 = affine.load %A[%i4, %i5, %i3] : memref<?x?x?xf32>
}		}
}		}
}		}
return		return
}		}

func.func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {		func.func @vec2d_imperfectly_nested(%A : memref<?x?x?xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%0 = memref.dim %A, %c0 : memref<?x?x?xf32>		%0 = memref.dim %A, %c0 : memref<?x?x?xf32>
%1 = memref.dim %A, %c1 : memref<?x?x?xf32>		%1 = memref.dim %A, %c1 : memref<?x?x?xf32>
%2 = memref.dim %A, %c2 : memref<?x?x?xf32>		%2 = memref.dim %A, %c2 : memref<?x?x?xf32>
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32 {		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 32 {
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {
// CHECK: affine.for %{{.}} = 0 to %{{.}} {		// CHECK: affine.for %{{.}} = 0 to %{{.}} {
// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [true, false, false], permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %{{.}} = 0 to %{{.}} {		// CHECK: affine.for %{{.}} = 0 to %{{.}} {
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {
// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [true, false, false], permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {		// CHECK: affine.for %{{.}} = 0 to %{{.}} step 256 {
// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>		// CHECK: %{{.}} = vector.transfer_read %{{.}}[%{{.}}, %{{.}}, %{{.}}], %{{.}} {in_bounds = [true, false, false], permutation_map = #[[map_proj_d0d1d2_d2d1]]} : memref<?x?x?xf32>, vector<32x256xf32>
affine.for %i0 = 0 to %0 {		affine.for %i0 = 0 to %0 {
affine.for %i1 = 0 to %1 {		affine.for %i1 = 0 to %1 {
affine.for %i2 = 0 to %2 {		affine.for %i2 = 0 to %2 {
%a2 = affine.load %A[%i2, %i1, %i0] : memref<?x?x?xf32>		%a2 = affine.load %A[%i2, %i1, %i0] : memref<?x?x?xf32>
}		}
}		}
affine.for %i3 = 0 to %1 {		affine.for %i3 = 0 to %1 {
affine.for %i4 = 0 to %2 {		affine.for %i4 = 0 to %2 {
Show All 10 Lines

mlir/test/Dialect/Linalg/hoisting.mlir

Show All 40 Lines
// CHECK: vector.transfer_write %{{.*}} : vector<2xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %{{.*}} : vector<2xf32>, memref<?x?xf32>
// CHECK: "unrelated_use"(%[[MEMREF0]]) : (memref<?x?xf32>) -> ()		// CHECK: "unrelated_use"(%[[MEMREF0]]) : (memref<?x?xf32>) -> ()
// CHECK: scf.yield {{.*}} : vector<1xf32>		// CHECK: scf.yield {{.*}} : vector<1xf32>
// CHECK: }		// CHECK: }
// CHECK: vector.transfer_write %{{.*}} : vector<1xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %{{.*}} : vector<1xf32>, memref<?x?xf32>
// CHECK: "unrelated_use"(%[[MEMREF1]]) : (memref<?x?xf32>) -> ()		// CHECK: "unrelated_use"(%[[MEMREF1]]) : (memref<?x?xf32>) -> ()
scf.for %i = %lb to %ub step %step {		scf.for %i = %lb to %ub step %step {
scf.for %j = %lb to %ub step %step {		scf.for %j = %lb to %ub step %step {
%r0 = vector.transfer_read %memref1[%c0, %c0], %cst: memref<?x?xf32>, vector<1xf32>		%r0 = vector.transfer_read %memref1[%c0, %c0], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<1xf32>
%r1 = vector.transfer_read %memref0[%i, %i], %cst: memref<?x?xf32>, vector<2xf32>		%r1 = vector.transfer_read %memref0[%i, %i], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<2xf32>
%r2 = vector.transfer_read %memref2[%c0, %c0], %cst: memref<?x?xf32>, vector<3xf32>		%r2 = vector.transfer_read %memref2[%c0, %c0], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<3xf32>
%r3 = vector.transfer_read %memref3[%c0, %c0], %cst: memref<?x?xf32>, vector<4xf32>		%r3 = vector.transfer_read %memref3[%c0, %c0], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<4xf32>
"some_crippling_use"(%memref4) : (memref<?x?xf32>) -> ()		"some_crippling_use"(%memref4) : (memref<?x?xf32>) -> ()
%r4 = vector.transfer_read %memref4[%c0, %c0], %cst: memref<?x?xf32>, vector<5xf32>		%r4 = vector.transfer_read %memref4[%c0, %c0], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<5xf32>
%r5 = vector.transfer_read %memref5[%c0, %c0], %cst: memref<?x?xf32>, vector<6xf32>		%r5 = vector.transfer_read %memref5[%c0, %c0], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<6xf32>
"some_crippling_use"(%memref5) : (memref<?x?xf32>) -> ()		"some_crippling_use"(%memref5) : (memref<?x?xf32>) -> ()
%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>		%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>		%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>
%u2 = "some_use"(%memref2, %r2) : (memref<?x?xf32>, vector<3xf32>) -> vector<3xf32>		%u2 = "some_use"(%memref2, %r2) : (memref<?x?xf32>, vector<3xf32>) -> vector<3xf32>
%u3 = "some_use"(%r3) : (vector<4xf32>) -> vector<4xf32>		%u3 = "some_use"(%r3) : (vector<4xf32>) -> vector<4xf32>
%u4 = "some_use"(%r4) : (vector<5xf32>) -> vector<5xf32>		%u4 = "some_use"(%r4) : (vector<5xf32>) -> vector<5xf32>
%u5 = "some_use"(%r5) : (vector<6xf32>) -> vector<6xf32>		%u5 = "some_use"(%r5) : (vector<6xf32>) -> vector<6xf32>
vector.transfer_write %u0, %memref1[%c0, %c0] : vector<1xf32>, memref<?x?xf32>		vector.transfer_write %u0, %memref1[%c0, %c0] {in_bounds = [true, false]} : vector<1xf32>, memref<?x?xf32>
vector.transfer_write %u1, %memref0[%i, %i] : vector<2xf32>, memref<?x?xf32>		vector.transfer_write %u1, %memref0[%i, %i] {in_bounds = [true, false]} : vector<2xf32>, memref<?x?xf32>
vector.transfer_write %u2, %memref2[%c0, %c0] : vector<3xf32>, memref<?x?xf32>		vector.transfer_write %u2, %memref2[%c0, %c0] {in_bounds = [true, false]} : vector<3xf32>, memref<?x?xf32>
vector.transfer_write %u3, %memref3[%c0, %c0] : vector<4xf32>, memref<?x?xf32>		vector.transfer_write %u3, %memref3[%c0, %c0] {in_bounds = [true, false]} : vector<4xf32>, memref<?x?xf32>
vector.transfer_write %u4, %memref4[%c0, %c0] : vector<5xf32>, memref<?x?xf32>		vector.transfer_write %u4, %memref4[%c0, %c0] {in_bounds = [true, false]} : vector<5xf32>, memref<?x?xf32>
vector.transfer_write %u5, %memref5[%c0, %c0] : vector<6xf32>, memref<?x?xf32>		vector.transfer_write %u5, %memref5[%c0, %c0] {in_bounds = [true, false]} : vector<6xf32>, memref<?x?xf32>
"some_crippling_use"(%memref3) : (memref<?x?xf32>) -> ()		"some_crippling_use"(%memref3) : (memref<?x?xf32>) -> ()
}		}
"unrelated_use"(%memref0) : (memref<?x?xf32>) -> ()		"unrelated_use"(%memref0) : (memref<?x?xf32>) -> ()
}		}
"unrelated_use"(%memref1) : (memref<?x?xf32>) -> ()		"unrelated_use"(%memref1) : (memref<?x?xf32>) -> ()
return		return
}		}

▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
// CHECK: scf.yield {{.*}} : vector<3xf32>, vector<3xf32>, vector<4xf32>, vector<4xf32>		// CHECK: scf.yield {{.*}} : vector<3xf32>, vector<3xf32>, vector<4xf32>, vector<4xf32>
// CHECK: }		// CHECK: }
// CHECK: vector.transfer_write %{{.}}, %[[MEMREF3]]{{.}} : vector<4xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %{{.}}, %[[MEMREF3]]{{.}} : vector<4xf32>, memref<?x?xf32>
// CHECK: vector.transfer_write %{{.}}, %[[MEMREF3]]{{.}} : vector<4xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %{{.}}, %[[MEMREF3]]{{.}} : vector<4xf32>, memref<?x?xf32>
// CHECK: vector.transfer_write %{{.}}, %[[MEMREF2]]{{.}} : vector<3xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %{{.}}, %[[MEMREF2]]{{.}} : vector<3xf32>, memref<?x?xf32>
// CHECK: vector.transfer_write %{{.}}, %[[MEMREF2]]{{.}} : vector<3xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %{{.}}, %[[MEMREF2]]{{.}} : vector<3xf32>, memref<?x?xf32>
scf.for %i = %lb to %ub step %step {		scf.for %i = %lb to %ub step %step {
scf.for %j = %lb to %ub step %step {		scf.for %j = %lb to %ub step %step {
%r00 = vector.transfer_read %memref1[%c0, %c0], %cst: memref<?x?xf32>, vector<2xf32>		%r00 = vector.transfer_read %memref1[%c0, %c0], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<2xf32>
%r01 = vector.transfer_read %memref1[%c0, %c1], %cst: memref<?x?xf32>, vector<2xf32>		%r01 = vector.transfer_read %memref1[%c0, %c1], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<2xf32>
%r20 = vector.transfer_read %memref2[%c0, %c0], %cst: memref<?x?xf32>, vector<3xf32>		%r20 = vector.transfer_read %memref2[%c0, %c0], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<3xf32>
%r21 = vector.transfer_read %memref2[%c0, %c3], %cst: memref<?x?xf32>, vector<3xf32>		%r21 = vector.transfer_read %memref2[%c0, %c3], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<3xf32>
%r30 = vector.transfer_read %memref3[%c0, %random_index], %cst: memref<?x?xf32>, vector<4xf32>		%r30 = vector.transfer_read %memref3[%c0, %random_index], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<4xf32>
%r31 = vector.transfer_read %memref3[%c1, %random_index], %cst: memref<?x?xf32>, vector<4xf32>		%r31 = vector.transfer_read %memref3[%c1, %random_index], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<4xf32>
%r10 = vector.transfer_read %memref0[%i, %i], %cst: memref<?x?xf32>, vector<2xf32>		%r10 = vector.transfer_read %memref0[%i, %i], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<2xf32>
%r11 = vector.transfer_read %memref0[%random_index, %random_index], %cst: memref<?x?xf32>, vector<2xf32>		%r11 = vector.transfer_read %memref0[%random_index, %random_index], %cst {in_bounds = [true, false]} : memref<?x?xf32>, vector<2xf32>
%u00 = "some_use"(%r00) : (vector<2xf32>) -> vector<2xf32>		%u00 = "some_use"(%r00) : (vector<2xf32>) -> vector<2xf32>
%u01 = "some_use"(%r01) : (vector<2xf32>) -> vector<2xf32>		%u01 = "some_use"(%r01) : (vector<2xf32>) -> vector<2xf32>
%u20 = "some_use"(%r20) : (vector<3xf32>) -> vector<3xf32>		%u20 = "some_use"(%r20) : (vector<3xf32>) -> vector<3xf32>
%u21 = "some_use"(%r21) : (vector<3xf32>) -> vector<3xf32>		%u21 = "some_use"(%r21) : (vector<3xf32>) -> vector<3xf32>
%u30 = "some_use"(%r30) : (vector<4xf32>) -> vector<4xf32>		%u30 = "some_use"(%r30) : (vector<4xf32>) -> vector<4xf32>
%u31 = "some_use"(%r31) : (vector<4xf32>) -> vector<4xf32>		%u31 = "some_use"(%r31) : (vector<4xf32>) -> vector<4xf32>
%u10 = "some_use"(%r10) : (vector<2xf32>) -> vector<2xf32>		%u10 = "some_use"(%r10) : (vector<2xf32>) -> vector<2xf32>
%u11 = "some_use"(%r11) : (vector<2xf32>) -> vector<2xf32>		%u11 = "some_use"(%r11) : (vector<2xf32>) -> vector<2xf32>
vector.transfer_write %u00, %memref1[%c0, %c0] : vector<2xf32>, memref<?x?xf32>		vector.transfer_write %u00, %memref1[%c0, %c0] {in_bounds = [true, false]} : vector<2xf32>, memref<?x?xf32>
vector.transfer_write %u01, %memref1[%c0, %c1] : vector<2xf32>, memref<?x?xf32>		vector.transfer_write %u01, %memref1[%c0, %c1] {in_bounds = [true, false]} : vector<2xf32>, memref<?x?xf32>
vector.transfer_write %u20, %memref2[%c0, %c0] : vector<3xf32>, memref<?x?xf32>		vector.transfer_write %u20, %memref2[%c0, %c0] {in_bounds = [true, false]} : vector<3xf32>, memref<?x?xf32>
vector.transfer_write %u21, %memref2[%c0, %c3] : vector<3xf32>, memref<?x?xf32>		vector.transfer_write %u21, %memref2[%c0, %c3] {in_bounds = [true, false]} : vector<3xf32>, memref<?x?xf32>
vector.transfer_write %u30, %memref3[%c0, %random_index] : vector<4xf32>, memref<?x?xf32>		vector.transfer_write %u30, %memref3[%c0, %random_index] {in_bounds = [true, false]} : vector<4xf32>, memref<?x?xf32>
vector.transfer_write %u31, %memref3[%c1, %random_index] : vector<4xf32>, memref<?x?xf32>		vector.transfer_write %u31, %memref3[%c1, %random_index] {in_bounds = [true, false]} : vector<4xf32>, memref<?x?xf32>
vector.transfer_write %u10, %memref0[%i, %i] : vector<2xf32>, memref<?x?xf32>		vector.transfer_write %u10, %memref0[%i, %i] {in_bounds = [true, false]} : vector<2xf32>, memref<?x?xf32>
vector.transfer_write %u11, %memref0[%random_index, %random_index] : vector<2xf32>, memref<?x?xf32>		vector.transfer_write %u11, %memref0[%random_index, %random_index] {in_bounds = [true, false]} : vector<2xf32>, memref<?x?xf32>
}		}
}		}
return		return
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !transform.any_op):		^bb1(%arg1: !transform.any_op):
%0 = transform.structured.match ops{["func.func"]} in %arg1		%0 = transform.structured.match ops{["func.func"]} in %arg1
: (!transform.any_op) -> !transform.any_op		: (!transform.any_op) -> !transform.any_op
transform.structured.hoist_redundant_vector_transfers %0		transform.structured.hoist_redundant_vector_transfers %0
: (!transform.any_op) -> !transform.any_op		: (!transform.any_op) -> !transform.any_op
}		}

// -----		// -----

// CHECK-LABEL: func @hoist_vector_transfer_pairs_in_affine_loops(		// CHECK-LABEL: func @hoist_vector_transfer_pairs_in_affine_loops(
// CHECK-SAME: %[[MEMREF0:[a-zA-Z0-9]+]]: memref<64x64xi32>,		// CHECK-SAME: %[[MEMREF0:[a-zA-Z0-9]+]]: memref<64x64xi32>,
// CHECK-SAME: %[[MEMREF1:[a-zA-Z0-9]+]]: memref<64x64xi32>,		// CHECK-SAME: %[[MEMREF1:[a-zA-Z0-9]+]]: memref<64x64xi32>,
// CHECK-SAME: %[[MEMREF2:[a-zA-Z0-9]+]]: memref<64x64xi32>) {		// CHECK-SAME: %[[MEMREF2:[a-zA-Z0-9]+]]: memref<64x64xi32>) {
// CHECK: %[[C0:.*]] = arith.constant 0 : i32		// CHECK: %[[C0:.*]] = arith.constant 0 : i32
// CHECK: affine.for %[[I:.*]] = 0 to 64 {		// CHECK: affine.for %[[I:.*]] = 0 to 64 {
// CHECK: affine.for %[[J:.*]] = 0 to 64 step 16 {		// CHECK: affine.for %[[J:.*]] = 0 to 64 step 16 {
// CHECK: %[[R0:.*]] = vector.transfer_read %[[MEMREF2]][%[[I]], %[[J]]], %[[C0]] : memref<64x64xi32>, vector<16xi32>		// CHECK: %[[R0:.]] = vector.transfer_read %[[MEMREF2]][%[[I]], %[[J]]], %[[C0]] {{.}} : memref<64x64xi32>, vector<16xi32>
// CHECK: %[[R:.]] = affine.for %[[K:.]] = 0 to 64 iter_args(%[[ACC:.*]] = %[[R0]]) -> (vector<16xi32>) {		// CHECK: %[[R:.]] = affine.for %[[K:.]] = 0 to 64 iter_args(%[[ACC:.*]] = %[[R0]]) -> (vector<16xi32>) {
// CHECK: %[[AV:.]] = vector.transfer_read %[[MEMREF0]][%[[I]], %[[K]]], %[[C0]] {{.}}: memref<64x64xi32>, vector<16xi32>		// CHECK: %[[AV:.]] = vector.transfer_read %[[MEMREF0]][%[[I]], %[[K]]], %[[C0]] {{.}}: memref<64x64xi32>, vector<16xi32>
// CHECK: %[[BV:.]] = vector.transfer_read %[[MEMREF1]][%[[K]], %[[J]]], %[[C0]] {{.}}: memref<64x64xi32>, vector<16xi32>		// CHECK: %[[BV:.]] = vector.transfer_read %[[MEMREF1]][%[[K]], %[[J]]], %[[C0]] {{.}}: memref<64x64xi32>, vector<16xi32>
// CHECK: %[[T0:.*]] = arith.muli %[[AV]], %[[BV]] : vector<16xi32>		// CHECK: %[[T0:.*]] = arith.muli %[[AV]], %[[BV]] : vector<16xi32>
// CHECK: %[[T1:.*]] = arith.addi %[[ACC]], %[[T0]] : vector<16xi32>		// CHECK: %[[T1:.*]] = arith.addi %[[ACC]], %[[T0]] : vector<16xi32>
// CHECK: affine.yield %[[T1]] : vector<16xi32>		// CHECK: affine.yield %[[T1]] : vector<16xi32>
// CHECK: }		// CHECK: }
// CHECK: vector.transfer_write %[[R]], %[[MEMREF2]][%[[I]], %[[J]]] : vector<16xi32>, memref<64x64xi32>		// CHECK: vector.transfer_write %[[R]], %[[MEMREF2]][%[[I]], %[[J]]] {{.*}} : vector<16xi32>, memref<64x64xi32>
// CHECK: }		// CHECK: }
// CHECK: }		// CHECK: }
func.func @hoist_vector_transfer_pairs_in_affine_loops(%memref0: memref<64x64xi32>, %memref1: memref<64x64xi32>, %memref2: memref<64x64xi32>) {		func.func @hoist_vector_transfer_pairs_in_affine_loops(%memref0: memref<64x64xi32>, %memref1: memref<64x64xi32>, %memref2: memref<64x64xi32>) {
%c0_i32 = arith.constant 0 : i32		%c0_i32 = arith.constant 0 : i32
affine.for %arg3 = 0 to 64 {		affine.for %arg3 = 0 to 64 {
affine.for %arg4 = 0 to 64 step 16 {		affine.for %arg4 = 0 to 64 step 16 {
affine.for %arg5 = 0 to 64 {		affine.for %arg5 = 0 to 64 {
%0 = vector.transfer_read %memref0[%arg3, %arg5], %c0_i32 {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<64x64xi32>, vector<16xi32>		%0 = vector.transfer_read %memref0[%arg3, %arg5], %c0_i32 {permutation_map = affine_map<(d0, d1) -> (0)>, in_bounds = [true, true]} : memref<64x64xi32>, vector<16xi32>
%1 = vector.transfer_read %memref1[%arg5, %arg4], %c0_i32 : memref<64x64xi32>, vector<16xi32>		%1 = vector.transfer_read %memref1[%arg5, %arg4], %c0_i32 {in_bounds = [true, false]} : memref<64x64xi32>, vector<16xi32>
%2 = vector.transfer_read %memref2[%arg3, %arg4], %c0_i32 : memref<64x64xi32>, vector<16xi32>		%2 = vector.transfer_read %memref2[%arg3, %arg4], %c0_i32 {in_bounds = [true, false]} : memref<64x64xi32>, vector<16xi32>
%3 = arith.muli %0, %1 : vector<16xi32>		%3 = arith.muli %0, %1 : vector<16xi32>
%4 = arith.addi %2, %3 : vector<16xi32>		%4 = arith.addi %2, %3 : vector<16xi32>
vector.transfer_write %4, %memref2[%arg3, %arg4] : vector<16xi32>, memref<64x64xi32>		vector.transfer_write %4, %memref2[%arg3, %arg4] {in_bounds = [true, false]} : vector<16xi32>, memref<64x64xi32>
}		}
}		}
}		}
return		return
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !transform.any_op):		^bb1(%arg1: !transform.any_op):
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	iter_args(%arg0 = %tensor0, %arg1 = %tensor1, %arg2 = %tensor2,
%arg3 = %tensor3, %arg4 = %tensor4, %arg5 = %tensor5)		%arg3 = %tensor3, %arg4 = %tensor4, %arg5 = %tensor5)
-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
tensor<?x?xf32>, tensor<?x?xf32>) {		tensor<?x?xf32>, tensor<?x?xf32>) {
%1:6 = scf.for %j = %lb to %ub step %step		%1:6 = scf.for %j = %lb to %ub step %step
iter_args(%arg6 = %arg0, %arg7 = %arg1, %arg8 = %arg2,		iter_args(%arg6 = %arg0, %arg7 = %arg1, %arg8 = %arg2,
%arg9 = %arg3, %arg10 = %arg4, %arg11 = %arg5)		%arg9 = %arg3, %arg10 = %arg4, %arg11 = %arg5)
-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
tensor<?x?xf32>, tensor<?x?xf32>) {		tensor<?x?xf32>, tensor<?x?xf32>) {
%r0 = vector.transfer_read %arg7[%c0, %c0], %cst: tensor<?x?xf32>, vector<1xf32>		%r0 = vector.transfer_read %arg7[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<1xf32>
%r1 = vector.transfer_read %arg6[%i, %i], %cst: tensor<?x?xf32>, vector<2xf32>		%r1 = vector.transfer_read %arg6[%i, %i], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>
%r3 = vector.transfer_read %arg9[%c0, %c0], %cst: tensor<?x?xf32>, vector<4xf32>		%r3 = vector.transfer_read %arg9[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<4xf32>
"some_crippling_use"(%arg10) : (tensor<?x?xf32>) -> ()		"some_crippling_use"(%arg10) : (tensor<?x?xf32>) -> ()
%r4 = vector.transfer_read %arg10[%c0, %c0], %cst: tensor<?x?xf32>, vector<5xf32>		%r4 = vector.transfer_read %arg10[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<5xf32>
%r5 = vector.transfer_read %arg11[%c0, %c0], %cst: tensor<?x?xf32>, vector<6xf32>		%r5 = vector.transfer_read %arg11[%c0, %c0], %cst{in_bounds = [true, false]} : tensor<?x?xf32>, vector<6xf32>
"some_crippling_use"(%arg11) : (tensor<?x?xf32>) -> ()		"some_crippling_use"(%arg11) : (tensor<?x?xf32>) -> ()
%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>		%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>		%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>
%u2 = "some_use"(%arg8) : (tensor<?x?xf32>) -> vector<3xf32>		%u2 = "some_use"(%arg8) : (tensor<?x?xf32>) -> vector<3xf32>
%u3 = "some_use"(%r3) : (vector<4xf32>) -> vector<4xf32>		%u3 = "some_use"(%r3) : (vector<4xf32>) -> vector<4xf32>
%u4 = "some_use"(%r4) : (vector<5xf32>) -> vector<5xf32>		%u4 = "some_use"(%r4) : (vector<5xf32>) -> vector<5xf32>
%u5 = "some_use"(%r5) : (vector<6xf32>) -> vector<6xf32>		%u5 = "some_use"(%r5) : (vector<6xf32>) -> vector<6xf32>
%w1 = vector.transfer_write %u0, %arg7[%c0, %c0] : vector<1xf32>, tensor<?x?xf32>		%w1 = vector.transfer_write %u0, %arg7[%c0, %c0] {in_bounds = [true, false]} : vector<1xf32>, tensor<?x?xf32>
%w0 = vector.transfer_write %u1, %arg6[%i, %i] : vector<2xf32>, tensor<?x?xf32>		%w0 = vector.transfer_write %u1, %arg6[%i, %i] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>
%w2 = vector.transfer_write %u2, %arg8[%c0, %c0] : vector<3xf32>, tensor<?x?xf32>		%w2 = vector.transfer_write %u2, %arg8[%c0, %c0] {in_bounds = [true, false]} : vector<3xf32>, tensor<?x?xf32>
%w3 = vector.transfer_write %u3, %arg9[%c0, %c0] : vector<4xf32>, tensor<?x?xf32>		%w3 = vector.transfer_write %u3, %arg9[%c0, %c0] {in_bounds = [true, false]} : vector<4xf32>, tensor<?x?xf32>
%w4 = vector.transfer_write %u4, %arg10[%c0, %c0] : vector<5xf32>, tensor<?x?xf32>		%w4 = vector.transfer_write %u4, %arg10[%c0, %c0] {in_bounds = [true, false]} : vector<5xf32>, tensor<?x?xf32>
%w5 = vector.transfer_write %u5, %arg11[%c0, %c0] : vector<6xf32>, tensor<?x?xf32>		%w5 = vector.transfer_write %u5, %arg11[%c0, %c0] {in_bounds = [true, false]} : vector<6xf32>, tensor<?x?xf32>
"some_crippling_use"(%w3) : (tensor<?x?xf32>) -> ()		"some_crippling_use"(%w3) : (tensor<?x?xf32>) -> ()
scf.yield %w0, %w1, %w2, %w3, %w4, %w5 :		scf.yield %w0, %w1, %w2, %w3, %w4, %w5 :
tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,		tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
tensor<?x?xf32>, tensor<?x?xf32>		tensor<?x?xf32>, tensor<?x?xf32>
}		}
scf.yield %1#0, %1#1, %1#2, %1#3, %1#4, %1#5 :		scf.yield %1#0, %1#1, %1#2, %1#3, %1#4, %1#5 :
tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,		tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
tensor<?x?xf32>, tensor<?x?xf32>		tensor<?x?xf32>, tensor<?x?xf32>
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	// CHECK: vector.transfer_write %[[R]]#2, %[[TENSOR5]]{{.*}} : vector<3xf32>, tensor<?x?xf32>
%0:4 = scf.for %i = %lb to %ub step %step		%0:4 = scf.for %i = %lb to %ub step %step
iter_args(%arg0 = %tensor0, %arg1 = %tensor1, %arg2 = %tensor2,		iter_args(%arg0 = %tensor0, %arg1 = %tensor1, %arg2 = %tensor2,
%arg3 = %tensor3)		%arg3 = %tensor3)
-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {
%1:4 = scf.for %j = %lb to %ub step %step		%1:4 = scf.for %j = %lb to %ub step %step
iter_args(%arg4 = %arg0, %arg5 = %arg1, %arg6 = %arg2,		iter_args(%arg4 = %arg0, %arg5 = %arg1, %arg6 = %arg2,
%arg7 = %arg3)		%arg7 = %arg3)
-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {
%r00 = vector.transfer_read %arg5[%c0, %c0], %cst: tensor<?x?xf32>, vector<2xf32>		%r00 = vector.transfer_read %arg5[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>
%r01 = vector.transfer_read %arg5[%c0, %c1], %cst: tensor<?x?xf32>, vector<2xf32>		%r01 = vector.transfer_read %arg5[%c0, %c1], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>
%r20 = vector.transfer_read %arg6[%c0, %c0], %cst: tensor<?x?xf32>, vector<3xf32>		%r20 = vector.transfer_read %arg6[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<3xf32>
%r21 = vector.transfer_read %arg6[%c0, %c3], %cst: tensor<?x?xf32>, vector<3xf32>		%r21 = vector.transfer_read %arg6[%c0, %c3], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<3xf32>
%r30 = vector.transfer_read %arg7[%c0, %random_index], %cst: tensor<?x?xf32>, vector<4xf32>		%r30 = vector.transfer_read %arg7[%c0, %random_index], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<4xf32>
%r31 = vector.transfer_read %arg7[%c1, %random_index], %cst: tensor<?x?xf32>, vector<4xf32>		%r31 = vector.transfer_read %arg7[%c1, %random_index], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<4xf32>
%r10 = vector.transfer_read %arg4[%i, %i], %cst: tensor<?x?xf32>, vector<2xf32>		%r10 = vector.transfer_read %arg4[%i, %i], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>
%r11 = vector.transfer_read %arg4[%random_index, %random_index], %cst: tensor<?x?xf32>, vector<2xf32>		%r11 = vector.transfer_read %arg4[%random_index, %random_index], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>
%u00 = "some_use"(%r00) : (vector<2xf32>) -> vector<2xf32>		%u00 = "some_use"(%r00) : (vector<2xf32>) -> vector<2xf32>
%u01 = "some_use"(%r01) : (vector<2xf32>) -> vector<2xf32>		%u01 = "some_use"(%r01) : (vector<2xf32>) -> vector<2xf32>
%u20 = "some_use"(%r20) : (vector<3xf32>) -> vector<3xf32>		%u20 = "some_use"(%r20) : (vector<3xf32>) -> vector<3xf32>
%u21 = "some_use"(%r21) : (vector<3xf32>) -> vector<3xf32>		%u21 = "some_use"(%r21) : (vector<3xf32>) -> vector<3xf32>
%u30 = "some_use"(%r30) : (vector<4xf32>) -> vector<4xf32>		%u30 = "some_use"(%r30) : (vector<4xf32>) -> vector<4xf32>
%u31 = "some_use"(%r31) : (vector<4xf32>) -> vector<4xf32>		%u31 = "some_use"(%r31) : (vector<4xf32>) -> vector<4xf32>
%u10 = "some_use"(%r10) : (vector<2xf32>) -> vector<2xf32>		%u10 = "some_use"(%r10) : (vector<2xf32>) -> vector<2xf32>
%u11 = "some_use"(%r11) : (vector<2xf32>) -> vector<2xf32>		%u11 = "some_use"(%r11) : (vector<2xf32>) -> vector<2xf32>
%w10 = vector.transfer_write %u00, %arg5[%c0, %c0] : vector<2xf32>, tensor<?x?xf32>		%w10 = vector.transfer_write %u00, %arg5[%c0, %c0] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>
%w11 = vector.transfer_write %u01, %w10[%c0, %c1] : vector<2xf32>, tensor<?x?xf32>		%w11 = vector.transfer_write %u01, %w10[%c0, %c1] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>
%w20 = vector.transfer_write %u20, %arg6[%c0, %c0] : vector<3xf32>, tensor<?x?xf32>		%w20 = vector.transfer_write %u20, %arg6[%c0, %c0] {in_bounds = [true, false]} : vector<3xf32>, tensor<?x?xf32>
%w21 = vector.transfer_write %u21, %w20[%c0, %c3] : vector<3xf32>, tensor<?x?xf32>		%w21 = vector.transfer_write %u21, %w20[%c0, %c3] {in_bounds = [true, false]} : vector<3xf32>, tensor<?x?xf32>
%w30 = vector.transfer_write %u30, %arg7[%c0, %random_index] : vector<4xf32>, tensor<?x?xf32>		%w30 = vector.transfer_write %u30, %arg7[%c0, %random_index] {in_bounds = [true, false]} : vector<4xf32>, tensor<?x?xf32>
%w31 = vector.transfer_write %u31, %w30[%c1, %random_index] : vector<4xf32>, tensor<?x?xf32>		%w31 = vector.transfer_write %u31, %w30[%c1, %random_index] {in_bounds = [true, false]} : vector<4xf32>, tensor<?x?xf32>
%w00 = vector.transfer_write %u10, %arg4[%i, %i] : vector<2xf32>, tensor<?x?xf32>		%w00 = vector.transfer_write %u10, %arg4[%i, %i] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>
%w01 = vector.transfer_write %u11, %w00[%random_index, %random_index] : vector<2xf32>, tensor<?x?xf32>		%w01 = vector.transfer_write %u11, %w00[%random_index, %random_index] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>
scf.yield %w01, %w11, %w21, %w31 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>		scf.yield %w01, %w11, %w21, %w31 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>
}		}
scf.yield %1#0, %1#1, %1#2, %1#3 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>		scf.yield %1#0, %1#1, %1#2, %1#3 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>
}		}
return %0#0, %0#1, %0#2, %0#3 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>		return %0#0, %0#1, %0#2, %0#3 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	iter_args(%arg0 = %tensor0, %arg1 = %tensor1, %arg2 = %tensor2)
// CHECK-SAME: %[[V0_ARG_L2:[0-9a-zA-Z]+]] = %[[V0]]		// CHECK-SAME: %[[V0_ARG_L2:[0-9a-zA-Z]+]] = %[[V0]]
// CHECK-SAME: ) ->		// CHECK-SAME: ) ->
// CHECK-SAME: (tensor<?x?xf32>, tensor<?x?xf32>, vector<1xf32>		// CHECK-SAME: (tensor<?x?xf32>, tensor<?x?xf32>, vector<1xf32>
%1:3 = scf.for %j = %lb to %ub step %step		%1:3 = scf.for %j = %lb to %ub step %step
iter_args(%arg6 = %arg0, %arg7 = %arg1, %arg8 = %arg2)		iter_args(%arg6 = %arg0, %arg7 = %arg1, %arg8 = %arg2)
-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {
// Hoists.		// Hoists.
%st0 = tensor.extract_slice %arg6[%i, %i][%step, %step][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%st0 = tensor.extract_slice %arg6[%i, %i][%step, %step][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%r0 = vector.transfer_read %st0[%c0, %c0], %cst: tensor<?x?xf32>, vector<1xf32>		%r0 = vector.transfer_read %st0[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<1xf32>

// CHECK: %[[ST1:.]] = tensor.extract_slice %[[TENSOR1_ARG_L2]][%[[J]],{{.}}: tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[ST1:.]] = tensor.extract_slice %[[TENSOR1_ARG_L2]][%[[J]],{{.}}: tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: %[[V1:.]] = vector.transfer_read %[[ST1]]{{.}} : tensor<?x?xf32>, vector<2xf32>		// CHECK: %[[V1:.]] = vector.transfer_read %[[ST1]]{{.}} : tensor<?x?xf32>, vector<2xf32>
// Does not hoist (slice depends on %j)		// Does not hoist (slice depends on %j)
%st1 = tensor.extract_slice %arg7[%j, %c0][%step, %step][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%st1 = tensor.extract_slice %arg7[%j, %c0][%step, %step][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%r1 = vector.transfer_read %st1[%c0, %c0], %cst: tensor<?x?xf32>, vector<2xf32>		%r1 = vector.transfer_read %st1[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>

// CHECK: %[[ST2:.]] = tensor.extract_slice %[[TENSOR2_ARG_L2]][%[[I]],{{.}}: tensor<?x?xf32> to tensor<?x?xf32>		// CHECK: %[[ST2:.]] = tensor.extract_slice %[[TENSOR2_ARG_L2]][%[[I]],{{.}}: tensor<?x?xf32> to tensor<?x?xf32>
// CHECK: %[[V2:.]] = vector.transfer_read %[[ST2]]{{.}} : tensor<?x?xf32>, vector<3xf32>		// CHECK: %[[V2:.]] = vector.transfer_read %[[ST2]]{{.}} : tensor<?x?xf32>, vector<3xf32>
// Does not hoist, 2 slice %arg8.		// Does not hoist, 2 slice %arg8.
%st2 = tensor.extract_slice %arg8[%i, %c0][%step, %step][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>		%st2 = tensor.extract_slice %arg8[%i, %c0][%step, %step][1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
%r2 = vector.transfer_read %st2[%c0, %c0], %cst: tensor<?x?xf32>, vector<3xf32>		%r2 = vector.transfer_read %st2[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<3xf32>

// CHECK: %[[U0:.*]] = "some_use"(%[[V0_ARG_L2]]) : (vector<1xf32>) -> vector<1xf32>		// CHECK: %[[U0:.*]] = "some_use"(%[[V0_ARG_L2]]) : (vector<1xf32>) -> vector<1xf32>
// CHECK: %[[U1:.*]] = "some_use"(%[[V1]]) : (vector<2xf32>) -> vector<2xf32>		// CHECK: %[[U1:.*]] = "some_use"(%[[V1]]) : (vector<2xf32>) -> vector<2xf32>
// CHECK: %[[U2:.*]] = "some_use"(%[[V2]]) : (vector<3xf32>) -> vector<3xf32>		// CHECK: %[[U2:.*]] = "some_use"(%[[V2]]) : (vector<3xf32>) -> vector<3xf32>
%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>		%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>		%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>
%u2 = "some_use"(%r2) : (vector<3xf32>) -> vector<3xf32>		%u2 = "some_use"(%r2) : (vector<3xf32>) -> vector<3xf32>

// Hoists		// Hoists
%w0 = vector.transfer_write %u0, %st0[%c0, %c0] : vector<1xf32>, tensor<?x?xf32>		%w0 = vector.transfer_write %u0, %st0[%c0, %c0] {in_bounds = [true, false]} : vector<1xf32>, tensor<?x?xf32>

// CHECK-DAG: %[[STI1:.]] = vector.transfer_write %[[U1]], %{{.}} : vector<2xf32>, tensor<?x?xf32>		// CHECK-DAG: %[[STI1:.]] = vector.transfer_write %[[U1]], %{{.}} : vector<2xf32>, tensor<?x?xf32>
// Does not hoist (associated slice depends on %j).		// Does not hoist (associated slice depends on %j).
%w1 = vector.transfer_write %u1, %st1[%i, %i] : vector<2xf32>, tensor<?x?xf32>		%w1 = vector.transfer_write %u1, %st1[%i, %i] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>

// CHECK-DAG: %[[STI2:.]] = vector.transfer_write %[[U2]], %{{.}} : vector<3xf32>, tensor<?x?xf32>		// CHECK-DAG: %[[STI2:.]] = vector.transfer_write %[[U2]], %{{.}} : vector<3xf32>, tensor<?x?xf32>
// Does not hoist, 2 slice / insert_slice for %arg8.		// Does not hoist, 2 slice / insert_slice for %arg8.
%w2 = vector.transfer_write %u2, %st2[%c0, %c0] : vector<3xf32>, tensor<?x?xf32>		%w2 = vector.transfer_write %u2, %st2[%c0, %c0] {in_bounds = [true, false]} : vector<3xf32>, tensor<?x?xf32>

// Hoists.		// Hoists.
%sti0 = tensor.insert_slice %w0 into %arg6[%i, %i][%step, %step][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>		%sti0 = tensor.insert_slice %w0 into %arg6[%i, %i][%step, %step][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>

// CHECK-DAG: tensor.insert_slice %[[STI1]] into %[[TENSOR1_ARG_L2]][%[[J]],{{.*}}: tensor<?x?xf32> into tensor<?x?xf32>		// CHECK-DAG: tensor.insert_slice %[[STI1]] into %[[TENSOR1_ARG_L2]][%[[J]],{{.*}}: tensor<?x?xf32> into tensor<?x?xf32>
// Does not hoist (depends on %j).		// Does not hoist (depends on %j).
%sti1 = tensor.insert_slice %w1 into %arg7[%j, %c0][%step, %step][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>		%sti1 = tensor.insert_slice %w1 into %arg7[%j, %c0][%step, %step][1, 1] : tensor<?x?xf32> into tensor<?x?xf32>

Show All 39 Lines
// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[C3:.*]] = arith.constant 3 : index
// CHECK-DAG: %[[R0:.]] = vector.transfer_read %[[T]][%[[C0]], %[[C0]]], %{{.}} : tensor<?x?xf32>, vector<2xf32>		// CHECK-DAG: %[[R0:.]] = vector.transfer_read %[[T]][%[[C0]], %[[C0]]], %{{.}} : tensor<?x?xf32>, vector<2xf32>
// CHECK-DAG: %[[R1:.]] = vector.transfer_read %[[T]][%[[C0]], %[[C3]]], %{{.}} : tensor<?x?xf32>, vector<2xf32>		// CHECK-DAG: %[[R1:.]] = vector.transfer_read %[[T]][%[[C0]], %[[C3]]], %{{.}} : tensor<?x?xf32>, vector<2xf32>
// CHECK: %[[F:.]]:2 = scf.for %{{.}} = %{{.}} to %{{.}} step %{{.}} iter_args(%[[R3:.]] = %[[R1:.]], %[[R2:.]] = %[[R0]]) -> (vector<2xf32>, vector<2xf32>) {		// CHECK: %[[F:.]]:2 = scf.for %{{.}} = %{{.}} to %{{.}} step %{{.}} iter_args(%[[R3:.]] = %[[R1:.]], %[[R2:.]] = %[[R0]]) -> (vector<2xf32>, vector<2xf32>) {
// CHECK: %[[R4:.*]] = "some_use"(%[[R2]]) : (vector<2xf32>) -> vector<2xf32>		// CHECK: %[[R4:.*]] = "some_use"(%[[R2]]) : (vector<2xf32>) -> vector<2xf32>
// CHECK: %[[R5:.*]] = "some_use"(%[[R3]]) : (vector<2xf32>) -> vector<2xf32>		// CHECK: %[[R5:.*]] = "some_use"(%[[R3]]) : (vector<2xf32>) -> vector<2xf32>
// CHECK: scf.yield %[[R5]], %[[R4]] : vector<2xf32>, vector<2xf32>		// CHECK: scf.yield %[[R5]], %[[R4]] : vector<2xf32>, vector<2xf32>
// CHECK: }		// CHECK: }
// CHECK: %[[W0:.*]] = vector.transfer_write %[[F]]#1, %[[T]][%[[C0]], %[[C0]]] : vector<2xf32>, tensor<?x?xf32>		// CHECK: %[[W0:.]] = vector.transfer_write %[[F]]#1, %[[T]][%[[C0]], %[[C0]]] {{.}} : vector<2xf32>, tensor<?x?xf32>
// CHECK: %[[W1:.*]] = vector.transfer_write %[[F]]#0, %[[W0]][%[[C0]], %[[C3]]] : vector<2xf32>, tensor<?x?xf32>		// CHECK: %[[W1:.]] = vector.transfer_write %[[F]]#0, %[[W0]][%[[C0]], %[[C3]]] {{.}} : vector<2xf32>, tensor<?x?xf32>
// CHECK: return %[[W1]] : tensor<?x?xf32>		// CHECK: return %[[W1]] : tensor<?x?xf32>
func.func @hoist_vector_transfer_write_pairs_disjoint_tensor(		func.func @hoist_vector_transfer_write_pairs_disjoint_tensor(
%tensor: tensor<?x?xf32>,		%tensor: tensor<?x?xf32>,
%val: index, %lb : index, %ub : index, %step: index) ->		%val: index, %lb : index, %ub : index, %step: index) ->
(tensor<?x?xf32>) {		(tensor<?x?xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%1 = scf.for %j = %lb to %ub step %step iter_args(%arg5 = %tensor)		%1 = scf.for %j = %lb to %ub step %step iter_args(%arg5 = %tensor)
-> (tensor<?x?xf32>) {		-> (tensor<?x?xf32>) {
%r00 = vector.transfer_read %arg5[%c0, %c0], %cst: tensor<?x?xf32>, vector<2xf32>		%r00 = vector.transfer_read %arg5[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>
%u00 = "some_use"(%r00) : (vector<2xf32>) -> vector<2xf32>		%u00 = "some_use"(%r00) : (vector<2xf32>) -> vector<2xf32>
%w10 = vector.transfer_write %u00, %arg5[%c0, %c0] : vector<2xf32>, tensor<?x?xf32>		%w10 = vector.transfer_write %u00, %arg5[%c0, %c0] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>

// Hoist by properly bypassing the disjoint write %w10.		// Hoist by properly bypassing the disjoint write %w10.
%r01 = vector.transfer_read %w10[%c0, %c3], %cst: tensor<?x?xf32>, vector<2xf32>		%r01 = vector.transfer_read %w10[%c0, %c3], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>
%u01 = "some_use"(%r01) : (vector<2xf32>) -> vector<2xf32>		%u01 = "some_use"(%r01) : (vector<2xf32>) -> vector<2xf32>
%w11 = vector.transfer_write %u01, %w10[%c0, %c3] : vector<2xf32>, tensor<?x?xf32>		%w11 = vector.transfer_write %u01, %w10[%c0, %c3] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>
scf.yield %w11 : tensor<?x?xf32>		scf.yield %w11 : tensor<?x?xf32>
}		}
return %1 : tensor<?x?xf32>		return %1 : tensor<?x?xf32>
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !transform.any_op):		^bb1(%arg1: !transform.any_op):
%0 = transform.structured.match ops{["func.func"]} in %arg1		%0 = transform.structured.match ops{["func.func"]} in %arg1
Show All 37 Lines	iter_args(%arg0 = %tensor0, %arg1 = %tensor1, %arg2 = %tensor2)
// CHECK-SAME: %[[V0_ARG_L2:[0-9a-zA-Z]+]] = %[[V0]]		// CHECK-SAME: %[[V0_ARG_L2:[0-9a-zA-Z]+]] = %[[V0]]
// CHECK-SAME: ) ->		// CHECK-SAME: ) ->
// CHECK-SAME: (tensor<200x200xf32>, tensor<300x300xf32>, vector<1xf32>		// CHECK-SAME: (tensor<200x200xf32>, tensor<300x300xf32>, vector<1xf32>
%1:3 = scf.for %j = %lb to %ub step %step		%1:3 = scf.for %j = %lb to %ub step %step
iter_args(%arg6 = %arg0, %arg7 = %arg1, %arg8 = %arg2)		iter_args(%arg6 = %arg0, %arg7 = %arg1, %arg8 = %arg2)
-> (tensor<100x100xf32>, tensor<200x200xf32>, tensor<300x300xf32>) {		-> (tensor<100x100xf32>, tensor<200x200xf32>, tensor<300x300xf32>) {
// Hoists.		// Hoists.
%st0 = tensor.extract_slice %arg6[%i, %i][%step, %step][1, 1] : tensor<100x100xf32> to tensor<?x?xf32>		%st0 = tensor.extract_slice %arg6[%i, %i][%step, %step][1, 1] : tensor<100x100xf32> to tensor<?x?xf32>
%r0 = vector.transfer_read %st0[%c0, %c0], %cst: tensor<?x?xf32>, vector<1xf32>		%r0 = vector.transfer_read %st0[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<1xf32>

// CHECK: %[[ST1:.]] = tensor.extract_slice %[[TENSOR1_ARG_L2]][%[[J]],{{.}}: tensor<200x200xf32> to tensor<?x?xf32>		// CHECK: %[[ST1:.]] = tensor.extract_slice %[[TENSOR1_ARG_L2]][%[[J]],{{.}}: tensor<200x200xf32> to tensor<?x?xf32>
// CHECK: %[[V1:.]] = vector.transfer_read %[[ST1]]{{.}} : tensor<?x?xf32>, vector<2xf32>		// CHECK: %[[V1:.]] = vector.transfer_read %[[ST1]]{{.}} : tensor<?x?xf32>, vector<2xf32>
// Does not hoist (slice depends on %j)		// Does not hoist (slice depends on %j)
%st1 = tensor.extract_slice %arg7[%j, %c0][%step, %step][1, 1] : tensor<200x200xf32> to tensor<?x?xf32>		%st1 = tensor.extract_slice %arg7[%j, %c0][%step, %step][1, 1] : tensor<200x200xf32> to tensor<?x?xf32>
%r1 = vector.transfer_read %st1[%c0, %c0], %cst: tensor<?x?xf32>, vector<2xf32>		%r1 = vector.transfer_read %st1[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<2xf32>

// CHECK: %[[ST2:.]] = tensor.extract_slice %[[TENSOR2_ARG_L2]][%[[I]],{{.}}: tensor<300x300xf32> to tensor<?x?xf32>		// CHECK: %[[ST2:.]] = tensor.extract_slice %[[TENSOR2_ARG_L2]][%[[I]],{{.}}: tensor<300x300xf32> to tensor<?x?xf32>
// CHECK: %[[V2:.]] = vector.transfer_read %[[ST2]]{{.}} : tensor<?x?xf32>, vector<3xf32>		// CHECK: %[[V2:.]] = vector.transfer_read %[[ST2]]{{.}} : tensor<?x?xf32>, vector<3xf32>
// Does not hoist, 2 slice %arg8.		// Does not hoist, 2 slice %arg8.
%st2 = tensor.extract_slice %arg8[%i, %c0][%step, %step][1, 1] : tensor<300x300xf32> to tensor<?x?xf32>		%st2 = tensor.extract_slice %arg8[%i, %c0][%step, %step][1, 1] : tensor<300x300xf32> to tensor<?x?xf32>
%r2 = vector.transfer_read %st2[%c0, %c0], %cst: tensor<?x?xf32>, vector<3xf32>		%r2 = vector.transfer_read %st2[%c0, %c0], %cst {in_bounds = [true, false]} : tensor<?x?xf32>, vector<3xf32>

// CHECK: %[[U0:.*]] = "some_use"(%[[V0_ARG_L2]]) : (vector<1xf32>) -> vector<1xf32>		// CHECK: %[[U0:.*]] = "some_use"(%[[V0_ARG_L2]]) : (vector<1xf32>) -> vector<1xf32>
// CHECK: %[[U1:.*]] = "some_use"(%[[V1]]) : (vector<2xf32>) -> vector<2xf32>		// CHECK: %[[U1:.*]] = "some_use"(%[[V1]]) : (vector<2xf32>) -> vector<2xf32>
// CHECK: %[[U2:.*]] = "some_use"(%[[V2]]) : (vector<3xf32>) -> vector<3xf32>		// CHECK: %[[U2:.*]] = "some_use"(%[[V2]]) : (vector<3xf32>) -> vector<3xf32>
%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>		%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>		%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>
%u2 = "some_use"(%r2) : (vector<3xf32>) -> vector<3xf32>		%u2 = "some_use"(%r2) : (vector<3xf32>) -> vector<3xf32>

// Hoists		// Hoists
%w0 = vector.transfer_write %u0, %st0[%c0, %c0] : vector<1xf32>, tensor<?x?xf32>		%w0 = vector.transfer_write %u0, %st0[%c0, %c0] {in_bounds = [true, false]} : vector<1xf32>, tensor<?x?xf32>

// CHECK-DAG: %[[STI1:.]] = vector.transfer_write %[[U1]], %{{.}} : vector<2xf32>, tensor<?x?xf32>		// CHECK-DAG: %[[STI1:.]] = vector.transfer_write %[[U1]], %{{.}} : vector<2xf32>, tensor<?x?xf32>
// Does not hoist (associated slice depends on %j).		// Does not hoist (associated slice depends on %j).
%w1 = vector.transfer_write %u1, %st1[%i, %i] : vector<2xf32>, tensor<?x?xf32>		%w1 = vector.transfer_write %u1, %st1[%i, %i] {in_bounds = [true, false]} : vector<2xf32>, tensor<?x?xf32>

// CHECK-DAG: %[[STI2:.]] = vector.transfer_write %[[U2]], %{{.}} : vector<3xf32>, tensor<?x?xf32>		// CHECK-DAG: %[[STI2:.]] = vector.transfer_write %[[U2]], %{{.}} : vector<3xf32>, tensor<?x?xf32>
// Does not hoist, 2 slice / insert_slice for %arg8.		// Does not hoist, 2 slice / insert_slice for %arg8.
%w2 = vector.transfer_write %u2, %st2[%c0, %c0] : vector<3xf32>, tensor<?x?xf32>		%w2 = vector.transfer_write %u2, %st2[%c0, %c0] {in_bounds = [true, false]} : vector<3xf32>, tensor<?x?xf32>

// Hoists.		// Hoists.
%sti0 = tensor.insert_slice %w0 into %arg6[%i, %i][%step, %step][1, 1] : tensor<?x?xf32> into tensor<100x100xf32>		%sti0 = tensor.insert_slice %w0 into %arg6[%i, %i][%step, %step][1, 1] : tensor<?x?xf32> into tensor<100x100xf32>

// CHECK-DAG: tensor.insert_slice %[[STI1]] into %[[TENSOR1_ARG_L2]][%[[J]],{{.*}}: tensor<?x?xf32> into tensor<200x200xf32>		// CHECK-DAG: tensor.insert_slice %[[STI1]] into %[[TENSOR1_ARG_L2]][%[[J]],{{.*}}: tensor<?x?xf32> into tensor<200x200xf32>
// Does not hoist (depends on %j).		// Does not hoist (depends on %j).
%sti1 = tensor.insert_slice %w1 into %arg7[%j, %c0][%step, %step][1, 1] : tensor<?x?xf32> into tensor<200x200xf32>		%sti1 = tensor.insert_slice %w1 into %arg7[%j, %c0][%step, %step][1, 1] : tensor<?x?xf32> into tensor<200x200xf32>

▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Lines
// CHECK: }		// CHECK: }
func.func @non_matching_transfers(%m: memref<6x1x7x32xf32>) {		func.func @non_matching_transfers(%m: memref<6x1x7x32xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1024 = arith.constant 1024 : index		%c1024 = arith.constant 1024 : index
%c128 = arith.constant 128 : index		%c128 = arith.constant 128 : index
%cst = arith.constant dense<5.5> : vector<6x7x32xf32>		%cst = arith.constant dense<5.5> : vector<6x7x32xf32>
%cst_0 = arith.constant 0.0 : f32		%cst_0 = arith.constant 0.0 : f32
scf.for %iv = %c0 to %c1024 step %c128 {		scf.for %iv = %c0 to %c1024 step %c128 {
%read = vector.transfer_read %m[%c0, %c0, %c0, %c0], %cst_0 {in_bounds = [true, true, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>} : memref<6x1x7x32xf32>, vector<6x7x32xf32>		%read = vector.transfer_read %m[%c0, %c0, %c0, %c0], %cst_0 {in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>} : memref<6x1x7x32xf32>, vector<6x7x32xf32>
%added = arith.addf %read, %cst : vector<6x7x32xf32>		%added = arith.addf %read, %cst : vector<6x7x32xf32>
%bc = vector.broadcast %added : vector<6x7x32xf32> to vector<1x6x7x32xf32>		%bc = vector.broadcast %added : vector<6x7x32xf32> to vector<1x6x7x32xf32>
%tr = vector.transpose %bc, [1, 0, 2, 3] : vector<1x6x7x32xf32> to vector<6x1x7x32xf32>		%tr = vector.transpose %bc, [1, 0, 2, 3] : vector<1x6x7x32xf32> to vector<6x1x7x32xf32>
vector.transfer_write %tr, %m[%c0, %c0, %c0, %c0] {in_bounds = [true, true, true, true]} : vector<6x1x7x32xf32>, memref<6x1x7x32xf32>		vector.transfer_write %tr, %m[%c0, %c0, %c0, %c0] {in_bounds = [true, true, true, true]} : vector<6x1x7x32xf32>, memref<6x1x7x32xf32>
}		}
return		return
}		}

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !transform.any_op):		^bb1(%arg1: !transform.any_op):
%0 = transform.structured.match ops{["func.func"]} in %arg1		%0 = transform.structured.match ops{["func.func"]} in %arg1
: (!transform.any_op) -> !transform.any_op		: (!transform.any_op) -> !transform.any_op
transform.structured.hoist_redundant_vector_transfers %0		transform.structured.hoist_redundant_vector_transfers %0
: (!transform.any_op) -> !transform.any_op		: (!transform.any_op) -> !transform.any_op
}		}

mlir/test/Dialect/Linalg/vectorization-masked.mlir

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	^bb(%in0: f32, %in1: f32, %out: f32) :
linalg.yield %0 : f32		linalg.yield %0 : f32
} -> tensor<?xf32>		} -> tensor<?xf32>
return %0 : tensor<?xf32>		return %0 : tensor<?xf32>
}		}

// CHECK-LABEL: @vectorize_dynamic_1d_broadcast		// CHECK-LABEL: @vectorize_dynamic_1d_broadcast
// CHECK: %[[VAL_3:.*]] = arith.constant 0 : index		// CHECK: %[[VAL_3:.*]] = arith.constant 0 : index
// CHECK: %[[VAL_4:.]] = tensor.dim %{{.}}, %[[VAL_3]] : tensor<?xf32>		// CHECK: %[[VAL_4:.]] = tensor.dim %{{.}}, %[[VAL_3]] : tensor<?xf32>
// CHECK: %[[VAL_7:.]] = vector.transfer_read %{{.}} {permutation_map = #{{.*}}} : tensor<?xf32>, vector<4xf32>		// CHECK: %[[VAL_7:.]] = vector.transfer_read %{{.}} {in_bounds = [true], permutation_map = #{{.*}}} : tensor<?xf32>, vector<4xf32>
// CHECK: %[[VAL_9:.*]] = vector.create_mask %[[VAL_4]] : vector<4xi1>		// CHECK: %[[VAL_9:.*]] = vector.create_mask %[[VAL_4]] : vector<4xi1>
// CHECK: %[[VAL_10:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>		// CHECK: %[[VAL_10:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
// CHECK: %[[VAL_12:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>		// CHECK: %[[VAL_12:.]] = vector.mask %[[VAL_9]] { vector.transfer_read %{{.}} {in_bounds = [true]} : tensor<?xf32>, vector<4xf32> } : vector<4xi1> -> vector<4xf32>
// CHECK: %[[VAL_13:.*]] = arith.addf %[[VAL_7]], %[[VAL_10]] : vector<4xf32>		// CHECK: %[[VAL_13:.*]] = arith.addf %[[VAL_7]], %[[VAL_10]] : vector<4xf32>
// CHECK: %[[VAL_14:.]] = vector.mask %{{.}} { vector.transfer_write %[[VAL_13]], {{.*}} {in_bounds = [true]} : vector<4xf32>, tensor<?xf32> } : vector<4xi1> -> tensor<?xf32>		// CHECK: %[[VAL_14:.]] = vector.mask %{{.}} { vector.transfer_write %[[VAL_13]], {{.*}} {in_bounds = [true]} : vector<4xf32>, tensor<?xf32> } : vector<4xi1> -> tensor<?xf32>

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !transform.any_op):		^bb1(%arg1: !transform.any_op):
▲ Show 20 Lines • Show All 394 Lines • ▼ Show 20 Lines
// CHECK-DAG: %[[VAL_4:.*]] = memref.dim %[[VAL_0]], %[[VAL_3]] : memref<?x?xf32>		// CHECK-DAG: %[[VAL_4:.*]] = memref.dim %[[VAL_0]], %[[VAL_3]] : memref<?x?xf32>
// CHECK-DAG: %[[VAL_5:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[VAL_5:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[VAL_6:.*]] = memref.dim %[[VAL_1]], %[[VAL_5]] : memref<?x?xf32>		// CHECK-DAG: %[[VAL_6:.*]] = memref.dim %[[VAL_1]], %[[VAL_5]] : memref<?x?xf32>
// CHECK-DAG: %[[VAL_7:.*]] = arith.constant 1 : index		// CHECK-DAG: %[[VAL_7:.*]] = arith.constant 1 : index
// CHECK-DAG: %[[VAL_8:.*]] = memref.dim %[[VAL_0]], %[[VAL_7]] : memref<?x?xf32>		// CHECK-DAG: %[[VAL_8:.*]] = memref.dim %[[VAL_0]], %[[VAL_7]] : memref<?x?xf32>
// CHECK-DAG: %[[VAL_9:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[VAL_9:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[VAL_10:.*]] = arith.constant 0.000000e+00 : f32		// CHECK-DAG: %[[VAL_10:.*]] = arith.constant 0.000000e+00 : f32
// CHECK: %[[VAL_11:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_8]] : vector<8x4xi1>		// CHECK: %[[VAL_11:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_8]] : vector<8x4xi1>
// CHECK: %[[VAL_12:.*]] = vector.mask %[[VAL_11]] { vector.transfer_read %[[VAL_0]]{{\[}}%[[VAL_9]], %[[VAL_9]]], %[[VAL_10]] {in_bounds = [true, true, true], permutation_map = #map} : memref<?x?xf32>, vector<8x16x4xf32> } : vector<8x4xi1> -> vector<8x16x4xf32>		// CHECK: %[[VAL_12:.*]] = vector.mask %[[VAL_11]] { vector.transfer_read %[[VAL_0]]{{\[}}%[[VAL_9]], %[[VAL_9]]], %[[VAL_10]] {in_bounds = [true, true], permutation_map = #map} : memref<?x?xf32>, vector<8x16x4xf32> } : vector<8x4xi1> -> vector<8x16x4xf32>
// CHECK: %[[VAL_13:.*]] = arith.constant 0.000000e+00 : f32		// CHECK: %[[VAL_13:.*]] = arith.constant 0.000000e+00 : f32
// CHECK: %[[VAL_14:.*]] = vector.create_mask %[[VAL_8]], %[[VAL_6]] : vector<4x16xi1>		// CHECK: %[[VAL_14:.*]] = vector.create_mask %[[VAL_8]], %[[VAL_6]] : vector<4x16xi1>
// CHECK: %[[VAL_15:.*]] = vector.mask %[[VAL_14]] { vector.transfer_read %[[VAL_1]]{{\[}}%[[VAL_9]], %[[VAL_9]]], %[[VAL_13]] {in_bounds = [true, true, true], permutation_map = #map1} : memref<?x?xf32>, vector<8x16x4xf32> } : vector<4x16xi1> -> vector<8x16x4xf32>		// CHECK: %[[VAL_15:.*]] = vector.mask %[[VAL_14]] { vector.transfer_read %[[VAL_1]]{{\[}}%[[VAL_9]], %[[VAL_9]]], %[[VAL_13]] {in_bounds = [true, true], permutation_map = #map1} : memref<?x?xf32>, vector<8x16x4xf32> } : vector<4x16xi1> -> vector<8x16x4xf32>
// CHECK: %[[VAL_16:.*]] = arith.constant 0.000000e+00 : f32		// CHECK: %[[VAL_16:.*]] = arith.constant 0.000000e+00 : f32
// CHECK: %[[VAL_17:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_6]] : vector<8x16xi1>		// CHECK: %[[VAL_17:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_6]] : vector<8x16xi1>
// CHECK: %[[VAL_18:.*]] = vector.mask %[[VAL_17]] { vector.transfer_read %[[VAL_2]]{{\[}}%[[VAL_9]], %[[VAL_9]]], %[[VAL_16]] {in_bounds = [true, true]} : memref<?x?xf32>, vector<8x16xf32> } : vector<8x16xi1> -> vector<8x16xf32>		// CHECK: %[[VAL_18:.*]] = vector.mask %[[VAL_17]] { vector.transfer_read %[[VAL_2]]{{\[}}%[[VAL_9]], %[[VAL_9]]], %[[VAL_16]] {in_bounds = [true, true]} : memref<?x?xf32>, vector<8x16xf32> } : vector<8x16xi1> -> vector<8x16xf32>
// CHECK: %[[VAL_19:.*]] = arith.mulf %[[VAL_12]], %[[VAL_15]] : vector<8x16x4xf32>		// CHECK: %[[VAL_19:.*]] = arith.mulf %[[VAL_12]], %[[VAL_15]] : vector<8x16x4xf32>
// CHECK: %[[VAL_20:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_6]], %[[VAL_8]] : vector<8x16x4xi1>		// CHECK: %[[VAL_20:.*]] = vector.create_mask %[[VAL_4]], %[[VAL_6]], %[[VAL_8]] : vector<8x16x4xi1>
// CHECK: %[[VAL_21:.*]] = vector.mask %[[VAL_20]] { vector.multi_reduction <add>, %[[VAL_19]], %[[VAL_18]] [2] : vector<8x16x4xf32> to vector<8x16xf32> } : vector<8x16x4xi1> -> vector<8x16xf32>		// CHECK: %[[VAL_21:.*]] = vector.mask %[[VAL_20]] { vector.multi_reduction <add>, %[[VAL_19]], %[[VAL_18]] [2] : vector<8x16x4xf32> to vector<8x16xf32> } : vector<8x16x4xi1> -> vector<8x16xf32>
// CHECK: %[[VAL_22:.*]] = arith.constant 0 : index		// CHECK: %[[VAL_22:.*]] = arith.constant 0 : index
// CHECK: vector.mask %[[VAL_17]] { vector.transfer_write %[[VAL_21]], %[[VAL_2]]{{\[}}%[[VAL_22]], %[[VAL_22]]] {in_bounds = [true, true]} : vector<8x16xf32>, memref<?x?xf32> } : vector<8x16xi1>		// CHECK: vector.mask %[[VAL_17]] { vector.transfer_write %[[VAL_21]], %[[VAL_2]]{{\[}}%[[VAL_22]], %[[VAL_22]]] {in_bounds = [true, true]} : vector<8x16xf32>, memref<?x?xf32> } : vector<8x16xi1>
Show All 9 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

Show First 20 Lines • Show All 657 Lines • ▼ Show 20 Lines

// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1) -> (d0, 0, 0, d1)>		// CHECK-DAG: #[[$MAP0:.*]] = affine_map<(d0, d1) -> (d0, 0, 0, d1)>
// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0) -> (d0, 0, 0, 0)>		// CHECK-DAG: #[[$MAP1:.*]] = affine_map<(d0) -> (d0, 0, 0, 0)>
// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0) -> (0, 0, d0, 0)>		// CHECK-DAG: #[[$MAP2:.*]] = affine_map<(d0) -> (0, 0, d0, 0)>
// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1) -> (d1, 0, d0, 0)>		// CHECK-DAG: #[[$MAP3:.*]] = affine_map<(d0, d1) -> (d1, 0, d0, 0)>
// CHECK: func @generic_vectorize_broadcast_transpose		// CHECK: func @generic_vectorize_broadcast_transpose
// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[CF:.*]] = arith.constant 0.000000e+00 : f32		// CHECK-DAG: %[[CF:.*]] = arith.constant 0.000000e+00 : f32
// CHECK: %[[V0:.]] = vector.transfer_read %{{.}}[%[[C0]], %[[C0]]], %[[CF]] {in_bounds = [true, true, true, true], permutation_map = #[[$MAP0]]} : memref<4x4xf32>, vector<4x4x4x4xf32>		// CHECK: %[[V0:.]] = vector.transfer_read %{{.}}[%[[C0]], %[[C0]]], %[[CF]] {in_bounds = [true, true], permutation_map = #[[$MAP0]]} : memref<4x4xf32>, vector<4x4x4x4xf32>
// CHECK: %[[V1:.]] = vector.transfer_read %{{.}}[%[[C0]]], %[[CF]] {in_bounds = [true, true, true, true], permutation_map = #[[$MAP1]]} : memref<4xf32>, vector<4x4x4x4xf32>		// CHECK: %[[V1:.]] = vector.transfer_read %{{.}}[%[[C0]]], %[[CF]] {in_bounds = [true], permutation_map = #[[$MAP1]]} : memref<4xf32>, vector<4x4x4x4xf32>
// CHECK: %[[V2:.]] = vector.transfer_read %{{.}}[%[[C0]]], %[[CF]] {in_bounds = [true, true, true, true], permutation_map = #[[$MAP2]]} : memref<4xf32>, vector<4x4x4x4xf32>		// CHECK: %[[V2:.]] = vector.transfer_read %{{.}}[%[[C0]]], %[[CF]] {in_bounds = [true], permutation_map = #[[$MAP2]]} : memref<4xf32>, vector<4x4x4x4xf32>
// CHECK: %[[V3:.]] = vector.transfer_read %{{.}}[%[[C0]], %[[C0]]], %[[CF]] {in_bounds = [true, true, true, true], permutation_map = #[[$MAP3]]} : memref<4x4xf32>, vector<4x4x4x4xf32>		// CHECK: %[[V3:.]] = vector.transfer_read %{{.}}[%[[C0]], %[[C0]]], %[[CF]] {in_bounds = [true, true], permutation_map = #[[$MAP3]]} : memref<4x4xf32>, vector<4x4x4x4xf32>
// CHECK: %[[SUB:.*]] = arith.subf %[[V0]], %[[V1]] : vector<4x4x4x4xf32>		// CHECK: %[[SUB:.*]] = arith.subf %[[V0]], %[[V1]] : vector<4x4x4x4xf32>
// CHECK: %[[ADD0:.*]] = arith.addf %[[V2]], %[[SUB]] : vector<4x4x4x4xf32>		// CHECK: %[[ADD0:.*]] = arith.addf %[[V2]], %[[SUB]] : vector<4x4x4x4xf32>
// CHECK: %[[ADD1:.*]] = arith.addf %[[V3]], %[[ADD0]] : vector<4x4x4x4xf32>		// CHECK: %[[ADD1:.*]] = arith.addf %[[V3]], %[[ADD0]] : vector<4x4x4x4xf32>
// CHECK: vector.transfer_write %[[ADD1]], {{.*}} : vector<4x4x4x4xf32>, memref<4x4x4x4xf32>		// CHECK: vector.transfer_write %[[ADD1]], {{.*}} : vector<4x4x4x4xf32>, memref<4x4x4x4xf32>
func.func @generic_vectorize_broadcast_transpose(		func.func @generic_vectorize_broadcast_transpose(
%A: memref<4xf32>, %B: memref<4x4xf32>, %C: memref<4x4x4x4xf32>) {		%A: memref<4xf32>, %B: memref<4x4xf32>, %C: memref<4x4x4x4xf32>) {
linalg.generic {		linalg.generic {
indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3)>,		indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d3)>,
Show All 32 Lines	#matmul_trait = {
],		],
iterator_types = ["parallel", "parallel", "parallel", "parallel"]		iterator_types = ["parallel", "parallel", "parallel", "parallel"]
}		}

// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d1, d0, 0, 0)>		// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d1, d0, 0, 0)>
// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0, d1) -> (0, d1, 0, d0)>		// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0, d1) -> (0, d1, 0, d0)>
// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d2, d1, d3, d0)>		// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d2, d1, d3, d0)>
// CHECK: func @vectorization_transpose		// CHECK: func @vectorization_transpose
// CHECK: vector.transfer_read {{.*}}{in_bounds = [true, true, true, true], permutation_map = #[[MAP0]]} : memref<14x7xf32>, vector<7x14x8x16xf32>		// CHECK: vector.transfer_read {{.*}}{in_bounds = [true, true], permutation_map = #[[MAP0]]} : memref<14x7xf32>, vector<7x14x8x16xf32>
// CHECK: vector.transfer_read {{.*}}{in_bounds = [true, true, true, true], permutation_map = #[[MAP1]]} : memref<16x14xf32>, vector<7x14x8x16xf32>		// CHECK: vector.transfer_read {{.*}}{in_bounds = [true, true], permutation_map = #[[MAP1]]} : memref<16x14xf32>, vector<7x14x8x16xf32>
// CHECK: vector.transfer_read {{.*}}{in_bounds = [true, true, true, true], permutation_map = #[[MAP2]]} : memref<16x14x7x8xf32>, vector<7x14x8x16xf32>		// CHECK: vector.transfer_read {{.*}}{in_bounds = [true, true, true, true], permutation_map = #[[MAP2]]} : memref<16x14x7x8xf32>, vector<7x14x8x16xf32>
// CHECK: arith.addf {{.*}} : vector<7x14x8x16xf32>		// CHECK: arith.addf {{.*}} : vector<7x14x8x16xf32>
// CHECK: arith.addf {{.*}} : vector<7x14x8x16xf32>		// CHECK: arith.addf {{.*}} : vector<7x14x8x16xf32>
// CHECK: vector.transfer_write {{.*}} : vector<7x14x8x16xf32>, memref<7x14x8x16xf32>		// CHECK: vector.transfer_write {{.*}} : vector<7x14x8x16xf32>, memref<7x14x8x16xf32>
func.func @vectorization_transpose(%A: memref<14x7xf32>, %B: memref<16x14xf32>,		func.func @vectorization_transpose(%A: memref<14x7xf32>, %B: memref<16x14xf32>,
%C: memref<16x14x7x8xf32>, %D: memref<7x14x8x16xf32>) {		%C: memref<16x14x7x8xf32>, %D: memref<7x14x8x16xf32>) {
linalg.generic #matmul_trait		linalg.generic #matmul_trait
ins(%A, %B, %C : memref<14x7xf32>, memref<16x14xf32>, memref<16x14x7x8xf32>)		ins(%A, %B, %C : memref<14x7xf32>, memref<16x14xf32>, memref<16x14x7x8xf32>)
▲ Show 20 Lines • Show All 399 Lines • ▼ Show 20 Lines
// CHECK-DAG: #[[$M1:.*]] = affine_map<(d0, d1) -> (d1, d0, 0, 0)>		// CHECK-DAG: #[[$M1:.*]] = affine_map<(d0, d1) -> (d1, d0, 0, 0)>
// CHECK-DAG: #[[$M2:.*]] = affine_map<(d0, d1) -> (0, 0, d1, d0)>		// CHECK-DAG: #[[$M2:.*]] = affine_map<(d0, d1) -> (0, 0, d1, d0)>
// CHECK-DAG: #[[$M3:.*]] = affine_map<(d0, d1) -> (d1, d0)>		// CHECK-DAG: #[[$M3:.*]] = affine_map<(d0, d1) -> (d1, d0)>

// CHECK-LABEL: func @sum_exp_2		// CHECK-LABEL: func @sum_exp_2
func.func @sum_exp_2(%input: tensor<3x2xf32>, %input_2: tensor<5x4xf32>, %output: tensor<5x2xf32>)		func.func @sum_exp_2(%input: tensor<3x2xf32>, %input_2: tensor<5x4xf32>, %output: tensor<5x2xf32>)
-> tensor<5x2xf32>		-> tensor<5x2xf32>
{		{
// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true, true, true], permutation_map = #[[$M1]]} : tensor<3x2xf32>, vector<2x3x4x5xf32>		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M1]]} : tensor<3x2xf32>, vector<2x3x4x5xf32>
// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true, true, true], permutation_map = #[[$M2]]} : tensor<5x4xf32>, vector<2x3x4x5xf32>		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M2]]} : tensor<5x4xf32>, vector<2x3x4x5xf32>
// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : tensor<5x2xf32>, vector<2x5xf32>		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : tensor<5x2xf32>, vector<2x5xf32>
// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>		// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>
// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>		// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>
// CHECK: addf {{.*}} : vector<2x3x4x5xf32>		// CHECK: addf {{.*}} : vector<2x3x4x5xf32>
// CHECK: vector.multi_reduction <add>, {{.}}, %{{.}} [1, 2] : vector<2x3x4x5xf32> to vector<2x5xf32>		// CHECK: vector.multi_reduction <add>, {{.}}, %{{.}} [1, 2] : vector<2x3x4x5xf32> to vector<2x5xf32>
// CHECK: vector.transfer_write {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : vector<2x5xf32>, tensor<5x2xf32>		// CHECK: vector.transfer_write {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : vector<2x5xf32>, tensor<5x2xf32>
// CHECK: return {{.*}} : tensor<5x2xf32>		// CHECK: return {{.*}} : tensor<5x2xf32>
%0 = linalg.generic {		%0 = linalg.generic {
▲ Show 20 Lines • Show All 644 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir

	Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
	// CHECK: %[[VAL_12:.*]] = vector.broadcast %[[VAL_11]] : index to vector<1x4xindex>			// CHECK: %[[VAL_12:.*]] = vector.broadcast %[[VAL_11]] : index to vector<1x4xindex>
	// CHECK: %[[VAL_13:.*]] = vector.broadcast %[[VAL_3]] : index to vector<4xindex>			// CHECK: %[[VAL_13:.*]] = vector.broadcast %[[VAL_3]] : index to vector<4xindex>
	// CHECK: %[[VAL_14:.*]] = arith.addi %[[VAL_13]], %[[VAL_6]] : vector<4xindex>			// CHECK: %[[VAL_14:.*]] = arith.addi %[[VAL_13]], %[[VAL_6]] : vector<4xindex>
	// CHECK: %[[VAL_15:.*]] = vector.broadcast %[[VAL_4]] : index to vector<4xindex>			// CHECK: %[[VAL_15:.*]] = vector.broadcast %[[VAL_4]] : index to vector<4xindex>
	// CHECK: %[[VAL_16:.*]] = arith.addi %[[VAL_14]], %[[VAL_15]] : vector<4xindex>			// CHECK: %[[VAL_16:.*]] = arith.addi %[[VAL_14]], %[[VAL_15]] : vector<4xindex>
	// CHECK: %[[VAL_17:.*]] = vector.shape_cast %[[VAL_12]] : vector<1x4xindex> to vector<4xindex>			// CHECK: %[[VAL_17:.*]] = vector.shape_cast %[[VAL_12]] : vector<1x4xindex> to vector<4xindex>
	// CHECK: %[[VAL_18:.*]] = vector.extractelement %[[VAL_17]]{{\[}}%[[VAL_7]] : i32] : vector<4xindex>			// CHECK: %[[VAL_18:.*]] = vector.extractelement %[[VAL_17]]{{\[}}%[[VAL_7]] : i32] : vector<4xindex>
	// CHECK: %[[VAL_19:.*]] = vector.extractelement %[[VAL_16]]{{\[}}%[[VAL_7]] : i32] : vector<4xindex>			// CHECK: %[[VAL_19:.*]] = vector.extractelement %[[VAL_16]]{{\[}}%[[VAL_7]] : i32] : vector<4xindex>
	// CHECK: %[[VAL_20:.*]] = vector.transfer_read %[[VAL_0]]{{\[}}%[[VAL_18]], %[[VAL_10]], %[[VAL_19]]], %[[VAL_8]] {in_bounds = [true, true]} : tensor<45x80x16xf32>, vector<1x4xf32>			// CHECK: %[[VAL_20:.*]] = vector.transfer_read %[[VAL_0]]{{\[}}%[[VAL_18]], %[[VAL_10]], %[[VAL_19]]], %[[VAL_8]] {in_bounds = [true, true, true]} : tensor<45x80x16xf32>, vector<1x4xf32>
	// CHECK: %[[VAL_21:.*]] = vector.transfer_write %[[VAL_20]], %[[VAL_5]]{{\[}}%[[VAL_9]], %[[VAL_9]]] {in_bounds = [true, true]} : vector<1x4xf32>, tensor<1x4xf32>			// CHECK: %[[VAL_21:.*]] = vector.transfer_write %[[VAL_20]], %[[VAL_5]]{{\[}}%[[VAL_9]], %[[VAL_9]]] {in_bounds = [true, true]} : vector<1x4xf32>, tensor<1x4xf32>
	// CHECK: return %[[VAL_21]] : tensor<1x4xf32>			// CHECK: return %[[VAL_21]] : tensor<1x4xf32>
	// CHECK: }			// CHECK: }

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !transform.any_op):			^bb1(%arg1: !transform.any_op):
	%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op			%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
	%1 = get_parent_op %0 {isolated_from_above} : (!transform.any_op) -> !transform.any_op			%1 = get_parent_op %0 {isolated_from_above} : (!transform.any_op) -> !transform.any_op
	▲ Show 20 Lines • Show All 368 Lines • Show Last 20 Lines

mlir/test/Dialect/MemRef/extract-address-computations.mlir

	Show First 20 Lines • Show All 261 Lines • ▼ Show 20 Lines
	// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)			// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)
	// CHECK-DAG: {{.}}, {{.}}, %[[DYN_SIZES:.]]:3, {{.}} = memref.extract_strided_metadata %[[BASE]]			// CHECK-DAG: {{.}}, {{.}}, %[[DYN_SIZES:.]]:3, {{.}} = memref.extract_strided_metadata %[[BASE]]
	// CHECK-DAG: %[[DYN_SIZE0:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#0, %[[DYN_OFFSET0]]]			// CHECK-DAG: %[[DYN_SIZE0:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#0, %[[DYN_OFFSET0]]]
	// CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#1, %[[DYN_OFFSET1]]]			// CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#1, %[[DYN_OFFSET1]]]
	// CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]			// CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[CF0:.]] = arith.constant 0.0{{0e\+00}} : f16			// CHECK-DAG: %[[CF0:.]] = arith.constant 0.0{{0e\+00}} : f16
	// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>			// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>
	// CHECK: %[[LOADED_VAL:.*]] = vector.transfer_read %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]], %[[CF0]] {permutation_map = #[[$PERMUTATION_MAP]]} : memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>, vector<4x2xf16>			// CHECK: %[[LOADED_VAL:.*]] = vector.transfer_read %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]], %[[CF0]] {in_bounds = [false, true, false], permutation_map = #[[$PERMUTATION_MAP]]} : memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>, vector<4x2xf16>
	// CHECK: return %[[LOADED_VAL]] : vector<4x2xf16>			// CHECK: return %[[LOADED_VAL]] : vector<4x2xf16>
	func.func @test_transfer_read_op(%base : memref<?x?x?xf16>,			func.func @test_transfer_read_op(%base : memref<?x?x?xf16>,
	%offset0 : index, %offset1: index, %offset2: index)			%offset0 : index, %offset1: index, %offset2: index)
	-> vector<4x2xf16> {			-> vector<4x2xf16> {
	%cf0 = arith.constant 0.0 : f16			%cf0 = arith.constant 0.0 : f16
	%loaded_val = vector.transfer_read %base[%offset0, %offset1, %offset2], %cf0 { permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : memref<?x?x?xf16>, vector<4x2xf16>			%loaded_val = vector.transfer_read %base[%offset0, %offset1, %offset2], %cf0 { in_bounds = [false, true, false], permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : memref<?x?x?xf16>, vector<4x2xf16>
	return %loaded_val : vector<4x2xf16>			return %loaded_val : vector<4x2xf16>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !transform.any_op):			^bb1(%arg1: !transform.any_op):
	%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op			%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op
	transform.apply_patterns to %0 {			transform.apply_patterns to %0 {
	transform.apply_patterns.memref.extract_address_computations			transform.apply_patterns.memref.extract_address_computations
	} : !transform.any_op			} : !transform.any_op
	}			}

	// -----			// -----

	// Same as test_transfer_read_op but with tensors.			// Same as test_transfer_read_op but with tensors.
	// Right now this rewrite is not supported but we still shouldn't choke on it.			// Right now this rewrite is not supported but we still shouldn't choke on it.

	// CHECK: #[[$PERMUTATION_MAP:.*]] = affine_map<(d0, d1, d2) -> (d2, d0)>			// CHECK: #[[$PERMUTATION_MAP:.*]] = affine_map<(d0, d1, d2) -> (d2, d0)>
	// CHECK-LABEL: @test_transfer_read_op_with_tensor(			// CHECK-LABEL: @test_transfer_read_op_with_tensor(
	// CHECK-SAME: %[[BASE:[^:]]]: tensor<{{[^,]}}>,			// CHECK-SAME: %[[BASE:[^:]]]: tensor<{{[^,]}}>,
	// CHECK-SAME: %[[DYN_OFFSET0:[^:]*]]: index,			// CHECK-SAME: %[[DYN_OFFSET0:[^:]*]]: index,
	// CHECK-SAME: %[[DYN_OFFSET1:[^:]*]]: index,			// CHECK-SAME: %[[DYN_OFFSET1:[^:]*]]: index,
	// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)			// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)
	// CHECK: %[[CF0:.]] = arith.constant 0.0{{0e\+00}} : f16			// CHECK: %[[CF0:.]] = arith.constant 0.0{{0e\+00}} : f16
	// CHECK: %[[LOADED_VAL:.*]] = vector.transfer_read %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]], %[[CF0]] {permutation_map = #[[$PERMUTATION_MAP]]} : tensor<?x?x?xf16>, vector<4x2xf16>			// CHECK: %[[LOADED_VAL:.*]] = vector.transfer_read %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]], %[[CF0]] {in_bounds = [false, true, false], permutation_map = #[[$PERMUTATION_MAP]]} : tensor<?x?x?xf16>, vector<4x2xf16>
	// CHECK: return %[[LOADED_VAL]] : vector<4x2xf16>			// CHECK: return %[[LOADED_VAL]] : vector<4x2xf16>
	func.func @test_transfer_read_op_with_tensor(%base : tensor<?x?x?xf16>,			func.func @test_transfer_read_op_with_tensor(%base : tensor<?x?x?xf16>,
	%offset0 : index, %offset1: index, %offset2: index)			%offset0 : index, %offset1: index, %offset2: index)
	-> vector<4x2xf16> {			-> vector<4x2xf16> {
	%cf0 = arith.constant 0.0 : f16			%cf0 = arith.constant 0.0 : f16
	%loaded_val = vector.transfer_read %base[%offset0, %offset1, %offset2], %cf0 { permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : tensor<?x?x?xf16>, vector<4x2xf16>			%loaded_val = vector.transfer_read %base[%offset0, %offset1, %offset2], %cf0 { in_bounds = [false, true, false], permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : tensor<?x?x?xf16>, vector<4x2xf16>
	return %loaded_val : vector<4x2xf16>			return %loaded_val : vector<4x2xf16>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !transform.any_op):			^bb1(%arg1: !transform.any_op):
	%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op			%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op
	transform.apply_patterns to %0 {			transform.apply_patterns to %0 {
	transform.apply_patterns.memref.extract_address_computations			transform.apply_patterns.memref.extract_address_computations
	Show All 14 Lines
	// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)			// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)
	// CHECK-DAG: {{.}}, {{.}}, %[[DYN_SIZES:.]]:3, {{.}} = memref.extract_strided_metadata %[[BASE]]			// CHECK-DAG: {{.}}, {{.}}, %[[DYN_SIZES:.]]:3, {{.}} = memref.extract_strided_metadata %[[BASE]]
	// CHECK-DAG: %[[DYN_SIZE0:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#0, %[[DYN_OFFSET0]]]			// CHECK-DAG: %[[DYN_SIZE0:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#0, %[[DYN_OFFSET0]]]
	// CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#1, %[[DYN_OFFSET1]]]			// CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#1, %[[DYN_OFFSET1]]]
	// CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]			// CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[VCF0:.]] = arith.constant dense<0.0{{0e\+00}}> : vector<4x2xf16>			// CHECK-DAG: %[[VCF0:.]] = arith.constant dense<0.0{{0e\+00}}> : vector<4x2xf16>
	// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>			// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16> to memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>
	// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>			// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {in_bounds = [false, true, false], permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[?, ?, 1], offset: ?>>
	// CHECK: return			// CHECK: return
	func.func @test_transfer_write_op(%base : memref<?x?x?xf16>,			func.func @test_transfer_write_op(%base : memref<?x?x?xf16>,
	%offset0 : index, %offset1: index, %offset2: index) {			%offset0 : index, %offset1: index, %offset2: index) {
	%vcf0 = arith.constant dense<0.000000e+00> : vector<4x2xf16>			%vcf0 = arith.constant dense<0.000000e+00> : vector<4x2xf16>
	vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, memref<?x?x?xf16>			vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { in_bounds = [false, true, false], permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, memref<?x?x?xf16>
	return			return
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !transform.any_op):			^bb1(%arg1: !transform.any_op):
	%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op			%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op
	transform.apply_patterns to %0 {			transform.apply_patterns to %0 {
	transform.apply_patterns.memref.extract_address_computations			transform.apply_patterns.memref.extract_address_computations
	Show All 15 Lines
	// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)			// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)
	// CHECK-DAG: {{.}}, {{.}}, %[[DYN_SIZES:.]]:3, {{.}} = memref.extract_strided_metadata %[[BASE]]			// CHECK-DAG: {{.}}, {{.}}, %[[DYN_SIZES:.]]:3, {{.}} = memref.extract_strided_metadata %[[BASE]]
	// CHECK-DAG: %[[DYN_SIZE0:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#0, %[[DYN_OFFSET0]]]			// CHECK-DAG: %[[DYN_SIZE0:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#0, %[[DYN_OFFSET0]]]
	// CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#1, %[[DYN_OFFSET1]]]			// CHECK-DAG: %[[DYN_SIZE1:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#1, %[[DYN_OFFSET1]]]
	// CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]			// CHECK-DAG: %[[DYN_SIZE2:.*]] = affine.apply #[[$A_MINUS_B_MAP]]()[%[[DYN_SIZES]]#2, %[[DYN_OFFSET2]]]
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	// CHECK-DAG: %[[VCF0:.]] = arith.constant dense<0.0{{0e\+00}}> : vector<4x2xf16>			// CHECK-DAG: %[[VCF0:.]] = arith.constant dense<0.0{{0e\+00}}> : vector<4x2xf16>
	// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>> to memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>			// CHECK-DAG: %[[SUBVIEW:.*]] = memref.subview %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] [%[[DYN_SIZE0]], %[[DYN_SIZE1]], %[[DYN_SIZE2]]] [1, 1, 1] : memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>> to memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>
	// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>			// CHECK: vector.transfer_write %[[VCF0]], %[[SUBVIEW]][%[[C0]], %[[C0]], %[[C0]]] {in_bounds = [false, true, false], permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>
	// CHECK: return			// CHECK: return
	func.func @test_transfer_write_op_with_strides(%base : memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>,			func.func @test_transfer_write_op_with_strides(%base : memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>,
	%offset0 : index, %offset1: index, %offset2: index) {			%offset0 : index, %offset1: index, %offset2: index) {
	%vcf0 = arith.constant dense<0.000000e+00> : vector<4x2xf16>			%vcf0 = arith.constant dense<0.000000e+00> : vector<4x2xf16>
	vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>			vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { in_bounds = [false, true, false], permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, memref<?x?x?xf16, strided<[329, 26, 12], offset: ?>>
	return			return
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !transform.any_op):			^bb1(%arg1: !transform.any_op):
	%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op			%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op
	transform.apply_patterns to %0 {			transform.apply_patterns to %0 {
	transform.apply_patterns.memref.extract_address_computations			transform.apply_patterns.memref.extract_address_computations
	} : !transform.any_op			} : !transform.any_op
	}			}

	// -----			// -----

	// Same as test_transfer_write_op but with tensors.			// Same as test_transfer_write_op but with tensors.
	// Right now this rewrite is not supported but we still shouldn't choke on it.			// Right now this rewrite is not supported but we still shouldn't choke on it.

	// CHECK: #[[$PERMUTATION_MAP:.*]] = affine_map<(d0, d1, d2) -> (d2, d0)>			// CHECK: #[[$PERMUTATION_MAP:.*]] = affine_map<(d0, d1, d2) -> (d2, d0)>
	// CHECK-LABEL: @test_transfer_write_op_with_tensor(			// CHECK-LABEL: @test_transfer_write_op_with_tensor(
	// CHECK-SAME: %[[BASE:[^:]]]: tensor<{{[^,]}}>,			// CHECK-SAME: %[[BASE:[^:]]]: tensor<{{[^,]}}>,
	// CHECK-SAME: %[[DYN_OFFSET0:[^:]*]]: index,			// CHECK-SAME: %[[DYN_OFFSET0:[^:]*]]: index,
	// CHECK-SAME: %[[DYN_OFFSET1:[^:]*]]: index,			// CHECK-SAME: %[[DYN_OFFSET1:[^:]*]]: index,
	// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)			// CHECK-SAME: %[[DYN_OFFSET2:[^:]*]]: index)
	// CHECK-DAG: %[[VCF0:.]] = arith.constant dense<0.0{{0e\+00}}> : vector<4x2xf16>			// CHECK-DAG: %[[VCF0:.]] = arith.constant dense<0.0{{0e\+00}}> : vector<4x2xf16>
	// CHECK: %[[RES:.*]] = vector.transfer_write %[[VCF0]], %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] {permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, tensor<?x?x?xf16>			// CHECK: %[[RES:.*]] = vector.transfer_write %[[VCF0]], %[[BASE]][%[[DYN_OFFSET0]], %[[DYN_OFFSET1]], %[[DYN_OFFSET2]]] {in_bounds = [false, true, false], permutation_map = #[[$PERMUTATION_MAP]]} : vector<4x2xf16>, tensor<?x?x?xf16>
	// CHECK: return %[[RES]] : tensor<?x?x?xf16>			// CHECK: return %[[RES]] : tensor<?x?x?xf16>
	func.func @test_transfer_write_op_with_tensor(%base : tensor<?x?x?xf16>,			func.func @test_transfer_write_op_with_tensor(%base : tensor<?x?x?xf16>,
	%offset0 : index, %offset1: index, %offset2: index) -> tensor<?x?x?xf16> {			%offset0 : index, %offset1: index, %offset2: index) -> tensor<?x?x?xf16> {
	%vcf0 = arith.constant dense<0.000000e+00> : vector<4x2xf16>			%vcf0 = arith.constant dense<0.000000e+00> : vector<4x2xf16>
	%res = vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, tensor<?x?x?xf16>			%res = vector.transfer_write %vcf0, %base[%offset0, %offset1, %offset2] { in_bounds = [false, true, false], permutation_map = affine_map<(d0,d1,d2) -> (d2,d0)> } : vector<4x2xf16>, tensor<?x?x?xf16>
	return %res : tensor<?x?x?xf16>			return %res : tensor<?x?x?xf16>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%arg1: !transform.any_op):			^bb1(%arg1: !transform.any_op):
	%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op			%0 = transform.structured.match ops{["func.func"]} in %arg1 : (!transform.any_op) -> !transform.any_op
	transform.apply_patterns to %0 {			transform.apply_patterns to %0 {
	transform.apply_patterns.memref.extract_address_computations			transform.apply_patterns.memref.extract_address_computations
	} : !transform.any_op			} : !transform.any_op
	}			}

mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir

	Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	// CHECK: vector.transfer_read %[[MEM]][%[[SZ0]], %[[SZ1]]]			// CHECK: vector.transfer_read %[[MEM]][%[[SZ0]], %[[SZ1]]]

	// -----			// -----

	func.func @fold_subview_with_transfer_read(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5 : index, %arg6 : index) -> vector<4xf32> {			func.func @fold_subview_with_transfer_read(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5 : index, %arg6 : index) -> vector<4xf32> {
	%f1 = arith.constant 1.0 : f32			%f1 = arith.constant 1.0 : f32

	%0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] : memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>			%0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] : memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>
	%1 = vector.transfer_read %0[%arg3, %arg4], %f1 {in_bounds = [true]} : memref<4x4xf32, strided<[?, ?], offset: ?>>, vector<4xf32>			%1 = vector.transfer_read %0[%arg3, %arg4], %f1 {in_bounds = [true, true]} : memref<4x4xf32, strided<[?, ?], offset: ?>>, vector<4xf32>
	return %1 : vector<4xf32>			return %1 : vector<4xf32>
	}			}
	// CHECK: func @fold_subview_with_transfer_read			// CHECK: func @fold_subview_with_transfer_read
	// Can't fold this atm since we don't emit the proper vector.extract_strided_slice.			// Can't fold this atm since we don't emit the proper vector.extract_strided_slice.
	// CHECK: memref.subview			// CHECK: memref.subview

	// -----			// -----

	Show All 13 Lines
	// CHECK-SAME: %[[V:[a-zA-Z0-9_]+]]: vector<f32>			// CHECK-SAME: %[[V:[a-zA-Z0-9_]+]]: vector<f32>
	// CHECK: vector.transfer_write %[[V]], %[[MEM]][%[[SZ0]], %[[SZ1]]]			// CHECK: vector.transfer_write %[[V]], %[[MEM]][%[[SZ0]], %[[SZ1]]]

	// -----			// -----

	func.func @fold_static_stride_subview_with_transfer_write(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5: index, %arg6 : index, %arg7 : vector<4xf32>) {			func.func @fold_static_stride_subview_with_transfer_write(%arg0 : memref<12x32xf32>, %arg1 : index, %arg2 : index, %arg3 : index, %arg4 : index, %arg5: index, %arg6 : index, %arg7 : vector<4xf32>) {
	%0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] :			%0 = memref.subview %arg0[%arg1, %arg2][4, 4][%arg5, %arg6] :
	memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>			memref<12x32xf32> to memref<4x4xf32, strided<[?, ?], offset: ?>>
	vector.transfer_write %arg7, %0[%arg3, %arg4] {in_bounds = [true]} : vector<4xf32>, memref<4x4xf32, strided<[?, ?], offset: ?>>			vector.transfer_write %arg7, %0[%arg3, %arg4] {in_bounds = [true, true]} : vector<4xf32>, memref<4x4xf32, strided<[?, ?], offset: ?>>
	return			return
	}			}
	// CHECK: func @fold_static_stride_subview_with_transfer_write			// CHECK: func @fold_static_stride_subview_with_transfer_write
	// Can't fold this atm since we don't emit the proper vector.extract_strided_slice.			// Can't fold this atm since we don't emit the proper vector.extract_strided_slice.
	// CHECK: memref.subview			// CHECK: memref.subview

	// -----			// -----

	Show All 37 Lines
	func.func @fold_vector_transfer_read_with_rank_reduced_subview(			func.func @fold_vector_transfer_read_with_rank_reduced_subview(
	%arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,			%arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,
	%arg1: index, %arg2 : index, %arg3 : index, %arg4: index, %arg5 : index,			%arg1: index, %arg2 : index, %arg3 : index, %arg4: index, %arg5 : index,
	%arg6 : index) -> vector<4xf32> {			%arg6 : index) -> vector<4xf32> {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%0 = memref.subview %arg0[0, %arg1, %arg2] [1, %arg3, %arg4] [1, 1, 1]			%0 = memref.subview %arg0[0, %arg1, %arg2] [1, %arg3, %arg4] [1, 1, 1]
	: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to			: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to
	memref<?x?xf32, strided<[?, ?], offset: ?>>			memref<?x?xf32, strided<[?, ?], offset: ?>>
	%1 = vector.transfer_read %0[%arg5, %arg6], %cst {in_bounds = [true]}			%1 = vector.transfer_read %0[%arg5, %arg6], %cst {in_bounds = [true, true]}
	: memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4xf32>			: memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4xf32>
	return %1 : vector<4xf32>			return %1 : vector<4xf32>
	}			}
	// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>			// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// CHECK: func @fold_vector_transfer_read_with_rank_reduced_subview			// CHECK: func @fold_vector_transfer_read_with_rank_reduced_subview
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index
	Show All 11 Lines
	func.func @fold_vector_transfer_write_with_rank_reduced_subview(			func.func @fold_vector_transfer_write_with_rank_reduced_subview(
	%arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,			%arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,
	%arg1 : vector<4xf32>, %arg2: index, %arg3 : index, %arg4 : index,			%arg1 : vector<4xf32>, %arg2: index, %arg3 : index, %arg4 : index,
	%arg5: index, %arg6 : index, %arg7 : index) {			%arg5: index, %arg6 : index, %arg7 : index) {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%0 = memref.subview %arg0[0, %arg2, %arg3] [1, %arg4, %arg5] [1, 1, 1]			%0 = memref.subview %arg0[0, %arg2, %arg3] [1, %arg4, %arg5] [1, 1, 1]
	: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to			: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to
	memref<?x?xf32, strided<[?, ?], offset: ?>>			memref<?x?xf32, strided<[?, ?], offset: ?>>
	vector.transfer_write %arg1, %0[%arg6, %arg7] {in_bounds = [true]}			vector.transfer_write %arg1, %0[%arg6, %arg7] {in_bounds = [true, true]}
	: vector<4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>			: vector<4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>
	return			return
	}			}
	// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>			// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// CHECK: func @fold_vector_transfer_write_with_rank_reduced_subview			// CHECK: func @fold_vector_transfer_write_with_rank_reduced_subview
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: vector<4xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: vector<4xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG4:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG4:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG5:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG5:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG6:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG6:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG7:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG7:[a-zA-Z0-9]+]]: index
	// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
	// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]			// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]
	// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]			// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]
	// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[C0]], %[[IDX0]], %[[IDX1]]] {in_bounds = [true]} : vector<4xf32>, memref<?x?x?xf32			// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[C0]], %[[IDX0]], %[[IDX1]]] {in_bounds = [true, true, true]} : vector<4xf32>, memref<?x?x?xf32

	// -----			// -----

	func.func @fold_vector_transfer_write_with_inner_rank_reduced_subview(			func.func @fold_vector_transfer_write_with_inner_rank_reduced_subview(
	%arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,			%arg0 : memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>,
	%arg1 : vector<4xf32>, %arg2: index, %arg3 : index, %arg4 : index,			%arg1 : vector<4xf32>, %arg2: index, %arg3 : index, %arg4 : index,
	%arg5: index, %arg6 : index, %arg7 : index) {			%arg5: index, %arg6 : index, %arg7 : index) {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%0 = memref.subview %arg0[%arg2, %arg3, 0] [%arg4, %arg5, 1] [1, 1, 1]			%0 = memref.subview %arg0[%arg2, %arg3, 0] [%arg4, %arg5, 1] [1, 1, 1]
	: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to			: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>> to
	memref<?x?xf32, strided<[?, ?], offset: ?>>			memref<?x?xf32, strided<[?, ?], offset: ?>>
	vector.transfer_write %arg1, %0[%arg6, %arg7] {in_bounds = [true]}			vector.transfer_write %arg1, %0[%arg6, %arg7] {in_bounds = [true, true]}
	: vector<4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>			: vector<4xf32>, memref<?x?xf32, strided<[?, ?], offset: ?>>
	return			return
	}			}
	// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>			// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2) -> (d1)>			// CHECK-DAG: #[[MAP2:.+]] = affine_map<(d0, d1, d2) -> (d1)>
	// CHECK: func @fold_vector_transfer_write_with_inner_rank_reduced_subview			// CHECK: func @fold_vector_transfer_write_with_inner_rank_reduced_subview
	// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: memref<?x?x?xf32, strided<[?, ?, ?], offset: ?>>
	// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: vector<4xf32>			// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: vector<4xf32>
	// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG4:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG4:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG5:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG5:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG6:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG6:[a-zA-Z0-9]+]]: index
	// CHECK-SAME: %[[ARG7:[a-zA-Z0-9]+]]: index			// CHECK-SAME: %[[ARG7:[a-zA-Z0-9]+]]: index
	// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
	// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]			// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]
	// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]			// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]
	// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[IDX0]], %[[IDX1]], %[[C0]]]			// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[IDX0]], %[[IDX1]], %[[C0]]]
	// CHECK-SAME: {in_bounds = [true], permutation_map = #[[MAP2]]} : vector<4xf32>, memref<?x?x?xf32			// CHECK-SAME: {in_bounds = [true, true, true], permutation_map = #[[MAP2]]} : vector<4xf32>, memref<?x?x?xf32

	// -----			// -----

	// Test with affine.load/store ops. We only do a basic test here since the			// Test with affine.load/store ops. We only do a basic test here since the
	// logic is identical to that with memref.load/store ops. The same affine.apply			// logic is identical to that with memref.load/store ops. The same affine.apply
	// ops would be generated.			// ops would be generated.

	// CHECK-LABEL: func @fold_static_stride_subview_with_affine_load_store			// CHECK-LABEL: func @fold_static_stride_subview_with_affine_load_store
	▲ Show 20 Lines • Show All 365 Lines • Show Last 20 Lines

mlir/test/Dialect/Tensor/fold-tensor-subset-ops-into-vector-transfers.mlir

Show All 24 Lines	func.func @transfer_read_of_extract_slice(%t : tensor<?x?xf32>, %s1 : index, %s2 : index) -> vector<5x6xf32> {
%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<10x?xf32>, vector<5x6xf32>		%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<10x?xf32>, vector<5x6xf32>
return %1 : vector<5x6xf32>		return %1 : vector<5x6xf32>
}		}

// CHECK-LABEL: func @transfer_read_of_extract_slice_1d(		// CHECK-LABEL: func @transfer_read_of_extract_slice_1d(
// CHECK-SAME: %[[t:.]]: tensor<?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index		// CHECK-SAME: %[[t:.]]: tensor<?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index
// CHECK-DAG: %[[c8:.*]] = arith.constant 8 : index		// CHECK-DAG: %[[c8:.*]] = arith.constant 8 : index
// CHECK: %[[add:.*]] = affine.apply #[[$map]]()[%[[s1]]]		// CHECK: %[[add:.*]] = affine.apply #[[$map]]()[%[[s1]]]
// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c8]], %[[add]]], %{{.}} {in_bounds = [true]} : tensor<?x?xf32>, vector<6xf32>		// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c8]], %[[add]]], %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<6xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
func.func @transfer_read_of_extract_slice_1d(%t : tensor<?x?xf32>, %s1 : index, %s2 : index) -> vector<6xf32> {		func.func @transfer_read_of_extract_slice_1d(%t : tensor<?x?xf32>, %s1 : index, %s2 : index) -> vector<6xf32> {
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%0 = tensor.extract_slice %t[5, %s1] [10, %s2] [1, 1] : tensor<?x?xf32> to tensor<10x?xf32>		%0 = tensor.extract_slice %t[5, %s1] [10, %s2] [1, 1] : tensor<?x?xf32> to tensor<10x?xf32>
%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true]} : tensor<10x?xf32>, vector<6xf32>		%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<10x?xf32>, vector<6xf32>
return %1 : vector<6xf32>		return %1 : vector<6xf32>
}		}

// CHECK-LABEL: func @transfer_read_of_extract_slice_rank_reducing(		// CHECK-LABEL: func @transfer_read_of_extract_slice_rank_reducing(
// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index		// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index
// CHECK-DAG: %[[c5:.*]] = arith.constant 5 : index		// CHECK-DAG: %[[c5:.*]] = arith.constant 5 : index
// CHECK-DAG: %[[c10:.*]] = arith.constant 10 : index		// CHECK-DAG: %[[c10:.*]] = arith.constant 10 : index
// CHECK: %[[add:.*]] = affine.apply #[[$map1]]()[%[[s1]]]		// CHECK: %[[add:.*]] = affine.apply #[[$map1]]()[%[[s1]]]
// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c5]], %[[add]], %[[c10]]], %{{.}} {in_bounds = [true, true]} : tensor<?x?x?xf32>, vector<5x6xf32>		// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c5]], %[[add]], %[[c10]]], %{{.}} {in_bounds = [true, true, true]} : tensor<?x?x?xf32>, vector<5x6xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
func.func @transfer_read_of_extract_slice_rank_reducing(%t : tensor<?x?x?xf32>, %s1 : index, %s2 : index) -> vector<5x6xf32> {		func.func @transfer_read_of_extract_slice_rank_reducing(%t : tensor<?x?x?xf32>, %s1 : index, %s2 : index) -> vector<5x6xf32> {
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%0 = tensor.extract_slice %t[5, %s1, 6] [1, %s2, 12] [1, 1, 1] : tensor<?x?x?xf32> to tensor<?x12xf32>		%0 = tensor.extract_slice %t[5, %s1, 6] [1, %s2, 12] [1, 1, 1] : tensor<?x?x?xf32> to tensor<?x12xf32>
%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<?x12xf32>, vector<5x6xf32>		%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<?x12xf32>, vector<5x6xf32>
return %1 : vector<5x6xf32>		return %1 : vector<5x6xf32>
}		}

// CHECK-LABEL: func @transfer_read_of_extract_slice_non_leading_rank_reduction(		// CHECK-LABEL: func @transfer_read_of_extract_slice_non_leading_rank_reduction(
// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index		// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index
// CHECK-DAG: %[[c8:.*]] = arith.constant 8 : index		// CHECK-DAG: %[[c8:.*]] = arith.constant 8 : index
// CHECK-DAG: %[[c10:.*]] = arith.constant 10 : index		// CHECK-DAG: %[[c10:.*]] = arith.constant 10 : index
// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c8]], %[[s1]], %[[c10]]], %{{.}} {in_bounds = [true, true], permutation_map = #[[$map2]]} : tensor<?x?x?xf32>, vector<5x6xf32>		// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c8]], %[[s1]], %[[c10]]], %{{.}} {in_bounds = [true, true, true], permutation_map = #[[$map2]]} : tensor<?x?x?xf32>, vector<5x6xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
func.func @transfer_read_of_extract_slice_non_leading_rank_reduction(%t : tensor<?x?x?xf32>, %s1 : index, %s2 : index) -> vector<5x6xf32> {		func.func @transfer_read_of_extract_slice_non_leading_rank_reduction(%t : tensor<?x?x?xf32>, %s1 : index, %s2 : index) -> vector<5x6xf32> {
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%0 = tensor.extract_slice %t[5, %s1, 6] [%s2, 1, 12] [1, 1, 1] : tensor<?x?x?xf32> to tensor<?x12xf32>		%0 = tensor.extract_slice %t[5, %s1, 6] [%s2, 1, 12] [1, 1, 1] : tensor<?x?x?xf32> to tensor<?x12xf32>
%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<?x12xf32>, vector<5x6xf32>		%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<?x12xf32>, vector<5x6xf32>
return %1 : vector<5x6xf32>		return %1 : vector<5x6xf32>
Show All 10 Lines	func.func @insert_slice_of_transfer_write(%t1 : tensor<?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x12xf32> {
%1 = tensor.insert_slice %0 into %t1[3, %s] [5, 6] [1, 1] : tensor<5x6xf32> into tensor<?x12xf32>		%1 = tensor.insert_slice %0 into %t1[3, %s] [5, 6] [1, 1] : tensor<5x6xf32> into tensor<?x12xf32>
return %1 : tensor<?x12xf32>		return %1 : tensor<?x12xf32>
}		}

// CHECK-LABEL: func @insert_slice_of_transfer_write_non_leading_rank_reduction(		// CHECK-LABEL: func @insert_slice_of_transfer_write_non_leading_rank_reduction(
// CHECK-SAME: %[[t1:.]]: tensor<?x?x12xf32>, %[[v:.]]: vector<5x6xf32>, %[[s:.*]]: index		// CHECK-SAME: %[[t1:.]]: tensor<?x?x12xf32>, %[[v:.]]: vector<5x6xf32>, %[[s:.*]]: index
// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index
// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index		// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index
// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true], permutation_map = #[[$map2]]} : vector<5x6xf32>, tensor<?x?x12xf32>		// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true, true], permutation_map = #[[$map2]]} : vector<5x6xf32>, tensor<?x?x12xf32>
func.func @insert_slice_of_transfer_write_non_leading_rank_reduction(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {		func.func @insert_slice_of_transfer_write_non_leading_rank_reduction(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>		%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>
%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [5, 1, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>		%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [5, 1, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>
return %1 : tensor<?x?x12xf32>		return %1 : tensor<?x?x12xf32>
}		}

// CHECK-LABEL: func @insert_slice_of_transfer_write_rank_extending(		// CHECK-LABEL: func @insert_slice_of_transfer_write_rank_extending(
// CHECK-SAME: %[[t1:.]]: tensor<?x?x12xf32>, %[[v:.]]: vector<5x6xf32>, %[[s:.*]]: index		// CHECK-SAME: %[[t1:.]]: tensor<?x?x12xf32>, %[[v:.]]: vector<5x6xf32>, %[[s:.*]]: index
// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index
// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index		// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index
// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>		// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
func.func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {		func.func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>		%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>
%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>		%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>
return %1 : tensor<?x?x12xf32>		return %1 : tensor<?x?x12xf32>
}		}

mlir/test/Dialect/Tensor/fold-tensor-subset-ops.mlir

// RUN: mlir-opt -fold-tensor-subset-ops -split-input-file --allow-unregistered-dialect %s \| FileCheck %s		// RUN: mlir-opt -fold-tensor-subset-ops -split-input-file --allow-unregistered-dialect %s \| FileCheck %s

func.func @fold_vector_transfer_read_with_rank_reduced_extract_slice(		func.func @fold_vector_transfer_read_with_rank_reduced_extract_slice(
%arg0 : tensor<?x?x?xf32>,		%arg0 : tensor<?x?x?xf32>,
%arg1: index, %arg2 : index, %arg3 : index, %arg4: index, %arg5 : index,		%arg1: index, %arg2 : index, %arg3 : index, %arg4: index, %arg5 : index,
%arg6 : index) -> vector<4xf32> {		%arg6 : index) -> vector<4xf32> {
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%0 = tensor.extract_slice %arg0[0, %arg1, %arg2] [1, %arg3, %arg4] [1, 1, 1]		%0 = tensor.extract_slice %arg0[0, %arg1, %arg2] [1, %arg3, %arg4] [1, 1, 1]
: tensor<?x?x?xf32> to		: tensor<?x?x?xf32> to
tensor<?x?xf32>		tensor<?x?xf32>
%1 = vector.transfer_read %0[%arg5, %arg6], %cst {in_bounds = [true]}		%1 = vector.transfer_read %0[%arg5, %arg6], %cst {in_bounds = [true, true]}
: tensor<?x?xf32>, vector<4xf32>		: tensor<?x?xf32>, vector<4xf32>
return %1 : vector<4xf32>		return %1 : vector<4xf32>
}		}
// CHECK-DAG: #[[$MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>		// CHECK-DAG: #[[$MAP1:.+]] = affine_map<()[s0, s1] -> (s0 + s1)>
// CHECK: func @fold_vector_transfer_read_with_rank_reduced_extract_slice		// CHECK: func @fold_vector_transfer_read_with_rank_reduced_extract_slice
// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?x?xf32>		// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: tensor<?x?x?xf32>
// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: index		// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: index
// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index		// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index
Show All 15 Lines	func.func @transfer_read_from_rank_reducing_extract_slice_failure(
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%f0 = arith.constant 0.000000e+00 : f32		%f0 = arith.constant 0.000000e+00 : f32

// Can't fold this atm since we don' emit the proper vector.extract_strided_slice.		// Can't fold this atm since we don' emit the proper vector.extract_strided_slice.
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
%0 = tensor.extract_slice %src[0, %i1, %i2, %i3] [1, 4, 1, 4] [2, 3, 4, 5] : tensor<1x8x8x8xf32> to tensor<1x4x4xf32>		%0 = tensor.extract_slice %src[0, %i1, %i2, %i3] [1, 4, 1, 4] [2, 3, 4, 5] : tensor<1x8x8x8xf32> to tensor<1x4x4xf32>
%1 = vector.transfer_read %0[%c1, %i4, %c2], %f0 {in_bounds = [true]} : tensor<1x4x4xf32>, vector<4xf32>		%1 = vector.transfer_read %0[%c1, %i4, %c2], %f0 {in_bounds = [true, true, true]} : tensor<1x4x4xf32>, vector<4xf32>
return %1 : vector<4xf32>		return %1 : vector<4xf32>
}		}

// -----		// -----

// CHECK-DAG: #[[$ADD_4:.+]] = affine_map<()[s0] -> (s0 + 4)>		// CHECK-DAG: #[[$ADD_4:.+]] = affine_map<()[s0] -> (s0 + 4)>

// CHECK-LABEL: func @transfer_read_of_extract_slice(		// CHECK-LABEL: func @transfer_read_of_extract_slice(
Show All 30 Lines
// -----		// -----

// CHECK-DAG: #[[$ADD_4:.+]] = affine_map<()[s0] -> (s0 + 4)>		// CHECK-DAG: #[[$ADD_4:.+]] = affine_map<()[s0] -> (s0 + 4)>

// CHECK-LABEL: func @transfer_read_of_extract_slice(		// CHECK-LABEL: func @transfer_read_of_extract_slice(
// CHECK-SAME: %[[t:.]]: tensor<?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index		// CHECK-SAME: %[[t:.]]: tensor<?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index
// CHECK-DAG: %[[c8:.*]] = arith.constant 8 : index		// CHECK-DAG: %[[c8:.*]] = arith.constant 8 : index
// CHECK: %[[add:.*]] = affine.apply #[[$ADD_4]]()[%[[s1]]]		// CHECK: %[[add:.*]] = affine.apply #[[$ADD_4]]()[%[[s1]]]
// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c8]], %[[add]]], %{{.}} {in_bounds = [true]} : tensor<?x?xf32>, vector<6xf32>		// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c8]], %[[add]]], %{{.}} {in_bounds = [true, true]} : tensor<?x?xf32>, vector<6xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
func.func @transfer_read_of_extract_slice(%t : tensor<?x?xf32>, %s1 : index, %s2 : index) -> vector<6xf32> {		func.func @transfer_read_of_extract_slice(%t : tensor<?x?xf32>, %s1 : index, %s2 : index) -> vector<6xf32> {
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%0 = tensor.extract_slice %t[5, %s1] [10, %s2] [1, 1] : tensor<?x?xf32> to tensor<10x?xf32>		%0 = tensor.extract_slice %t[5, %s1] [10, %s2] [1, 1] : tensor<?x?xf32> to tensor<10x?xf32>
%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true]} : tensor<10x?xf32>, vector<6xf32>		%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<10x?xf32>, vector<6xf32>
return %1 : vector<6xf32>		return %1 : vector<6xf32>
}		}

// -----		// -----

// CHECK-DAG: #[[$ADD_3:.+]] = affine_map<()[s0] -> (s0 + 3)>		// CHECK-DAG: #[[$ADD_3:.+]] = affine_map<()[s0] -> (s0 + 3)>

// CHECK-LABEL: func @transfer_read_of_extract_slice_rank_reducing(		// CHECK-LABEL: func @transfer_read_of_extract_slice_rank_reducing(
// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index		// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[s1:.]]: index, %[[s2:.*]]: index
// CHECK-DAG: %[[c5:.*]] = arith.constant 5 : index		// CHECK-DAG: %[[c5:.*]] = arith.constant 5 : index
// CHECK-DAG: %[[c10:.*]] = arith.constant 10 : index		// CHECK-DAG: %[[c10:.*]] = arith.constant 10 : index
// CHECK: %[[add:.*]] = affine.apply #[[$ADD_3]]()[%[[s1]]]		// CHECK: %[[add:.*]] = affine.apply #[[$ADD_3]]()[%[[s1]]]
// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c5]], %[[add]], %[[c10]]], %{{.}} {in_bounds = [true, true]} : tensor<?x?x?xf32>, vector<5x6xf32>		// CHECK: %[[r:.]] = vector.transfer_read %[[t]][%[[c5]], %[[add]], %[[c10]]], %{{.}} {in_bounds = [true, true, true]} : tensor<?x?x?xf32>, vector<5x6xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
func.func @transfer_read_of_extract_slice_rank_reducing(%t : tensor<?x?x?xf32>, %s1 : index, %s2 : index) -> vector<5x6xf32> {		func.func @transfer_read_of_extract_slice_rank_reducing(%t : tensor<?x?x?xf32>, %s1 : index, %s2 : index) -> vector<5x6xf32> {
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
%0 = tensor.extract_slice %t[5, %s1, 6] [1, %s2, 12] [1, 1, 1] : tensor<?x?x?xf32> to tensor<?x12xf32>		%0 = tensor.extract_slice %t[5, %s1, 6] [1, %s2, 12] [1, 1, 1] : tensor<?x?x?xf32> to tensor<?x12xf32>
%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<?x12xf32>, vector<5x6xf32>		%1 = vector.transfer_read %0[%c3, %c4], %cst {in_bounds = [true, true]} : tensor<?x12xf32>, vector<5x6xf32>
return %1 : vector<5x6xf32>		return %1 : vector<5x6xf32>
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	func.func @fold_vector_transfer_write_with_rank_reduced_insert_slice(
%arg5: index, %arg6 : index, %arg7 : index,		%arg5: index, %arg6 : index, %arg7 : index,
%st : tensor<?x?xf32>) -> tensor<?x?x?xf32> {		%st : tensor<?x?xf32>) -> tensor<?x?x?xf32> {
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32

// CHECK-NOT: insert_slice		// CHECK-NOT: insert_slice
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]		// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]
// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]		// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]
// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[C0]], %[[IDX0]], %[[IDX1]]] {in_bounds = [true]} : vector<4xf32>, tensor<?x?x?xf32		// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[C0]], %[[IDX0]], %[[IDX1]]] {in_bounds = [true, true, true]} : vector<4xf32>, tensor<?x?x?xf32
%0 = vector.transfer_write %arg1, %st[%arg6, %arg7] {in_bounds = [true]}		%0 = vector.transfer_write %arg1, %st[%arg6, %arg7] {in_bounds = [true, true]}
: vector<4xf32>, tensor<?x?xf32>		: vector<4xf32>, tensor<?x?xf32>
%1 = tensor.insert_slice %0 into %arg0[0, %arg2, %arg3] [1, %arg4, %arg5] [1, 1, 1]		%1 = tensor.insert_slice %0 into %arg0[0, %arg2, %arg3] [1, %arg4, %arg5] [1, 1, 1]
: tensor<?x?xf32> into tensor<?x?x?xf32>		: tensor<?x?xf32> into tensor<?x?x?xf32>
return %1 : tensor<?x?x?xf32>		return %1 : tensor<?x?x?xf32>
}		}

// -----		// -----

Show All 16 Lines	func.func @fold_vector_transfer_write_with_inner_rank_reduced_insert_slice(
%st : tensor<?x?xf32>) -> tensor<?x?x?xf32> {		%st : tensor<?x?xf32>) -> tensor<?x?x?xf32> {
%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32

// CHECK-NOT: insert_slice		// CHECK-NOT: insert_slice
// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index		// CHECK-DAG: %[[C0:.+]] = arith.constant 0 : index
// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]		// CHECK-DAG: %[[IDX0:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG6]]]
// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]		// CHECK-DAG: %[[IDX1:.+]] = affine.apply #[[MAP1]]()[%[[ARG3]], %[[ARG7]]]
// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[IDX0]], %[[IDX1]], %[[C0]]]		// CHECK-DAG: vector.transfer_write %[[ARG1]], %[[ARG0]][%[[IDX0]], %[[IDX1]], %[[C0]]]
// CHECK-SAME: {in_bounds = [true], permutation_map = #[[MAP2]]} : vector<4xf32>, tensor<?x?x?xf32		// CHECK-SAME: {in_bounds = [true, true, true], permutation_map = #[[MAP2]]} : vector<4xf32>, tensor<?x?x?xf32
%0 = vector.transfer_write %arg1, %st[%arg6, %arg7] {in_bounds = [true]}		%0 = vector.transfer_write %arg1, %st[%arg6, %arg7] {in_bounds = [true, true]}
: vector<4xf32>, tensor<?x?xf32>		: vector<4xf32>, tensor<?x?xf32>
%1 = tensor.insert_slice %0 into %arg0[%arg2, %arg3, 0] [%arg4, %arg5, 1] [1, 1, 1]		%1 = tensor.insert_slice %0 into %arg0[%arg2, %arg3, 0] [%arg4, %arg5, 1] [1, 1, 1]
: tensor<?x?xf32> into tensor<?x?x?xf32>		: tensor<?x?xf32> into tensor<?x?x?xf32>
return %1 : tensor<?x?x?xf32>		return %1 : tensor<?x?x?xf32>
}		}

// -----		// -----

Show All 21 Lines	func.func @insert_slice_of_transfer_write_swappy_rank_extending(
%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>,		%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>,
%s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {		%s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index

// CHECK-NOT: insert_slice		// CHECK-NOT: insert_slice
// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index
// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index		// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index
// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]]		// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]]
// CHECK-SAME: {in_bounds = [true, true], permutation_map = #[[$d0d2]]} : vector<5x6xf32>, tensor<?x?x12xf32>		// CHECK-SAME: {in_bounds = [true, true, true], permutation_map = #[[$d0d2]]} : vector<5x6xf32>, tensor<?x?x12xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>		%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>
%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [5, 1, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>		%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [5, 1, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>
return %1 : tensor<?x?x12xf32>		return %1 : tensor<?x?x12xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @insert_slice_of_transfer_write_rank_extending(		// CHECK-LABEL: func @insert_slice_of_transfer_write_rank_extending(
// CHECK-SAME: %[[t1:.]]: tensor<?x?x12xf32>, %[[v:.]]: vector<5x6xf32>, %[[s:.*]]: index		// CHECK-SAME: %[[t1:.]]: tensor<?x?x12xf32>, %[[v:.]]: vector<5x6xf32>, %[[s:.*]]: index
// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index		// CHECK-DAG: %[[c3:.*]] = arith.constant 3 : index
// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index		// CHECK-DAG: %[[c4:.*]] = arith.constant 4 : index
// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>		// CHECK: %[[r:.*]] = vector.transfer_write %[[v]], %[[t1]][%[[c4]], %[[c3]], %[[s]]] {in_bounds = [true, true, true]} : vector<5x6xf32>, tensor<?x?x12xf32>
// CHECK: return %[[r]]		// CHECK: return %[[r]]
func.func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {		func.func @insert_slice_of_transfer_write_rank_extending(%t1 : tensor<?x?x12xf32>, %v : vector<5x6xf32>, %s : index, %t2 : tensor<5x6xf32>) -> tensor<?x?x12xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>		%0 = vector.transfer_write %v, %t2[%c0, %c0] {in_bounds = [true, true]} : vector<5x6xf32>, tensor<5x6xf32>
%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>		%1 = tensor.insert_slice %0 into %t1[4, 3, %s] [1, 5, 6] [1, 1, 1] : tensor<5x6xf32> into tensor<?x?x12xf32>
return %1 : tensor<?x?x12xf32>		return %1 : tensor<?x?x12xf32>
}		}

▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/canonicalize.mlir

Show First 20 Lines • Show All 721 Lines • ▼ Show 20 Lines	func.func @canonicalize_broadcast_shapecast(%arg0: vector<3xf32>) -> vector<8x3xf32> {
%0 = vector.broadcast %arg0 : vector<3xf32> to vector<2x4x3xf32>		%0 = vector.broadcast %arg0 : vector<3xf32> to vector<2x4x3xf32>
%1 = vector.shape_cast %0 : vector<2x4x3xf32> to vector<8x3xf32>		%1 = vector.shape_cast %0 : vector<2x4x3xf32> to vector<8x3xf32>
return %1 : vector<8x3xf32>		return %1 : vector<8x3xf32>
}		}

// -----		// -----

// CHECK-LABEL: fold_vector_transfers		// CHECK-LABEL: fold_vector_transfers
func.func @fold_vector_transfers(%A: memref<?x8xf32>) -> (vector<4x8xf32>, vector<4x9xf32>) {		func.func @fold_vector_transfers(%A: memref<?x8xf32>,
		%B: memref<9x8xf32>,
		%idx: index)
		-> (vector<4x8xf32>, vector<4x9xf32>, vector<1x2xf32>, vector<5xf32>)
		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32

// CHECK: vector.transfer_read %{{.*}} {in_bounds = [false, true]}		// CHECK: vector.transfer_read %{{.*}} {in_bounds = [true, true]}
%1 = vector.transfer_read %A[%c0, %c0], %f0 : memref<?x8xf32>, vector<4x8xf32>		%1 = vector.transfer_read %A[%c0, %c0], %f0 {in_bounds = [true, true]} : memref<?x8xf32>, vector<4x8xf32>

// CHECK: vector.transfer_write %{{.*}} {in_bounds = [false, true]}		// CHECK: vector.transfer_write %{{.*}} {in_bounds = [false, true]}
vector.transfer_write %1, %A[%c0, %c0] : vector<4x8xf32>, memref<?x8xf32>		vector.transfer_write %1, %A[%c0, %c0] : vector<4x8xf32>, memref<?x8xf32>

// Both dims may be out-of-bounds, attribute is elided.		// Both dims may be out-of-bounds, attribute is elided.
// CHECK: vector.transfer_read %{{.*}}		// CHECK: vector.transfer_read %{{.*}}
// CHECK-NOT: in_bounds		// CHECK-NOT: in_bounds
%2 = vector.transfer_read %A[%c0, %c0], %f0 : memref<?x8xf32>, vector<4x9xf32>		%2 = vector.transfer_read %A[%c0, %c0], %f0 : memref<?x8xf32>, vector<4x9xf32>

// Both dims may be out-of-bounds, attribute is elided.		// Both dims may be out-of-bounds, attribute is elided.
// CHECK: vector.transfer_write %{{.*}}		// CHECK: vector.transfer_write %{{.*}}
// CHECK-NOT: in_bounds		// CHECK-NOT: in_bounds
vector.transfer_write %2, %A[%c0, %c0] : vector<4x9xf32>, memref<?x8xf32>		vector.transfer_write %2, %A[%c0, %c0] : vector<4x9xf32>, memref<?x8xf32>

		// Cannot infer in_bounds for non-constant offsets or dynamic dim sizes.
		// CHECK: vector.transfer_read {{.*}}
		// CHECK-NOT: in_bounds
		%3 = vector.transfer_read %A[%c0, %idx], %f0 : memref<?x8xf32>, vector<1x2xf32>

		// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]}
		%4 = vector.transfer_read %B[%c0, %c0], %f0 {in_bounds = [true, false]} : memref<9x8xf32>, vector<5xf32>

// CHECK: return		// CHECK: return
return %1, %2 : vector<4x8xf32>, vector<4x9xf32>		return %1, %2, %3, %4
		: vector<4x8xf32>, vector<4x9xf32>, vector<1x2xf32>, vector<5xf32>
}		}

// -----		// -----

// CHECK-LABEL: bitcast_folding		// CHECK-LABEL: bitcast_folding
// CHECK-SAME: %[[A:.*]]: vector<4x8xf32>		// CHECK-SAME: %[[A:.*]]: vector<4x8xf32>
// CHECK-SAME: %[[B:.*]]: vector<2xi32>		// CHECK-SAME: %[[B:.*]]: vector<2xi32>
// CHECK: return %[[A]], %[[B]] : vector<4x8xf32>, vector<2xi32>		// CHECK: return %[[A]], %[[B]] : vector<4x8xf32>, vector<2xi32>
▲ Show 20 Lines • Show All 365 Lines • ▼ Show 20 Lines
// CHECK: %[[T:.*]] = vector.transpose %[[B]], [1, 2, 0] : vector<6x4x2xf32> to vector<4x2x6xf32>		// CHECK: %[[T:.*]] = vector.transpose %[[B]], [1, 2, 0] : vector<6x4x2xf32> to vector<4x2x6xf32>
// CHECK: return %[[T]] : vector<4x2x6xf32>		// CHECK: return %[[T]] : vector<4x2x6xf32>
func.func @store_to_load_tensor_broadcast(%arg0 : tensor<4x4xf32>,		func.func @store_to_load_tensor_broadcast(%arg0 : tensor<4x4xf32>,
%v0 : vector<4x2xf32>) -> vector<4x2x6xf32> {		%v0 : vector<4x2xf32>) -> vector<4x2x6xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cf0 = arith.constant 0.0 : f32		%cf0 = arith.constant 0.0 : f32
%w0 = vector.transfer_write %v0, %arg0[%c0, %c0] {in_bounds = [true, true]} :		%w0 = vector.transfer_write %v0, %arg0[%c0, %c0] {in_bounds = [true, true]} :
vector<4x2xf32>, tensor<4x4xf32>		vector<4x2xf32>, tensor<4x4xf32>
%0 = vector.transfer_read %w0[%c0, %c0], %cf0 {in_bounds = [true, true, true],		%0 = vector.transfer_read %w0[%c0, %c0], %cf0 {in_bounds = [true, true],
permutation_map = affine_map<(d0, d1) -> (d0, d1, 0)>} :		permutation_map = affine_map<(d0, d1) -> (d0, d1, 0)>} :
tensor<4x4xf32>, vector<4x2x6xf32>		tensor<4x4xf32>, vector<4x2x6xf32>
return %0 : vector<4x2x6xf32>		return %0 : vector<4x2x6xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @store_to_load_tensor_perm_broadcast		// CHECK-LABEL: func @store_to_load_tensor_perm_broadcast
// CHECK-SAME: (%[[ARG:.]]: tensor<4x4x4xf32>, %[[V0:.]]: vector<4x1xf32>)		// CHECK-SAME: (%[[ARG:.]]: tensor<4x4x4xf32>, %[[V0:.]]: vector<4x1xf32>)
// CHECK: %[[B:.*]] = vector.broadcast %[[V0]] : vector<4x1xf32> to vector<100x5x4x1xf32>		// CHECK: %[[B:.*]] = vector.broadcast %[[V0]] : vector<4x1xf32> to vector<100x5x4x1xf32>
// CHECK: %[[T:.*]] = vector.transpose %[[B]], [3, 0, 2, 1] : vector<100x5x4x1xf32> to vector<1x100x4x5xf32>		// CHECK: %[[T:.*]] = vector.transpose %[[B]], [3, 0, 2, 1] : vector<100x5x4x1xf32> to vector<1x100x4x5xf32>
// CHECK: return %[[T]] : vector<1x100x4x5xf32>		// CHECK: return %[[T]] : vector<1x100x4x5xf32>
func.func @store_to_load_tensor_perm_broadcast(%arg0 : tensor<4x4x4xf32>,		func.func @store_to_load_tensor_perm_broadcast(%arg0 : tensor<4x4x4xf32>,
%v0 : vector<4x1xf32>) -> vector<1x100x4x5xf32> {		%v0 : vector<4x1xf32>) -> vector<1x100x4x5xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cf0 = arith.constant 0.0 : f32		%cf0 = arith.constant 0.0 : f32
%w0 = vector.transfer_write %v0, %arg0[%c0, %c0, %c0] {in_bounds = [true, true],		%w0 = vector.transfer_write %v0, %arg0[%c0, %c0, %c0] {in_bounds = [true, true, true],
permutation_map = affine_map<(d0, d1, d2) -> (d2, d1)>} :		permutation_map = affine_map<(d0, d1, d2) -> (d2, d1)>} :
vector<4x1xf32>, tensor<4x4x4xf32>		vector<4x1xf32>, tensor<4x4x4xf32>
%0 = vector.transfer_read %w0[%c0, %c0, %c0], %cf0 {in_bounds = [true, true, true, true],		%0 = vector.transfer_read %w0[%c0, %c0, %c0], %cf0 {in_bounds = [true, true, true],
permutation_map = affine_map<(d0, d1, d2) -> (d1, 0, d2, 0)>} :		permutation_map = affine_map<(d0, d1, d2) -> (d1, 0, d2, 0)>} :
tensor<4x4x4xf32>, vector<1x100x4x5xf32>		tensor<4x4x4xf32>, vector<1x100x4x5xf32>
return %0 : vector<1x100x4x5xf32>		return %0 : vector<1x100x4x5xf32>
}		}

// -----		// -----


▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	func.func @swap_extract_slice_transfer_write(%arg0 : vector<8x4xf32>,
%iv : index, %sz : index) -> tensor<64x64xf32> {		%iv : index, %sz : index) -> tensor<64x64xf32> {
// CHECK: %[[C0:.*]] = arith.constant 0 : index		// CHECK: %[[C0:.*]] = arith.constant 0 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index

// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ITER_ARG]]		// CHECK: %[[T0:.*]] = tensor.extract_slice %[[ITER_ARG]]
// CHECK-SAME: [%[[IV]], 16] [%[[SZ]], 8]		// CHECK-SAME: [%[[IV]], 16] [%[[SZ]], 8]
// CHECK: %[[T1:.*]] = vector.transfer_write %[[VEC]]		// CHECK: %[[T1:.*]] = vector.transfer_write %[[VEC]]
// CHECK-SAME: %[[T0]][%[[C0]], %[[C0]]]		// CHECK-SAME: %[[T0]][%[[C0]], %[[C0]]]
// CHECK-SAME: in_bounds = [true, false]		// CHECK-SAME: in_bounds = [false, true]
// CHECK-SAME: permutation_map = #[[$MAP]]		// CHECK-SAME: permutation_map = #[[$MAP]]
// CHECK: %[[T2:.*]] = tensor.insert_slice %[[T1]] into %[[ITER_ARG]]		// CHECK: %[[T2:.*]] = tensor.insert_slice %[[T1]] into %[[ITER_ARG]]
// CHECK-SAME: [%[[IV]], 16] [%[[SZ]], 8]		// CHECK-SAME: [%[[IV]], 16] [%[[SZ]], 8]
%0 = vector.transfer_write %arg0, %arg1[%c0, %c0] {in_bounds = [true, true], permutation_map = affine_map<(d0, d1) -> (d1, d0)>} : vector<8x4xf32>, tensor<4x8xf32>		%0 = vector.transfer_write %arg0, %arg1[%c0, %c0] {in_bounds = [true, true], permutation_map = affine_map<(d0, d1) -> (d1, d0)>} : vector<8x4xf32>, tensor<4x8xf32>
%1 = tensor.extract_slice %0[0, 0] [%sz, 8] [1, 1] : tensor<4x8xf32> to tensor<?x8xf32>		%1 = tensor.extract_slice %0[0, 0] [%sz, 8] [1, 1] : tensor<4x8xf32> to tensor<?x8xf32>
%2 = tensor.insert_slice %1 into %arg2[%iv, 16] [%sz, 8] [1, 1] : tensor<?x8xf32> into tensor<64x64xf32>		%2 = tensor.insert_slice %1 into %arg2[%iv, 16] [%sz, 8] [1, 1] : tensor<?x8xf32> into tensor<64x64xf32>

// CHECK: return %[[T2]]		// CHECK: return %[[T2]]
▲ Show 20 Lines • Show All 841 Lines • ▼ Show 20 Lines

// CHECK-LABEL: func.func @transfer_read_from_rank_reducing_extract_slice		// CHECK-LABEL: func.func @transfer_read_from_rank_reducing_extract_slice
// CHECK: tensor.extract_slice		// CHECK: tensor.extract_slice
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
func.func @transfer_read_from_rank_reducing_extract_slice(%src: tensor<1x8x8x8xf32>, %i1: index, %i2: index, %i3: index, %i4: index) -> vector<4xf32> {		func.func @transfer_read_from_rank_reducing_extract_slice(%src: tensor<1x8x8x8xf32>, %i1: index, %i2: index, %i3: index, %i4: index) -> vector<4xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%f0 = arith.constant 0.000000e+00 : f32		%f0 = arith.constant 0.000000e+00 : f32
%0 = tensor.extract_slice %src[0, %i1, %i2, %i3] [1, 4, 1, 4] [1, 1, 1, 1] : tensor<1x8x8x8xf32> to tensor<1x4x4xf32>		%0 = tensor.extract_slice %src[0, %i1, %i2, %i3] [1, 4, 1, 4] [1, 1, 1, 1] : tensor<1x8x8x8xf32> to tensor<1x4x4xf32>
%1 = vector.transfer_read %0[%c0, %i4, %c0], %f0 {in_bounds = [true]} : tensor<1x4x4xf32>, vector<4xf32>		%1 = vector.transfer_read %0[%c0, %i4, %c0], %f0 {in_bounds = [true, true, true]} : tensor<1x4x4xf32>, vector<4xf32>
return %1 : vector<4xf32>		return %1 : vector<4xf32>
}		}

// -----		// -----

// CHECK-LABEL: func.func @extract_from_broadcast		// CHECK-LABEL: func.func @extract_from_broadcast
func.func @extract_from_broadcast(%src: vector<1x1x1xf32>) -> vector<1xf32> {		func.func @extract_from_broadcast(%src: vector<1x1x1xf32>) -> vector<1xf32> {
%0 = vector.broadcast %src : vector<1x1x1xf32> to vector<1x1x32x1xf32>		%0 = vector.broadcast %src : vector<1x1x1xf32> to vector<1x1x32x1xf32>
▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/invalid.mlir

	Show First 20 Lines • Show All 377 Lines • ▼ Show 20 Lines
	}			}

	// -----			// -----

	func.func @test_vector.transfer_read(%arg0: memref<?x?xf32>) {			func.func @test_vector.transfer_read(%arg0: memref<?x?xf32>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant 3.0 : f32			%cst = arith.constant 3.0 : f32
	// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}			// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
	%0 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0 + d1)>} : memref<?x?xf32>, vector<128xf32>			%0 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0 + d1)>, in_bounds = [true, true]} : memref<?x?xf32>, vector<128xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_read(%arg0: memref<?x?xf32>) {			func.func @test_vector.transfer_read(%arg0: memref<?x?xf32>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant 3.0 : f32			%cst = arith.constant 3.0 : f32
	// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}			// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
	%0 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0 + 1)>} : memref<?x?xf32>, vector<128xf32>			%0 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0 + 1)>, in_bounds = [true, true]} : memref<?x?xf32>, vector<128xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_read(%arg0: memref<?x?x?xf32>) {			func.func @test_vector.transfer_read(%arg0: memref<?x?x?xf32>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant 3.0 : f32			%cst = arith.constant 3.0 : f32
	// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}			// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}
	%0 = vector.transfer_read %arg0[%c3, %c3, %c3], %cst {permutation_map = affine_map<(d0, d1, d2)->(d0, d0)>} : memref<?x?x?xf32>, vector<3x7xf32>			%0 = vector.transfer_read %arg0[%c3, %c3, %c3], %cst {permutation_map = affine_map<(d0, d1, d2)->(d0, d0)>, in_bounds = [true, true, true]} : memref<?x?x?xf32>, vector<3x7xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_read(%arg0: memref<?x?x?xf32>) {			func.func @test_vector.transfer_read(%arg0: memref<?x?x?xf32>) {
	%c1 = arith.constant 1 : i1			%c1 = arith.constant 1 : i1
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant 3.0 : f32			%cst = arith.constant 3.0 : f32
	Show All 24 Lines
	}			}

	// -----			// -----

	func.func @test_vector.transfer_read(%arg0: memref<?x?xvector<2x3xf32>>) {			func.func @test_vector.transfer_read(%arg0: memref<?x?xvector<2x3xf32>>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%vf0 = vector.splat %f0 : vector<2x3xf32>			%vf0 = vector.splat %f0 : vector<2x3xf32>
	// expected-error@+1 {{ expects the optional in_bounds attr of same rank as permutation_map results: affine_map<(d0, d1) -> (d0, d1)>}}			// expected-error@+1 {{ expects the optional in_bounds attr of same rank as the source type: 2 vs inBounds of size: 1}}
	%0 = vector.transfer_read %arg0[%c3, %c3], %vf0 {in_bounds = [true], permutation_map = affine_map<(d0, d1)->(d0, d1)>} : memref<?x?xvector<2x3xf32>>, vector<1x1x2x3xf32>			%0 = vector.transfer_read %arg0[%c3, %c3], %vf0 {in_bounds = [true], permutation_map = affine_map<(d0, d1)->(d0, d1)>} : memref<?x?xvector<2x3xf32>>, vector<1x1x2x3xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_read(%arg0: memref<?x?xvector<2x3xf32>>) {			func.func @test_vector.transfer_read(%arg0: memref<?x?xvector<2x3xf32>>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%f0 = arith.constant 0.0 : f32			%f0 = arith.constant 0.0 : f32
	%vf0 = vector.splat %f0 : vector<2x3xf32>			%vf0 = vector.splat %f0 : vector<2x3xf32>
	// expected-error@+1 {{requires broadcast dimensions to be in-bounds}}
	%0 = vector.transfer_read %arg0[%c3, %c3], %vf0 {in_bounds = [false, true], permutation_map = affine_map<(d0, d1)->(0, d1)>} : memref<?x?xvector<2x3xf32>>, vector<1x1x2x3xf32>
	}

	// -----

	func.func @test_vector.transfer_read(%arg0: memref<?x?xvector<2x3xf32>>) {
	%c3 = arith.constant 3 : index
	%f0 = arith.constant 0.0 : f32
	%vf0 = vector.splat %f0 : vector<2x3xf32>
	%mask = vector.splat %c1 : vector<2x3xi1>			%mask = vector.splat %c1 : vector<2x3xi1>
	// expected-error@+1 {{does not support masks with vector element type}}			// expected-error@+1 {{does not support masks with vector element type}}
	%0 = vector.transfer_read %arg0[%c3, %c3], %vf0, %mask {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : memref<?x?xvector<2x3xf32>>, vector<1x1x2x3xf32>			%0 = vector.transfer_read %arg0[%c3, %c3], %vf0, %mask {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : memref<?x?xvector<2x3xf32>>, vector<1x1x2x3xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_write(%arg0: memref<?x?xf32>) {			func.func @test_vector.transfer_write(%arg0: memref<?x?xf32>) {
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	}			}

	// -----			// -----

	func.func @test_vector.transfer_write(%arg0: memref<?x?xf32>) {			func.func @test_vector.transfer_write(%arg0: memref<?x?xf32>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant dense<3.0> : vector<128 x f32>			%cst = arith.constant dense<3.0> : vector<128 x f32>
	// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}			// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
	vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0 + d1)>} : vector<128xf32>, memref<?x?xf32>			vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0 + d1)>, in_bounds = [true, true]} : vector<128xf32>, memref<?x?xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_write(%arg0: memref<?x?xf32>) {			func.func @test_vector.transfer_write(%arg0: memref<?x?xf32>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant dense<3.0> : vector<128 x f32>			%cst = arith.constant dense<3.0> : vector<128 x f32>
	// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}			// expected-error@+1 {{requires a projected permutation_map (at most one dim or the zero constant can appear in each result)}}
	vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0 + 1)>} : vector<128xf32>, memref<?x?xf32>			vector.transfer_write %cst, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0 + 1)>, in_bounds = [true, true]} : vector<128xf32>, memref<?x?xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_write(%arg0: memref<?x?x?xf32>) {			func.func @test_vector.transfer_write(%arg0: memref<?x?x?xf32>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant dense<3.0> : vector<3 x 7 x f32>			%cst = arith.constant dense<3.0> : vector<3 x 7 x f32>
	// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}			// expected-error@+1 {{requires a permutation_map that is a permutation (found one dim used more than once)}}
	vector.transfer_write %cst, %arg0[%c3, %c3, %c3] {permutation_map = affine_map<(d0, d1, d2)->(d0, d0)>} : vector<3x7xf32>, memref<?x?x?xf32>			vector.transfer_write %cst, %arg0[%c3, %c3, %c3] {permutation_map = affine_map<(d0, d1, d2)->(d0, d0)>, in_bounds = [true, true, true]} : vector<3x7xf32>, memref<?x?x?xf32>
	}			}

	// -----			// -----

	func.func @test_vector.transfer_write(%arg0: memref<?xf32>, %arg1: vector<7xf32>) {			func.func @test_vector.transfer_write(%arg0: memref<?xf32>, %arg1: vector<7xf32>) {
	%c3 = arith.constant 3 : index			%c3 = arith.constant 3 : index
	%cst = arith.constant 3.0 : f32			%cst = arith.constant 3.0 : f32
	// expected-error@+1 {{should not have broadcast dimensions}}			// expected-error@+1 {{should not have broadcast dimensions}}
	▲ Show 20 Lines • Show All 1,112 Lines • ▼ Show 20 Lines
	func.func @integer_vector_contract(%arg0: vector<16x32xsi8>, %arg1: vector<32x16xsi8>, %arg2: vector<16x16xsi32>) -> vector<16x16xsi32> {			func.func @integer_vector_contract(%arg0: vector<16x32xsi8>, %arg1: vector<32x16xsi8>, %arg2: vector<16x16xsi32>) -> vector<16x16xsi32> {
	// expected-error@+1 {{op only supports signless integer types}}			// expected-error@+1 {{op only supports signless integer types}}
	%0 = vector.contract {			%0 = vector.contract {
	indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>],			indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>],
	iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>			iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>
	} %arg0, %arg1, %arg2 : vector<16x32xsi8>, vector<32x16xsi8> into vector<16x16xsi32>			} %arg0, %arg1, %arg2 : vector<16x32xsi8>, vector<32x16xsi8> into vector<16x16xsi32>
	return %0: vector<16x16xsi32>			return %0: vector<16x16xsi32>
	}			}

				// -----

				func.func @out_of_bounds_non_transfer_dim(%arg0: tensor<?x?xf32>, %pos: index, %f: f32) -> vector<5xf32> {
				// expected-error @below{{expects that all non-transfer dims are in-bounds}}
				%0 = vector.transfer_read %arg0[%pos, %pos], %f : tensor<?x?xf32>, vector<5xf32>
				return %0 : vector<5xf32>
				}

mlir/test/Dialect/Vector/ops.mlir

Show All 14 Lines	-> tensor<f32> {
return %1: tensor<f32>		return %1: tensor<f32>
}		}

// CHECK-LABEL: func @vector_transfer_ops_0d_from_higher_d(		// CHECK-LABEL: func @vector_transfer_ops_0d_from_higher_d(
func.func @vector_transfer_ops_0d_from_higher_d(%arg0: tensor<?xf32>, %arg1: memref<?x?xf32>)		func.func @vector_transfer_ops_0d_from_higher_d(%arg0: tensor<?xf32>, %arg1: memref<?x?xf32>)
-> tensor<?xf32> {		-> tensor<?xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
%0 = vector.transfer_read %arg0[%c0], %f0 {permutation_map = affine_map<(d0)->()>} :		%0 = vector.transfer_read %arg0[%c0], %f0 {permutation_map = affine_map<(d0)->()>, in_bounds = [true]} :
tensor<?xf32>, vector<f32>		tensor<?xf32>, vector<f32>
%1 = vector.transfer_write %0, %arg0[%c0] {permutation_map = affine_map<(d0)->()>} :		%1 = vector.transfer_write %0, %arg0[%c0] {permutation_map = affine_map<(d0)->()>, in_bounds = [true]} :
vector<f32>, tensor<?xf32>		vector<f32>, tensor<?xf32>
%2 = vector.transfer_read %arg1[%c0, %c0], %f0 {permutation_map = affine_map<(d0, d1)->()>} :		%2 = vector.transfer_read %arg1[%c0, %c0], %f0 {permutation_map = affine_map<(d0, d1)->()>, in_bounds = [true, true]} :
memref<?x?xf32>, vector<f32>		memref<?x?xf32>, vector<f32>
vector.transfer_write %2, %arg1[%c0, %c0] {permutation_map = affine_map<(d0, d1)->()>} :		vector.transfer_write %2, %arg1[%c0, %c0] {permutation_map = affine_map<(d0, d1)->()>, in_bounds = [true, true]} :
vector<f32>, memref<?x?xf32>		vector<f32>, memref<?x?xf32>
return %1: tensor<?xf32>		return %1: tensor<?xf32>
}		}

// CHECK-LABEL: func @vector_transfer_ops(		// CHECK-LABEL: func @vector_transfer_ops(
func.func @vector_transfer_ops(%arg0: memref<?x?xf32>,		func.func @vector_transfer_ops(%arg0: memref<?x?xf32>,
%arg1 : memref<?x?xvector<4x3xf32>>,		%arg1 : memref<?x?xvector<4x3xf32>>,
%arg2 : memref<?x?xvector<4x3xi32>>,		%arg2 : memref<?x?xvector<4x3xi32>>,
Show All 9 Lines	func.func @vector_transfer_ops(%arg0: memref<?x?xf32>,

%vf0 = vector.splat %f0 : vector<4x3xf32>		%vf0 = vector.splat %f0 : vector<4x3xf32>
%v0 = vector.splat %c0 : vector<4x3xi32>		%v0 = vector.splat %c0 : vector<4x3xi32>
%vi0 = vector.splat %i0 : vector<4x3xindex>		%vi0 = vector.splat %i0 : vector<4x3xindex>
%m = arith.constant dense<[0, 0, 1, 0, 1]> : vector<5xi1>		%m = arith.constant dense<[0, 0, 1, 0, 1]> : vector<5xi1>
%m2 = vector.splat %i1 : vector<4x5xi1>		%m2 = vector.splat %i1 : vector<4x5xi1>
//		//
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%0 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d0)>} : memref<?x?xf32>, vector<128xf32>		%0 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d0)>, in_bounds = [false, true]} : memref<?x?xf32>, vector<128xf32>
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%1 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d1, d0)>} : memref<?x?xf32>, vector<3x7xf32>		%1 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d1, d0)>} : memref<?x?xf32>, vector<3x7xf32>
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%2 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0)>} : memref<?x?xf32>, vector<128xf32>		%2 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0)>, in_bounds = [false, true]} : memref<?x?xf32>, vector<128xf32>
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%3 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d1)>} : memref<?x?xf32>, vector<128xf32>		%3 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d1)>, in_bounds = [true, false]} : memref<?x?xf32>, vector<128xf32>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
%4 = vector.transfer_read %arg1[%c3, %c3], %vf0 {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		%4 = vector.transfer_read %arg1[%c3, %c3], %vf0 {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} {in_bounds = [false, true]} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} {in_bounds = [false, true]} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
%5 = vector.transfer_read %arg1[%c3, %c3], %vf0 {in_bounds = [false, true]} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		%5 = vector.transfer_read %arg1[%c3, %c3], %vf0 {in_bounds = [false, true]} : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : memref<?x?xvector<4x3xi32>>, vector<5x24xi8>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : memref<?x?xvector<4x3xi32>>, vector<5x24xi8>
%6 = vector.transfer_read %arg2[%c3, %c3], %v0 : memref<?x?xvector<4x3xi32>>, vector<5x24xi8>		%6 = vector.transfer_read %arg2[%c3, %c3], %v0 {in_bounds = [true, true]} : memref<?x?xvector<4x3xi32>>, vector<5x24xi8>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : memref<?x?xvector<4x3xindex>>, vector<5x48xi8>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : memref<?x?xvector<4x3xindex>>, vector<5x48xi8>
%7 = vector.transfer_read %arg3[%c3, %c3], %vi0 : memref<?x?xvector<4x3xindex>>, vector<5x48xi8>		%7 = vector.transfer_read %arg3[%c3, %c3], %vi0 {in_bounds = [true, true]} : memref<?x?xvector<4x3xindex>>, vector<5x48xi8>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}}, %{{.*}} : memref<?x?xf32>, vector<5xf32>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}}, %{{.*}} : memref<?x?xf32>, vector<5xf32>
%8 = vector.transfer_read %arg0[%c3, %c3], %f0, %m : memref<?x?xf32>, vector<5xf32>		%8 = vector.transfer_read %arg0[%c3, %c3], %f0, %m {in_bounds = [true, false]} : memref<?x?xf32>, vector<5xf32>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]], %[[C3]]], %{{.}}, %{{.*}} : memref<?x?x?xf32>, vector<5x4x8xf32>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]], %[[C3]]], %{{.}}, %{{.*}} : memref<?x?x?xf32>, vector<5x4x8xf32>
%9 = vector.transfer_read %arg4[%c3, %c3, %c3], %f0, %m2 {permutation_map = affine_map<(d0, d1, d2)->(d1, d0, 0)>} : memref<?x?x?xf32>, vector<5x4x8xf32>		%9 = vector.transfer_read %arg4[%c3, %c3, %c3], %f0, %m2 {permutation_map = affine_map<(d0, d1, d2)->(d1, d0, 0)>, in_bounds = [false, false, true]} : memref<?x?x?xf32>, vector<5x4x8xf32>

// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0)>} : vector<128xf32>, memref<?x?xf32>		vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0)>, in_bounds = [false, true]} : vector<128xf32>, memref<?x?xf32>
// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d1, d0)>} : vector<3x7xf32>, memref<?x?xf32>		vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d1, d0)>} : vector<3x7xf32>, memref<?x?xf32>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>
vector.transfer_write %4, %arg1[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>		vector.transfer_write %4, %arg1[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>
vector.transfer_write %5, %arg1[%c3, %c3] {in_bounds = [false, false]} : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>		vector.transfer_write %5, %arg1[%c3, %c3] {in_bounds = [false, false]} : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<5x24xi8>, memref<?x?xvector<4x3xi32>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] {in_bounds = [true, true]} : vector<5x24xi8>, memref<?x?xvector<4x3xi32>>
vector.transfer_write %6, %arg2[%c3, %c3] : vector<5x24xi8>, memref<?x?xvector<4x3xi32>>		vector.transfer_write %6, %arg2[%c3, %c3] {in_bounds = [true, true]} : vector<5x24xi8>, memref<?x?xvector<4x3xi32>>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<5x48xi8>, memref<?x?xvector<4x3xindex>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] {in_bounds = [true, true]} : vector<5x48xi8>, memref<?x?xvector<4x3xindex>>
vector.transfer_write %7, %arg3[%c3, %c3] : vector<5x48xi8>, memref<?x?xvector<4x3xindex>>		vector.transfer_write %7, %arg3[%c3, %c3] {in_bounds = [true, true]} : vector<5x48xi8>, memref<?x?xvector<4x3xindex>>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]], %{{.*}} : vector<5xf32>, memref<?x?xf32>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]], %{{.*}} : vector<5xf32>, memref<?x?xf32>
vector.transfer_write %8, %arg0[%c3, %c3], %m : vector<5xf32>, memref<?x?xf32>		vector.transfer_write %8, %arg0[%c3, %c3], %m {in_bounds = [true, false]} : vector<5xf32>, memref<?x?xf32>

return		return
}		}


// CHECK-LABEL: func @vector_transfer_ops_tensor(		// CHECK-LABEL: func @vector_transfer_ops_tensor(
func.func @vector_transfer_ops_tensor(%arg0: tensor<?x?xf32>,		func.func @vector_transfer_ops_tensor(%arg0: tensor<?x?xf32>,
%arg1 : tensor<?x?xvector<4x3xf32>>,		%arg1 : tensor<?x?xvector<4x3xf32>>,
Show All 10 Lines	func.func @vector_transfer_ops_tensor(%arg0: tensor<?x?xf32>,
%i0 = arith.constant 0 : index		%i0 = arith.constant 0 : index

%vf0 = vector.splat %f0 : vector<4x3xf32>		%vf0 = vector.splat %f0 : vector<4x3xf32>
%v0 = vector.splat %c0 : vector<4x3xi32>		%v0 = vector.splat %c0 : vector<4x3xi32>
%vi0 = vector.splat %i0 : vector<4x3xindex>		%vi0 = vector.splat %i0 : vector<4x3xindex>

//		//
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%0 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d0)>} : tensor<?x?xf32>, vector<128xf32>		%0 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d0)>, in_bounds = [false, true]} : tensor<?x?xf32>, vector<128xf32>
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%1 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d1, d0)>} : tensor<?x?xf32>, vector<3x7xf32>		%1 = vector.transfer_read %arg0[%c3, %c3], %f0 {permutation_map = affine_map<(d0, d1)->(d1, d0)>, in_bounds = [true, true]} : tensor<?x?xf32>, vector<3x7xf32>
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%2 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0)>} : tensor<?x?xf32>, vector<128xf32>		%2 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d0)>, in_bounds = [false, true]} : tensor<?x?xf32>, vector<128xf32>
// CHECK: vector.transfer_read		// CHECK: vector.transfer_read
%3 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d1)>} : tensor<?x?xf32>, vector<128xf32>		%3 = vector.transfer_read %arg0[%c3, %c3], %cst {permutation_map = affine_map<(d0, d1)->(d1)>, in_bounds = [true, false]} : tensor<?x?xf32>, vector<128xf32>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
%4 = vector.transfer_read %arg1[%c3, %c3], %vf0 {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		%4 = vector.transfer_read %arg1[%c3, %c3], %vf0 {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} {in_bounds = [false, true]} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} {in_bounds = [false, true]} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
%5 = vector.transfer_read %arg1[%c3, %c3], %vf0 {in_bounds = [false, true]} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>		%5 = vector.transfer_read %arg1[%c3, %c3], %vf0 {in_bounds = [false, true]} : tensor<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : tensor<?x?xvector<4x3xi32>>, vector<5x24xi8>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : tensor<?x?xvector<4x3xi32>>, vector<5x24xi8>
%6 = vector.transfer_read %arg2[%c3, %c3], %v0 : tensor<?x?xvector<4x3xi32>>, vector<5x24xi8>		%6 = vector.transfer_read %arg2[%c3, %c3], %v0 {in_bounds = [true, true]} : tensor<?x?xvector<4x3xi32>>, vector<5x24xi8>
// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : tensor<?x?xvector<4x3xindex>>, vector<5x48xi8>		// CHECK: vector.transfer_read %{{.}}[%[[C3]], %[[C3]]], %{{.}} : tensor<?x?xvector<4x3xindex>>, vector<5x48xi8>
%7 = vector.transfer_read %arg3[%c3, %c3], %vi0 : tensor<?x?xvector<4x3xindex>>, vector<5x48xi8>		%7 = vector.transfer_read %arg3[%c3, %c3], %vi0 {in_bounds = [true, true]} : tensor<?x?xvector<4x3xindex>>, vector<5x48xi8>


// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
%8 = vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0)>} : vector<128xf32>, tensor<?x?xf32>		%8 = vector.transfer_write %0, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0)>, in_bounds = [false, true]} : vector<128xf32>, tensor<?x?xf32>
// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
%9 = vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d1, d0)>} : vector<3x7xf32>, tensor<?x?xf32>		%9 = vector.transfer_write %1, %arg0[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d1, d0)>} : vector<3x7xf32>, tensor<?x?xf32>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>
%10 = vector.transfer_write %4, %arg1[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>		%10 = vector.transfer_write %4, %arg1[%c3, %c3] {permutation_map = affine_map<(d0, d1)->(d0, d1)>} : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>
%11 = vector.transfer_write %5, %arg1[%c3, %c3] {in_bounds = [false, false]} : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>		%11 = vector.transfer_write %5, %arg1[%c3, %c3] {in_bounds = [false, false]} : vector<1x1x4x3xf32>, tensor<?x?xvector<4x3xf32>>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<5x24xi8>, tensor<?x?xvector<4x3xi32>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] {in_bounds = [true, true]} : vector<5x24xi8>, tensor<?x?xvector<4x3xi32>>
%12 = vector.transfer_write %6, %arg2[%c3, %c3] : vector<5x24xi8>, tensor<?x?xvector<4x3xi32>>		%12 = vector.transfer_write %6, %arg2[%c3, %c3] {in_bounds = [true, true]} : vector<5x24xi8>, tensor<?x?xvector<4x3xi32>>
// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] : vector<5x48xi8>, tensor<?x?xvector<4x3xindex>>		// CHECK: vector.transfer_write %{{.}}, %{{.}}[%[[C3]], %[[C3]]] {in_bounds = [true, true]} : vector<5x48xi8>, tensor<?x?xvector<4x3xindex>>
%13 = vector.transfer_write %7, %arg3[%c3, %c3] : vector<5x48xi8>, tensor<?x?xvector<4x3xindex>>		%13 = vector.transfer_write %7, %arg3[%c3, %c3] {in_bounds = [true, true]} : vector<5x48xi8>, tensor<?x?xvector<4x3xindex>>

return %8, %9, %10, %11, %12, %13 :		return %8, %9, %10, %11, %12, %13 :
tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xvector<4x3xf32>>,		tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xvector<4x3xf32>>,
tensor<?x?xvector<4x3xf32>>, tensor<?x?xvector<4x3xi32>>,		tensor<?x?xvector<4x3xf32>>, tensor<?x?xvector<4x3xi32>>,
tensor<?x?xvector<4x3xindex>>		tensor<?x?xvector<4x3xindex>>
}		}

// CHECK-LABEL: @vector_broadcast		// CHECK-LABEL: @vector_broadcast
▲ Show 20 Lines • Show All 790 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/scalar-vector-transfer-to-memref.mlir

	// RUN: mlir-opt %s -test-scalar-vector-transfer-lowering -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -test-scalar-vector-transfer-lowering -split-input-file \| FileCheck %s
	// RUN: mlir-opt %s -test-scalar-vector-transfer-lowering=allow-multiple-uses -split-input-file \| FileCheck %s --check-prefix=MULTIUSE			// RUN: mlir-opt %s -test-scalar-vector-transfer-lowering=allow-multiple-uses -split-input-file \| FileCheck %s --check-prefix=MULTIUSE

	// CHECK-LABEL: func @transfer_read_0d(			// CHECK-LABEL: func @transfer_read_0d(
	// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index			// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index
	// CHECK: %[[r:.*]] = memref.load %[[m]][%[[idx]], %[[idx]], %[[idx]]]			// CHECK: %[[r:.*]] = memref.load %[[m]][%[[idx]], %[[idx]], %[[idx]]]
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func.func @transfer_read_0d(%m: memref<?x?x?xf32>, %idx: index) -> f32 {			func.func @transfer_read_0d(%m: memref<?x?x?xf32>, %idx: index) -> f32 {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%0 = vector.transfer_read %m[%idx, %idx, %idx], %cst : memref<?x?x?xf32>, vector<f32>			%0 = vector.transfer_read %m[%idx, %idx, %idx], %cst {in_bounds = [true, true, true]} : memref<?x?x?xf32>, vector<f32>
	%1 = vector.extractelement %0[] : vector<f32>			%1 = vector.extractelement %0[] : vector<f32>
	return %1 : f32			return %1 : f32
	}			}

	// -----			// -----

	// CHECK: #[[$map:.*]] = affine_map<()[s0, s1] -> (s0 + s1)>			// CHECK: #[[$map:.*]] = affine_map<()[s0, s1] -> (s0 + s1)>
	// CHECK-LABEL: func @transfer_read_1d(			// CHECK-LABEL: func @transfer_read_1d(
	// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index, %[[idx2:.*]]: index			// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index, %[[idx2:.*]]: index
	// CHECK: %[[added:.*]] = affine.apply #[[$map]]()[%[[idx]], %[[idx2]]]			// CHECK: %[[added:.*]] = affine.apply #[[$map]]()[%[[idx]], %[[idx2]]]
	// CHECK: %[[r:.*]] = memref.load %[[m]][%[[idx]], %[[idx]], %[[added]]]			// CHECK: %[[r:.*]] = memref.load %[[m]][%[[idx]], %[[idx]], %[[added]]]
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func.func @transfer_read_1d(%m: memref<?x?x?xf32>, %idx: index, %idx2: index) -> f32 {			func.func @transfer_read_1d(%m: memref<?x?x?xf32>, %idx: index, %idx2: index) -> f32 {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%0 = vector.transfer_read %m[%idx, %idx, %idx], %cst {in_bounds = [true]} : memref<?x?x?xf32>, vector<5xf32>			%0 = vector.transfer_read %m[%idx, %idx, %idx], %cst {in_bounds = [true, true, true]} : memref<?x?x?xf32>, vector<5xf32>
	%1 = vector.extractelement %0[%idx2 : index] : vector<5xf32>			%1 = vector.extractelement %0[%idx2 : index] : vector<5xf32>
	return %1 : f32			return %1 : f32
	}			}

	// -----			// -----

	// CHECK-LABEL: func @tensor_transfer_read_0d(			// CHECK-LABEL: func @tensor_transfer_read_0d(
	// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[idx:.]]: index			// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[idx:.]]: index
	// CHECK: %[[r:.*]] = tensor.extract %[[t]][%[[idx]], %[[idx]], %[[idx]]]			// CHECK: %[[r:.*]] = tensor.extract %[[t]][%[[idx]], %[[idx]], %[[idx]]]
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func.func @tensor_transfer_read_0d(%t: tensor<?x?x?xf32>, %idx: index) -> f32 {			func.func @tensor_transfer_read_0d(%t: tensor<?x?x?xf32>, %idx: index) -> f32 {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%0 = vector.transfer_read %t[%idx, %idx, %idx], %cst : tensor<?x?x?xf32>, vector<f32>			%0 = vector.transfer_read %t[%idx, %idx, %idx], %cst {in_bounds = [true, true, true]} : tensor<?x?x?xf32>, vector<f32>
	%1 = vector.extractelement %0[] : vector<f32>			%1 = vector.extractelement %0[] : vector<f32>
	return %1 : f32			return %1 : f32
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_write_0d(			// CHECK-LABEL: func @transfer_write_0d(
	// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index, %[[f:.*]]: f32			// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index, %[[f:.*]]: f32
	// CHECK: %[[bc:.*]] = vector.broadcast %[[f]] : f32 to vector<f32>			// CHECK: %[[bc:.*]] = vector.broadcast %[[f]] : f32 to vector<f32>
	// CHECK: %[[extract:.*]] = vector.extractelement %[[bc]][] : vector<f32>			// CHECK: %[[extract:.*]] = vector.extractelement %[[bc]][] : vector<f32>
	// CHECK: memref.store %[[extract]], %[[m]][%[[idx]], %[[idx]], %[[idx]]]			// CHECK: memref.store %[[extract]], %[[m]][%[[idx]], %[[idx]], %[[idx]]]
	func.func @transfer_write_0d(%m: memref<?x?x?xf32>, %idx: index, %f: f32) {			func.func @transfer_write_0d(%m: memref<?x?x?xf32>, %idx: index, %f: f32) {
	%0 = vector.broadcast %f : f32 to vector<f32>			%0 = vector.broadcast %f : f32 to vector<f32>
	vector.transfer_write %0, %m[%idx, %idx, %idx] : vector<f32>, memref<?x?x?xf32>			vector.transfer_write %0, %m[%idx, %idx, %idx] {in_bounds = [true, true, true]} : vector<f32>, memref<?x?x?xf32>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_write_1d(			// CHECK-LABEL: func @transfer_write_1d(
	// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index, %[[f:.*]]: f32			// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index, %[[f:.*]]: f32
	// CHECK: memref.store %[[f]], %[[m]][%[[idx]], %[[idx]], %[[idx]]]			// CHECK: memref.store %[[f]], %[[m]][%[[idx]], %[[idx]], %[[idx]]]
	func.func @transfer_write_1d(%m: memref<?x?x?xf32>, %idx: index, %f: f32) {			func.func @transfer_write_1d(%m: memref<?x?x?xf32>, %idx: index, %f: f32) {
	%0 = vector.broadcast %f : f32 to vector<1xf32>			%0 = vector.broadcast %f : f32 to vector<1xf32>
	vector.transfer_write %0, %m[%idx, %idx, %idx] : vector<1xf32>, memref<?x?x?xf32>			vector.transfer_write %0, %m[%idx, %idx, %idx] {in_bounds = [true, true, true]} : vector<1xf32>, memref<?x?x?xf32>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: func @tensor_transfer_write_0d(			// CHECK-LABEL: func @tensor_transfer_write_0d(
	// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[idx:.]]: index, %[[f:.*]]: f32			// CHECK-SAME: %[[t:.]]: tensor<?x?x?xf32>, %[[idx:.]]: index, %[[f:.*]]: f32
	// CHECK: %[[bc:.*]] = vector.broadcast %[[f]] : f32 to vector<f32>			// CHECK: %[[bc:.*]] = vector.broadcast %[[f]] : f32 to vector<f32>
	// CHECK: %[[extract:.*]] = vector.extractelement %[[bc]][] : vector<f32>			// CHECK: %[[extract:.*]] = vector.extractelement %[[bc]][] : vector<f32>
	// CHECK: %[[r:.*]] = tensor.insert %[[extract]] into %[[t]][%[[idx]], %[[idx]], %[[idx]]]			// CHECK: %[[r:.*]] = tensor.insert %[[extract]] into %[[t]][%[[idx]], %[[idx]], %[[idx]]]
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func.func @tensor_transfer_write_0d(%t: tensor<?x?x?xf32>, %idx: index, %f: f32) -> tensor<?x?x?xf32> {			func.func @tensor_transfer_write_0d(%t: tensor<?x?x?xf32>, %idx: index, %f: f32) -> tensor<?x?x?xf32> {
	%0 = vector.broadcast %f : f32 to vector<f32>			%0 = vector.broadcast %f : f32 to vector<f32>
	%1 = vector.transfer_write %0, %t[%idx, %idx, %idx] : vector<f32>, tensor<?x?x?xf32>			%1 = vector.transfer_write %0, %t[%idx, %idx, %idx] {in_bounds = [true, true, true]} : vector<f32>, tensor<?x?x?xf32>
	return %1 : tensor<?x?x?xf32>			return %1 : tensor<?x?x?xf32>
	}			}

	// -----			// -----

	// CHECK: #[[$map:.*]] = affine_map<()[s0] -> (s0 + 8)>			// CHECK: #[[$map:.*]] = affine_map<()[s0] -> (s0 + 8)>
	// CHECK: #[[$map1:.*]] = affine_map<()[s0] -> (s0 + 1)>			// CHECK: #[[$map1:.*]] = affine_map<()[s0] -> (s0 + 1)>
	// CHECK-LABEL: func @transfer_read_2d_extract(			// CHECK-LABEL: func @transfer_read_2d_extract(
	// CHECK-SAME: %[[m:.]]: memref<?x?x?x?xf32>, %[[idx:.]]: index, %[[idx2:.*]]: index			// CHECK-SAME: %[[m:.]]: memref<?x?x?x?xf32>, %[[idx:.]]: index, %[[idx2:.*]]: index
	// CHECK: %[[added:.*]] = affine.apply #[[$map]]()[%[[idx]]]			// CHECK: %[[added:.*]] = affine.apply #[[$map]]()[%[[idx]]]
	// CHECK: %[[added1:.*]] = affine.apply #[[$map1]]()[%[[idx]]]			// CHECK: %[[added1:.*]] = affine.apply #[[$map1]]()[%[[idx]]]
	// CHECK: %[[r:.*]] = memref.load %[[m]][%[[idx]], %[[idx]], %[[added]], %[[added1]]]			// CHECK: %[[r:.*]] = memref.load %[[m]][%[[idx]], %[[idx]], %[[added]], %[[added1]]]
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func.func @transfer_read_2d_extract(%m: memref<?x?x?x?xf32>, %idx: index, %idx2: index) -> f32 {			func.func @transfer_read_2d_extract(%m: memref<?x?x?x?xf32>, %idx: index, %idx2: index) -> f32 {
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%0 = vector.transfer_read %m[%idx, %idx, %idx, %idx], %cst {in_bounds = [true, true]} : memref<?x?x?x?xf32>, vector<10x5xf32>			%0 = vector.transfer_read %m[%idx, %idx, %idx, %idx], %cst {in_bounds = [true, true, true, true]} : memref<?x?x?x?xf32>, vector<10x5xf32>
	%1 = vector.extract %0[8, 1] : vector<10x5xf32>			%1 = vector.extract %0[8, 1] : vector<10x5xf32>
	return %1 : f32			return %1 : f32
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_write_arith_constant(			// CHECK-LABEL: func @transfer_write_arith_constant(
	// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index			// CHECK-SAME: %[[m:.]]: memref<?x?x?xf32>, %[[idx:.]]: index
	// CHECK: %[[cst:.*]] = arith.constant dense<5.000000e+00> : vector<1x1xf32>			// CHECK: %[[cst:.*]] = arith.constant dense<5.000000e+00> : vector<1x1xf32>
	// CHECK: %[[extract:.*]] = vector.extract %[[cst]][0, 0] : vector<1x1xf32>			// CHECK: %[[extract:.*]] = vector.extract %[[cst]][0, 0] : vector<1x1xf32>
	// CHECK: memref.store %[[extract]], %[[m]][%[[idx]], %[[idx]], %[[idx]]]			// CHECK: memref.store %[[extract]], %[[m]][%[[idx]], %[[idx]], %[[idx]]]
	func.func @transfer_write_arith_constant(%m: memref<?x?x?xf32>, %idx: index) {			func.func @transfer_write_arith_constant(%m: memref<?x?x?xf32>, %idx: index) {
	%cst = arith.constant dense<5.000000e+00> : vector<1x1xf32>			%cst = arith.constant dense<5.000000e+00> : vector<1x1xf32>
	vector.transfer_write %cst, %m[%idx, %idx, %idx] : vector<1x1xf32>, memref<?x?x?xf32>			vector.transfer_write %cst, %m[%idx, %idx, %idx] {in_bounds = [true, true, true]} : vector<1x1xf32>, memref<?x?x?xf32>
	return			return
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_read_multi_use(			// CHECK-LABEL: func @transfer_read_multi_use(
	// CHECK-SAME: %[[m:.]]: memref<?xf32>, %[[idx:.]]: index			// CHECK-SAME: %[[m:.]]: memref<?xf32>, %[[idx:.]]: index
	// CHECK-NOT: memref.load			// CHECK-NOT: memref.load
	Show All 38 Lines

mlir/test/Dialect/Vector/vector-dropleadunitdim-transforms.mlir

	Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines
	}			}

	// CHECK-LABEL: func @cast_away_transfer_read_leading_one_dims			// CHECK-LABEL: func @cast_away_transfer_read_leading_one_dims
	func.func @cast_away_transfer_read_leading_one_dims(%arg0: memref<1x4x8x16xf16>) -> vector<1x4xf16> {			func.func @cast_away_transfer_read_leading_one_dims(%arg0: memref<1x4x8x16xf16>) -> vector<1x4xf16> {
	// CHECK: %[[C0:.+]] = arith.constant 0 : index			// CHECK: %[[C0:.+]] = arith.constant 0 : index
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// CHECK: %[[F0:.+]] = arith.constant 0.000000e+00 : f16			// CHECK: %[[F0:.+]] = arith.constant 0.000000e+00 : f16
	%f0 = arith.constant 0. : f16			%f0 = arith.constant 0. : f16
	// CHECK: %[[READ:.+]] = vector.transfer_read %{{.*}}[%[[C0]], %[[C0]], %[[C0]], %[[C0]]], %[[F0]] {in_bounds = [true]} : memref<1x4x8x16xf16>, vector<4xf16>			// CHECK: %[[READ:.+]] = vector.transfer_read %{{.*}}[%[[C0]], %[[C0]], %[[C0]], %[[C0]]], %[[F0]] {in_bounds = [true, true, true, true]} : memref<1x4x8x16xf16>, vector<4xf16>
	// CHECK: %[[CAST:.+]] = vector.broadcast %[[READ]] : vector<4xf16> to vector<1x4xf16>			// CHECK: %[[CAST:.+]] = vector.broadcast %[[READ]] : vector<4xf16> to vector<1x4xf16>
	%0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %f0 {in_bounds = [true, true]} : memref<1x4x8x16xf16>, vector<1x4xf16>			%0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %f0 {in_bounds = [true, true, true, true]} : memref<1x4x8x16xf16>, vector<1x4xf16>
	// CHECK: return %[[CAST]]			// CHECK: return %[[CAST]]
	return %0: vector<1x4xf16>			return %0: vector<1x4xf16>
	}			}

	// CHECK-LABEL: func @cast_away_transfer_read_leading_one_dims_one_element			// CHECK-LABEL: func @cast_away_transfer_read_leading_one_dims_one_element
	func.func @cast_away_transfer_read_leading_one_dims_one_element(%arg0: memref<1x1x1x1xf16>) -> vector<1x1xf16> {			func.func @cast_away_transfer_read_leading_one_dims_one_element(%arg0: memref<1x1x1x1xf16>) -> vector<1x1xf16> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%f0 = arith.constant 0. : f16			%f0 = arith.constant 0. : f16
	// CHECK: vector.broadcast %{{.+}} : vector<1xf16> to vector<1x1xf16>			// CHECK: vector.broadcast %{{.+}} : vector<1xf16> to vector<1x1xf16>
	%0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %f0 {in_bounds = [true, true]} : memref<1x1x1x1xf16>, vector<1x1xf16>			%0 = vector.transfer_read %arg0[%c0, %c0, %c0, %c0], %f0 {in_bounds = [true, true, true, true]} : memref<1x1x1x1xf16>, vector<1x1xf16>
	return %0: vector<1x1xf16>			return %0: vector<1x1xf16>
	}			}

	// CHECK-LABEL: func @cast_away_transfer_write_leading_one_dims			// CHECK-LABEL: func @cast_away_transfer_write_leading_one_dims
	func.func @cast_away_transfer_write_leading_one_dims(%arg0: memref<1x4x8x16xf16>, %arg1: vector<1x4xf16>) {			func.func @cast_away_transfer_write_leading_one_dims(%arg0: memref<1x4x8x16xf16>, %arg1: vector<1x4xf16>) {
	// CHECK: %[[C0:.+]] = arith.constant 0 : index			// CHECK: %[[C0:.+]] = arith.constant 0 : index
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// CHECK: %[[CAST:.+]] = vector.extract %{{.*}}[0] : vector<1x4xf16>			// CHECK: %[[CAST:.+]] = vector.extract %{{.*}}[0] : vector<1x4xf16>
	// CHECK: vector.transfer_write %[[CAST]], %{{.*}}[%[[C0]], %[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true]} : vector<4xf16>, memref<1x4x8x16xf16>			// CHECK: vector.transfer_write %[[CAST]], %{{.*}}[%[[C0]], %[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, true, true, true]} : vector<4xf16>, memref<1x4x8x16xf16>

	vector.transfer_write %arg1, %arg0[%c0, %c0, %c0, %c0] {in_bounds = [true, true]} : vector<1x4xf16>, memref<1x4x8x16xf16>			vector.transfer_write %arg1, %arg0[%c0, %c0, %c0, %c0] {in_bounds = [true, true, true, true]} : vector<1x4xf16>, memref<1x4x8x16xf16>
	return			return
	}			}

	// CHECK-LABEL: func @cast_away_transfer_write_leading_one_dims_one_element			// CHECK-LABEL: func @cast_away_transfer_write_leading_one_dims_one_element
	func.func @cast_away_transfer_write_leading_one_dims_one_element(%arg0: memref<1x1x1x1xf16>, %arg1: vector<1x1xf16>) {			func.func @cast_away_transfer_write_leading_one_dims_one_element(%arg0: memref<1x1x1x1xf16>, %arg1: vector<1x1xf16>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	// CHECK: vector.extract %{{.+}}[0] : vector<1x1xf16>			// CHECK: vector.extract %{{.+}}[0] : vector<1x1xf16>
	vector.transfer_write %arg1, %arg0[%c0, %c0, %c0, %c0] {in_bounds = [true, true]} : vector<1x1xf16>, memref<1x1x1x1xf16>			vector.transfer_write %arg1, %arg0[%c0, %c0, %c0, %c0] {in_bounds = [true, true, true, true]} : vector<1x1xf16>, memref<1x1x1x1xf16>
	return			return
	}			}

	// CHECK-LABEL: func @cast_away_elementwise_leading_one_dims			// CHECK-LABEL: func @cast_away_elementwise_leading_one_dims
	func.func @cast_away_elementwise_leading_one_dims(			func.func @cast_away_elementwise_leading_one_dims(
	%arg0: vector<1x1x8xf32>, %arg1: f32, %arg2: vector<1x4xf32>,			%arg0: vector<1x1x8xf32>, %arg1: f32, %arg2: vector<1x4xf32>,
	%arg3: vector<1x4xf32>, %arg4: i1) ->			%arg3: vector<1x4xf32>, %arg4: i1) ->
	(vector<1x1x8xf32>, vector<1x4xi1>, vector<1x4xf32>, vector<1x4xf32>) {			(vector<1x1x8xf32>, vector<1x4xi1>, vector<1x4xf32>, vector<1x4xf32>) {
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir

	// RUN: mlir-opt %s -test-vector-transfer-collapse-inner-most-dims -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -test-vector-transfer-collapse-inner-most-dims -split-input-file \| FileCheck %s

	func.func @contiguous_inner_most_view(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{			func.func @contiguous_inner_most_view(%in: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>) -> vector<1x8x1xf32>{
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>			%0 = vector.transfer_read %in[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, true, true]} : memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>, vector<1x8x1xf32>
	return %0 : vector<1x8x1xf32>			return %0 : vector<1x8x1xf32>
	}			}
	// CHECK: func @contiguous_inner_most_view(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>			// CHECK: func @contiguous_inner_most_view(%[[SRC:.+]]: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>>
	// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]			// CHECK: %[[SRC_0:.+]] = memref.subview %[[SRC]]
	// CHECK-SAME: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>			// CHECK-SAME: memref<1x1x8x1xf32, strided<[3072, 8, 1, 1], offset: ?>> to memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>
	// CHECK: %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]			// CHECK: %[[VEC:.+]] = vector.transfer_read %[[SRC_0]]
	// CHECK-SAME: memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>, vector<1x8xf32>			// CHECK-SAME: memref<1x1x8xf32, strided<[3072, 8, 1], offset: ?>>, vector<1x8xf32>
	// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]			// CHECK: %[[RESULT:.+]] = vector.shape_cast %[[VEC]]
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir

	// RUN: mlir-opt %s --test-transform-dialect-interpreter \| FileCheck %s			// RUN: mlir-opt %s --test-transform-dialect-interpreter \| FileCheck %s

	func.func @transfer_read_rank_reducing(			func.func @transfer_read_rank_reducing(
	%arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>) -> vector<3x2xi8> {			%arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>) -> vector<3x2xi8> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0 : i8			%cst = arith.constant 0 : i8
	%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst :			%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, false, false]} :
	memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>, vector<3x2xi8>			memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>, vector<3x2xi8>
	return %v : vector<3x2xi8>			return %v : vector<3x2xi8>
	}			}
	// CHECK-LABEL: func @transfer_read_rank_reducing			// CHECK-LABEL: func @transfer_read_rank_reducing
	// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2xi8			// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2xi8
	// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]			// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
	// CHECK-SAME: memref<1x1x3x2xi8, {{.}}> to memref<3x2xi8, {{.}}>			// CHECK-SAME: memref<1x1x3x2xi8, {{.}}> to memref<3x2xi8, {{.}}>
	// CHECK: vector.transfer_read %[[SUBVIEW]]			// CHECK: vector.transfer_read %[[SUBVIEW]]

	func.func @transfer_write_rank_reducing(%arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>, %vec : vector<3x2xi8>) {			func.func @transfer_write_rank_reducing(%arg : memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>, %vec : vector<3x2xi8>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] :			vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0] {in_bounds = [true, true, false, false]} :
	vector<3x2xi8>, memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>			vector<3x2xi8>, memref<1x1x3x2xi8, strided<[6, 6, 2, 1], offset: ?>>
	return			return
	}			}
	// CHECK-LABEL: func @transfer_write_rank_reducing			// CHECK-LABEL: func @transfer_write_rank_reducing
	// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2xi8			// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2xi8
	// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]			// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0] [1, 1, 3, 2] [1, 1, 1, 1]
	// CHECK-SAME: memref<1x1x3x2xi8, {{.}}> to memref<3x2xi8, {{.}}>			// CHECK-SAME: memref<1x1x3x2xi8, {{.}}> to memref<3x2xi8, {{.}}>
	// CHECK: vector.transfer_write %{{.*}}, %[[SUBVIEW]]			// CHECK: vector.transfer_write %{{.*}}, %[[SUBVIEW]]

	func.func @transfer_read_and_vector_rank_reducing(			func.func @transfer_read_and_vector_rank_reducing(
	%arg : memref<1x1x3x2x1xf32>) -> vector<3x2x1xf32> {			%arg : memref<1x1x3x2x1xf32>) -> vector<3x2x1xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0, %c0], %cst :			%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, false, false, false]} :
	memref<1x1x3x2x1xf32>, vector<3x2x1xf32>			memref<1x1x3x2x1xf32>, vector<3x2x1xf32>
	return %v : vector<3x2x1xf32>			return %v : vector<3x2x1xf32>
	}			}
	// CHECK-LABEL: func @transfer_read_and_vector_rank_reducing			// CHECK-LABEL: func @transfer_read_and_vector_rank_reducing
	// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2x1xf32>			// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2x1xf32>
	// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0, 0] [1, 1, 3, 2, 1] [1, 1, 1, 1, 1]			// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0, 0] [1, 1, 3, 2, 1] [1, 1, 1, 1, 1]
	// CHECK-SAME: memref<1x1x3x2x1xf32> to memref<3x2xf32>			// CHECK-SAME: memref<1x1x3x2x1xf32> to memref<3x2xf32>
	// CHECK: vector.transfer_read %[[SUBVIEW]]{{.*}} {in_bounds = [true, true]} : memref<3x2xf32>, vector<3x2xf32>			// CHECK: vector.transfer_read %[[SUBVIEW]]{{.*}} {in_bounds = [true, true]} : memref<3x2xf32>, vector<3x2xf32>

	func.func @transfer_write_and_vector_rank_reducing(			func.func @transfer_write_and_vector_rank_reducing(
	%arg : memref<1x1x3x2x1xf32>,			%arg : memref<1x1x3x2x1xf32>,
	%vec : vector<3x2x1xf32>) {			%vec : vector<3x2x1xf32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0, %c0] :			vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, false, false, false]} :
	vector<3x2x1xf32>, memref<1x1x3x2x1xf32>			vector<3x2x1xf32>, memref<1x1x3x2x1xf32>
	return			return
	}			}
	// CHECK-LABEL: func @transfer_write_and_vector_rank_reducing			// CHECK-LABEL: func @transfer_write_and_vector_rank_reducing
	// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2x1xf32>			// CHECK-SAME: %[[ARG:.+]]: memref<1x1x3x2x1xf32>
	// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0, 0] [1, 1, 3, 2, 1] [1, 1, 1, 1, 1]			// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[ARG]][0, 0, 0, 0, 0] [1, 1, 3, 2, 1] [1, 1, 1, 1, 1]
	// CHECK-SAME: memref<1x1x3x2x1xf32> to memref<3x2xf32>			// CHECK-SAME: memref<1x1x3x2x1xf32> to memref<3x2xf32>
	// CHECK: vector.transfer_write %{{.}}, %[[SUBVIEW]]{{.}} {in_bounds = [true, true]} : vector<3x2xf32>, memref<3x2xf32>			// CHECK: vector.transfer_write %{{.}}, %[[SUBVIEW]]{{.}} {in_bounds = [true, true]} : vector<3x2xf32>, memref<3x2xf32>

	func.func @transfer_read_and_vector_rank_reducing_to_0d(			func.func @transfer_read_and_vector_rank_reducing_to_0d(
	%arg : memref<1x1x1x1x1xf32>) -> vector<1x1x1xf32> {			%arg : memref<1x1x1x1x1xf32>) -> vector<1x1x1xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0.0 : f32			%cst = arith.constant 0.0 : f32
	%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0, %c0], %cst :			%v = vector.transfer_read %arg[%c0, %c0, %c0, %c0, %c0], %cst {in_bounds = [true, true, false, false, false]} :
	memref<1x1x1x1x1xf32>, vector<1x1x1xf32>			memref<1x1x1x1x1xf32>, vector<1x1x1xf32>
	return %v : vector<1x1x1xf32>			return %v : vector<1x1x1xf32>
	}			}
	// CHECK-LABEL: func @transfer_read_and_vector_rank_reducing_to_0d			// CHECK-LABEL: func @transfer_read_and_vector_rank_reducing_to_0d
	// CHECK-SAME: %[[MEMREF:.+]]: memref<1x1x1x1x1xf32>			// CHECK-SAME: %[[MEMREF:.+]]: memref<1x1x1x1x1xf32>
	// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[MEMREF]][0, 0, 0, 0, 0] [1, 1, 1, 1, 1] [1, 1, 1, 1, 1] : memref<1x1x1x1x1xf32> to memref<f32>			// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[MEMREF]][0, 0, 0, 0, 0] [1, 1, 1, 1, 1] [1, 1, 1, 1, 1] : memref<1x1x1x1x1xf32> to memref<f32>
	// CHECK: %[[READ:.+]] = vector.transfer_read %[[SUBVIEW]]{{.*}} : memref<f32>, vector<f32>			// CHECK: %[[READ:.+]] = vector.transfer_read %[[SUBVIEW]]{{.*}} : memref<f32>, vector<f32>
	// CHECK: vector.shape_cast %[[READ]] : vector<f32> to vector<1x1x1xf32>			// CHECK: vector.shape_cast %[[READ]] : vector<f32> to vector<1x1x1xf32>

	func.func @transfer_write_and_vector_rank_reducing_to_0d(			func.func @transfer_write_and_vector_rank_reducing_to_0d(
	%arg : memref<1x1x1x1x1xf32>,			%arg : memref<1x1x1x1x1xf32>,
	%vec : vector<1x1x1xf32>) {			%vec : vector<1x1x1xf32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0, %c0] :			vector.transfer_write %vec, %arg [%c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, false, false, false]} :
	vector<1x1x1xf32>, memref<1x1x1x1x1xf32>			vector<1x1x1xf32>, memref<1x1x1x1x1xf32>
	return			return
	}			}
	// CHECK-LABEL: func @transfer_write_and_vector_rank_reducing_to_0d			// CHECK-LABEL: func @transfer_write_and_vector_rank_reducing_to_0d
	// CHECK-SAME: %[[MEMREF:.+]]: memref<1x1x1x1x1xf32>, %[[VECTOR:.+]]: vector<1x1x1xf32>			// CHECK-SAME: %[[MEMREF:.+]]: memref<1x1x1x1x1xf32>, %[[VECTOR:.+]]: vector<1x1x1xf32>
	// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[MEMREF]][0, 0, 0, 0, 0] [1, 1, 1, 1, 1] [1, 1, 1, 1, 1] : memref<1x1x1x1x1xf32> to memref<f32>			// CHECK: %[[SUBVIEW:.+]] = memref.subview %[[MEMREF]][0, 0, 0, 0, 0] [1, 1, 1, 1, 1] [1, 1, 1, 1, 1] : memref<1x1x1x1x1xf32> to memref<f32>
	// CHECK: %[[SHCAST:.+]] = vector.shape_cast %[[VECTOR]] : vector<1x1x1xf32> to vector<f32>			// CHECK: %[[SHCAST:.+]] = vector.shape_cast %[[VECTOR]] : vector<1x1x1xf32> to vector<f32>
	// CHECK: vector.transfer_write %[[SHCAST]], %[[SUBVIEW]]{{.*}} : vector<f32>, memref<f32>			// CHECK: vector.transfer_write %[[SHCAST]], %[[SUBVIEW]]{{.*}} : vector<f32>, memref<f32>

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%func_op: !transform.op<"func.func">):			^bb1(%func_op: !transform.op<"func.func">):
	transform.apply_patterns to %func_op {			transform.apply_patterns to %func_op {
	transform.apply_patterns.vector.rank_reducing_subview_patterns			transform.apply_patterns.vector.rank_reducing_subview_patterns
	} : !transform.op<"func.func">			} : !transform.op<"func.func">
	}			}

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir

	Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]][], %[[CST]] : memref<i8>			// CHECK: %[[READ:.+]] = vector.transfer_read %[[ARG]][], %[[CST]] : memref<i8>
	// CHECK: return %[[READ]]			// CHECK: return %[[READ]]

	// -----			// -----

	func.func @transfer_read_flattenable_with_dynamic_dims_and_indices(%arg0 : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>, %arg1 : index, %arg2 : index) -> vector<8x4xi8> {			func.func @transfer_read_flattenable_with_dynamic_dims_and_indices(%arg0 : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>, %arg1 : index, %arg2 : index) -> vector<8x4xi8> {
	%c0_i8 = arith.constant 0 : i8			%c0_i8 = arith.constant 0 : i8
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%result = vector.transfer_read %arg0[%arg1, %arg2, %c0, %c0], %c0_i8 {in_bounds = [true, true]} : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>, vector<8x4xi8>			%result = vector.transfer_read %arg0[%arg1, %arg2, %c0, %c0], %c0_i8 {in_bounds = [true, true, true, true]} : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>, vector<8x4xi8>
	return %result : vector<8x4xi8>			return %result : vector<8x4xi8>
	}			}

	// CHECK-LABEL: func @transfer_read_flattenable_with_dynamic_dims_and_indices			// CHECK-LABEL: func @transfer_read_flattenable_with_dynamic_dims_and_indices
	// CHECK-SAME: %[[ARG0:.+]]: memref<?x?x8x4xi8, {{.+}}>, %[[ARG1:.+]]: index, %[[ARG2:.+]]: index			// CHECK-SAME: %[[ARG0:.+]]: memref<?x?x8x4xi8, {{.+}}>, %[[ARG1:.+]]: index, %[[ARG2:.+]]: index
	// CHECK: %[[C0_I8:.+]] = arith.constant 0 : i8			// CHECK: %[[C0_I8:.+]] = arith.constant 0 : i8
	// CHECK: %[[C0:.+]] = arith.constant 0 : index			// CHECK: %[[C0:.+]] = arith.constant 0 : index
	// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG0]] {{\[}}[0], [1], [2, 3]{{\]}}			// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG0]] {{\[}}[0], [1], [2, 3]{{\]}}
	// CHECK-SAME: : memref<?x?x8x4xi8, {{.+}}> into memref<?x?x32xi8, {{.+}}>			// CHECK-SAME: : memref<?x?x8x4xi8, {{.+}}> into memref<?x?x32xi8, {{.+}}>
	// CHECK: %[[VEC1D:.+]] = vector.transfer_read %[[COLLAPSED]]			// CHECK: %[[VEC1D:.+]] = vector.transfer_read %[[COLLAPSED]]
	// CHECK-SAME: [%[[ARG1]], %[[ARG2]], %[[C0]]], %[[C0_I8]]			// CHECK-SAME: [%[[ARG1]], %[[ARG2]], %[[C0]]], %[[C0_I8]]
	// CHECK-SAME: {in_bounds = [true]}			// CHECK-SAME: {in_bounds = [true, true, true]}
	// CHECK-SAME: : memref<?x?x32xi8, {{.+}}>, vector<32xi8>			// CHECK-SAME: : memref<?x?x32xi8, {{.+}}>, vector<32xi8>
	// CHECK: %[[VEC2D:.+]] = vector.shape_cast %[[VEC1D]] : vector<32xi8> to vector<8x4xi8>			// CHECK: %[[VEC2D:.+]] = vector.shape_cast %[[VEC1D]] : vector<32xi8> to vector<8x4xi8>
	// CHECK: return %[[VEC2D]] : vector<8x4xi8>			// CHECK: return %[[VEC2D]] : vector<8x4xi8>

	// -----			// -----

	func.func @transfer_write_flattenable_with_dynamic_dims_and_indices(%vec : vector<8x4xi8>, %dst : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>, %arg1 : index, %arg2 : index) {			func.func @transfer_write_flattenable_with_dynamic_dims_and_indices(%vec : vector<8x4xi8>, %dst : memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>, %arg1 : index, %arg2 : index) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	vector.transfer_write %vec, %dst[%arg1, %arg2, %c0, %c0] {in_bounds = [true, true]} : vector<8x4xi8>, memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>			vector.transfer_write %vec, %dst[%arg1, %arg2, %c0, %c0] {in_bounds = [true, true, true, true]} : vector<8x4xi8>, memref<?x?x8x4xi8, strided<[?, 32, 4, 1], offset: ?>>
	return			return
	}			}

	// CHECK-LABEL: func @transfer_write_flattenable_with_dynamic_dims_and_indices			// CHECK-LABEL: func @transfer_write_flattenable_with_dynamic_dims_and_indices
	// CHECK-SAME: %[[ARG0:.+]]: vector<8x4xi8>, %[[ARG1:.+]]: memref<?x?x8x4xi8, {{.+}}>, %[[ARG2:.+]]: index, %[[ARG3:.+]]: index			// CHECK-SAME: %[[ARG0:.+]]: vector<8x4xi8>, %[[ARG1:.+]]: memref<?x?x8x4xi8, {{.+}}>, %[[ARG2:.+]]: index, %[[ARG3:.+]]: index
	// CHECK: %[[C0:.+]] = arith.constant 0 : index			// CHECK: %[[C0:.+]] = arith.constant 0 : index
	// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG1]] {{\[}}[0], [1], [2, 3]{{\]}}			// CHECK: %[[COLLAPSED:.+]] = memref.collapse_shape %[[ARG1]] {{\[}}[0], [1], [2, 3]{{\]}}
	// CHECK-SAME: : memref<?x?x8x4xi8, {{.+}}> into memref<?x?x32xi8, {{.+}}>			// CHECK-SAME: : memref<?x?x8x4xi8, {{.+}}> into memref<?x?x32xi8, {{.+}}>
	// CHECK: %[[VEC1D:.+]] = vector.shape_cast %[[ARG0]] : vector<8x4xi8> to vector<32xi8>			// CHECK: %[[VEC1D:.+]] = vector.shape_cast %[[ARG0]] : vector<8x4xi8> to vector<32xi8>
	// CHECK: vector.transfer_write %[[VEC1D]], %[[COLLAPSED]]			// CHECK: vector.transfer_write %[[VEC1D]], %[[COLLAPSED]]
	// CHECK-SAME: [%[[ARG2]], %[[ARG3]], %[[C0]]]			// CHECK-SAME: [%[[ARG2]], %[[ARG3]], %[[C0]]]
	// CHECK-SAME: {in_bounds = [true]}			// CHECK-SAME: {in_bounds = [true, true, true]}
	// CHECK-SAME: : vector<32xi8>, memref<?x?x32xi8, {{.+}}>			// CHECK-SAME: : vector<32xi8>, memref<?x?x32xi8, {{.+}}>

	// -----			// -----

	func.func @transfer_read_flattenable_negative(			func.func @transfer_read_flattenable_negative(
	%arg : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>) -> vector<2x2x2x2xi8> {			%arg : memref<5x4x3x2xi8, strided<[24, 6, 2, 1], offset: ?>>) -> vector<2x2x2x2xi8> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cst = arith.constant 0 : i8			%cst = arith.constant 0 : i8
	Show All 21 Lines

mlir/test/Dialect/Vector/vector-transfer-materialize-masks.mlir

This file was added.

				// RUN: mlir-opt %s -test-transform-dialect-interpreter -split-input-file \| FileCheck %s

				transform.sequence failures(propagate) {
				^bb1(%func_op: !transform.op<"func.func">):
				transform.apply_patterns to %func_op {
				transform.apply_patterns.vector.materialize_masks
				} : !transform.op<"func.func">
				}

				// CHECK-LABEL: func @mask_1d_transfer(
				// CHECK: arith.cmpi
				// CHECK: vector.transfer_read %{{.}}[{{.}}], %{{.}}, %{{.}} {in_bounds = [true, true]}
				func.func @mask_1d_transfer(%t: tensor<?x?xf32>, %i: index, %j: index)
				-> vector<5xf32>
				{
				%cst = arith.constant 5.5 : f32
				%0 = vector.transfer_read %t[%i, %j], %cst {in_bounds = [true, false]}
				: tensor<?x?xf32>, vector<5xf32>
				return %0 : vector<5xf32>
				}

				// CHECK-LABEL: func @mask_2d_transfer(
				// CHECK-NOT: arith.cmpi
				// CHECK: vector.transfer_read %{{.}}[{{.}}], %{{.*}} {in_bounds = [true, false]}
				func.func @mask_2d_transfer(%t: tensor<?x?xf32>, %i: index, %j: index)
				-> vector<5x4xf32>
				{
				%cst = arith.constant 5.5 : f32
				// Masks are currently not materialized for transfers that are 2D or higher.
				%0 = vector.transfer_read %t[%i, %j], %cst {in_bounds = [true, false]}
				: tensor<?x?xf32>, vector<5x4xf32>
				return %0 : vector<5x4xf32>
				}

mlir/test/Dialect/Vector/vector-transfer-permutation-lowering.mlir

	// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file \| FileCheck %s			// RUN: mlir-opt %s --test-transform-dialect-interpreter --split-input-file \| FileCheck %s

	// CHECK-LABEL: func @lower_permutation_with_mask(			// CHECK-LABEL: func @lower_permutation_with_mask(
	// CHECK: %[[vec:.*]] = arith.constant dense<-2.000000e+00> : vector<7x1xf32>			// CHECK: %[[vec:.*]] = arith.constant dense<-2.000000e+00> : vector<7x1xf32>
	// CHECK: %[[mask:.*]] = arith.constant dense<[true, false, true, false, true, true, true]> : vector<7xi1>			// CHECK: %[[mask:.*]] = arith.constant dense<[true, false, true, false, true, true, true]> : vector<7xi1>
	// CHECK: %[[b:.*]] = vector.broadcast %[[mask]] : vector<7xi1> to vector<1x7xi1>			// CHECK: %[[b:.*]] = vector.broadcast %[[mask]] : vector<7xi1> to vector<1x7xi1>
	// CHECK: %[[tp:.*]] = vector.transpose %[[b]], [1, 0] : vector<1x7xi1> to vector<7x1xi1>			// CHECK: %[[tp:.*]] = vector.transpose %[[b]], [1, 0] : vector<1x7xi1> to vector<7x1xi1>
	// CHECK: vector.transfer_write %[[vec]], %{{.}}[%{{.}}, %{{.*}}], %[[tp]] {in_bounds = [false, true]} : vector<7x1xf32>, memref<?x?xf32>			// CHECK: vector.transfer_write %[[vec]], %{{.}}[%{{.}}, %{{.*}}], %[[tp]] {in_bounds = [false, true]} : vector<7x1xf32>, memref<?x?xf32>
	func.func @lower_permutation_with_mask(%A : memref<?x?xf32>, %base1 : index,			func.func @lower_permutation_with_mask(%A : memref<?x?xf32>, %base1 : index,
	%base2 : index) {			%base2 : index) {
	%fn1 = arith.constant -2.0 : f32			%fn1 = arith.constant -2.0 : f32
	%vf0 = vector.splat %fn1 : vector<7xf32>			%vf0 = vector.splat %fn1 : vector<7xf32>
	%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1]> : vector<7xi1>			%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1]> : vector<7xi1>
	vector.transfer_write %vf0, %A[%base1, %base2], %mask			vector.transfer_write %vf0, %A[%base1, %base2], %mask
	{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [false]}			{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [false, true]}
	: vector<7xf32>, memref<?x?xf32>			: vector<7xf32>, memref<?x?xf32>
	return			return
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%module_op: !transform.any_op):			^bb1(%module_op: !transform.any_op):
	%f = transform.structured.match ops{["func.func"]} in %module_op			%f = transform.structured.match ops{["func.func"]} in %module_op
	: (!transform.any_op) -> !transform.any_op			: (!transform.any_op) -> !transform.any_op
	transform.apply_patterns to %f {			transform.apply_patterns to %f {
	transform.apply_patterns.vector.transfer_permutation_patterns			transform.apply_patterns.vector.transfer_permutation_patterns
	} : !transform.any_op			} : !transform.any_op
	}			}

mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir

	Show All 40 Lines
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
	// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<4xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<4xf32>
	// CHECK-NEXT: vector.store %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<4xf32>			// CHECK-NEXT: vector.store %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<4xf32>
	// CHECK-NEXT: return %[[RES]] : vector<4xf32>			// CHECK-NEXT: return %[[RES]] : vector<4xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	func.func @transfer_to_load(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {			func.func @transfer_to_load(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true]} : memref<8x8xf32>, vector<4xf32>			%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true, true]} : memref<8x8xf32>, vector<4xf32>
	vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true]} : vector<4xf32>, memref<8x8xf32>			vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true, true]} : vector<4xf32>, memref<8x8xf32>
	return %res : vector<4xf32>			return %res : vector<4xf32>
	}			}

	// n-D results are also supported.			// n-D results are also supported.
	// CHECK-LABEL: func @transfer_2D(			// CHECK-LABEL: func @transfer_2D(
	// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,			// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {
	// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<2x4xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>, vector<2x4xf32>
	Show All 14 Lines
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {
	// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>
	// CHECK-NEXT: vector.store %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>			// CHECK-NEXT: vector.store %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>
	// CHECK-NEXT: return %[[RES]] : vector<2x4xf32>			// CHECK-NEXT: return %[[RES]] : vector<2x4xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	func.func @transfer_vector_element(%mem : memref<8x8xvector<2x4xf32>>, %i : index) -> vector<2x4xf32> {			func.func @transfer_vector_element(%mem : memref<8x8xvector<2x4xf32>>, %i : index) -> vector<2x4xf32> {
	%cf0 = arith.constant dense<0.0> : vector<2x4xf32>			%cf0 = arith.constant dense<0.0> : vector<2x4xf32>
	%res = vector.transfer_read %mem[%i, %i], %cf0 : memref<8x8xvector<2x4xf32>>, vector<2x4xf32>			%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true, true]}: memref<8x8xvector<2x4xf32>>, vector<2x4xf32>
	vector.transfer_write %res, %mem[%i, %i] : vector<2x4xf32>, memref<8x8xvector<2x4xf32>>			vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true, true]} : vector<2x4xf32>, memref<8x8xvector<2x4xf32>>
	return %res : vector<2x4xf32>			return %res : vector<2x4xf32>
	}			}

	// TODO: Vector element types are not supported yet when the result has a			// TODO: Vector element types are not supported yet when the result has a
	// different type.			// different type.
	// CHECK-LABEL: func @transfer_vector_element_different_types(			// CHECK-LABEL: func @transfer_vector_element_different_types(
	// CHECK-SAME: %[[MEM:.*]]: memref<8x8xvector<2x4xf32>>,			// CHECK-SAME: %[[MEM:.*]]: memref<8x8xvector<2x4xf32>>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<1x2x4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<1x2x4xf32> {
	// CHECK-NEXT: %[[CF0:.*]] = arith.constant dense<0.000000e+00> : vector<2x4xf32>			// CHECK-NEXT: %[[CF0:.*]] = arith.constant dense<0.000000e+00> : vector<2x4xf32>
	// CHECK-NEXT: %[[RES:.*]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {in_bounds = [true]} : memref<8x8xvector<2x4xf32>>, vector<1x2x4xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {in_bounds = [true, true]} : memref<8x8xvector<2x4xf32>>, vector<1x2x4xf32>
	// CHECK-NEXT: vector.transfer_write %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] {in_bounds = [true]} : vector<1x2x4xf32>, memref<8x8xvector<2x4xf32>>			// CHECK-NEXT: vector.transfer_write %[[RES:.*]], %[[MEM]][%[[IDX]], %[[IDX]]] {in_bounds = [true, true]} : vector<1x2x4xf32>, memref<8x8xvector<2x4xf32>>
	// CHECK-NEXT: return %[[RES]] : vector<1x2x4xf32>			// CHECK-NEXT: return %[[RES]] : vector<1x2x4xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	func.func @transfer_vector_element_different_types(%mem : memref<8x8xvector<2x4xf32>>, %i : index) -> vector<1x2x4xf32> {			func.func @transfer_vector_element_different_types(%mem : memref<8x8xvector<2x4xf32>>, %i : index) -> vector<1x2x4xf32> {
	%cf0 = arith.constant dense<0.0> : vector<2x4xf32>			%cf0 = arith.constant dense<0.0> : vector<2x4xf32>
	%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true]} : memref<8x8xvector<2x4xf32>>, vector<1x2x4xf32>			%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true, true]} : memref<8x8xvector<2x4xf32>>, vector<1x2x4xf32>
	vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true]} : vector<1x2x4xf32>, memref<8x8xvector<2x4xf32>>			vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true, true]} : vector<1x2x4xf32>, memref<8x8xvector<2x4xf32>>
	return %res : vector<1x2x4xf32>			return %res : vector<1x2x4xf32>
	}			}

	// TODO: transfer_read/write cannot be lowered because there is a dimension			// TODO: transfer_read/write cannot be lowered because there is a dimension
	// that is not guaranteed to be in-bounds.			// that is not guaranteed to be in-bounds.
	// CHECK-LABEL: func @transfer_2D_not_inbounds(			// CHECK-LABEL: func @transfer_2D_not_inbounds(
	// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,			// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<2x4xf32> {
	Show All 11 Lines
	}			}

	// TODO: transfer_read/write cannot be lowered because they are not guaranteed			// TODO: transfer_read/write cannot be lowered because they are not guaranteed
	// to be in-bounds.			// to be in-bounds.
	// CHECK-LABEL: func @transfer_not_inbounds(			// CHECK-LABEL: func @transfer_not_inbounds(
	// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,			// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
	// CHECK-NEXT: %[[CF0:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-NEXT: %[[CF0:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-NEXT: %[[RES:.*]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] : memref<8x8xf32>, vector<4xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {in_bounds = [true, false]} : memref<8x8xf32>, vector<4xf32>
	// CHECK-NEXT: vector.transfer_write %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] : vector<4xf32>, memref<8x8xf32>			// CHECK-NEXT: vector.transfer_write %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] {in_bounds = [true, false]} : vector<4xf32>, memref<8x8xf32>
	// CHECK-NEXT: return %[[RES]] : vector<4xf32>			// CHECK-NEXT: return %[[RES]] : vector<4xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	func.func @transfer_not_inbounds(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {			func.func @transfer_not_inbounds(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%res = vector.transfer_read %mem[%i, %i], %cf0 : memref<8x8xf32>, vector<4xf32>			%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true, false]} : memref<8x8xf32>, vector<4xf32>
	vector.transfer_write %res, %mem[%i, %i] : vector<4xf32>, memref<8x8xf32>			vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true, false]} : vector<4xf32>, memref<8x8xf32>
	return %res : vector<4xf32>			return %res : vector<4xf32>
	}			}

	// CHECK-LABEL: func @transfer_nondefault_layout(			// CHECK-LABEL: func @transfer_nondefault_layout(
	// CHECK-SAME: %[[MEM:.]]: memref<8x8xf32, #{{.}}>,			// CHECK-SAME: %[[MEM:.]]: memref<8x8xf32, #{{.}}>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
	// CHECK-NEXT: %[[RES:.]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32, #{{.}}>, vector<4xf32>			// CHECK-NEXT: %[[RES:.]] = vector.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32, #{{.}}>, vector<4xf32>
	// CHECK-NEXT: vector.store %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32, #{{.*}}>, vector<4xf32>			// CHECK-NEXT: vector.store %[[RES]], %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32, #{{.*}}>, vector<4xf32>
	// CHECK-NEXT: return %[[RES]] : vector<4xf32>			// CHECK-NEXT: return %[[RES]] : vector<4xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	#layout = affine_map<(d0, d1) -> (d0*16 + d1)>			#layout = affine_map<(d0, d1) -> (d0*16 + d1)>
	func.func @transfer_nondefault_layout(%mem : memref<8x8xf32, #layout>, %i : index) -> vector<4xf32> {			func.func @transfer_nondefault_layout(%mem : memref<8x8xf32, #layout>, %i : index) -> vector<4xf32> {
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true]} : memref<8x8xf32, #layout>, vector<4xf32>			%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true, true]} : memref<8x8xf32, #layout>, vector<4xf32>
	vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true]} : vector<4xf32>, memref<8x8xf32, #layout>			vector.transfer_write %res, %mem[%i, %i] {in_bounds = [true, true]} : vector<4xf32>, memref<8x8xf32, #layout>
	return %res : vector<4xf32>			return %res : vector<4xf32>
	}			}

	// TODO: transfer_read/write cannot be lowered to vector.load/store yet when the			// TODO: transfer_read/write cannot be lowered to vector.load/store yet when the
	// permutation map is not the minor identity map (up to broadcasting).			// permutation map is not the minor identity map (up to broadcasting).
	// CHECK-LABEL: func @transfer_perm_map(			// CHECK-LABEL: func @transfer_perm_map(
	// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,			// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
	// CHECK-NEXT: %[[CF0:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-NEXT: %[[CF0:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-NEXT: %[[RES:.]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {in_bounds = [true], permutation_map = #{{.}}} : memref<8x8xf32>, vector<4xf32>			// CHECK-NEXT: %[[RES:.]] = vector.transfer_read %[[MEM]][%[[IDX]], %[[IDX]]], %[[CF0]] {in_bounds = [true, true], permutation_map = #{{.}}} : memref<8x8xf32>, vector<4xf32>
	// CHECK-NEXT: return %[[RES]] : vector<4xf32>			// CHECK-NEXT: return %[[RES]] : vector<4xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	func.func @transfer_perm_map(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {			func.func @transfer_perm_map(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true], permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<8x8xf32>, vector<4xf32>			%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true, true], permutation_map = affine_map<(d0, d1) -> (d0)>} : memref<8x8xf32>, vector<4xf32>
	return %res : vector<4xf32>			return %res : vector<4xf32>
	}			}

	// Lowering of transfer_read with broadcasting is supported (note that a `load`			// Lowering of transfer_read with broadcasting is supported (note that a `load`
	// is generated instead of a `vector.load`).			// is generated instead of a `vector.load`).
	// CHECK-LABEL: func @transfer_broadcasting(			// CHECK-LABEL: func @transfer_broadcasting(
	// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,			// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4xf32> {
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>			// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>
	// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : f32 to vector<4xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : f32 to vector<4xf32>
	// CHECK-NEXT: return %[[RES]] : vector<4xf32>			// CHECK-NEXT: return %[[RES]] : vector<4xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	#broadcast_1d = affine_map<(d0, d1) -> (0)>			#broadcast_1d = affine_map<(d0, d1) -> (0)>
	func.func @transfer_broadcasting(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {			func.func @transfer_broadcasting(%mem : memref<8x8xf32>, %i : index) -> vector<4xf32> {
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%res = vector.transfer_read %mem[%i, %i], %cf0			%res = vector.transfer_read %mem[%i, %i], %cf0
	{in_bounds = [true], permutation_map = #broadcast_1d}			{in_bounds = [true, true], permutation_map = #broadcast_1d}
	: memref<8x8xf32>, vector<4xf32>			: memref<8x8xf32>, vector<4xf32>
	return %res : vector<4xf32>			return %res : vector<4xf32>
	}			}

	// CHECK-LABEL: func @transfer_scalar(			// CHECK-LABEL: func @transfer_scalar(
	// CHECK-SAME: %[[MEM:.*]]: memref<?x?xf32>,			// CHECK-SAME: %[[MEM:.*]]: memref<?x?xf32>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<1xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<1xf32> {
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<?x?xf32>			// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<?x?xf32>
	// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : f32 to vector<1xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : f32 to vector<1xf32>
	// CHECK-NEXT: return %[[RES]] : vector<1xf32>			// CHECK-NEXT: return %[[RES]] : vector<1xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }
	func.func @transfer_scalar(%mem : memref<?x?xf32>, %i : index) -> vector<1xf32> {			func.func @transfer_scalar(%mem : memref<?x?xf32>, %i : index) -> vector<1xf32> {
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true]} : memref<?x?xf32>, vector<1xf32>			%res = vector.transfer_read %mem[%i, %i], %cf0 {in_bounds = [true, true]} : memref<?x?xf32>, vector<1xf32>
	return %res : vector<1xf32>			return %res : vector<1xf32>
	}			}

	// An example with two broadcasted dimensions.			// An example with two broadcasted dimensions.
	// CHECK-LABEL: func @transfer_broadcasting_2D(			// CHECK-LABEL: func @transfer_broadcasting_2D(
	// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,			// CHECK-SAME: %[[MEM:.*]]: memref<8x8xf32>,
	// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4x4xf32> {			// CHECK-SAME: %[[IDX:.*]]: index) -> vector<4x4xf32> {
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>			// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[MEM]][%[[IDX]], %[[IDX]]] : memref<8x8xf32>
	Show All 18 Lines
	// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : vector<3x1x1x5xf32> to vector<3x2x4x5xf32>			// CHECK-NEXT: %[[RES:.*]] = vector.broadcast %[[LOAD]] : vector<3x1x1x5xf32> to vector<3x2x4x5xf32>
	// CHECK-NEXT: return %[[RES]] : vector<3x2x4x5xf32>			// CHECK-NEXT: return %[[RES]] : vector<3x2x4x5xf32>
	// CHECK-NEXT: }			// CHECK-NEXT: }

	#broadcast_2d_in_4d = affine_map<(d0, d1, d2, d3, d4) -> (d1, 0, 0, d4)>			#broadcast_2d_in_4d = affine_map<(d0, d1, d2, d3, d4) -> (d1, 0, 0, d4)>
	func.func @transfer_broadcasting_complex(%mem : memref<10x20x30x8x8xf32>, %i : index) -> vector<3x2x4x5xf32> {			func.func @transfer_broadcasting_complex(%mem : memref<10x20x30x8x8xf32>, %i : index) -> vector<3x2x4x5xf32> {
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%res = vector.transfer_read %mem[%i, %i, %i, %i, %i], %cf0			%res = vector.transfer_read %mem[%i, %i, %i, %i, %i], %cf0
	{in_bounds = [true, true, true, true], permutation_map = #broadcast_2d_in_4d}			{in_bounds = [true, true, true, true, true], permutation_map = #broadcast_2d_in_4d}
	: memref<10x20x30x8x8xf32>, vector<3x2x4x5xf32>			: memref<10x20x30x8x8xf32>, vector<3x2x4x5xf32>
	return %res : vector<3x2x4x5xf32>			return %res : vector<3x2x4x5xf32>
	}			}


	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%func_op: !transform.op<"func.func">):			^bb1(%func_op: !transform.op<"func.func">):
	transform.apply_patterns to %func_op {			transform.apply_patterns to %func_op {
	Show All 22 Lines
	// CHECK-DAG: %[[CF0:.*]] = arith.constant 0.000000e+00 : f32			// CHECK-DAG: %[[CF0:.*]] = arith.constant 0.000000e+00 : f32
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index

	// CHECK: %[[MASK0:.]] = vector.splat %{{.}} : vector<14x7xi1>			// CHECK: %[[MASK0:.]] = vector.splat %{{.}} : vector<14x7xi1>
	%mask0 = vector.splat %m : vector<14x7xi1>			%mask0 = vector.splat %m : vector<14x7xi1>
	%0 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst, %mask0 {in_bounds = [true, false, true, true], permutation_map = #map0} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>			%0 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst, %mask0 {in_bounds = [true, false, true, true], permutation_map = #map0} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>
	// CHECK: vector.transfer_read {{.*}} %[[MASK0]] {in_bounds = [false, true, true, true], permutation_map = #[[$MAP0]]} : memref<?x?x?x?xf32>, vector<14x7x8x16xf32>			// CHECK: vector.transfer_read {{.*}} %[[MASK0]] {in_bounds = [true, false, true, true], permutation_map = #[[$MAP0]]} : memref<?x?x?x?xf32>, vector<14x7x8x16xf32>
	// CHECK: vector.transpose %{{.*}}, [1, 0, 2, 3] : vector<14x7x8x16xf32> to vector<7x14x8x16xf32>			// CHECK: vector.transpose %{{.*}}, [1, 0, 2, 3] : vector<14x7x8x16xf32> to vector<7x14x8x16xf32>

	// CHECK: %[[MASK1:.]] = vector.splat %{{.}} : vector<16x14xi1>			// CHECK: %[[MASK1:.]] = vector.splat %{{.}} : vector<16x14xi1>
	%mask1 = vector.splat %m : vector<16x14xi1>			%mask1 = vector.splat %m : vector<16x14xi1>
	%1 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst, %mask1 {permutation_map = #map1} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>			%1 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst, %mask1 {permutation_map = #map1, in_bounds = [false, false, true, true]} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>
	// CHECK: vector.transfer_read {{.*}} %[[MASK1]] {permutation_map = #[[$MAP0]]} : memref<?x?x?x?xf32>, vector<16x14x7x8xf32>			// CHECK: vector.transfer_read {{.*}} %[[MASK1]] {in_bounds = [false, false, true, true], permutation_map = #[[$MAP0]]} : memref<?x?x?x?xf32>, vector<16x14x7x8xf32>
	// CHECK: vector.transpose %{{.*}}, [2, 1, 3, 0] : vector<16x14x7x8xf32> to vector<7x14x8x16xf32>			// CHECK: vector.transpose %{{.*}}, [2, 1, 3, 0] : vector<16x14x7x8xf32> to vector<7x14x8x16xf32>

	// CHECK: %[[MASK3:.]] = vector.splat %{{.}} : vector<14x7xi1>			// CHECK: %[[MASK3:.]] = vector.splat %{{.}} : vector<14x7xi1>
	%mask2 = vector.splat %m : vector<14x7xi1>			%mask2 = vector.splat %m : vector<14x7xi1>
	%2 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst, %mask2 {in_bounds = [true, false, true, true], permutation_map = #map2} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>			%2 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst, %mask2 {in_bounds = [true, false, true, true], permutation_map = #map2} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>
	// CHECK: vector.transfer_read {{.*}} %[[MASK3]] {in_bounds = [false, true, true], permutation_map = #[[$MAP1]]} : memref<?x?x?x?xf32>, vector<14x16x7xf32>			// CHECK: vector.transfer_read {{.*}} %[[MASK3]] {in_bounds = [true, false, true, true], permutation_map = #[[$MAP1]]} : memref<?x?x?x?xf32>, vector<14x16x7xf32>
	// CHECK: vector.broadcast %{{.*}} : vector<14x16x7xf32> to vector<8x14x16x7xf32>			// CHECK: vector.broadcast %{{.*}} : vector<14x16x7xf32> to vector<8x14x16x7xf32>
	// CHECK: vector.transpose %{{.*}}, [3, 1, 0, 2] : vector<8x14x16x7xf32> to vector<7x14x8x16xf32>			// CHECK: vector.transpose %{{.*}}, [3, 1, 0, 2] : vector<8x14x16x7xf32> to vector<7x14x8x16xf32>

	%3 = vector.transfer_read %arg0[%c0, %c0], %cst {permutation_map = #map3} : memref<?x?xf32>, vector<7x14x8x16xf32>			%3 = vector.transfer_read %arg0[%c0, %c0], %cst {permutation_map = #map3} : memref<?x?xf32>, vector<7x14x8x16xf32>
	// CHECK: vector.transfer_read %{{.*}}[%[[C0]], %[[C0]]], %[[CF0]] : memref<?x?xf32>, vector<14x7xf32>			// CHECK: vector.transfer_read %{{.*}}[%[[C0]], %[[C0]]], %[[CF0]] : memref<?x?xf32>, vector<14x7xf32>
	// CHECK: vector.broadcast %{{.*}} : vector<14x7xf32> to vector<8x16x14x7xf32>			// CHECK: vector.broadcast %{{.*}} : vector<14x7xf32> to vector<8x16x14x7xf32>
	// CHECK: vector.transpose %{{.*}}, [3, 2, 0, 1] : vector<8x16x14x7xf32> to vector<7x14x8x16xf32>			// CHECK: vector.transpose %{{.*}}, [3, 2, 0, 1] : vector<8x16x14x7xf32> to vector<7x14x8x16xf32>

	%4 = vector.transfer_read %arg0[%c0, %c0], %cst {permutation_map = #map4} : memref<?x?xf32>, vector<7x14x8x16xf32>			%4 = vector.transfer_read %arg0[%c0, %c0], %cst {permutation_map = #map4} : memref<?x?xf32>, vector<7x14x8x16xf32>
	// CHECK: vector.transfer_read %{{.*}}[%[[C0]], %[[C0]]], %[[CF0]] : memref<?x?xf32>, vector<16x14xf32>			// CHECK: vector.transfer_read %{{.*}}[%[[C0]], %[[C0]]], %[[CF0]] : memref<?x?xf32>, vector<16x14xf32>
	// CHECK: vector.broadcast %{{.*}} : vector<16x14xf32> to vector<7x8x16x14xf32>			// CHECK: vector.broadcast %{{.*}} : vector<16x14xf32> to vector<7x8x16x14xf32>
	// CHECK: vector.transpose %{{.*}}, [0, 3, 1, 2] : vector<7x8x16x14xf32> to vector<7x14x8x16xf32>			// CHECK: vector.transpose %{{.*}}, [0, 3, 1, 2] : vector<7x8x16x14xf32> to vector<7x14x8x16xf32>

	%5 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst {permutation_map = #map5} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>			%5 = vector.transfer_read %arg1[%c0, %c0, %c0, %c0], %cst {permutation_map = #map5} : memref<?x?x?x?xf32>, vector<7x14x8x16xf32>
	// CHECK: vector.transfer_read %{{.*}}[%[[C0]], %[[C0]], %[[C0]], %[[C0]]], %[[CF0]] : memref<?x?x?x?xf32>, vector<16x14x7x8xf32>			// CHECK: vector.transfer_read %{{.*}}[%[[C0]], %[[C0]], %[[C0]], %[[C0]]], %[[CF0]] : memref<?x?x?x?xf32>, vector<16x14x7x8xf32>
	// CHECK: vector.transpose %{{.*}}, [2, 1, 3, 0] : vector<16x14x7x8xf32> to vector<7x14x8x16xf32>			// CHECK: vector.transpose %{{.*}}, [2, 1, 3, 0] : vector<16x14x7x8xf32> to vector<7x14x8x16xf32>

	%6 = vector.transfer_read %arg0[%c0, %c0], %cst {permutation_map = #map6} : memref<?x?xf32>, vector<8xf32>			%6 = vector.transfer_read %arg0[%c0, %c0], %cst {permutation_map = #map6, in_bounds = [true, true]} : memref<?x?xf32>, vector<8xf32>
	// CHECK: memref.load %{{.*}}[%[[C0]], %[[C0]]] : memref<?x?xf32>			// CHECK: memref.load %{{.*}}[%[[C0]], %[[C0]]] : memref<?x?xf32>
	// CHECK: vector.broadcast %{{.*}} : f32 to vector<8xf32>			// CHECK: vector.broadcast %{{.*}} : f32 to vector<8xf32>

	return %0, %1, %2, %3, %4, %5, %6 : vector<7x14x8x16xf32>, vector<7x14x8x16xf32>,			return %0, %1, %2, %3, %4, %5, %6 : vector<7x14x8x16xf32>, vector<7x14x8x16xf32>,
	vector<7x14x8x16xf32>, vector<7x14x8x16xf32>, vector<7x14x8x16xf32>,			vector<7x14x8x16xf32>, vector<7x14x8x16xf32>, vector<7x14x8x16xf32>,
	vector<7x14x8x16xf32>, vector<8xf32>			vector<7x14x8x16xf32>, vector<8xf32>
	}			}

	// CHECK-LABEL: func @transfer_write_permutations			// CHECK-LABEL: func @transfer_write_permutations
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?x?x?xf32>			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?x?x?xf32>
	// CHECK-SAME: %[[ARG1:.*]]: tensor<?x?x?x?xf32>			// CHECK-SAME: %[[ARG1:.*]]: tensor<?x?x?x?xf32>
	// CHECK-SAME: %[[ARG2:.*]]: vector<7x14x8x16xf32>			// CHECK-SAME: %[[ARG2:.*]]: vector<7x14x8x16xf32>
	// CHECK-SAME: %[[ARG3:.*]]: vector<8x16xf32>			// CHECK-SAME: %[[ARG3:.*]]: vector<8x16xf32>
	// CHECK-SAME: %[[M:.*]]: i1			// CHECK-SAME: %[[M:.*]]: i1
	func.func @transfer_write_permutations(			func.func @transfer_write_permutations(
	%arg0 : memref<?x?x?x?xf32>, %arg1 : tensor<?x?x?x?xf32>,			%arg0 : memref<?x?x?x?xf32>, %arg1 : tensor<?x?x?x?xf32>,
	%v1 : vector<7x14x8x16xf32>, %v2 : vector<8x16xf32>, %m: i1) -> tensor<?x?x?x?xf32> {			%v1 : vector<7x14x8x16xf32>, %v2 : vector<8x16xf32>, %m: i1) -> tensor<?x?x?x?xf32> {
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index

	// CHECK: %[[MASK:.*]] = vector.splat %[[M]] : vector<16x14x7x8xi1>			// CHECK: %[[MASK:.*]] = vector.splat %[[M]] : vector<16x14x7x8xi1>
	%mask0 = vector.splat %m : vector<16x14x7x8xi1>			%mask0 = vector.splat %m : vector<16x14x7x8xi1>
	%0 = vector.transfer_write %v1, %arg1[%c0, %c0, %c0, %c0], %mask0 {in_bounds = [true, false, false, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (d2, d1, d3, d0)>} : vector<7x14x8x16xf32>, tensor<?x?x?x?xf32>			%0 = vector.transfer_write %v1, %arg1[%c0, %c0, %c0, %c0], %mask0 {in_bounds = [true, false, false, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (d2, d1, d3, d0)>} : vector<7x14x8x16xf32>, tensor<?x?x?x?xf32>
	// CHECK: %[[NEW_VEC0:.]] = vector.transpose %{{.}} [3, 1, 0, 2] : vector<7x14x8x16xf32> to vector<16x14x7x8xf32>			// CHECK: %[[NEW_VEC0:.]] = vector.transpose %{{.}} [3, 1, 0, 2] : vector<7x14x8x16xf32> to vector<16x14x7x8xf32>
	// CHECK: %[[NEW_RES0:.*]] = vector.transfer_write %[[NEW_VEC0]], %[[ARG1]][%c0, %c0, %c0, %c0], %[[MASK]] {in_bounds = [true, false, true, false]} : vector<16x14x7x8xf32>, tensor<?x?x?x?xf32>			// CHECK: %[[NEW_RES0:.*]] = vector.transfer_write %[[NEW_VEC0]], %[[ARG1]][%c0, %c0, %c0, %c0], %[[MASK]] {in_bounds = [true, false, false, true]} : vector<16x14x7x8xf32>, tensor<?x?x?x?xf32>

	vector.transfer_write %v2, %arg0[%c0, %c0, %c0, %c0] {permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, d2)>} : vector<8x16xf32>, memref<?x?x?x?xf32>			vector.transfer_write %v2, %arg0[%c0, %c0, %c0, %c0] {in_bounds = [true, true, false, false], permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, d2)>} : vector<8x16xf32>, memref<?x?x?x?xf32>
	// CHECK: %[[NEW_VEC1:.]] = vector.transpose %{{.}} [1, 0] : vector<8x16xf32> to vector<16x8xf32>			// CHECK: %[[NEW_VEC1:.]] = vector.transpose %{{.}} [1, 0] : vector<8x16xf32> to vector<16x8xf32>
	// CHECK: vector.transfer_write %[[NEW_VEC1]], %[[ARG0]][%c0, %c0, %c0, %c0] : vector<16x8xf32>, memref<?x?x?x?xf32>			// CHECK: vector.transfer_write %[[NEW_VEC1]], %[[ARG0]][%c0, %c0, %c0, %c0] {in_bounds = [true, true, false, false]} : vector<16x8xf32>, memref<?x?x?x?xf32>

	return %0 : tensor<?x?x?x?xf32>			return %0 : tensor<?x?x?x?xf32>
	}			}

	// CHECK-LABEL: func @transfer_write_broadcast_unit_dim			// CHECK-LABEL: func @transfer_write_broadcast_unit_dim
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?x?x?xf32>			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?x?x?xf32>
	// CHECK-SAME: %[[ARG1:.*]]: tensor<?x?x?x?xf32>			// CHECK-SAME: %[[ARG1:.*]]: tensor<?x?x?x?xf32>
	// CHECK-SAME: %[[ARG2:.*]]: vector<14x8x16xf32>			// CHECK-SAME: %[[ARG2:.*]]: vector<14x8x16xf32>
	// CHECK-SAME: %[[ARG3:.*]]: vector<8x16xf32>			// CHECK-SAME: %[[ARG3:.*]]: vector<8x16xf32>
	// CHECK-SAME: %[[M:.*]]: i1			// CHECK-SAME: %[[M:.*]]: i1
	func.func @transfer_write_broadcast_unit_dim(			func.func @transfer_write_broadcast_unit_dim(
	%arg0 : memref<?x?x?x?xf32>, %arg1 : tensor<?x?x?x?xf32>,			%arg0 : memref<?x?x?x?xf32>, %arg1 : tensor<?x?x?x?xf32>,
	%v1 : vector<14x8x16xf32>, %v2 : vector<8x16xf32>, %m: i1) -> tensor<?x?x?x?xf32> {			%v1 : vector<14x8x16xf32>, %v2 : vector<8x16xf32>, %m: i1) -> tensor<?x?x?x?xf32> {
	// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index			// CHECK-DAG: %[[C0:.*]] = arith.constant 0 : index
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index

	%0 = vector.transfer_write %v1, %arg1[%c0, %c0, %c0, %c0] {in_bounds = [false, false, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>} : vector<14x8x16xf32>, tensor<?x?x?x?xf32>			%0 = vector.transfer_write %v1, %arg1[%c0, %c0, %c0, %c0] {in_bounds = [false, false, true, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>} : vector<14x8x16xf32>, tensor<?x?x?x?xf32>
	// CHECK: %[[NEW_VEC0:.]] = vector.broadcast %{{.}} : vector<14x8x16xf32> to vector<1x14x8x16xf32>			// CHECK: %[[NEW_VEC0:.]] = vector.broadcast %{{.}} : vector<14x8x16xf32> to vector<1x14x8x16xf32>
	// CHECK: %[[NEW_VEC1:.*]] = vector.transpose %[[NEW_VEC0]], [1, 2, 0, 3] : vector<1x14x8x16xf32> to vector<14x8x1x16xf32>			// CHECK: %[[NEW_VEC1:.*]] = vector.transpose %[[NEW_VEC0]], [1, 2, 0, 3] : vector<1x14x8x16xf32> to vector<14x8x1x16xf32>
	// CHECK: %[[NEW_RES0:.*]] = vector.transfer_write %[[NEW_VEC1]], %[[ARG1]][%[[C0]], %[[C0]], %[[C0]], %[[C0]]] {in_bounds = [false, false, true, true]} : vector<14x8x1x16xf32>, tensor<?x?x?x?xf32>			// CHECK: %[[NEW_RES0:.*]] = vector.transfer_write %[[NEW_VEC1]], %[[ARG1]][%[[C0]], %[[C0]], %[[C0]], %[[C0]]] {in_bounds = [false, false, true, true]} : vector<14x8x1x16xf32>, tensor<?x?x?x?xf32>

	vector.transfer_write %v2, %arg0[%c0, %c0, %c0, %c0] {permutation_map = affine_map<(d0, d1, d2, d3) -> (d1, d2)>} : vector<8x16xf32>, memref<?x?x?x?xf32>			vector.transfer_write %v2, %arg0[%c0, %c0, %c0, %c0] {permutation_map = affine_map<(d0, d1, d2, d3) -> (d1, d2)>, in_bounds = [true, false, false, true]} : vector<8x16xf32>, memref<?x?x?x?xf32>
	// CHECK: %[[NEW_VEC2:.]] = vector.broadcast %{{.}} : vector<8x16xf32> to vector<1x8x16xf32>			// CHECK: %[[NEW_VEC2:.]] = vector.broadcast %{{.}} : vector<8x16xf32> to vector<1x8x16xf32>
	// CHECK: %[[NEW_VEC3:.*]] = vector.transpose %[[NEW_VEC2]], [1, 2, 0] : vector<1x8x16xf32> to vector<8x16x1xf32>			// CHECK: %[[NEW_VEC3:.*]] = vector.transpose %[[NEW_VEC2]], [1, 2, 0] : vector<1x8x16xf32> to vector<8x16x1xf32>
	// CHECK: vector.transfer_write %[[NEW_VEC3]], %[[ARG0]][%[[C0]], %[[C0]], %[[C0]], %[[C0]]] {in_bounds = [false, false, true]} : vector<8x16x1xf32>, memref<?x?x?x?xf32>			// CHECK: vector.transfer_write %[[NEW_VEC3]], %[[ARG0]][%[[C0]], %[[C0]], %[[C0]], %[[C0]]] {in_bounds = [true, false, false, true]} : vector<8x16x1xf32>, memref<?x?x?x?xf32>

	return %0 : tensor<?x?x?x?xf32>			return %0 : tensor<?x?x?x?xf32>
	}			}

	transform.sequence failures(propagate) {			transform.sequence failures(propagate) {
	^bb1(%func_op: !transform.op<"func.func">):			^bb1(%func_op: !transform.op<"func.func">):
	transform.apply_patterns to %func_op {			transform.apply_patterns to %func_op {
	transform.apply_patterns.vector.lower_transfer max_transfer_rank = 99			transform.apply_patterns.vector.lower_transfer max_transfer_rank = 99
	transform.apply_patterns.vector.transfer_permutation_patterns			transform.apply_patterns.vector.transfer_permutation_patterns
	} : !transform.op<"func.func">			} : !transform.op<"func.func">
	}			}

mlir/test/Dialect/Vector/vector-transfer-unroll.mlir

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: %[[VEC4:.*]] = vector.insert_strided_slice %[[VTR4]], %[[VEC3]] {offsets = [4, 0], strides = [1, 1]} : vector<2x2xf32> into vector<6x4xf32>			// CHECK-NEXT: %[[VEC4:.*]] = vector.insert_strided_slice %[[VTR4]], %[[VEC3]] {offsets = [4, 0], strides = [1, 1]} : vector<2x2xf32> into vector<6x4xf32>
	// CHECK-NEXT: %[[VTR5:.]] = vector.transfer_read {{.}}[%[[C0]], %[[C2]]], %{{.*}} : memref<6x4xf32>, vector<2x2xf32>			// CHECK-NEXT: %[[VTR5:.]] = vector.transfer_read {{.}}[%[[C0]], %[[C2]]], %{{.*}} : memref<6x4xf32>, vector<2x2xf32>
	// CHECK-NEXT: %[[VEC5:.*]] = vector.insert_strided_slice %[[VTR5]], %[[VEC4]] {offsets = [4, 2], strides = [1, 1]} : vector<2x2xf32> into vector<6x4xf32>			// CHECK-NEXT: %[[VEC5:.*]] = vector.insert_strided_slice %[[VTR5]], %[[VEC4]] {offsets = [4, 2], strides = [1, 1]} : vector<2x2xf32> into vector<6x4xf32>
	// CHECK-NEXT: return %[[VEC5]] : vector<6x4xf32>			// CHECK-NEXT: return %[[VEC5]] : vector<6x4xf32>
	#map0 = affine_map<(d0, d1) -> (0, d1)>			#map0 = affine_map<(d0, d1) -> (0, d1)>
	func.func @transfer_read_unroll_broadcast(%arg0 : memref<6x4xf32>) -> vector<6x4xf32> {			func.func @transfer_read_unroll_broadcast(%arg0 : memref<6x4xf32>) -> vector<6x4xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%0 = vector.transfer_read %arg0[%c0, %c0], %cf0 {permutation_map = #map0} : memref<6x4xf32>, vector<6x4xf32>			%0 = vector.transfer_read %arg0[%c0, %c0], %cf0 {in_bounds = [true, false], permutation_map = #map0} : memref<6x4xf32>, vector<6x4xf32>
	return %0 : vector<6x4xf32>			return %0 : vector<6x4xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_read_unroll_broadcast_permuation			// CHECK-LABEL: func @transfer_read_unroll_broadcast_permuation
	// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index			// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
	// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index			// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
	Show All 10 Lines
	// CHECK-NEXT: %[[VEC4:.*]] = vector.insert_strided_slice %[[VTR4]], %[[VEC3]] {offsets = [2, 2], strides = [1, 1]} : vector<2x2xf32> into vector<4x6xf32>			// CHECK-NEXT: %[[VEC4:.*]] = vector.insert_strided_slice %[[VTR4]], %[[VEC3]] {offsets = [2, 2], strides = [1, 1]} : vector<2x2xf32> into vector<4x6xf32>
	// CHECK-NEXT: %[[VTR5:.]] = vector.transfer_read {{.}}[%[[C4]], %[[C0]]], %{{.*}} : memref<6x4xf32>, vector<2x2xf32>			// CHECK-NEXT: %[[VTR5:.]] = vector.transfer_read {{.}}[%[[C4]], %[[C0]]], %{{.*}} : memref<6x4xf32>, vector<2x2xf32>
	// CHECK-NEXT: %[[VEC5:.*]] = vector.insert_strided_slice %[[VTR5]], %[[VEC4]] {offsets = [2, 4], strides = [1, 1]} : vector<2x2xf32> into vector<4x6xf32>			// CHECK-NEXT: %[[VEC5:.*]] = vector.insert_strided_slice %[[VTR5]], %[[VEC4]] {offsets = [2, 4], strides = [1, 1]} : vector<2x2xf32> into vector<4x6xf32>
	// CHECK-NEXT: return %[[VEC5]] : vector<4x6xf32>			// CHECK-NEXT: return %[[VEC5]] : vector<4x6xf32>
	#map0 = affine_map<(d0, d1) -> (0, d0)>			#map0 = affine_map<(d0, d1) -> (0, d0)>
	func.func @transfer_read_unroll_broadcast_permuation(%arg0 : memref<6x4xf32>) -> vector<4x6xf32> {			func.func @transfer_read_unroll_broadcast_permuation(%arg0 : memref<6x4xf32>) -> vector<4x6xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%0 = vector.transfer_read %arg0[%c0, %c0], %cf0 {permutation_map = #map0} : memref<6x4xf32>, vector<4x6xf32>			%0 = vector.transfer_read %arg0[%c0, %c0], %cf0 {in_bounds = [false, true], permutation_map = #map0} : memref<6x4xf32>, vector<4x6xf32>
	return %0 : vector<4x6xf32>			return %0 : vector<4x6xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @transfer_read_unroll_different_rank			// CHECK-LABEL: func @transfer_read_unroll_different_rank
	// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index			// CHECK-DAG: %[[C4:.*]] = arith.constant 4 : index
	// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index			// CHECK-DAG: %[[C2:.*]] = arith.constant 2 : index
	Show All 29 Lines
	// ORDER-NEXT: %[[VTR5:.]] = vector.transfer_read {{.}}[%[[C2]], %[[C0]], %[[C4]]], %{{.*}} : memref<?x?x?xf32>, vector<2x2xf32>			// ORDER-NEXT: %[[VTR5:.]] = vector.transfer_read {{.}}[%[[C2]], %[[C0]], %[[C4]]], %{{.*}} : memref<?x?x?xf32>, vector<2x2xf32>
	// ORDER-NEXT: %[[VEC5:.*]] = vector.insert_strided_slice %[[VTR5]], %[[VEC4]] {offsets = [4, 2], strides = [1, 1]} : vector<2x2xf32> into vector<6x4xf32>			// ORDER-NEXT: %[[VEC5:.*]] = vector.insert_strided_slice %[[VTR5]], %[[VEC4]] {offsets = [4, 2], strides = [1, 1]} : vector<2x2xf32> into vector<6x4xf32>
	// ORDER-NEXT: return %[[VEC5]] : vector<6x4xf32>			// ORDER-NEXT: return %[[VEC5]] : vector<6x4xf32>

	#map0 = affine_map<(d0, d1, d2) -> (d2, d0)>			#map0 = affine_map<(d0, d1, d2) -> (d2, d0)>
	func.func @transfer_read_unroll_different_rank(%arg0 : memref<?x?x?xf32>) -> vector<6x4xf32> {			func.func @transfer_read_unroll_different_rank(%arg0 : memref<?x?x?xf32>) -> vector<6x4xf32> {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%cf0 = arith.constant 0.0 : f32			%cf0 = arith.constant 0.0 : f32
	%0 = vector.transfer_read %arg0[%c0, %c0, %c0], %cf0 {permutation_map = #map0} : memref<?x?x?xf32>, vector<6x4xf32>			%0 = vector.transfer_read %arg0[%c0, %c0, %c0], %cf0 {in_bounds = [false, true, false], permutation_map = #map0} : memref<?x?x?xf32>, vector<6x4xf32>
	return %0 : vector<6x4xf32>			return %0 : vector<6x4xf32>
	}			}

	// -----			// -----

	// CHECK-LABEL: func @vector_gather_unroll			// CHECK-LABEL: func @vector_gather_unroll
	// CHECK-SAME: %[[ARG0:.*]]: memref<?x?x?xf32>			// CHECK-SAME: %[[ARG0:.*]]: memref<?x?x?xf32>
	// CHECK-SAME: %[[ARG1:.*]]: vector<6x4xindex>			// CHECK-SAME: %[[ARG1:.*]]: vector<6x4xindex>
	▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/vector-warp-distribute.mlir

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
// CHECK-D: }		// CHECK-D: }

func.func @warp_extract(%laneid: index, %arg1: memref<1024x1024xf32>, %gid : index) {		func.func @warp_extract(%laneid: index, %arg1: memref<1024x1024xf32>, %gid : index) {
vector.warp_execute_on_lane_0(%laneid)[32] {		vector.warp_execute_on_lane_0(%laneid)[32] {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%v = "test.dummy_op"() : () -> (vector<1xf32>)		%v = "test.dummy_op"() : () -> (vector<1xf32>)
%v1 = "test.dummy_op"() : () -> (vector<1x1xf32>)		%v1 = "test.dummy_op"() : () -> (vector<1x1xf32>)
vector.transfer_write %v1, %arg1[%c0, %c0] : vector<1x1xf32>, memref<1024x1024xf32>		vector.transfer_write %v1, %arg1[%c0, %c0] : vector<1x1xf32>, memref<1024x1024xf32>
vector.transfer_write %v, %arg1[%c0, %c0] : vector<1xf32>, memref<1024x1024xf32>		vector.transfer_write %v, %arg1[%c0, %c0] {in_bounds = [true, false]}: vector<1xf32>, memref<1024x1024xf32>
}		}
return		return
}		}

// -----		// -----

// CHECK-PROP-LABEL: func @warp_dead_result(		// CHECK-PROP-LABEL: func @warp_dead_result(
func.func @warp_dead_result(%laneid: index) -> (vector<1xf32>) {		func.func @warp_dead_result(%laneid: index) -> (vector<1xf32>) {
▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines

func.func @vector_reduction(%laneid: index, %m0: memref<4x2x32xf32>, %m1: memref<f32>) {		func.func @vector_reduction(%laneid: index, %m0: memref<4x2x32xf32>, %m1: memref<f32>) {
%c0 = arith.constant 0: index		%c0 = arith.constant 0: index
%f0 = arith.constant 0.0: f32		%f0 = arith.constant 0.0: f32
// CHECK-D: %[[R:.]] = vector.warp_execute_on_lane_0(%{{.}})[32] -> (vector<f32>) {		// CHECK-D: %[[R:.]] = vector.warp_execute_on_lane_0(%{{.}})[32] -> (vector<f32>) {
// CHECK-D: vector.warp_execute_on_lane_0(%{{.*}})[32] {		// CHECK-D: vector.warp_execute_on_lane_0(%{{.*}})[32] {
// CHECK-D: vector.transfer_write %[[R]], %{{.*}}[] : vector<f32>, memref<f32>		// CHECK-D: vector.transfer_write %[[R]], %{{.*}}[] : vector<f32>, memref<f32>
vector.warp_execute_on_lane_0(%laneid)[32] {		vector.warp_execute_on_lane_0(%laneid)[32] {
%0 = vector.transfer_read %m0[%c0, %c0, %c0], %f0 {in_bounds = [true]} : memref<4x2x32xf32>, vector<32xf32>		%0 = vector.transfer_read %m0[%c0, %c0, %c0], %f0 {in_bounds = [true, true, true]} : memref<4x2x32xf32>, vector<32xf32>
%1 = vector.transfer_read %m1[], %f0 : memref<f32>, vector<f32>		%1 = vector.transfer_read %m1[], %f0 : memref<f32>, vector<f32>
%2 = vector.extractelement %1[] : vector<f32>		%2 = vector.extractelement %1[] : vector<f32>
%3 = vector.reduction <add>, %0 : vector<32xf32> into f32		%3 = vector.reduction <add>, %0 : vector<32xf32> into f32
%4 = arith.addf %3, %2 : f32		%4 = arith.addf %3, %2 : f32
%5 = vector.broadcast %4 : f32 to vector<f32>		%5 = vector.broadcast %4 : f32 to vector<f32>
vector.transfer_write %5, %m1[] : vector<f32>, memref<f32>		vector.transfer_write %5, %m1[] : vector<f32>, memref<f32>
}		}
return		return
▲ Show 20 Lines • Show All 595 Lines • ▼ Show 20 Lines	func.func @transfer_read_no_prop(%in2: vector<1x2xindex>, %ar1 : memref<1x4x2xi32>, %ar2 : memref<1x4x1024xf32>)-> vector<2xf32> {
%cst_6 = arith.constant 0.000000e+00 : f32		%cst_6 = arith.constant 0.000000e+00 : f32

%18 = vector.warp_execute_on_lane_0(%0)[32] args(%in2 : vector<1x2xindex>) -> (vector<2xf32>) {		%18 = vector.warp_execute_on_lane_0(%0)[32] args(%in2 : vector<1x2xindex>) -> (vector<2xf32>) {
^bb0(%arg4: vector<1x64xindex>):		^bb0(%arg4: vector<1x64xindex>):
%28 = vector.gather %ar1[%c0, %c0, %c0] [%arg4], %cst_0, %cst : memref<1x4x2xi32>, vector<1x64xindex>, vector<1x64xi1>, vector<1x64xi32> into vector<1x64xi32>		%28 = vector.gather %ar1[%c0, %c0, %c0] [%arg4], %cst_0, %cst : memref<1x4x2xi32>, vector<1x64xindex>, vector<1x64xi1>, vector<1x64xi32> into vector<1x64xi32>
%29 = vector.extract %28[0] : vector<1x64xi32>		%29 = vector.extract %28[0] : vector<1x64xi32>
%30 = arith.index_cast %29 : vector<64xi32> to vector<64xindex>		%30 = arith.index_cast %29 : vector<64xi32> to vector<64xindex>
%36 = vector.extractelement %30[%c0_i32 : i32] : vector<64xindex>		%36 = vector.extractelement %30[%c0_i32 : i32] : vector<64xindex>
%37 = vector.transfer_read %ar2[%c0, %36, %c0], %cst_6 {in_bounds = [true]} : memref<1x4x1024xf32>, vector<64xf32>		%37 = vector.transfer_read %ar2[%c0, %36, %c0], %cst_6 {in_bounds = [true, true, true]} : memref<1x4x1024xf32>, vector<64xf32>
vector.yield %37 : vector<64xf32>		vector.yield %37 : vector<64xf32>
}		}
return %18 : vector<2xf32>		return %18 : vector<2xf32>
}		}

// -----		// -----

// Check that we don't fold vector.broadcast when each thread doesn't get the		// Check that we don't fold vector.broadcast when each thread doesn't get the
Show All 17 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_coo_test.mlir

Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	func.func @entry() {
// CHECK-NEXT-COUNT-3: ( 4.3, 5.3, 6.3, 8.3, 8.3, 12.3, 14.3, 16.3 )		// CHECK-NEXT-COUNT-3: ( 4.3, 5.3, 6.3, 8.3, 8.3, 12.3, 14.3, 16.3 )
// CHECK-NEXT-COUNT-3: ( 4.5, 4.5, 6.5, 8.5, 8.5, 12.5, 14.5, 16.5 )		// CHECK-NEXT-COUNT-3: ( 4.5, 4.5, 6.5, 8.5, 8.5, 12.5, 14.5, 16.5 )
// CHECK-NEXT-COUNT-3: ( 9.9, 4.9, 6.9, 8.9, 8.9, 12.9, 15.9, 16.9 )		// CHECK-NEXT-COUNT-3: ( 9.9, 4.9, 6.9, 8.9, 8.9, 12.9, 15.9, 16.9 )
// CHECK-NEXT-COUNT-3: ( 12.1, 6.1, 5.1, 9.1, 9.1, 13.1, 15.1, 17.1 )		// CHECK-NEXT-COUNT-3: ( 12.1, 6.1, 5.1, 9.1, 9.1, 13.1, 15.1, 17.1 )
// CHECK-NEXT-COUNT-3: ( 15.4, 5.4, 7.4, 5.4, 11.4, 10.4, 11.4, 9.4 )		// CHECK-NEXT-COUNT-3: ( 15.4, 5.4, 7.4, 5.4, 11.4, 10.4, 11.4, 9.4 )
//		//
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
scf.for %i = %c0 to %c8 step %c1 {		scf.for %i = %c0 to %c8 step %c1 {
%v1 = vector.transfer_read %C1[%i, %c0], %f0		%v1 = vector.transfer_read %C1[%i, %c0], %f0 {in_bounds = [true, true]}
: tensor<8x8xf32>, vector<8xf32>		: tensor<8x8xf32>, vector<8xf32>
%v2 = vector.transfer_read %C2[%i, %c0], %f0		%v2 = vector.transfer_read %C2[%i, %c0], %f0 {in_bounds = [true, true]}
: tensor<8x8xf32>, vector<8xf32>		: tensor<8x8xf32>, vector<8xf32>
%v3 = vector.transfer_read %C3[%i, %c0], %f0		%v3 = vector.transfer_read %C3[%i, %c0], %f0 {in_bounds = [true, true]}
: tensor<8x8xf32>, vector<8xf32>		: tensor<8x8xf32>, vector<8xf32>
vector.print %v1 : vector<8xf32>		vector.print %v1 : vector<8xf32>
vector.print %v2 : vector<8xf32>		vector.print %v2 : vector<8xf32>
vector.print %v3 : vector<8xf32>		vector.print %v3 : vector<8xf32>
}		}

// Release resources.		// Release resources.
bufferization.dealloc_tensor %C1 : tensor<8x8xf32>		bufferization.dealloc_tensor %C1 : tensor<8x8xf32>
Show All 10 Lines

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_sampled_matmul.mlir

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	func.func @entry() {
//		//
// CHECK: ( 10, 0, 0, 56, 0 )		// CHECK: ( 10, 0, 0, 56, 0 )
// CHECK: ( 0, 80, 0, 0, 250 )		// CHECK: ( 0, 80, 0, 0, 250 )
// CHECK: ( 0, 0, 270, 0, 0 )		// CHECK: ( 0, 0, 270, 0, 0 )
// CHECK: ( 164, 0, 0, 640, 0 )		// CHECK: ( 164, 0, 0, 640, 0 )
// CHECK: ( 0, 520, 0, 0, 1250 )		// CHECK: ( 0, 520, 0, 0, 1250 )
//		//
scf.for %i = %c0 to %c5 step %c1 {		scf.for %i = %c0 to %c5 step %c1 {
%v = vector.transfer_read %0[%i, %c0], %d0: tensor<?x?xf32>, vector<5xf32>		%v = vector.transfer_read %0[%i, %c0], %d0 {in_bounds = [true, true]}: tensor<?x?xf32>, vector<5xf32>
vector.print %v : vector<5xf32>		vector.print %v : vector<5xf32>
}		}

// Release the resources.		// Release the resources.
bufferization.dealloc_tensor %s : tensor<?x?xf32, #SparseMatrix>		bufferization.dealloc_tensor %s : tensor<?x?xf32, #SparseMatrix>

return		return
}		}
}		}

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	func.func @entry() {
//		//
// CHECK-NEXT: ( 1.1, 0, 3.1 )		// CHECK-NEXT: ( 1.1, 0, 3.1 )
// CHECK-NEXT: ( 1.2, 0, 0 )		// CHECK-NEXT: ( 1.2, 0, 0 )
// CHECK-NEXT: ( 0, 0, 3.3 )		// CHECK-NEXT: ( 0, 0, 3.3 )
// CHECK-NEXT: ( 1.4, 0, 3.4 )		// CHECK-NEXT: ( 1.4, 0, 3.4 )
//		//
%x = sparse_tensor.convert %0 : tensor<4x3xf64, #DCSR> to tensor<4x3xf64>		%x = sparse_tensor.convert %0 : tensor<4x3xf64, #DCSR> to tensor<4x3xf64>
scf.for %i = %c0 to %c4 step %c1 {		scf.for %i = %c0 to %c4 step %c1 {
%v1 = vector.transfer_read %x[%i, %c0], %du: tensor<4x3xf64>, vector<3xf64>		%v1 = vector.transfer_read %x[%i, %c0], %du {in_bounds = [true, true]} : tensor<4x3xf64>, vector<3xf64>
vector.print %v1 : vector<3xf64>		vector.print %v1 : vector<3xf64>
}		}
%y = sparse_tensor.convert %1 : tensor<4x3xf64, #DCSR> to tensor<4x3xf64>		%y = sparse_tensor.convert %1 : tensor<4x3xf64, #DCSR> to tensor<4x3xf64>
scf.for %i = %c0 to %c4 step %c1 {		scf.for %i = %c0 to %c4 step %c1 {
%v2 = vector.transfer_read %y[%i, %c0], %du: tensor<4x3xf64>, vector<3xf64>		%v2 = vector.transfer_read %y[%i, %c0], %du {in_bounds = [true, true]} : tensor<4x3xf64>, vector<3xf64>
vector.print %v2 : vector<3xf64>		vector.print %v2 : vector<3xf64>
}		}

// Release resources.		// Release resources.
bufferization.dealloc_tensor %a : tensor<3x4xf64, #DCSR>		bufferization.dealloc_tensor %a : tensor<3x4xf64, #DCSR>
bufferization.dealloc_tensor %0 : tensor<4x3xf64, #DCSR>		bufferization.dealloc_tensor %0 : tensor<4x3xf64, #DCSR>
bufferization.dealloc_tensor %1 : tensor<4x3xf64, #DCSR>		bufferization.dealloc_tensor %1 : tensor<4x3xf64, #DCSR>

return		return
}		}
}		}

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matmul-lib.mlir

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	func.func @dump(%mat: tensor<8x8xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%c5 = arith.constant 5 : index		%c5 = arith.constant 5 : index
%c6 = arith.constant 6 : index		%c6 = arith.constant 6 : index
%c7 = arith.constant 7 : index		%c7 = arith.constant 7 : index
%r0 = vector.transfer_read %mat[%c0,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r0 = vector.transfer_read %mat[%c0,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r0 : vector<8xf32>		vector.print %r0 : vector<8xf32>
%r1 = vector.transfer_read %mat[%c1,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r1 = vector.transfer_read %mat[%c1,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r1 : vector<8xf32>		vector.print %r1 : vector<8xf32>
%r2 = vector.transfer_read %mat[%c2,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r2 = vector.transfer_read %mat[%c2,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r2 : vector<8xf32>		vector.print %r2 : vector<8xf32>
%r3 = vector.transfer_read %mat[%c3,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r3 = vector.transfer_read %mat[%c3,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r3 : vector<8xf32>		vector.print %r3 : vector<8xf32>
%r4 = vector.transfer_read %mat[%c4,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r4 = vector.transfer_read %mat[%c4,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r4 : vector<8xf32>		vector.print %r4 : vector<8xf32>
%r5 = vector.transfer_read %mat[%c5,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r5 = vector.transfer_read %mat[%c5,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r5 : vector<8xf32>		vector.print %r5 : vector<8xf32>
%r6 = vector.transfer_read %mat[%c6,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r6 = vector.transfer_read %mat[%c6,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r6 : vector<8xf32>		vector.print %r6 : vector<8xf32>
%r7 = vector.transfer_read %mat[%c7,%c0], %f0 : tensor<8x8xf32>, vector<8xf32>		%r7 = vector.transfer_read %mat[%c7,%c0], %f0 {in_bounds = [true, false]} : tensor<8x8xf32>, vector<8xf32>
vector.print %r7 : vector<8xf32>		vector.print %r7 : vector<8xf32>
return		return
}		}

//		//
// Main driver.		// Main driver.
//		//
func.func @main() {		func.func @main() {
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-mma-2-4-f16.mlir

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	gpu.func @mma_sp_sync_f16_16832(
//		//
// The (thread_id)->(row, col) map within each 8x4x(2xf16) quadrant is (t)->(t/4, t%4). We		// The (thread_id)->(row, col) map within each 8x4x(2xf16) quadrant is (t)->(t/4, t%4). We
// can use "affine.delinearize_index" which means the same thing.		// can use "affine.delinearize_index" which means the same thing.

%quad_row, %col_8x4 = affine.delinearize_index %lane_id into (%c8, %c4) : index, index		%quad_row, %col_8x4 = affine.delinearize_index %lane_id into (%c8, %c4) : index, index
%quad_col = affine.apply affine_map<()[s0]->(s0 * 2)>()[%col_8x4] // account for 2xf16/col		%quad_col = affine.apply affine_map<()[s0]->(s0 * 2)>()[%col_8x4] // account for 2xf16/col

// Load quad (0, 0)		// Load quad (0, 0)
%A_quad00 = vector.transfer_read %argA[%quad_row, %quad_col], %f0 {in_bounds = [true]} : memref<16x16xf16>, vector<2xf16>		%A_quad00 = vector.transfer_read %argA[%quad_row, %quad_col], %f0 {in_bounds = [true, true]} : memref<16x16xf16>, vector<2xf16>

// Load quad (1, 0). Just shift row down 8.		// Load quad (1, 0). Just shift row down 8.
%quad_row_plus_8 = affine.apply affine_map<(d0)[]->(d0+8)>(%quad_row)[]		%quad_row_plus_8 = affine.apply affine_map<(d0)[]->(d0+8)>(%quad_row)[]
%A_quad10 = vector.transfer_read %argA[%quad_row_plus_8, %quad_col], %f0 {in_bounds = [true]} : memref<16x16xf16>, vector<2xf16>		%A_quad10 = vector.transfer_read %argA[%quad_row_plus_8, %quad_col], %f0 {in_bounds = [true, true]} : memref<16x16xf16>, vector<2xf16>

// Load quad (0, 1). Just shift col right 8 (4 2xf16 values)		// Load quad (0, 1). Just shift col right 8 (4 2xf16 values)
%quad_col_plus_8 = affine.apply affine_map<(d0)[]->(d0+8)>(%quad_col)[]		%quad_col_plus_8 = affine.apply affine_map<(d0)[]->(d0+8)>(%quad_col)[]
%A_quad01 = vector.transfer_read %argA[%quad_row, %quad_col_plus_8], %f0 {in_bounds = [true]} : memref<16x16xf16>, vector<2xf16>		%A_quad01 = vector.transfer_read %argA[%quad_row, %quad_col_plus_8], %f0 {in_bounds = [true, true]} : memref<16x16xf16>, vector<2xf16>

// Load quad (1, 1)		// Load quad (1, 1)
%A_quad11 = vector.transfer_read %argA[%quad_row_plus_8, %quad_col_plus_8], %f0 {in_bounds = [true]} : memref<16x16xf16>, vector<2xf16>		%A_quad11 = vector.transfer_read %argA[%quad_row_plus_8, %quad_col_plus_8], %f0 {in_bounds = [true, true]} : memref<16x16xf16>, vector<2xf16>

// Assemble the elements into a vector		// Assemble the elements into a vector
%A_init0 = arith.constant dense<0.0> : vector<4x2xf16>		%A_init0 = arith.constant dense<0.0> : vector<4x2xf16>
%A_data0 = vector.insert %A_quad00, %A_init0[0] : vector<2xf16> into vector<4x2xf16>		%A_data0 = vector.insert %A_quad00, %A_init0[0] : vector<2xf16> into vector<4x2xf16>
%A_data1 = vector.insert %A_quad10, %A_data0[1] : vector<2xf16> into vector<4x2xf16>		%A_data1 = vector.insert %A_quad10, %A_data0[1] : vector<2xf16> into vector<4x2xf16>
%A_data2 = vector.insert %A_quad01, %A_data1[2] : vector<2xf16> into vector<4x2xf16>		%A_data2 = vector.insert %A_quad01, %A_data1[2] : vector<2xf16> into vector<4x2xf16>
%A_data = vector.insert %A_quad11, %A_data2[3] : vector<2xf16> into vector<4x2xf16>		%A_data = vector.insert %A_quad11, %A_data2[3] : vector<2xf16> into vector<4x2xf16>

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Load operand B		// Load operand B
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Load the actual fragments for the dense values. This can be done using ldmatrix,		// Load the actual fragments for the dense values. This can be done using ldmatrix,
// but here we just do naive individual loads, as would be required if we could		// but here we just do naive individual loads, as would be required if we could
// not use ldmatrix.		// not use ldmatrix.
//		//
// The thread map here is different from operandA. This operand is in the form		// The thread map here is different from operandA. This operand is in the form
// memref<8x32xf16> (col major). Each thread load a 2xf16 vector from a		// memref<8x32xf16> (col major). Each thread load a 2xf16 vector from a
// 8x8xf16 quadrant.		// 8x8xf16 quadrant.
//		//
// The (thread_id)->(col, row) map within each 8x4x(2xf16) quadrant is		// The (thread_id)->(col, row) map within each 8x4x(2xf16) quadrant is
// (t) -> (t/4, t % 4). So we can re-use some of the calculation from A.		// (t) -> (t/4, t % 4). So we can re-use some of the calculation from A.

// Load quad (0, 0)		// Load quad (0, 0)
%B_quad0 = vector.transfer_read %argB[%quad_row, %quad_col], %f0 {in_bounds = [true]} : memref<8x32xf16>, vector<2xf16>		%B_quad0 = vector.transfer_read %argB[%quad_row, %quad_col], %f0 {in_bounds = [true, true]} : memref<8x32xf16>, vector<2xf16>

// Load quad (0, 1)		// Load quad (0, 1)
%B_quad1 = vector.transfer_read %argB[%quad_row, %quad_col_plus_8], %f0 {in_bounds = [true]} : memref<8x32xf16>, vector<2xf16>		%B_quad1 = vector.transfer_read %argB[%quad_row, %quad_col_plus_8], %f0 {in_bounds = [true, true]} : memref<8x32xf16>, vector<2xf16>

// Load quad (0, 2)		// Load quad (0, 2)
%quad_col_plus_16 = affine.apply affine_map<()[s0]->(s0 + 16)>()[%quad_col]		%quad_col_plus_16 = affine.apply affine_map<()[s0]->(s0 + 16)>()[%quad_col]
%B_quad2 = vector.transfer_read %argB[%quad_row, %quad_col_plus_16], %f0 {in_bounds = [true]} : memref<8x32xf16>, vector<2xf16>		%B_quad2 = vector.transfer_read %argB[%quad_row, %quad_col_plus_16], %f0 {in_bounds = [true, true]} : memref<8x32xf16>, vector<2xf16>

// Load quad (0, 3)		// Load quad (0, 3)
%quad_col_plus_24 = affine.apply affine_map<()[s0]->(s0 + 24)>()[%quad_col]		%quad_col_plus_24 = affine.apply affine_map<()[s0]->(s0 + 24)>()[%quad_col]
%B_quad3 = vector.transfer_read %argB[%quad_row, %quad_col_plus_24], %f0 {in_bounds = [true]} : memref<8x32xf16>, vector<2xf16>		%B_quad3 = vector.transfer_read %argB[%quad_row, %quad_col_plus_24], %f0 {in_bounds = [true, true]} : memref<8x32xf16>, vector<2xf16>

// Assemble into vector		// Assemble into vector
%B_init0 = arith.constant dense<0.0> : vector<4x2xf16>		%B_init0 = arith.constant dense<0.0> : vector<4x2xf16>
%B_data0 = vector.insert %B_quad0, %B_init0[0] : vector<2xf16> into vector<4x2xf16>		%B_data0 = vector.insert %B_quad0, %B_init0[0] : vector<2xf16> into vector<4x2xf16>
%B_data1 = vector.insert %B_quad1, %B_data0[1] : vector<2xf16> into vector<4x2xf16>		%B_data1 = vector.insert %B_quad1, %B_data0[1] : vector<2xf16> into vector<4x2xf16>
%B_data2 = vector.insert %B_quad2, %B_data1[2] : vector<2xf16> into vector<4x2xf16>		%B_data2 = vector.insert %B_quad2, %B_data1[2] : vector<2xf16> into vector<4x2xf16>
%B_data = vector.insert %B_quad3, %B_data2[3] : vector<2xf16> into vector<4x2xf16>		%B_data = vector.insert %B_quad3, %B_data2[3] : vector<2xf16> into vector<4x2xf16>

Show All 14 Lines	gpu.func @mma_sp_sync_f16_16832(

// The mma instruction gave us two 2xf16 vectors per thread. These values		// The mma instruction gave us two 2xf16 vectors per thread. These values
// correspond to different positions in the 16x8xf16 result matrix. Each value belongs		// correspond to different positions in the 16x8xf16 result matrix. Each value belongs
// to one of the 8x4x(2xf16) halves. The halves are indexed as follows (as you might guess):		// to one of the 8x4x(2xf16) halves. The halves are indexed as follows (as you might guess):
// vector0: (tid) -> (tid / 4 , tid %4)		// vector0: (tid) -> (tid / 4 , tid %4)
// vector1: (tid) -> (tid / 4 + 8, tid %4)		// vector1: (tid) -> (tid / 4 + 8, tid %4)
%C_0 = vector.extract %d[0] : vector<2x2xf16>		%C_0 = vector.extract %d[0] : vector<2x2xf16>
%C_1 = vector.extract %d[1] : vector<2x2xf16>		%C_1 = vector.extract %d[1] : vector<2x2xf16>
vector.transfer_write %C_0, %argC[%quad_row, %quad_col] {in_bounds = [true]} : vector<2xf16>, memref<16x8xf16>		vector.transfer_write %C_0, %argC[%quad_row, %quad_col] {in_bounds = [true, true]} : vector<2xf16>, memref<16x8xf16>
vector.transfer_write %C_1, %argC[%quad_row_plus_8, %quad_col] {in_bounds = [true]} : vector<2xf16>, memref<16x8xf16>		vector.transfer_write %C_1, %argC[%quad_row_plus_8, %quad_col] {in_bounds = [true, true]} : vector<2xf16>, memref<16x8xf16>

gpu.return		gpu.return
}		}
}		}

// Code than runs on the host.		// Code than runs on the host.

//		//
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	func.func @main() {
// CHECK-NEXT: ( 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 )		// CHECK-NEXT: ( 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 )
// CHECK-NEXT: ( 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 )		// CHECK-NEXT: ( 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 )
// CHECK-NEXT: ( 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 )		// CHECK-NEXT: ( 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28 )
// CHECK-NEXT: ( 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 )		// CHECK-NEXT: ( 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29 )
// CHECK-NEXT: ( 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 )		// CHECK-NEXT: ( 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 )
// CHECK-NEXT: ( 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 )		// CHECK-NEXT: ( 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 )
//		//
scf.for %pai = %c0 to %c16 step %c1 {		scf.for %pai = %c0 to %c16 step %c1 {
%pa0 = vector.transfer_read %a[%pai, %c0], %f0 : memref<16x16xf16>, vector<16xf16>		%pa0 = vector.transfer_read %a[%pai, %c0], %f0 {in_bounds = [true, true]} : memref<16x16xf16>, vector<16xf16>
vector.print %pa0 : vector<16xf16>		vector.print %pa0 : vector<16xf16>
}		}

//		//
// Sanity check on input matrix 32x8 B.		// Sanity check on input matrix 32x8 B.
// Note that this is really shown as B^T		// Note that this is really shown as B^T
//		//
// CHECK-NEXT: ( 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -30, -31 )		// CHECK-NEXT: ( 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -30, -31 )
// CHECK-NEXT: ( 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -30 )		// CHECK-NEXT: ( 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29, -30 )
// CHECK-NEXT: ( 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29 )		// CHECK-NEXT: ( 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28, -29 )
// CHECK-NEXT: ( 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28 )		// CHECK-NEXT: ( 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27, -28 )
// CHECK-NEXT: ( 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27 )		// CHECK-NEXT: ( 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26, -27 )
// CHECK-NEXT: ( 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26 )		// CHECK-NEXT: ( 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25, -26 )
// CHECK-NEXT: ( 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25 )		// CHECK-NEXT: ( 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24, -25 )
// CHECK-NEXT: ( 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24 )		// CHECK-NEXT: ( 7, 6, 5, 4, 3, 2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, -11, -12, -13, -14, -15, -16, -17, -18, -19, -20, -21, -22, -23, -24 )
//		//
//		//
scf.for %pbi = %c0 to %c8 step %c1 {		scf.for %pbi = %c0 to %c8 step %c1 {
%pb0 = vector.transfer_read %b[%pbi, %c0], %f0 : memref<8x32xf16>, vector<32xf16>		%pb0 = vector.transfer_read %b[%pbi, %c0], %f0 {in_bounds = [true, true]} : memref<8x32xf16>, vector<32xf16>
vector.print %pb0 : vector<32xf16>		vector.print %pb0 : vector<32xf16>
}		}

// Maps the provided host buffers into the device address space.		// Maps the provided host buffers into the device address space.
// Writes from the host are guaranteed to be visible to device		// Writes from the host are guaranteed to be visible to device
// kernels that are launched afterwards. Writes from the device		// kernels that are launched afterwards. Writes from the device
// are guaranteed to be visible on the host after synchronizing		// are guaranteed to be visible on the host after synchronizing
// with the device kernel completion.		// with the device kernel completion.
Show All 40 Lines	func.func @main() {
// CHECK-NEXT: ( -5120, -4824, -4528, -4232, -3936, -3640, -3344, -3048 )		// CHECK-NEXT: ( -5120, -4824, -4528, -4232, -3936, -3640, -3344, -3048 )
// CHECK-NEXT: ( -5360, -5048, -4736, -4424, -4112, -3800, -3488, -3176 )		// CHECK-NEXT: ( -5360, -5048, -4736, -4424, -4112, -3800, -3488, -3176 )
// CHECK-NEXT: ( -5600, -5272, -4944, -4616, -4288, -3960, -3632, -3304 )		// CHECK-NEXT: ( -5600, -5272, -4944, -4616, -4288, -3960, -3632, -3304 )
// CHECK-NEXT: ( -5840, -5496, -5152, -4808, -4464, -4120, -3776, -3432 )		// CHECK-NEXT: ( -5840, -5496, -5152, -4808, -4464, -4120, -3776, -3432 )
// CHECK-NEXT: ( -6080, -5720, -5360, -5000, -4640, -4280, -3920, -3560 )		// CHECK-NEXT: ( -6080, -5720, -5360, -5000, -4640, -4280, -3920, -3560 )
// CHECK-NEXT: ( -6320, -5944, -5568, -5192, -4816, -4440, -4064, -3688 )		// CHECK-NEXT: ( -6320, -5944, -5568, -5192, -4816, -4440, -4064, -3688 )
//		//
scf.for %pci = %c0 to %c16 step %c1 {		scf.for %pci = %c0 to %c16 step %c1 {
%pc0 = vector.transfer_read %c[%pci, %c0], %f0 : memref<16x8xf16>, vector<8xf16>		%pc0 = vector.transfer_read %c[%pci, %c0], %f0 {in_bounds = [true, true]} : memref<16x8xf16>, vector<8xf16>
vector.print %pc0 : vector<8xf16>		vector.print %pc0 : vector<8xf16>
}		}

return		return
}		}
}		}

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir

Show All 15 Lines	dense<[[0. , 1. , 2. , 3. , 4. , 5. ],
[20., 21., 22., 23., 24., 25.],		[20., 21., 22., 23., 24., 25.],
[30., 31., 32., 33., 34., 35.],		[30., 31., 32., 33., 34., 35.],
[40., 41., 42., 43., 44., 45.]]>		[40., 41., 42., 43., 44., 45.]]>

// Non-contiguous, strided load.		// Non-contiguous, strided load.
func.func @transfer_read_1d(%A : memref<?x?xf32>, %base1 : index, %base2 : index) {		func.func @transfer_read_1d(%A : memref<?x?xf32>, %base1 : index, %base2 : index) {
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
%f = vector.transfer_read %A[%base1, %base2], %fm42		%f = vector.transfer_read %A[%base1, %base2], %fm42
{permutation_map = affine_map<(d0, d1) -> (d0)>}		{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds=[false, true]}
: memref<?x?xf32>, vector<9xf32>		: memref<?x?xf32>, vector<9xf32>
vector.print %f: vector<9xf32>		vector.print %f: vector<9xf32>
return		return
}		}

// Vector load with unit stride only on last dim.		// Vector load with unit stride only on last dim.
func.func @transfer_read_1d_unit_stride(%A : memref<?x?xf32>) {		func.func @transfer_read_1d_unit_stride(%A : memref<?x?xf32>) {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%c3 = arith.constant 3 : index		%c3 = arith.constant 3 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%c5 = arith.constant 5 : index		%c5 = arith.constant 5 : index
%c6 = arith.constant 6 : index		%c6 = arith.constant 6 : index
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
scf.for %arg2 = %c1 to %c5 step %c2 {		scf.for %arg2 = %c1 to %c5 step %c2 {
scf.for %arg3 = %c0 to %c6 step %c3 {		scf.for %arg3 = %c0 to %c6 step %c3 {
%0 = memref.subview %A[%arg2, %arg3] [1, 2] [1, 1]		%0 = memref.subview %A[%arg2, %arg3] [1, 2] [1, 1]
: memref<?x?xf32> to memref<1x2xf32, strided<[?, 1], offset: ?>>		: memref<?x?xf32> to memref<1x2xf32, strided<[?, 1], offset: ?>>
%1 = vector.transfer_read %0[%c0, %c0], %fm42 {in_bounds=[true]}		%1 = vector.transfer_read %0[%c0, %c0], %fm42 {in_bounds=[true, true]}
: memref<1x2xf32, strided<[?, 1], offset: ?>>, vector<2xf32>		: memref<1x2xf32, strided<[?, 1], offset: ?>>, vector<2xf32>
vector.print %1 : vector<2xf32>		vector.print %1 : vector<2xf32>
}		}
}		}
return		return
}		}

// Vector load with unit stride only on last dim. Strides are not static, so		// Vector load with unit stride only on last dim. Strides are not static, so
// codegen must go through VectorToSCF 1D lowering.		// codegen must go through VectorToSCF 1D lowering.
func.func @transfer_read_1d_non_static_unit_stride(%A : memref<?x?xf32>) {		func.func @transfer_read_1d_non_static_unit_stride(%A : memref<?x?xf32>) {
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%c4 = arith.constant 4 : index		%c4 = arith.constant 4 : index
%c6 = arith.constant 6 : index		%c6 = arith.constant 6 : index
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
%1 = memref.reinterpret_cast %A to offset: [%c6], sizes: [%c4, %c6], strides: [%c6, %c1]		%1 = memref.reinterpret_cast %A to offset: [%c6], sizes: [%c4, %c6], strides: [%c6, %c1]
: memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>		: memref<?x?xf32> to memref<?x?xf32, strided<[?, ?], offset: ?>>
%2 = vector.transfer_read %1[%c2, %c1], %fm42 {in_bounds=[true]}		%2 = vector.transfer_read %1[%c2, %c1], %fm42 {in_bounds=[true, true]}
: memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4xf32>		: memref<?x?xf32, strided<[?, ?], offset: ?>>, vector<4xf32>
vector.print %2 : vector<4xf32>		vector.print %2 : vector<4xf32>
return		return
}		}

// Vector load where last dim has non-unit stride.		// Vector load where last dim has non-unit stride.
func.func @transfer_read_1d_non_unit_stride(%A : memref<?x?xf32>) {		func.func @transfer_read_1d_non_unit_stride(%A : memref<?x?xf32>) {
%B = memref.reinterpret_cast %A to offset: [0], sizes: [4, 3], strides: [6, 2]		%B = memref.reinterpret_cast %A to offset: [0], sizes: [4, 3], strides: [6, 2]
: memref<?x?xf32> to memref<4x3xf32, strided<[6, 2]>>		: memref<?x?xf32> to memref<4x3xf32, strided<[6, 2]>>
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
%vec = vector.transfer_read %B[%c2, %c1], %fm42 {in_bounds=[false]} : memref<4x3xf32, strided<[6, 2]>>, vector<3xf32>		%vec = vector.transfer_read %B[%c2, %c1], %fm42 {in_bounds=[true, false]} : memref<4x3xf32, strided<[6, 2]>>, vector<3xf32>
vector.print %vec : vector<3xf32>		vector.print %vec : vector<3xf32>
return		return
}		}

// Broadcast.		// Broadcast.
func.func @transfer_read_1d_broadcast(		func.func @transfer_read_1d_broadcast(
%A : memref<?x?xf32>, %base1 : index, %base2 : index) {		%A : memref<?x?xf32>, %base1 : index, %base2 : index) {
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
%f = vector.transfer_read %A[%base1, %base2], %fm42		%f = vector.transfer_read %A[%base1, %base2], %fm42
{permutation_map = affine_map<(d0, d1) -> (0)>}		{permutation_map = affine_map<(d0, d1) -> (0)>, in_bounds=[true, true]}
: memref<?x?xf32>, vector<9xf32>		: memref<?x?xf32>, vector<9xf32>
vector.print %f: vector<9xf32>		vector.print %f: vector<9xf32>
return		return
}		}

// Non-contiguous, strided load.		// Non-contiguous, strided load.
func.func @transfer_read_1d_in_bounds(		func.func @transfer_read_1d_in_bounds(
%A : memref<?x?xf32>, %base1 : index, %base2 : index) {		%A : memref<?x?xf32>, %base1 : index, %base2 : index) {
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
%f = vector.transfer_read %A[%base1, %base2], %fm42		%f = vector.transfer_read %A[%base1, %base2], %fm42
{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [true]}		{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [true, true]}
: memref<?x?xf32>, vector<3xf32>		: memref<?x?xf32>, vector<3xf32>
vector.print %f: vector<3xf32>		vector.print %f: vector<3xf32>
return		return
}		}

// Non-contiguous, strided load.		// Non-contiguous, strided load.
func.func @transfer_read_1d_mask(		func.func @transfer_read_1d_mask(
%A : memref<?x?xf32>, %base1 : index, %base2 : index) {		%A : memref<?x?xf32>, %base1 : index, %base2 : index) {
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1, 0, 1]> : vector<9xi1>		%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1, 0, 1]> : vector<9xi1>
%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask		%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask
{permutation_map = affine_map<(d0, d1) -> (d0)>}		{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds=[false, true]}
: memref<?x?xf32>, vector<9xf32>		: memref<?x?xf32>, vector<9xf32>
vector.print %f: vector<9xf32>		vector.print %f: vector<9xf32>
return		return
}		}

// Non-contiguous, strided load.		// Non-contiguous, strided load.
func.func @transfer_read_1d_mask_in_bounds(		func.func @transfer_read_1d_mask_in_bounds(
%A : memref<?x?xf32>, %base1 : index, %base2 : index) {		%A : memref<?x?xf32>, %base1 : index, %base2 : index) {
%fm42 = arith.constant -42.0: f32		%fm42 = arith.constant -42.0: f32
%mask = arith.constant dense<[1, 0, 1]> : vector<3xi1>		%mask = arith.constant dense<[1, 0, 1]> : vector<3xi1>
%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask		%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask
{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [true]}		{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds = [true, true]}
: memref<?x?xf32>, vector<3xf32>		: memref<?x?xf32>, vector<3xf32>
vector.print %f: vector<3xf32>		vector.print %f: vector<3xf32>
return		return
}		}

// Non-contiguous, strided store.		// Non-contiguous, strided store.
func.func @transfer_write_1d(%A : memref<?x?xf32>, %base1 : index, %base2 : index) {		func.func @transfer_write_1d(%A : memref<?x?xf32>, %base1 : index, %base2 : index) {
%fn1 = arith.constant -1.0 : f32		%fn1 = arith.constant -1.0 : f32
%vf0 = vector.splat %fn1 : vector<7xf32>		%vf0 = vector.splat %fn1 : vector<7xf32>
vector.transfer_write %vf0, %A[%base1, %base2]		vector.transfer_write %vf0, %A[%base1, %base2]
{permutation_map = affine_map<(d0, d1) -> (d0)>}		{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds=[false, true]}
: vector<7xf32>, memref<?x?xf32>		: vector<7xf32>, memref<?x?xf32>
return		return
}		}

// Non-contiguous, strided store.		// Non-contiguous, strided store.
func.func @transfer_write_1d_mask(%A : memref<?x?xf32>, %base1 : index, %base2 : index) {		func.func @transfer_write_1d_mask(%A : memref<?x?xf32>, %base1 : index, %base2 : index) {
%fn1 = arith.constant -2.0 : f32		%fn1 = arith.constant -2.0 : f32
%vf0 = vector.splat %fn1 : vector<7xf32>		%vf0 = vector.splat %fn1 : vector<7xf32>
%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1]> : vector<7xi1>		%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1]> : vector<7xi1>
vector.transfer_write %vf0, %A[%base1, %base2], %mask		vector.transfer_write %vf0, %A[%base1, %base2], %mask
{permutation_map = affine_map<(d0, d1) -> (d0)>}		{permutation_map = affine_map<(d0, d1) -> (d0)>, in_bounds=[false, true]}
: vector<7xf32>, memref<?x?xf32>		: vector<7xf32>, memref<?x?xf32>
return		return
}		}

func.func @entry() {		func.func @entry() {
%c0 = arith.constant 0: index		%c0 = arith.constant 0: index
%c1 = arith.constant 1: index		%c1 = arith.constant 1: index
%c2 = arith.constant 2: index		%c2 = arith.constant 2: index
▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir

	Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
	}			}

	// Vector load with mask + broadcast.			// Vector load with mask + broadcast.
	func.func @transfer_read_2d_mask_broadcast(			func.func @transfer_read_2d_mask_broadcast(
	%A : memref<?x?xf32>, %base1: index, %base2: index) {			%A : memref<?x?xf32>, %base1: index, %base2: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1, 0, 1]> : vector<9xi1>			%mask = arith.constant dense<[1, 0, 1, 0, 1, 1, 1, 0, 1]> : vector<9xi1>
	%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask			%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask
	{permutation_map = affine_map<(d0, d1) -> (0, d1)>} :			{permutation_map = affine_map<(d0, d1) -> (0, d1)>,
				in_bounds = [true, false]} :
	memref<?x?xf32>, vector<4x9xf32>			memref<?x?xf32>, vector<4x9xf32>
	vector.print %f: vector<4x9xf32>			vector.print %f: vector<4x9xf32>
	return			return
	}			}

	// Transpose + vector load with mask + broadcast.			// Transpose + vector load with mask + broadcast.
	func.func @transfer_read_2d_mask_transpose_broadcast_last_dim(			func.func @transfer_read_2d_mask_transpose_broadcast_last_dim(
	%A : memref<?x?xf32>, %base1: index, %base2: index) {			%A : memref<?x?xf32>, %base1: index, %base2: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%mask = arith.constant dense<[1, 0, 1, 1]> : vector<4xi1>			%mask = arith.constant dense<[1, 0, 1, 1]> : vector<4xi1>
	%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask			%f = vector.transfer_read %A[%base1, %base2], %fm42, %mask
	{permutation_map = affine_map<(d0, d1) -> (d1, 0)>} :			{permutation_map = affine_map<(d0, d1) -> (d1, 0)>,
				in_bounds = [true, false]} :
	memref<?x?xf32>, vector<4x9xf32>			memref<?x?xf32>, vector<4x9xf32>
	vector.print %f: vector<4x9xf32>			vector.print %f: vector<4x9xf32>
	return			return
	}			}

	// Load + transpose.			// Load + transpose.
	func.func @transfer_read_2d_transposed(			func.func @transfer_read_2d_transposed(
	%A : memref<?x?xf32>, %base1: index, %base2: index) {			%A : memref<?x?xf32>, %base1: index, %base2: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%base1, %base2], %fm42			%f = vector.transfer_read %A[%base1, %base2], %fm42
	{permutation_map = affine_map<(d0, d1) -> (d1, d0)>} :			{permutation_map = affine_map<(d0, d1) -> (d1, d0)>} :
	memref<?x?xf32>, vector<4x9xf32>			memref<?x?xf32>, vector<4x9xf32>
	vector.print %f: vector<4x9xf32>			vector.print %f: vector<4x9xf32>
	return			return
	}			}

	// Load 1D + broadcast to 2D.			// Load 1D + broadcast to 2D.
	func.func @transfer_read_2d_broadcast(			func.func @transfer_read_2d_broadcast(
	%A : memref<?x?xf32>, %base1: index, %base2: index) {			%A : memref<?x?xf32>, %base1: index, %base2: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%base1, %base2], %fm42			%f = vector.transfer_read %A[%base1, %base2], %fm42
	{permutation_map = affine_map<(d0, d1) -> (d1, 0)>} :			{permutation_map = affine_map<(d0, d1) -> (d1, 0)>,
				in_bounds = [true, false]} :
	memref<?x?xf32>, vector<4x9xf32>			memref<?x?xf32>, vector<4x9xf32>
	vector.print %f: vector<4x9xf32>			vector.print %f: vector<4x9xf32>
	return			return
	}			}

	// Vector store.			// Vector store.
	func.func @transfer_write_2d(%A : memref<?x?xf32>, %base1: index, %base2: index) {			func.func @transfer_write_2d(%A : memref<?x?xf32>, %base1: index, %base2: index) {
	%fn1 = arith.constant -1.0 : f32			%fn1 = arith.constant -1.0 : f32
	▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-3d.mlir

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf,lower-affine,convert-scf-to-cf),convert-vector-to-llvm,finalize-memref-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \| \			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf,lower-affine,convert-scf-to-cf),convert-vector-to-llvm,finalize-memref-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf{full-unroll=true},lower-affine,convert-scf-to-cf),convert-vector-to-llvm,finalize-memref-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \| \			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(convert-vector-to-scf{full-unroll=true},lower-affine,convert-scf-to-cf),convert-vector-to-llvm,finalize-memref-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \| \
	// RUN: mlir-cpu-runner -e entry -entry-point-result=void \			// RUN: mlir-cpu-runner -e entry -entry-point-result=void \
	// RUN: -shared-libs=%mlir_c_runner_utils \| \			// RUN: -shared-libs=%mlir_c_runner_utils \| \
	// RUN: FileCheck %s			// RUN: FileCheck %s

	func.func @transfer_read_3d(%A : memref<?x?x?x?xf32>,			func.func @transfer_read_3d(%A : memref<?x?x?x?xf32>,
	%o: index, %a: index, %b: index, %c: index) {			%o: index, %a: index, %b: index, %c: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42			%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42
				{in_bounds = [true, false, false, false]}
	: memref<?x?x?x?xf32>, vector<2x5x3xf32>			: memref<?x?x?x?xf32>, vector<2x5x3xf32>
	vector.print %f: vector<2x5x3xf32>			vector.print %f: vector<2x5x3xf32>
	return			return
	}			}

	func.func @transfer_read_3d_and_extract(%A : memref<?x?x?x?xf32>,			func.func @transfer_read_3d_and_extract(%A : memref<?x?x?x?xf32>, %o: index,
	%o: index, %a: index, %b: index, %c: index) {			%a: index, %b: index, %c: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42			%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42
	{in_bounds = [true, true, true]}			{in_bounds = [true, true, true, true]}
	: memref<?x?x?x?xf32>, vector<2x5x3xf32>			: memref<?x?x?x?xf32>, vector<2x5x3xf32>
	%sub = vector.extract %f[0] : vector<2x5x3xf32>			%sub = vector.extract %f[0] : vector<2x5x3xf32>
	vector.print %sub: vector<5x3xf32>			vector.print %sub: vector<5x3xf32>
	return			return
	}			}

	func.func @transfer_read_3d_broadcast(%A : memref<?x?x?x?xf32>,			func.func @transfer_read_3d_broadcast(%A : memref<?x?x?x?xf32>, %o: index,
	%o: index, %a: index, %b: index, %c: index) {			%a: index, %b: index, %c: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42			%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42
	{permutation_map = affine_map<(d0, d1, d2, d3) -> (d1, 0, d3)>}			{permutation_map = affine_map<(d0, d1, d2, d3) -> (d1, 0, d3)>,
				in_bounds = [true, false, true, false]}
	: memref<?x?x?x?xf32>, vector<2x5x3xf32>			: memref<?x?x?x?xf32>, vector<2x5x3xf32>
	vector.print %f: vector<2x5x3xf32>			vector.print %f: vector<2x5x3xf32>
	return			return
	}			}

	func.func @transfer_read_3d_mask_broadcast(			func.func @transfer_read_3d_mask_broadcast(
	%A : memref<?x?x?x?xf32>, %o: index, %a: index, %b: index, %c: index) {			%A : memref<?x?x?x?xf32>, %o: index, %a: index, %b: index, %c: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%mask = arith.constant dense<[0, 1]> : vector<2xi1>			%mask = arith.constant dense<[0, 1]> : vector<2xi1>
	%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42, %mask			%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42, %mask
	{permutation_map = affine_map<(d0, d1, d2, d3) -> (d1, 0, 0)>}			{permutation_map = affine_map<(d0, d1, d2, d3) -> (d1, 0, 0)>,
				in_bounds = [true, false, true, true]}
	: memref<?x?x?x?xf32>, vector<2x5x3xf32>			: memref<?x?x?x?xf32>, vector<2x5x3xf32>
	vector.print %f: vector<2x5x3xf32>			vector.print %f: vector<2x5x3xf32>
	return			return
	}			}

	func.func @transfer_read_3d_transposed(%A : memref<?x?x?x?xf32>,			func.func @transfer_read_3d_transposed(%A : memref<?x?x?x?xf32>, %o: index,
	%o: index, %a: index, %b: index, %c: index) {			%a: index, %b: index, %c: index) {
	%fm42 = arith.constant -42.0: f32			%fm42 = arith.constant -42.0: f32
	%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42			%f = vector.transfer_read %A[%o, %a, %b, %c], %fm42
	{permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, d0, d1)>}			{permutation_map = affine_map<(d0, d1, d2, d3) -> (d3, d0, d1)>,
				in_bounds = [false, false, true, false]}
	: memref<?x?x?x?xf32>, vector<3x5x3xf32>			: memref<?x?x?x?xf32>, vector<3x5x3xf32>
	vector.print %f: vector<3x5x3xf32>			vector.print %f: vector<3x5x3xf32>
	return			return
	}			}

	func.func @transfer_write_3d(%A : memref<?x?x?x?xf32>,			func.func @transfer_write_3d(%A : memref<?x?x?x?xf32>, %o: index, %a: index,
	%o: index, %a: index, %b: index, %c: index) {			%b: index, %c: index) {
	%fn1 = arith.constant -1.0 : f32			%fn1 = arith.constant -1.0 : f32
	%vf0 = vector.splat %fn1 : vector<2x9x3xf32>			%vf0 = vector.splat %fn1 : vector<2x9x3xf32>
	vector.transfer_write %vf0, %A[%o, %a, %b, %c]			vector.transfer_write %vf0, %A[%o, %a, %b, %c]
				{in_bounds = [true, false, false, false]}
	: vector<2x9x3xf32>, memref<?x?x?x?xf32>			: vector<2x9x3xf32>, memref<?x?x?x?xf32>
	return			return
	}			}

	func.func @entry() {			func.func @entry() {
	%c0 = arith.constant 0: index			%c0 = arith.constant 0: index
	%c1 = arith.constant 1: index			%c1 = arith.constant 1: index
	%c2 = arith.constant 2: index			%c2 = arith.constant 2: index
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-to-loops.mlir

Show First 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	func.func @main() {
vector.print %4 : vector<5x5xf32>		vector.print %4 : vector<5x5xf32>
// Transposed 5x5 block rooted @{2, 3} in memory.		// Transposed 5x5 block rooted @{2, 3} in memory.
// CHECK-NEXT: ( ( 403, 404, 405, 305, -42 ),		// CHECK-NEXT: ( ( 403, 404, 405, 305, -42 ),
// CHECK-SAME: ( 503, 504, 505, 405, -42 ),		// CHECK-SAME: ( 503, 504, 505, 405, -42 ),
// CHECK-SAME: ( 502, 503, 504, 505, -42 ),		// CHECK-SAME: ( 502, 503, 504, 505, -42 ),
// CHECK-SAME: ( -42, -42, -42, -42, -42 ),		// CHECK-SAME: ( -42, -42, -42, -42, -42 ),
// CHECK-SAME: ( -42, -42, -42, -42, -42 ) )		// CHECK-SAME: ( -42, -42, -42, -42, -42 ) )

%5 = vector.transfer_read %0[%c2, %c3], %cst {permutation_map = #map1} : memref<?x?xf32>, vector<5xf32>		%5 = vector.transfer_read %0[%c2, %c3], %cst {permutation_map = #map1, in_bounds = [true, false]} : memref<?x?xf32>, vector<5xf32>
vector.print %5 : vector<5xf32>		vector.print %5 : vector<5xf32>
// CHECK-NEXT: ( 403, 503, 502, -42, -42 )		// CHECK-NEXT: ( 403, 503, 502, -42, -42 )

memref.dealloc %0 : memref<?x?xf32>		memref.dealloc %0 : memref<?x?xf32>
return		return
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][vector] Transfer ops: one `in_bounds` bool per memref/tensor dimChanges PlannedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 540433

mlir/include/mlir/Dialect/Vector/IR/VectorOps.td

mlir/include/mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h

mlir/include/mlir/Dialect/Vector/Transforms/VectorTransforms.h

mlir/include/mlir/Interfaces/VectorInterfaces.td

mlir/lib/Conversion/VectorToSCF/VectorToSCF.cpp

mlir/lib/Dialect/Affine/Transforms/SuperVectorize.cpp

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/lib/Dialect/MemRef/Transforms/FoldMemRefAliasOps.cpp

mlir/lib/Dialect/Tensor/Transforms/FoldTensorSubsetOps.cpp

mlir/lib/Dialect/Vector/IR/VectorOps.cpp

mlir/lib/Dialect/Vector/Transforms/LowerVectorTransfer.cpp

mlir/lib/Dialect/Vector/Transforms/VectorDropLeadUnitDim.cpp

mlir/lib/Dialect/Vector/Transforms/VectorTransferOpTransforms.cpp

mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

mlir/test/Conversion/GPUCommon/transfer_write.mlir

mlir/test/Conversion/VectorToGPU/vector-to-mma-ops-mma-sync.mlir

mlir/test/Conversion/VectorToGPU/vector-to-mma-ops.mlir

mlir/test/Conversion/VectorToLLVM/vector-to-llvm.mlir

mlir/test/Conversion/VectorToSCF/tensor-transfer-ops.mlir

mlir/test/Conversion/VectorToSCF/unrolled-tensor-transfer-ops.mlir

mlir/test/Conversion/VectorToSCF/vector-to-scf-mask-and-permutation-map.mlir

mlir/test/Conversion/VectorToSCF/vector-to-scf.mlir

mlir/test/Dialect/Affine/SuperVectorize/vector_utils.mlir

mlir/test/Dialect/Affine/SuperVectorize/vectorize_1d.mlir

mlir/test/Dialect/Affine/SuperVectorize/vectorize_2d.mlir

mlir/test/Dialect/Affine/SuperVectorize/vectorize_outer_loop_2d.mlir

mlir/test/Dialect/Affine/SuperVectorize/vectorize_outer_loop_transpose_2d.mlir

mlir/test/Dialect/Affine/SuperVectorize/vectorize_transpose_2d.mlir

mlir/test/Dialect/Linalg/hoisting.mlir

mlir/test/Dialect/Linalg/vectorization-masked.mlir

mlir/test/Dialect/Linalg/vectorization.mlir

mlir/test/Dialect/Linalg/vectorize-tensor-extract.mlir

mlir/test/Dialect/MemRef/extract-address-computations.mlir

mlir/test/Dialect/MemRef/fold-memref-alias-ops.mlir

mlir/test/Dialect/Tensor/fold-tensor-subset-ops-into-vector-transfers.mlir

mlir/test/Dialect/Tensor/fold-tensor-subset-ops.mlir

mlir/test/Dialect/Vector/canonicalize.mlir

mlir/test/Dialect/Vector/invalid.mlir

mlir/test/Dialect/Vector/ops.mlir

mlir/test/Dialect/Vector/scalar-vector-transfer-to-memref.mlir

mlir/test/Dialect/Vector/vector-dropleadunitdim-transforms.mlir

mlir/test/Dialect/Vector/vector-transfer-collapse-inner-most-dims.mlir

mlir/test/Dialect/Vector/vector-transfer-drop-unit-dims-patterns.mlir

mlir/test/Dialect/Vector/vector-transfer-flatten.mlir

mlir/test/Dialect/Vector/vector-transfer-materialize-masks.mlir

mlir/test/Dialect/Vector/vector-transfer-permutation-lowering.mlir

mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir

mlir/test/Dialect/Vector/vector-transfer-unroll.mlir

mlir/test/Dialect/Vector/vector-warp-distribute.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_coo_test.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_sampled_matmul.mlir

mlir/test/Integration/Dialect/SparseTensor/CPU/sparse_transpose.mlir

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-matmul-lib.mlir

mlir/test/Integration/Dialect/SparseTensor/GPU/CUDA/sparse-mma-2-4-f16.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-1d.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-2d.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-read-3d.mlir

mlir/test/Integration/Dialect/Vector/CPU/test-transfer-to-loops.mlir

[mlir][vector] Transfer ops: one `in_bounds` bool per memref/tensor dim
Changes PlannedPublic