This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/
-
mlir/
-
Dialect/
-
Bufferization/
-
IR/
-
Bufferization.h
-
BufferizationBase.td
3/5
BufferizationOps.td
-
Transforms/
-
AllocTensorElimination.h
-
Passes.h
-
Passes.td
-
Linalg/
-
IR/
-
LinalgOps.td
-
Passes.h
-
Passes.td
-
Transforms/
-
BufferizableOpInterfaceImpl.h
-
lib/Dialect/
-
Dialect/
-
Bufferization/
-
IR/
-
BufferizationDialect.cpp
-
BufferizationOps.cpp
-
CMakeLists.txt
-
Transforms/
-
AllocTensorElimination.cpp
-
Bufferize.cpp
-
CMakeLists.txt
-
OneShotAnalysis.cpp
-
OneShotModuleBufferize.cpp
-
Linalg/Transforms/
-
Transforms/
-
BufferizableOpInterfaceImpl.cpp
-
CMakeLists.txt
-
InitTensorElimination.cpp
-
InitTensorToAllocTensor.cpp
-
python/mlir/dialects/
-
mlir/
-
dialects/
-
BufferizationOps.td
-
_bufferization_ops_ext.py
-
test/
-
Dialect/
-
Bufferization/
-
Transforms/
-
one-shot-bufferize-alloc-tensor-elimination.mlir
-
one-shot-bufferize-allow-return-allocs.mlir
-
one-shot-bufferize-partial.mlir
-
one-shot-bufferize.mlir
-
one-shot-module-bufferize-allow-return-allocs.mlir
-
one-shot-module-bufferize-analysis.mlir
-
one-shot-module-bufferize-invalid.mlir
-
one-shot-module-bufferize.mlir
-
canonicalize.mlir
-
invalid.mlir
-
Linalg/
-
one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir
-
one-shot-bufferize-analysis-init-tensor-elimination.mlir
-
one-shot-bufferize-init-tensor-elimination.mlir
-
one-shot-bufferize.mlir
-
SCF/
-
one-shot-bufferize-analysis.mlir
-
one-shot-bufferize.mlir
-
Tensor/
-
one-shot-bufferize.mlir
-
Integration/Dialect/Linalg/CPU/
-
Dialect/
-
Linalg/
-
CPU/
-
test-one-shot-bufferize.mlir
-
test-padtensor.mlir
-
utils/bazel/llvm-project-overlay/mlir/
-
bazel/
-
llvm-project-overlay/
-
mlir/
-
BUILD.bazel

Differential D126003

[mlir][bufferization] Add bufferization.alloc_tensor op
ClosedPublic

Authored by springerm on May 19 2022, 11:55 AM.

Download Raw Diff

Details

Reviewers

mravishankar
nicolasvasilache
pifon2a
silvas
aartbik

Commits

rGffdbecccafdf: [mlir][bufferization] Add bufferization.alloc_tensor op

Summary

This change adds a new op alloc_tensor to the bufferization dialect. During bufferization, this op is always lowered to a buffer allocation (unless it is "eliminated" by a pre-processing pass). It is useful to have such an op in tensor land, because it allows users to model tensor SSA use-def chains (which drive bufferization decisions) and because tensor SSA use-def chains can be analyzed by One-Shot Bufferize, while memref values cannot.

This change also replaces all uses of linalg.init_tensor in bufferization-related code with bufferization.alloc_tensor.

linalg.init_tensor and bufferization.alloc_tensor are similar, but the purpose of the former one is just to carry a shape. It does not indicate a memory allocation.

linalg.init_tensor is not suitable for modelling SSA use-def chains for bufferization purposes, because linalg.init_tensor is marked as not having side effects (in contrast to alloc_tensor). As such, it is legal to move linalg.init_tensor ops around/CSE them/etc. This is not desirable for alloc_tensor; it represents an explicit buffer allocation while still in tensor land and such allocations should not suddenly disappear or get moved around when running the canonicalizer/CSE/etc.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.May 19 2022, 11:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 11:55 AM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 19 others. · View Herald Transcript

springerm requested review of this revision.May 19 2022, 11:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 11:56 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B165382: Diff 430766.May 19 2022, 1:10 PM

This direction makes sense to me. I am a little out of the loop from the current bufferization interfaces, so I will let someone else review the code in detail.

This revision is now accepted and ready to land.May 20 2022, 5:37 AM

mehdi_amini added inline comments.May 20 2022, 8:44 AM

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
34	This does not document clearly if this op is breaking the value semantics of tensors or not, can you clarify?

springerm added inline comments.May 20 2022, 8:58 AM

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
34	I'd say that reading from the result of a `bufferization.alloc_tensor` is undefinied behavior. Would that clarify it? Similarly, reading from an uninitialized portion of a tensor is undefined behavior. E.g.: %0 = alloc_tensor : tensor<10xf32> %1 = tensor.insert_tensor ... into %0[0][5][1] : tensor<5xf32> into tensor<10xf32> %2 = tensor.extract %1[%c6] : tensor<10xf32> # undefined

springerm added inline comments.May 20 2022, 9:15 AM

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
34	Just read this again, and I think the last sentence `The contents of the buffer are unspecified.` may be badly formulated. This op returns a tensor, not a buffer. So talking about buffers in the op description could be confusing. The tensor returned by this op is read-only (just as any other tensor), there is no concept of writing into a tensor, etc. That's probably what you mean by "value semantics".

update

Harbormaster completed remote builds in B165540: Diff 430996.May 20 2022, 9:59 AM

mehdi_amini added inline comments.May 20 2022, 1:59 PM

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
34	The tensor returned by this op is read-only (just as any other tensor), there is no concept of writing into a tensor, etc. That's probably what you mean by "value semantics". Yes. I am also not sure why there is UB involved here? Couldn't we leave it at "reading from the result of an `alloc_tensor` op yields an undefined value"? Also, something like this should be legal IR and bufferization should gracefully manage the need of two allocations for example: %0 = alloc_tensor : tensor<10xf32> %1 = linalg.generic outs(%0) %2 = linalg.generic outs(%0)

springerm added inline comments.May 20 2022, 5:25 PM

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td
34	Ah yes you're right. I was thinking of `tensor.insert_slice`, which is essentially a copy. But it's actually just another example of "reading a tensor". Your example above is legal IR and the bufferization indeed generates two allocations. (Assuming that `%1` is read at some point.)

Herald added a subscriber: bzcheeseman. · View Herald TranscriptMay 20 2022, 5:25 PM

update

This revision was landed with ongoing or failed builds.May 20 2022, 5:57 PM

Closed by commit rGffdbecccafdf: [mlir][bufferization] Add bufferization.alloc_tensor op (authored by springerm). · Explain Why

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rGffdbecccafdf: [mlir][bufferization] Add bufferization.alloc_tensor op.

Harbormaster completed remote builds in B165626: Diff 431104.May 20 2022, 6:03 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Bufferization/

IR/

Bufferization.h

1 line

BufferizationBase.td

4 lines

BufferizationOps.td

116 lines

Transforms/

Passes.h

7 lines

Passes.td

17 lines

	Bufferization/	Transforms/
	Linalg/	Transforms/

	AllocTensorElimination.h
	BufferizableOpInterfaceImpl.h

38 lines

Linalg/

IR/

LinalgOps.td

11 lines

Passes.h

5 lines

Passes.td

14 lines

Transforms/

BufferizableOpInterfaceImpl.h

32 lines

lib/

Dialect/

Bufferization/

IR/

BufferizationDialect.cpp

1 line

BufferizationOps.cpp

162 lines

CMakeLists.txt

1 line

Transforms/

AllocTensorElimination.cpp

272 lines

Bufferize.cpp

22 lines

CMakeLists.txt

2 lines

OneShotAnalysis.cpp

4 lines

OneShotModuleBufferize.cpp

2 lines

Linalg/

Transforms/

BufferizableOpInterfaceImpl.cpp

248 lines

CMakeLists.txt

2 lines

InitTensorElimination.cpp

InitTensorToAllocTensor.cpp

55 lines

python/

mlir/

dialects/

BufferizationOps.td

15 lines

_bufferization_ops_ext.py

51 lines

test/

Dialect/

	Bufferization/	Transforms/
		Linalg/

	one-shot-bufferize-alloc-tensor-elimination.mlir
	one-shot-bufferize-init-tensor-elimination.mlir

18 lines

Bufferization/

Transforms/

one-shot-bufferize-allow-return-allocs.mlir

2 lines

one-shot-bufferize-partial.mlir

4 lines

one-shot-bufferize.mlir

6 lines

one-shot-module-bufferize-allow-return-allocs.mlir

2 lines

one-shot-module-bufferize-analysis.mlir

22 lines

one-shot-module-bufferize-invalid.mlir

10 lines

one-shot-module-bufferize.mlir

10 lines

canonicalize.mlir

13 lines

invalid.mlir

26 lines

Linalg/

one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir

50 lines

one-shot-bufferize-analysis-init-tensor-elimination.mlir

6 lines

one-shot-bufferize-init-tensor-elimination.mlir

one-shot-bufferize.mlir

12 lines

SCF/

one-shot-bufferize-analysis.mlir

2 lines

one-shot-bufferize.mlir

6 lines

Tensor/

one-shot-bufferize.mlir

4 lines

Integration/

Dialect/

Linalg/

CPU/

test-one-shot-bufferize.mlir

10 lines

test-padtensor.mlir

3 lines

utils/

bazel/

llvm-project-overlay/

mlir/

BUILD.bazel

4 lines

Diff 431107

mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h

	//===- Bufferization.h - Bufferization dialect ------------------- C++ --===//			//===- Bufferization.h - Bufferization dialect ------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_BUFFERIZATION_IR_BUFFERIZATION_H_			#ifndef MLIR_DIALECT_BUFFERIZATION_IR_BUFFERIZATION_H_
	#define MLIR_DIALECT_BUFFERIZATION_IR_BUFFERIZATION_H_			#define MLIR_DIALECT_BUFFERIZATION_IR_BUFFERIZATION_H_

	#include "mlir/Dialect/Bufferization/IR/AllocationOpInterface.h"			#include "mlir/Dialect/Bufferization/IR/AllocationOpInterface.h"
	#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"			#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
	#include "mlir/Interfaces/CopyOpInterface.h"			#include "mlir/Interfaces/CopyOpInterface.h"
				#include "mlir/Interfaces/InferTypeOpInterface.h"

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Bufferization Dialect			// Bufferization Dialect
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Dialect/Bufferization/IR/BufferizationOpsDialect.h.inc"			#include "mlir/Dialect/Bufferization/IR/BufferizationOpsDialect.h.inc"

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	Show All 36 Lines

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationBase.td

Show All 19 Lines	let description = [{
The `bufferization` dialect is intended to collect operations/interfaces		The `bufferization` dialect is intended to collect operations/interfaces
specific to the bufferization passes.		specific to the bufferization passes.

Overview of the bufferization infrastructure and important conceptual		Overview of the bufferization infrastructure and important conceptual
details related to using the MLIR dialect conversion infrastructure can be		details related to using the MLIR dialect conversion infrastructure can be
found in [bufferization](/docs/Bufferization/) and [buffer		found in [bufferization](/docs/Bufferization/) and [buffer
deallocation](/docs/BufferDeallocationInternals/).		deallocation](/docs/BufferDeallocationInternals/).
}];		}];
let dependentDialects = ["memref::MemRefDialect", "tensor::TensorDialect"];		let dependentDialects = [
		"AffineDialect", "memref::MemRefDialect", "tensor::TensorDialect"
		];

let extraClassDeclaration = [{		let extraClassDeclaration = [{
/// An attribute that can override writability of buffers of tensor function		/// An attribute that can override writability of buffers of tensor function
/// arguments during One-Shot Module Bufferize.		/// arguments during One-Shot Module Bufferize.
constexpr const static ::llvm::StringLiteral		constexpr const static ::llvm::StringLiteral
kWritableAttrName = "bufferization.writable";		kWritableAttrName = "bufferization.writable";

/// Attribute name used to mark the bufferization layout for region		/// Attribute name used to mark the bufferization layout for region
/// arguments during One-Shot Module Bufferize.		/// arguments during One-Shot Module Bufferize.
constexpr const static ::llvm::StringLiteral		constexpr const static ::llvm::StringLiteral
kBufferLayoutAttrName = "bufferization.buffer_layout";		kBufferLayoutAttrName = "bufferization.buffer_layout";
}];		}];
let hasOperationAttrVerify = 1;		let hasOperationAttrVerify = 1;
}		}

#endif // BUFFERIZATION_BASE		#endif // BUFFERIZATION_BASE

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td

	//===- BufferizationOps.td - Bufferization op definitions ----------- tablegen --===//			//===- BufferizationOps.td - Bufferization op definitions ----------- tablegen --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef BUFFERIZATION_OPS			#ifndef BUFFERIZATION_OPS
	#define BUFFERIZATION_OPS			#define BUFFERIZATION_OPS

	include "mlir/Dialect/Bufferization/IR/AllocationOpInterface.td"			include "mlir/Dialect/Bufferization/IR/AllocationOpInterface.td"
	include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td"			include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.td"
	include "mlir/Dialect/Bufferization/IR/BufferizationBase.td"			include "mlir/Dialect/Bufferization/IR/BufferizationBase.td"
				include "mlir/Interfaces/InferTypeOpInterface.td"
	include "mlir/Interfaces/SideEffectInterfaces.td"			include "mlir/Interfaces/SideEffectInterfaces.td"
	include "mlir/Interfaces/CopyOpInterface.td"			include "mlir/Interfaces/CopyOpInterface.td"

	class Bufferization_Op<string mnemonic, list<Trait> traits = []>			class Bufferization_Op<string mnemonic, list<Trait> traits = []>
	: Op<Bufferization_Dialect, mnemonic, traits>;			: Op<Bufferization_Dialect, mnemonic, traits>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				// AllocTensorOp
				//===----------------------------------------------------------------------===//

				def Bufferization_AllocTensorOp : Bufferization_Op<"alloc_tensor",
				[BufferizableOpInterface,
				DeclareOpInterfaceMethods<ReifyRankedShapedTypeOpInterface>]> {
				let summary = "buffer allocation in tensor land";

				let description = [{
				`bufferization.alloc_tensor` is an operation that bufferizes to a buffer
				allocation of a given shape. The shape could be dynamic or static.
				Reading from the result of an `alloc_tensor` op yields an undefined value.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions This does not document clearly if this op is breaking the value semantics of tensors or not, can you clarify? mehdi_amini: This does not document clearly if this op is breaking the value semantics of tensors or not…
				springermAuthorUnsubmitted Done Reply Inline Actions I'd say that reading from the result of a `bufferization.alloc_tensor` is undefinied behavior. Would that clarify it? Similarly, reading from an uninitialized portion of a tensor is undefined behavior. E.g.: %0 = alloc_tensor : tensor<10xf32> %1 = tensor.insert_tensor ... into %0[0][5][1] : tensor<5xf32> into tensor<10xf32> %2 = tensor.extract %1[%c6] : tensor<10xf32> # undefined springerm: I'd say that reading from the result of a `bufferization.alloc_tensor` is undefinied behavior.
				springermAuthorUnsubmitted Done Reply Inline Actions Just read this again, and I think the last sentence `The contents of the buffer are unspecified.` may be badly formulated. This op returns a tensor, not a buffer. So talking about buffers in the op description could be confusing. The tensor returned by this op is read-only (just as any other tensor), there is no concept of writing into a tensor, etc. That's probably what you mean by "value semantics". springerm: Just read this again, and I think the last sentence `The contents of the buffer are unspecified.
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions The tensor returned by this op is read-only (just as any other tensor), there is no concept of writing into a tensor, etc. That's probably what you mean by "value semantics". Yes. I am also not sure why there is UB involved here? Couldn't we leave it at "reading from the result of an `alloc_tensor` op yields an undefined value"? Also, something like this should be legal IR and bufferization should gracefully manage the need of two allocations for example: %0 = alloc_tensor : tensor<10xf32> %1 = linalg.generic outs(%0) %2 = linalg.generic outs(%0) mehdi_amini: > The tensor returned by this op is read-only (just as any other tensor), there is no concept…
				springermAuthorUnsubmitted Done Reply Inline Actions Ah yes you're right. I was thinking of `tensor.insert_slice`, which is essentially a copy. But it's actually just another example of "reading a tensor". Your example above is legal IR and the bufferization indeed generates two allocations. (Assuming that `%1` is read at some point.) springerm: Ah yes you're right. I was thinking of `tensor.insert_slice`, which is essentially a copy. But…

				`alloc_tensor` is a helper op for bufferization. It marks the beginning of
				a new tensor SSA use-def chain and is used to control in-place bufferization
				decisions during One-Shot Bufferize.
				}];

				let arguments =
				(ins Variadic<Index>:$sizes, I64ArrayAttr:$static_sizes);

				let results = (outs AnyTensor:$result);

				let assemblyFormat = [{
				custom<OperandsOrIntegersSizesList>($sizes, $static_sizes) attr-dict
				`:` type($result)
				}];

				let extraClassDeclaration = [{
				LogicalResult bufferize(RewriterBase &rewriter, BufferizationState &state);

				bool isMemoryWrite(OpResult opResult, const AnalysisState &state) const {
				// AllocTensorOps allocate but do not write.
				return false;
				}

				static StringRef getStaticSizesAttrName() {
				return "static_sizes";
				}

				RankedTensorType getType() {
				return getResult().getType().cast<RankedTensorType>();
				}

				// Infer the shape of the result tensor given the static shapes
				// and element type of the result tensor.
				static Type inferResultType(ArrayRef<int64_t> staticSizes, Type elementType,
				Attribute encoding = {});

				// Return true if the size of the tensor is dynamic at `idx`
				bool isDynamicSize(unsigned idx) {
				APInt v = *(static_sizes().getAsValueRange<IntegerAttr>().begin() + idx);
				return ShapedType::isDynamic(v.getSExtValue());
				}

				// Assert that the size of the result tensor is static at `idx`
				// and return the shape.
				int64_t getStaticSize(unsigned idx) {
				assert(!isDynamicSize(idx) && "expected static size");
				APInt v = *(static_sizes().
				template getAsValueRange<IntegerAttr>().begin() + idx);
				return v.getSExtValue();
				}

				// Return the argument position that contains the dynamic size of
				// the tensor at dimension `idx`. Asserts that the shape is
				// dynamic at that `idx`.
				unsigned getIndexOfDynamicSize(unsigned idx) {
				assert(isDynamicSize(idx) && "expected dynamic size");
				return std::count_if(
				static_sizes().getValue().begin(),
				static_sizes().getValue().begin() + idx,
				[&](Attribute attr) {
				return ShapedType::isDynamic(attr.cast<IntegerAttr>().getInt());
				});
				}

				// Return both static and dynamic sizes as a list of `OpFoldResult`.
				SmallVector<OpFoldResult> getMixedSizes();

				// Return the Value of the dynamic size of the tensor at dimension
				// `idx`. Asserts that the shape is dynamic at that `idx.
				Value getDynamicSize(unsigned idx) {
				return getOperand(getIndexOfDynamicSize(idx));
				}
				}];

				let builders = [
				OpBuilder<(ins "ValueRange":$shape,
				"ArrayRef<int64_t>":$staticShape, "Type":$elementType),
				[{
				build($_builder, $_state,
				AllocTensorOp::inferResultType(staticShape, elementType),
				shape, $_builder.getI64ArrayAttr(staticShape));
				}]>,
				OpBuilder<(ins "ValueRange":$shape, "Type":$elementType),
				[{
				SmallVector<int64_t, 4> staticShape(
				shape.size(), ShapedType::kDynamicSize);
				build($_builder, $_state, shape, staticShape, elementType);
				}]>,
				OpBuilder<(ins "ArrayRef<int64_t>":$staticShape, "Type":$elementType),
				[{
				build($_builder, $_state, ValueRange{}, staticShape, elementType);
				}]>,
				OpBuilder<(ins "ArrayRef<OpFoldResult>":$sizes, "Type":$elementType,
				CArg<"ArrayRef<NamedAttribute>", "{}">:$attrs)>
				];

				let hasCanonicalizer = 1;
				let hasCustomAssemblyFormat = 1;
				let hasVerifier = 1;
				}

				//===----------------------------------------------------------------------===//
	// CloneOp			// CloneOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def Bufferization_CloneOp : Bufferization_Op<"clone", [			def Bufferization_CloneOp : Bufferization_Op<"clone", [
	CopyOpInterface,			CopyOpInterface,
	DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,			DeclareOpInterfaceMethods<MemoryEffectsOpInterface>,
	DeclareOpInterfaceMethods<AllocationOpInterface, ["buildDealloc", "buildClone"]>			DeclareOpInterfaceMethods<AllocationOpInterface, ["buildDealloc", "buildClone"]>
	]> {			]> {
	▲ Show 20 Lines • Show All 200 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Bufferization/Transforms/AllocTensorElimination.h

This file was copied from mlir/include/mlir/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.h.

	//===- BufferizableOpInterfaceImpl.h - Impl. of BufferizableOpInterface ---===//			//===- AllocTensorElimination.h - alloc_tensor op elimination -------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H			#ifndef MLIR_DIALECT_BUFFERIZATION_TRANSFORMS_ALLOCTENSORELIMINATION_H
	#define MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H			#define MLIR_DIALECT_BUFFERIZATION_TRANSFORMS_ALLOCTENSORELIMINATION_H

	#include "mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h"			#include "mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h"

	namespace mlir {			namespace mlir {
	class DialectRegistry;			namespace bufferization {

	namespace linalg {			/// A function that matches anchor OpOperands for AllocTensorOp elimination.

	/// A function that matches anchor OpOperands for InitTensorOp elimination.
	/// If an OpOperand is matched, the function should populate the SmallVector			/// If an OpOperand is matched, the function should populate the SmallVector
	/// with all values that are needed during `RewriteFn` to produce the			/// with all values that are needed during `RewriteFn` to produce the
	/// replacement value.			/// replacement value.
	using AnchorMatchFn = std::function<bool(OpOperand &, SmallVector<Value> &)>;			using AnchorMatchFn = std::function<bool(OpOperand &, SmallVector<Value> &)>;

	/// A function that rewrites matched anchors.			/// A function that rewrites matched anchors.
	using RewriteFn = std::function<Value(OpBuilder &, Location, OpOperand &)>;			using RewriteFn = std::function<Value(OpBuilder &, Location, OpOperand &)>;

	/// Try to eliminate InitTensorOps inside `op`.			/// Try to eliminate AllocTensorOps inside `op`.
	///			///
	/// * `rewriteFunc` generates the replacement for the InitTensorOp.			/// * `rewriteFunc` generates the replacement for the AllocTensorOp.
	/// * Only InitTensorOps that are anchored on a matching OpOperand as per			/// * Only AllocTensorOps that are anchored on a matching OpOperand as per
	/// `anchorMatchFunc` are considered. "Anchored" means that there is a path			/// `anchorMatchFunc` are considered. "Anchored" means that there is a path
	/// on the reverse SSA use-def chain, starting from the OpOperand and always			/// on the reverse SSA use-def chain, starting from the OpOperand and always
	/// following the aliasing OpOperand, that eventually ends at a single			/// following the aliasing OpOperand, that eventually ends at a single
	/// InitTensorOp.			/// AllocTensorOp.
	LogicalResult eliminateInitTensors(RewriterBase &rewriter, Operation *op,			LogicalResult eliminateAllocTensors(RewriterBase &rewriter, Operation *op,
	bufferization::AnalysisState &state,			bufferization::AnalysisState &state,
	AnchorMatchFn anchorMatchFunc,			AnchorMatchFn anchorMatchFunc,
	RewriteFn rewriteFunc);			RewriteFn rewriteFunc);

	/// Try to eliminate InitTensorOps inside `op` that are anchored on an			/// Try to eliminate AllocTensorOps inside `op` that are anchored on an
	/// InsertSliceOp, i.e., if it is eventually inserted into another tensor			/// InsertSliceOp, i.e., if it is eventually inserted into another tensor
	/// (and some other conditions are met).			/// (and some other conditions are met).
	LogicalResult insertSliceAnchoredInitTensorEliminationStep(			LogicalResult insertSliceAnchoredAllocTensorEliminationStep(
	RewriterBase &rewriter, Operation *op, bufferization::AnalysisState &state);			RewriterBase &rewriter, Operation *op, bufferization::AnalysisState &state);

	void registerBufferizableOpInterfaceExternalModels(DialectRegistry &registry);			} // namespace bufferization

	} // namespace linalg
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H			#endif // MLIR_DIALECT_BUFFERIZATION_TRANSFORMS_ALLOCTENSORELIMINATION_H

mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	createPromoteBuffersToStackPass(unsigned maxAllocSizeInBytes = 1024,			createPromoteBuffersToStackPass(unsigned maxAllocSizeInBytes = 1024,
	unsigned maxRankOfAllocatedMemRef = 1);			unsigned maxRankOfAllocatedMemRef = 1);

	/// Creates a pass that promotes heap-based allocations to stack-based ones.			/// Creates a pass that promotes heap-based allocations to stack-based ones.
	/// Only buffers smaller with `isSmallAlloc(alloc) == true` are promoted.			/// Only buffers smaller with `isSmallAlloc(alloc) == true` are promoted.
	std::unique_ptr<Pass>			std::unique_ptr<Pass>
	createPromoteBuffersToStackPass(std::function<bool(Value)> isSmallAlloc);			createPromoteBuffersToStackPass(std::function<bool(Value)> isSmallAlloc);

				/// Create a pass that tries to eliminate alloc_tensor ops that are anchored on
				/// insert_slice ops.
				std::unique_ptr<Pass> createAllocTensorEliminationPass();

				/// Create a pass that bufferizes ops from the bufferization dialect.
				std::unique_ptr<Pass> createBufferizationBufferizePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Registration			// Registration
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	/// Register external models for AllocationOpInterface.			/// Register external models for AllocationOpInterface.
	void registerAllocationOpInterfaceExternalModels(DialectRegistry &registry);			void registerAllocationOpInterfaceExternalModels(DialectRegistry &registry);

	/// Generate the code for registering passes.			/// Generate the code for registering passes.
	#define GEN_PASS_REGISTRATION			#define GEN_PASS_REGISTRATION
	#include "mlir/Dialect/Bufferization/Transforms/Passes.h.inc"			#include "mlir/Dialect/Bufferization/Transforms/Passes.h.inc"

	} // namespace bufferization			} // namespace bufferization
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_BUFFERIZATION_TRANSFORMS_PASSES_H			#endif // MLIR_DIALECT_BUFFERIZATION_TRANSFORMS_PASSES_H

mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td

Show First 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	let description = [{
`bufferization.to_buffer` operations.		`bufferization.to_buffer` operations.

This pass will fail if not all operations can be removed or if any operation		This pass will fail if not all operations can be removed or if any operation
with tensor typed operands remains.		with tensor typed operands remains.
}];		}];
let constructor = "mlir::bufferization::createFinalizingBufferizePass()";		let constructor = "mlir::bufferization::createFinalizingBufferizePass()";
}		}

		def BufferizationBufferize : Pass<"bufferization-bufferize", "func::FuncOp"> {
		let summary = "Bufferize the `bufferization` dialect";
		let constructor = "mlir::bufferization::createBufferizationBufferizePass()";
		}

def OneShotBufferize : Pass<"one-shot-bufferize", "ModuleOp"> {		def OneShotBufferize : Pass<"one-shot-bufferize", "ModuleOp"> {
let summary = "One-Shot Bufferize";		let summary = "One-Shot Bufferize";
let description = [{		let description = [{
This pass bufferizes all ops that implement `BufferizableOpInterface`. It		This pass bufferizes all ops that implement `BufferizableOpInterface`. It
first performs an inplacability analysis on SSA use-def chains of tensor		first performs an inplacability analysis on SSA use-def chains of tensor
values to determine which OpOperands may bufferize in-place, i.e., without		values to determine which OpOperands may bufferize in-place, i.e., without
inserting a buffer copy. It then rewrites the IR, inserting a buffer		inserting a buffer copy. It then rewrites the IR, inserting a buffer
allocation and copy for each OpOperand that was decided to bufferize		allocation and copy for each OpOperand that was decided to bufferize
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	Option<"maxAllocSizeInBytes", "max-alloc-size-in-bytes", "unsigned",
/default=/"1024",		/default=/"1024",
"Maximal size in bytes to promote allocations to stack.">,		"Maximal size in bytes to promote allocations to stack.">,
Option<"maxRankOfAllocatedMemRef", "max-rank-of-allocated-memref", "unsigned",		Option<"maxRankOfAllocatedMemRef", "max-rank-of-allocated-memref", "unsigned",
/default=/"1",		/default=/"1",
"Maximal memref rank to promote dynamic buffers.">,		"Maximal memref rank to promote dynamic buffers.">,
];		];
}		}

		def AllocTensorElimination : Pass<"eliminate-alloc-tensors"> {
		let summary = "Try to eliminate all alloc_tensor ops.";
		let description = [{
		This pass tries to eliminate all insert_slice op-anchored alloc_tensor ops.
		I.e., when a value that is equivalent to an alloc_tensor op is inserted into
		another tensor, this pass tries to rewrite the IR in such a way that the
		destination tensor of the insert_slice op is used directly instead of the
		alloc_tensor result.
		}];
		let constructor = "mlir::bufferization::createAllocTensorEliminationPass()";
		}

#endif // MLIR_DIALECT_BUFFERIZATION_TRANSFORMS_PASSES		#endif // MLIR_DIALECT_BUFFERIZATION_TRANSFORMS_PASSES

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td

	Show All 21 Lines

	// Base class for Linalg dialect ops that do not correspond to library calls.			// Base class for Linalg dialect ops that do not correspond to library calls.
	class Linalg_Op<string mnemonic, list<Trait> traits = []> :			class Linalg_Op<string mnemonic, list<Trait> traits = []> :
	Op<Linalg_Dialect, mnemonic, traits>;			Op<Linalg_Dialect, mnemonic, traits>;

	def Linalg_InitTensorOp : Linalg_Op<"init_tensor",			def Linalg_InitTensorOp : Linalg_Op<"init_tensor",
	[NoSideEffect,			[NoSideEffect,
	DeclareOpInterfaceMethods<ReifyRankedShapedTypeOpInterface>]> {			DeclareOpInterfaceMethods<ReifyRankedShapedTypeOpInterface>]> {
	let summary = "operation to define a tensor of particular value";			let summary = "operation to define a tensor of particular shape";

	let description = [{			let description = [{
	`linalg.init_tensor` is an operation that materializes a tensor of			`linalg.init_tensor` is an operation that defines a tensor of a particular
	a given shape. The shape could be dynamic or static.			shape. The shape could be dynamic or static. The contents of the tensor are
				unspecified and the only purpose of the op result is to materialize the
				specified shape in IR and make it available to other transformations.

				Note: This op can be lowered to a `bufferization.alloc_tensor`, at which
				point it turns into an explicit buffer allocation.
	}];			}];

	let arguments =			let arguments =
	(ins Variadic<Index>:$sizes, I64ArrayAttr:$static_sizes);			(ins Variadic<Index>:$sizes, I64ArrayAttr:$static_sizes);

	let results = (outs AnyTensor:$result);			let results = (outs AnyTensor:$result);

	let assemblyFormat = [{			let assemblyFormat = [{
	▲ Show 20 Lines • Show All 146 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/Passes.h

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	createConvertLinalgToParallelLoopsPass();			createConvertLinalgToParallelLoopsPass();

	/// Create a pass to convert Linalg operations to affine.for loops and			/// Create a pass to convert Linalg operations to affine.for loops and
	/// affine_load/affine_store accesses.			/// affine_load/affine_store accesses.
	/// Placeholder for now, this is NYI.			/// Placeholder for now, this is NYI.
	std::unique_ptr<OperationPass<func::FuncOp>>			std::unique_ptr<OperationPass<func::FuncOp>>
	createConvertLinalgToAffineLoopsPass();			createConvertLinalgToAffineLoopsPass();

	/// Create a pass that tries to eliminate init_tensor ops that are anchored on			/// Create a pass that rewrites init_tensor to alloc_tensor.
	/// insert_slice ops.			std::unique_ptr<Pass> createLinalgInitTensorToAllocTensorPass();
	std::unique_ptr<Pass> createLinalgInitTensorEliminationPass();

	/// Create a pass to convert Linalg operations which work on tensors to use			/// Create a pass to convert Linalg operations which work on tensors to use
	/// buffers instead.			/// buffers instead.
	std::unique_ptr<OperationPass<func::FuncOp>> createLinalgBufferizePass();			std::unique_ptr<OperationPass<func::FuncOp>> createLinalgBufferizePass();

	/// Create a pass to convert named Linalg operations to Linalg generic			/// Create a pass to convert named Linalg operations to Linalg generic
	/// operations.			/// operations.
	std::unique_ptr<OperationPass<func::FuncOp>> createLinalgGeneralizationPass();			std::unique_ptr<OperationPass<func::FuncOp>> createLinalgGeneralizationPass();
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/Passes.td

Show All 18 Lines	let description = [{
This pass only converts ops that operate on ranked tensors. It can be		This pass only converts ops that operate on ranked tensors. It can be
run on op which contains linalg ops (most commonly a		run on op which contains linalg ops (most commonly a
FunctionOpInterface op).		FunctionOpInterface op).
}];		}];
let constructor = "mlir::createConvertElementwiseToLinalgPass()";		let constructor = "mlir::createConvertElementwiseToLinalgPass()";
let dependentDialects = ["linalg::LinalgDialect", "memref::MemRefDialect"];		let dependentDialects = ["linalg::LinalgDialect", "memref::MemRefDialect"];
}		}

def LinalgInitTensorElimination : Pass<"linalg-eliminate-init-tensors"> {		def LinalgInitTensorToAllocTensor : Pass<"linalg-init-tensor-to-alloc-tensor"> {
let summary = "Try to eliminate all init_tensor ops.";		let summary = "Replace all init_tensor ops by alloc_tensor ops.";
let description = [{		let description = [{
This pass tries to eliminate all insert_slice op-anchored init_tensor ops.		init_tensor ops return a tensor of unspecified contents who's only purpose
I.e., when a value that is aliasing with an init_tensor op is inserted into		is to carry the tensor shape. This pass converts such ops to
another tensor, this pass tries to rewrite the IR in such a way that the		bufferization.alloc_tensor ops, which bufferize to buffer allocations.
destination tensor of the insert_slice op is used directly instead of the
init_tensor result.
}];		}];
let constructor = "mlir::createLinalgInitTensorEliminationPass()";		let constructor = "mlir::createLinalgInitTensorToAllocTensorPass()";
}		}

def LinalgFoldUnitExtentDims : Pass<"linalg-fold-unit-extent-dims", ""> {		def LinalgFoldUnitExtentDims : Pass<"linalg-fold-unit-extent-dims", ""> {
let summary = "Remove unit-extent dimension in Linalg ops on tensors";		let summary = "Remove unit-extent dimension in Linalg ops on tensors";
let constructor = "mlir::createLinalgFoldUnitExtentDimsPass()";		let constructor = "mlir::createLinalgFoldUnitExtentDimsPass()";
let options = [		let options = [
Option<"foldOneTripLoopsOnly", "fold-one-trip-loops-only", "bool",		Option<"foldOneTripLoopsOnly", "fold-one-trip-loops-only", "bool",
/default=/"false",		/default=/"false",
▲ Show 20 Lines • Show All 282 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.h

This file was copied to mlir/include/mlir/Dialect/Bufferization/Transforms/AllocTensorElimination.h.

	//===- BufferizableOpInterfaceImpl.h - Impl. of BufferizableOpInterface ---===//			//===- BufferizableOpInterfaceImpl.h - Impl. of BufferizableOpInterface ---===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H			#ifndef MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H
	#define MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H			#define MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H

	#include "mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h"

	namespace mlir {			namespace mlir {
	class DialectRegistry;			class DialectRegistry;

	namespace linalg {			namespace linalg {

	/// A function that matches anchor OpOperands for InitTensorOp elimination.
	/// If an OpOperand is matched, the function should populate the SmallVector
	/// with all values that are needed during `RewriteFn` to produce the
	/// replacement value.
	using AnchorMatchFn = std::function<bool(OpOperand &, SmallVector<Value> &)>;

	/// A function that rewrites matched anchors.
	using RewriteFn = std::function<Value(OpBuilder &, Location, OpOperand &)>;

	/// Try to eliminate InitTensorOps inside `op`.
	///
	/// * `rewriteFunc` generates the replacement for the InitTensorOp.
	/// * Only InitTensorOps that are anchored on a matching OpOperand as per
	/// `anchorMatchFunc` are considered. "Anchored" means that there is a path
	/// on the reverse SSA use-def chain, starting from the OpOperand and always
	/// following the aliasing OpOperand, that eventually ends at a single
	/// InitTensorOp.
	LogicalResult eliminateInitTensors(RewriterBase &rewriter, Operation *op,
	bufferization::AnalysisState &state,
	AnchorMatchFn anchorMatchFunc,
	RewriteFn rewriteFunc);

	/// Try to eliminate InitTensorOps inside `op` that are anchored on an
	/// InsertSliceOp, i.e., if it is eventually inserted into another tensor
	/// (and some other conditions are met).
	LogicalResult insertSliceAnchoredInitTensorEliminationStep(
	RewriterBase &rewriter, Operation *op, bufferization::AnalysisState &state);

	void registerBufferizableOpInterfaceExternalModels(DialectRegistry &registry);			void registerBufferizableOpInterfaceExternalModels(DialectRegistry &registry);

	} // namespace linalg			} // namespace linalg
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H			#endif // MLIR_DIALECT_LINALG_BUFFERIZABLEOPINTERFACEIMPL_H

mlir/lib/Dialect/Bufferization/IR/BufferizationDialect.cpp

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/Affine/IR/AffineOps.h"
	#include "mlir/Dialect/Bufferization/IR/Bufferization.h"			#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
	#include "mlir/Dialect/MemRef/IR/MemRef.h"			#include "mlir/Dialect/MemRef/IR/MemRef.h"
	#include "mlir/Dialect/Tensor/IR/Tensor.h"			#include "mlir/Dialect/Tensor/IR/Tensor.h"
	#include "mlir/IR/FunctionInterfaces.h"			#include "mlir/IR/FunctionInterfaces.h"
	#include "mlir/Transforms/InliningUtils.h"			#include "mlir/Transforms/InliningUtils.h"

	using namespace mlir;			using namespace mlir;
	using namespace mlir::bufferization;			using namespace mlir::bufferization;
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"		#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
		#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
#include "mlir/Dialect/Bufferization/IR/Bufferization.h"		#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/MemRef/Utils/MemRefUtils.h"		#include "mlir/Dialect/MemRef/Utils/MemRefUtils.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::bufferization;		using namespace mlir::bufferization;

▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	LogicalResult mlir::bufferization::foldToMemrefToTensorPair(
assert(memref::CastOp::areCastCompatible(srcType, destType) &&		assert(memref::CastOp::areCastCompatible(srcType, destType) &&
"expected that types are cast compatible");		"expected that types are cast compatible");
rewriter.replaceOpWithNewOp<memref::CastOp>(toMemref, destType,		rewriter.replaceOpWithNewOp<memref::CastOp>(toMemref, destType,
memrefToTensor.memref());		memrefToTensor.memref());
return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// AllocTensorOp
		//===----------------------------------------------------------------------===//

		LogicalResult AllocTensorOp::bufferize(RewriterBase &rewriter,
		BufferizationState &state) {
		// Nothing to do for dead AllocTensorOps.
		if (getOperation()->getUses().empty())
		return success();

		FailureOr<Value> alloc = state.createAlloc(rewriter, getLoc(), getResult());
		if (failed(alloc))
		return failure();
		replaceOpWithBufferizedValues(rewriter, getOperation(), *alloc);
		return success();
		}

		void AllocTensorOp::build(OpBuilder &b, OperationState &result,
		ArrayRef<OpFoldResult> sizes, Type elementType,
		ArrayRef<NamedAttribute> attrs) {
		SmallVector<Value, 4> dynamicSizes;
		SmallVector<int64_t, 4> staticSizes;
		dispatchIndexOpFoldResults(sizes, dynamicSizes, staticSizes,
		ShapedType::kDynamicSize);
		auto resultType = RankedTensorType ::get(staticSizes, elementType);
		build(b, result, resultType, dynamicSizes, b.getI64ArrayAttr(staticSizes));
		result.addAttributes(attrs);
		}

		LogicalResult AllocTensorOp::verify() {
		RankedTensorType resultType = getType();
		SmallVector<int64_t, 4> staticSizes = llvm::to_vector<4>(llvm::map_range(
		static_sizes().cast<ArrayAttr>(),
		[](Attribute a) -> int64_t { return a.cast<IntegerAttr>().getInt(); }));

		if (failed(verifyListOfOperandsOrIntegers(
		*this, "sizes", resultType.getRank(), static_sizes(), sizes(),
		ShapedType::isDynamic)))
		return failure();

		if (static_sizes().size() != static_cast<unsigned>(resultType.getRank()))
		return emitError("expected ") << resultType.getRank() << " sizes values";

		Type expectedType = AllocTensorOp::inferResultType(
		staticSizes, resultType.getElementType(), resultType.getEncoding());
		if (resultType != expectedType) {
		return emitError("specified type ")
		<< resultType << " does not match the inferred type "
		<< expectedType;
		}
		return success();
		}

		Type AllocTensorOp::inferResultType(ArrayRef<int64_t> staticSizes,
		Type elementType, Attribute encoding) {
		return RankedTensorType::get(staticSizes, elementType, encoding);
		}

		SmallVector<OpFoldResult> AllocTensorOp::getMixedSizes() {
		SmallVector<OpFoldResult> mixedSizes;
		mixedSizes.reserve(getType().getRank());
		unsigned dynamicValIndex = 0;
		for (Attribute attr : static_sizes()) {
		auto intAttr = attr.cast<IntegerAttr>();
		if (!ShapedType::isDynamic(intAttr.getInt())) {
		mixedSizes.push_back(intAttr);
		continue;
		}
		mixedSizes.push_back(sizes()[dynamicValIndex++]);
		}
		return mixedSizes;
		}

		namespace {
		/// Change the type of the result of a `bufferization.alloc_tensor` by making
		/// the result type statically sized along dimension that in the original
		/// operation where defined as dynamic, but the size was defined using a
		/// `constant` op. For example:
		///
		/// %c5 = arith.constant 5: index
		/// %0 = bufferization.alloc_tensor [%arg0, %c5] : tensor<?x?xf32>
		///
		/// to
		///
		/// %0 = bufferization.alloc_tensor [%arg0, 5] : tensor<?x5xf32>
		struct ReplaceStaticShapeDims : OpRewritePattern<AllocTensorOp> {
		using OpRewritePattern<AllocTensorOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(AllocTensorOp op,
		PatternRewriter &rewriter) const override {
		SmallVector<Value, 4> dynamicSizes;
		SmallVector<int64_t, 4> staticSizes;
		for (unsigned i = 0, e = op.getType().getRank(); i != e; ++i) {
		// If the size is already static, nothing to do.
		if (!op.isDynamicSize(i)) {
		staticSizes.push_back(op.getStaticSize(i));
		continue;
		}

		// If the size is dynamic but defined using a `constant` op, get the
		// constant value to find the static size to use.
		unsigned operandNum = op.getIndexOfDynamicSize(i);
		Value sizeOperand = op.getOperand(operandNum);
		if (auto constantIndexOp =
		sizeOperand.getDefiningOp<arith::ConstantIndexOp>()) {
		staticSizes.push_back(constantIndexOp.value());
		continue;
		}

		// Fallback case. Keep the size dynamic.
		dynamicSizes.push_back(sizeOperand);
		staticSizes.push_back(ShapedType::kDynamicSize);
		}
		RankedTensorType newType =
		RankedTensorType::get(staticSizes, op.getType().getElementType());
		if (newType == op.getType())
		return failure();
		auto newOp =
		rewriter.create<AllocTensorOp>(op.getLoc(), newType, dynamicSizes,
		rewriter.getI64ArrayAttr(staticSizes));
		rewriter.replaceOpWithNewOp<tensor::CastOp>(op, op.getType(), newOp);
		return success();
		}
		};

		struct FoldDimOfAllocTensorOp : public OpRewritePattern<tensor::DimOp> {
		using OpRewritePattern<tensor::DimOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(tensor::DimOp dimOp,
		PatternRewriter &rewriter) const override {
		Optional<int64_t> maybeConstantIndex = dimOp.getConstantIndex();
		auto allocTensorOp = dimOp.source().getDefiningOp<AllocTensorOp>();
		if (!allocTensorOp \|\| !maybeConstantIndex)
		return failure();
		if (!allocTensorOp.isDynamicSize(*maybeConstantIndex))
		return failure();
		rewriter.replaceOp(dimOp,
		allocTensorOp.getDynamicSize(*maybeConstantIndex));
		return success();
		}
		};
		} // namespace

		void AllocTensorOp::getCanonicalizationPatterns(RewritePatternSet &results,
		MLIRContext *ctx) {
		results.add<FoldDimOfAllocTensorOp, ReplaceStaticShapeDims>(ctx);
		}

		LogicalResult AllocTensorOp::reifyResultShapes(
		OpBuilder &builder, ReifiedRankedShapedTypeDims &reifiedReturnShapes) {
		auto shapes = llvm::to_vector<4>(llvm::map_range(
		llvm::seq<int64_t>(0, getType().getRank()), [&](int64_t dim) -> Value {
		if (isDynamicSize(dim))
		return getDynamicSize(dim);
		return builder.create<arith::ConstantIndexOp>(getLoc(),
		getStaticSize(dim));
		}));
		reifiedReturnShapes.emplace_back(std::move(shapes));
		return success();
		}

		//===----------------------------------------------------------------------===//
// CloneOp		// CloneOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void CloneOp::getEffects(		void CloneOp::getEffects(
SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>>		SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>>
&effects) {		&effects) {
effects.emplace_back(MemoryEffects::Read::get(), input(),		effects.emplace_back(MemoryEffects::Read::get(), input(),
SideEffects::DefaultResource::get());		SideEffects::DefaultResource::get());
▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

mlir/lib/Dialect/Bufferization/IR/CMakeLists.txt

	add_mlir_dialect_library(MLIRBufferization			add_mlir_dialect_library(MLIRBufferization
	AllocationOpInterface.cpp			AllocationOpInterface.cpp
	BufferizableOpInterface.cpp			BufferizableOpInterface.cpp
	BufferizationOps.cpp			BufferizationOps.cpp
	BufferizationDialect.cpp			BufferizationDialect.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Bufferization			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Bufferization

	DEPENDS			DEPENDS
	MLIRAllocationOpInterfaceIncGen			MLIRAllocationOpInterfaceIncGen
	MLIRBufferizationOpsIncGen			MLIRBufferizationOpsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
				MLIRAffine
	MLIRDialect			MLIRDialect
	MLIRFunc			MLIRFunc
	MLIRIR			MLIRIR
	MLIRTensor			MLIRTensor
	MLIRMemRef			MLIRMemRef
	)			)

mlir/lib/Dialect/Bufferization/Transforms/AllocTensorElimination.cpp

This file was added.

				//===- AllocTensorElimination.cpp - alloc_tensor op elimination -----------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"

				#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
				#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
				#include "mlir/Dialect/Bufferization/Transforms/AllocTensorElimination.h"
				#include "mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h"
				#include "mlir/Dialect/Bufferization/Transforms/Passes.h"
				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/IR/Dominance.h"
				#include "mlir/Pass/Pass.h"

				using namespace mlir;
				using namespace mlir::bufferization;

				/// Return true if all `neededValues` are in scope at the given
				/// `insertionPoint`.
				static bool
				neededValuesDominateInsertionPoint(const DominanceInfo &domInfo,
				Operation *insertionPoint,
				const SmallVector<Value> &neededValues) {
				for (Value val : neededValues) {
				if (auto bbArg = val.dyn_cast<BlockArgument>()) {
				Block *owner = bbArg.getOwner();
				if (!owner->findAncestorOpInBlock(*insertionPoint))
				return false;
				} else {
				auto opResult = val.cast<OpResult>();
				if (!domInfo.dominates(opResult.getOwner(), insertionPoint))
				return false;
				}
				}
				return true;
				}

				/// Return true if the given `insertionPoint` dominates all uses of
				/// `allocTensorOp`.
				static bool insertionPointDominatesUses(const DominanceInfo &domInfo,
				Operation *insertionPoint,
				Operation *allocTensorOp) {
				for (Operation *user : allocTensorOp->getUsers())
				if (!domInfo.dominates(insertionPoint, user))
				return false;
				return true;
				}

				/// Find a valid insertion point for a replacement of `allocTensorOp`, assuming
				/// that the replacement may use any value from `neededValues`.
				static Operation *
				findValidInsertionPoint(Operation *allocTensorOp,
				const SmallVector<Value> &neededValues) {
				DominanceInfo domInfo;

				// Gather all possible insertion points: the location of `allocTensorOp` and
				// right after the definition of each value in `neededValues`.
				SmallVector<Operation *> insertionPointCandidates;
				insertionPointCandidates.push_back(allocTensorOp);
				for (Value val : neededValues) {
				// Note: The anchor op is using all of `neededValues`, so:
				// * in case of a block argument: There must be at least one op in the block
				// (the anchor op or one of its parents).
				// * in case of an OpResult: There must be at least one op right after the
				// defining op (the anchor op or one of its
				// parents).
				if (auto bbArg = val.dyn_cast<BlockArgument>()) {
				insertionPointCandidates.push_back(
				&bbArg.getOwner()->getOperations().front());
				} else {
				insertionPointCandidates.push_back(val.getDefiningOp()->getNextNode());
				}
				}

				// Select first matching insertion point.
				for (Operation *insertionPoint : insertionPointCandidates) {
				// Check if all needed values are in scope.
				if (!neededValuesDominateInsertionPoint(domInfo, insertionPoint,
				neededValues))
				continue;
				// Check if the insertion point is before all uses.
				if (!insertionPointDominatesUses(domInfo, insertionPoint, allocTensorOp))
				continue;
				return insertionPoint;
				}

				// No suitable insertion point was found.
				return nullptr;
				}

				/// Try to eliminate AllocTensorOps inside `op`. An AllocTensorOp is replaced
				/// with the result of `rewriteFunc` if it is anchored on a matching
				/// OpOperand. "Anchored" means that there is a path on the reverse SSA use-def
				/// chain, starting from the OpOperand and always following the aliasing
				/// OpOperand, that eventually ends at a single AllocTensorOp.
				LogicalResult mlir::bufferization::eliminateAllocTensors(
				RewriterBase &rewriter, Operation *op, AnalysisState &state,
				AnchorMatchFn anchorMatchFunc, RewriteFn rewriteFunc) {
				OpBuilder::InsertionGuard g(rewriter);

				WalkResult status = op->walk([&](Operation *op) {
				for (OpOperand &operand : op->getOpOperands()) {
				// Skip operands that do not bufferize inplace.
				if (!state.isInPlace(operand))
				continue;
				// All values that are needed to create the replacement op.
				SmallVector<Value> neededValues;
				// Is this a matching OpOperand?
				if (!anchorMatchFunc(operand, neededValues))
				continue;
				SetVector<Value> maybeAllocTensor =
				state.findValueInReverseUseDefChain(operand.get(), [&](Value val) {
				// Continue traversal until this function returns true.
				OpResult opResult = val.dyn_cast<OpResult>();
				if (!opResult)
				return true;
				SmallVector<OpOperand *> opOperands =
				state.getAliasingOpOperand(opResult);
				if (!llvm::all_of(opOperands, [&](OpOperand *operand) {
				return state.isInPlace(*operand);
				}))
				return true;
				// Only equivalent tensors are supported at the moment.
				// TODO: Support cases such as extract_slice(alloc_tensor)
				return !llvm::all_of(opOperands, [&](OpOperand *operand) {
				return state.areEquivalentBufferizedValues(operand->get(),
				opResult);
				});
				});

				// Replace only if the reverse use-def chain ends at exactly one
				// AllocTensorOp.
				if (maybeAllocTensor.size() != 1 \|\|
				!maybeAllocTensor.front().getDefiningOp<AllocTensorOp>())
				return WalkResult::skip();
				Value allocTensor = maybeAllocTensor.front();

				// Find a suitable insertion point.
				Operation *insertionPoint =
				findValidInsertionPoint(allocTensor.getDefiningOp(), neededValues);
				if (!insertionPoint)
				continue;

				// Create a replacement for the AllocTensorOp.
				rewriter.setInsertionPoint(insertionPoint);
				Value replacement = rewriteFunc(rewriter, allocTensor.getLoc(), operand);
				if (!replacement)
				continue;

				// Replace the AllocTensorOp.
				rewriter.replaceOp(allocTensor.getDefiningOp(), replacement);
				}

				// Advance to the next operation.
				return WalkResult::advance();
				});

				return failure(status.wasInterrupted());
				}

				/// Try to eliminate AllocTensorOps inside `op`. An AllocTensorOp can be
				/// eliminated if it is eventually inserted into another tensor (and some other
				/// conditions are met).
				///
				/// E.g.:
				/// %0 = linalg.alloc_tensor
				/// %1 = linalg.fill(%cst, %0) {inplace = [true]}
				/// %2 = tensor.insert_slice %1 into %t[10][20][1]
				///
				/// AllocTensorOp elimination will try to fill %t inplace instead of filling a
				/// new allocation %0 and inserting it into %t. This is done by replacing the
				/// AllocTensorOp with:
				///
				/// %0 = tensor.extract_slice %t[10][20][1]
				///
				/// The analysis looks for matching ExtractSliceOp/InsertSliceOp pairs and lets
				/// those bufferize inplace in the absence of other conflicts.
				///
				/// Starting from an InsertSliceOp, an AllocTensorOp at the end of the insert
				/// source's reverse use-def chain is eliminated if:
				/// * On the reverse use-def chain path from the InsertSliceOp to the
				/// AllocTensorOp, all ops were decided to bufferize inplace and the buffer
				/// relation is "equivalent" (TODO: can be relaxed if needed).
				/// * The reverse use-def chain has exactly one end, which is the AllocTensorOp.
				LogicalResult
				mlir::bufferization::insertSliceAnchoredAllocTensorEliminationStep(
				RewriterBase &rewriter, Operation *op, AnalysisState &state) {
				return eliminateAllocTensors(
				rewriter, op, state,
				/anchorMatchFunc=/
				[&](OpOperand &operand, SmallVector<Value> &neededValues) {
				auto insertSliceOp =
				dyn_cast<tensor::InsertSliceOp>(operand.getOwner());
				if (!insertSliceOp)
				return false;
				if (&operand != &insertSliceOp->getOpOperand(0) /source/)
				return false;

				// Collect all values that are needed to construct the replacement op.
				neededValues.append(insertSliceOp.offsets().begin(),
				insertSliceOp.offsets().end());
				neededValues.append(insertSliceOp.sizes().begin(),
				insertSliceOp.sizes().end());
				neededValues.append(insertSliceOp.strides().begin(),
				insertSliceOp.strides().end());
				neededValues.push_back(insertSliceOp.dest());

				return true;
				},
				/rewriteFunc=/
				[](OpBuilder &b, Location loc, OpOperand &operand) {
				auto insertOp = cast<tensor::InsertSliceOp>(operand.getOwner());
				// Expand offsets, sizes and strides to the full rank to handle the
				// rank-reducing case.
				SmallVector<OpFoldResult> mixedOffsets = insertOp.getMixedOffsets();
				SmallVector<OpFoldResult> mixedSizes = insertOp.getMixedSizes();
				SmallVector<OpFoldResult> mixedStrides = insertOp.getMixedStrides();
				OffsetSizeAndStrideOpInterface::expandToRank(
				insertOp.dest(), mixedOffsets, mixedSizes, mixedStrides,
				[&](Value target, int64_t dim) -> OpFoldResult {
				auto shapedType = target.getType().cast<ShapedType>();
				if (shapedType.isDynamicDim(dim))
				return b.create<tensor::DimOp>(loc, target, dim).result();
				return b.getIndexAttr(shapedType.getDimSize(dim));
				});
				auto t = tensor::ExtractSliceOp::inferRankReducedResultType(
				insertOp.getSourceType().getRank(),
				insertOp.dest().getType().cast<RankedTensorType>(), mixedOffsets,
				mixedSizes, mixedStrides);
				auto extractOp = b.create<tensor::ExtractSliceOp>(
				loc, t, insertOp.dest(), mixedOffsets, mixedSizes, mixedStrides);
				return extractOp.result();
				});
				}

				namespace {
				struct AllocTensorElimination
				: public AllocTensorEliminationBase<AllocTensorElimination> {
				AllocTensorElimination() = default;

				void runOnOperation() override;

				void getDependentDialects(DialectRegistry &registry) const override {
				registry
				.insert<bufferization::BufferizationDialect, tensor::TensorDialect>();
				}
				};
				} // namespace

				void AllocTensorElimination::runOnOperation() {
				Operation *op = getOperation();
				OneShotBufferizationOptions options;
				OneShotAnalysisState state(op, options);
				if (failed(analyzeOp(op, state))) {
				signalPassFailure();
				return;
				}

				IRRewriter rewriter(op->getContext());
				if (failed(bufferization::insertSliceAnchoredAllocTensorEliminationStep(
				rewriter, op, state)))
				signalPassFailure();
				}

				std::unique_ptr<Pass> mlir::bufferization::createAllocTensorEliminationPass() {
				return std::make_unique<AllocTensorElimination>();
				}

mlir/lib/Dialect/Bufferization/Transforms/Bufferize.cpp

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	void runOnOperation() override {
(void)runPipeline(cleanupPipeline, moduleOp);		(void)runPipeline(cleanupPipeline, moduleOp);
}		}

private:		private:
llvm::Optional<OneShotBufferizationOptions> options;		llvm::Optional<OneShotBufferizationOptions> options;
};		};
} // namespace		} // namespace

		namespace {
		struct BufferizationBufferizePass
		: public BufferizationBufferizeBase<BufferizationBufferizePass> {
		void runOnOperation() override {
		BufferizationOptions options = getPartialBufferizationOptions();
		options.allowDialectInFilter<BufferizationDialect>();

		if (failed(bufferizeOp(getOperation(), options)))
		signalPassFailure();
		}

		void getDependentDialects(DialectRegistry &registry) const override {
		registry
		.insert<bufferization::BufferizationDialect, memref::MemRefDialect>();
		}
		};
		} // namespace

		std::unique_ptr<Pass> mlir::bufferization::createBufferizationBufferizePass() {
		return std::make_unique<BufferizationBufferizePass>();
		}

std::unique_ptr<Pass> mlir::bufferization::createOneShotBufferizePass() {		std::unique_ptr<Pass> mlir::bufferization::createOneShotBufferizePass() {
return std::make_unique<OneShotBufferizePass>();		return std::make_unique<OneShotBufferizePass>();
}		}

std::unique_ptr<Pass> mlir::bufferization::createOneShotBufferizePass(		std::unique_ptr<Pass> mlir::bufferization::createOneShotBufferizePass(
const OneShotBufferizationOptions &options) {		const OneShotBufferizationOptions &options) {
return std::make_unique<OneShotBufferizePass>(options);		return std::make_unique<OneShotBufferizePass>(options);
}		}
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRBufferizationTransforms			add_mlir_dialect_library(MLIRBufferizationTransforms
				AllocTensorElimination.cpp
	Bufferize.cpp			Bufferize.cpp
	BufferDeallocation.cpp			BufferDeallocation.cpp
	BufferOptimizations.cpp			BufferOptimizations.cpp
	BufferResultsToOutParams.cpp			BufferResultsToOutParams.cpp
	BufferUtils.cpp			BufferUtils.cpp
	FuncBufferizableOpInterfaceImpl.cpp			FuncBufferizableOpInterfaceImpl.cpp
	OneShotAnalysis.cpp			OneShotAnalysis.cpp
	OneShotModuleBufferize.cpp			OneShotModuleBufferize.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Bufferization			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Bufferization

	DEPENDS			DEPENDS
	MLIRBufferizationPassIncGen			MLIRBufferizationPassIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRBufferization			MLIRBufferization
	MLIRControlFlowInterfaces			MLIRControlFlowInterfaces
	MLIRFunc			MLIRFunc
	MLIRInferTypeOpInterface			MLIRInferTypeOpInterface
	MLIRIR			MLIRIR
	MLIRMemRef			MLIRMemRef
	MLIRPass			MLIRPass
				MLIRTensor
	MLIRTransforms			MLIRTransforms
	)			)

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp

Show First 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	getCommonEnclosingRepetitiveRegion(ArrayRef<Value> values) {
for (Value value : values.drop_front())		for (Value value : values.drop_front())
if (getEnclosingRepetitiveRegion(value) != r)		if (getEnclosingRepetitiveRegion(value) != r)
return None;		return None;
return r;		return r;
}		}

/// Return `true` if the given tensor value is a memory write. Most values are		/// Return `true` if the given tensor value is a memory write. Most values are
/// tensor writes, but ops that define a tensor SSA value without specifying its		/// tensor writes, but ops that define a tensor SSA value without specifying its
/// contents (e.g., init_tensor) are not.		/// contents (e.g., alloc_tensor) are not.
static bool isMemoryWrite(Value value, const AnalysisState &state) {		static bool isMemoryWrite(Value value, const AnalysisState &state) {
auto opResult = value.dyn_cast<OpResult>();		auto opResult = value.dyn_cast<OpResult>();
if (!opResult)		if (!opResult)
return true;		return true;
auto bufferizableOp = state.getOptions().dynCastBufferizableOp(value);		auto bufferizableOp = state.getOptions().dynCastBufferizableOp(value);
if (!bufferizableOp)		if (!bufferizableOp)
return true;		return true;
return bufferizableOp.isMemoryWrite(opResult, state);		return bufferizableOp.isMemoryWrite(opResult, state);
▲ Show 20 Lines • Show All 459 Lines • ▼ Show 20 Lines
/// * aliasing an OpResult of a op in a parent block.		/// * aliasing an OpResult of a op in a parent block.
///		///
/// Example:		/// Example:
/// ```		/// ```
/// %0 = "some_op" : tensor<?xf32>		/// %0 = "some_op" : tensor<?xf32>
/// %1 = scf.if %c -> (tensor<?xf32>) {		/// %1 = scf.if %c -> (tensor<?xf32>) {
/// scf.yield %0 : tensor<?xf32>		/// scf.yield %0 : tensor<?xf32>
/// } else {		/// } else {
/// %t = linalg.init_tensor : tensor<?xf32>		/// %t = linalg.alloc_tensor : tensor<?xf32>
/// scf.yield %t : tensor<?xf32>		/// scf.yield %t : tensor<?xf32>
/// }		/// }
/// ```		/// ```
/// In the above example, the first scf.yield op satifies destination-passing		/// In the above example, the first scf.yield op satifies destination-passing
/// style because the yielded value %0 is defined in the parent block. The		/// style because the yielded value %0 is defined in the parent block. The
/// second scf.yield op does not satisfy destination-passing style because the		/// second scf.yield op does not satisfy destination-passing style because the
/// yielded value %t is defined in the same block as the scf.yield op.		/// yielded value %t is defined in the same block as the scf.yield op.
// TODO: The current implementation checks for equivalent values instead of		// TODO: The current implementation checks for equivalent values instead of
▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

mlir/lib/Dialect/Bufferization/Transforms/OneShotModuleBufferize.cpp

	Show All 26 Lines
	// Only tensors that are equivalent to some FuncOp bbArg may be returned.			// Only tensors that are equivalent to some FuncOp bbArg may be returned.
	// Bufferization currently fails if other tensors (in particular tensors that			// Bufferization currently fails if other tensors (in particular tensors that
	// bufferize out-of-place and result in a new buffer allocation) are returned.			// bufferize out-of-place and result in a new buffer allocation) are returned.
	// In the future, such allocations could be hoisted to the caller.			// In the future, such allocations could be hoisted to the caller.
	//			//
	// Example: `foo` fails bufferization because %0 is not equivalent to any bbArg.			// Example: `foo` fails bufferization because %0 is not equivalent to any bbArg.
	// ```			// ```
	// func @foo() -> tensor<?xf32> {			// func @foo() -> tensor<?xf32> {
	// %0 = linalg.init_tensor [...] : tensor<?xf32>			// %0 = linalg.alloc_tensor [...] : tensor<?xf32>
	// return %0 : tensor<?xf32>			// return %0 : tensor<?xf32>
	// }			// }
	// ```			// ```
	//			//
	// Module Bufferization implements the following calling convention.			// Module Bufferization implements the following calling convention.
	//			//
	// * In the absence of conflicts within a FuncOp, the FuncOp's bbArgs may always			// * In the absence of conflicts within a FuncOp, the FuncOp's bbArgs may always
	// be written to in-place.			// be written to in-place.
	▲ Show 20 Lines • Show All 459 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.cpp

//===- BufferizableOpInterfaceImpl.cpp - Impl. of BufferizableOpInterface -===//		//===- BufferizableOpInterfaceImpl.cpp - Impl. of BufferizableOpInterface -===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.h"		#include "mlir/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.h"
#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"		#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
#include "mlir/Dialect/Bufferization/IR/Bufferization.h"		#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/IR/Dialect.h"		#include "mlir/IR/Dialect.h"
#include "mlir/IR/Dominance.h"
#include "mlir/IR/Operation.h"		#include "mlir/IR/Operation.h"

using namespace mlir;		using namespace mlir;
using namespace linalg;		using namespace linalg;
using namespace mlir::bufferization;		using namespace mlir::bufferization;

namespace {		namespace {

▲ Show 20 Lines • Show All 190 Lines • ▼ Show 20 Lines	struct LinalgOpInterface
}		}

LogicalResult bufferize(Operation *op, RewriterBase &rewriter,		LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
BufferizationState &state) const {		BufferizationState &state) const {
return bufferizeLinalgOp(rewriter, cast<LinalgOp>(op), state);		return bufferizeLinalgOp(rewriter, cast<LinalgOp>(op), state);
}		}
};		};

struct InitTensorOpInterface
: public BufferizableOpInterface::ExternalModel<InitTensorOpInterface,
linalg::InitTensorOp> {
bool isMemoryWrite(Operation *op, OpResult opResult,
const AnalysisState &state) const {
// InitTensorOps allocate but do not write.
return false;
}

LogicalResult bufferize(Operation *op, RewriterBase &rewriter,
BufferizationState &state) const {
auto initTensorOp = cast<linalg::InitTensorOp>(op);

// The InitTensorOp may have been eliminated.
if (initTensorOp->getUses().empty())
return success();

FailureOr<Value> alloc = state.createAlloc(rewriter, initTensorOp->getLoc(),
initTensorOp.result());
if (failed(alloc))
return failure();
replaceOpWithBufferizedValues(rewriter, op, *alloc);
return success();
}
};

/// Helper structure that iterates over all LinalgOps in `OpTys` and registers		/// Helper structure that iterates over all LinalgOps in `OpTys` and registers
/// the `BufferizableOpInterface` with each of them.		/// the `BufferizableOpInterface` with each of them.
template <typename... Ops>		template <typename... Ops>
struct LinalgOpInterfaceHelper {		struct LinalgOpInterfaceHelper {
static void registerOpInterface(MLIRContext *ctx) {		static void registerOpInterface(MLIRContext *ctx) {
(void)std::initializer_list<int>{		(void)std::initializer_list<int>{
0, (Ops::template attachInterface<LinalgOpInterface<Ops>>(*ctx), 0)...};		0, (Ops::template attachInterface<LinalgOpInterface<Ops>>(*ctx), 0)...};
}		}
};		};
} // namespace		} // namespace

/// Return true if all `neededValues` are in scope at the given
/// `insertionPoint`.
static bool
neededValuesDominateInsertionPoint(const DominanceInfo &domInfo,
Operation *insertionPoint,
const SmallVector<Value> &neededValues) {
for (Value val : neededValues) {
if (auto bbArg = val.dyn_cast<BlockArgument>()) {
Block *owner = bbArg.getOwner();
if (!owner->findAncestorOpInBlock(*insertionPoint))
return false;
} else {
auto opResult = val.cast<OpResult>();
if (!domInfo.dominates(opResult.getOwner(), insertionPoint))
return false;
}
}
return true;
}

/// Return true if the given `insertionPoint` dominates all uses of
/// `initTensorOp`.
static bool insertionPointDominatesUses(const DominanceInfo &domInfo,
Operation *insertionPoint,
Operation *initTensorOp) {
for (Operation *user : initTensorOp->getUsers())
if (!domInfo.dominates(insertionPoint, user))
return false;
return true;
}

/// Find a valid insertion point for a replacement of `initTensorOp`, assuming
/// that the replacement may use any value from `neededValues`.
static Operation *
findValidInsertionPoint(Operation *initTensorOp,
const SmallVector<Value> &neededValues) {
DominanceInfo domInfo;

// Gather all possible insertion points: the location of `initTensorOp` and
// right after the definition of each value in `neededValues`.
SmallVector<Operation *> insertionPointCandidates;
insertionPointCandidates.push_back(initTensorOp);
for (Value val : neededValues) {
// Note: The anchor op is using all of `neededValues`, so:
// * in case of a block argument: There must be at least one op in the block
// (the anchor op or one of its parents).
// * in case of an OpResult: There must be at least one op right after the
// defining op (the anchor op or one of its
// parents).
if (auto bbArg = val.dyn_cast<BlockArgument>()) {
insertionPointCandidates.push_back(
&bbArg.getOwner()->getOperations().front());
} else {
insertionPointCandidates.push_back(val.getDefiningOp()->getNextNode());
}
}

// Select first matching insertion point.
for (Operation *insertionPoint : insertionPointCandidates) {
// Check if all needed values are in scope.
if (!neededValuesDominateInsertionPoint(domInfo, insertionPoint,
neededValues))
continue;
// Check if the insertion point is before all uses.
if (!insertionPointDominatesUses(domInfo, insertionPoint, initTensorOp))
continue;
return insertionPoint;
}

// No suitable insertion point was found.
return nullptr;
}

/// Try to eliminate InitTensorOps inside `op`. An InitTensorOp is replaced
/// with the the result of `rewriteFunc` if it is anchored on a matching
/// OpOperand. "Anchored" means that there is a path on the reverse SSA use-def
/// chain, starting from the OpOperand and always following the aliasing
/// OpOperand, that eventually ends at a single InitTensorOp.
LogicalResult mlir::linalg::eliminateInitTensors(RewriterBase &rewriter,
Operation *op,
AnalysisState &state,
AnchorMatchFn anchorMatchFunc,
RewriteFn rewriteFunc) {
OpBuilder::InsertionGuard g(rewriter);

WalkResult status = op->walk([&](Operation *op) {
for (OpOperand &operand : op->getOpOperands()) {
// Skip operands that do not bufferize inplace.
if (!state.isInPlace(operand))
continue;
// All values that are needed to create the replacement op.
SmallVector<Value> neededValues;
// Is this a matching OpOperand?
if (!anchorMatchFunc(operand, neededValues))
continue;
SetVector<Value> maybeInitTensor =
state.findValueInReverseUseDefChain(operand.get(), [&](Value val) {
// Continue traversal until this function returns true.
OpResult opResult = val.dyn_cast<OpResult>();
if (!opResult)
return true;
SmallVector<OpOperand *> opOperands =
state.getAliasingOpOperand(opResult);
if (!llvm::all_of(opOperands, [&](OpOperand *operand) {
return state.isInPlace(*operand);
}))
return true;
// Only equivalent tensors are supported at the moment.
// TODO: Support cases such as extract_slice(init_tensor)
return !llvm::all_of(opOperands, [&](OpOperand *operand) {
return state.areEquivalentBufferizedValues(operand->get(),
opResult);
});
});

// Replace only if the reverse use-def chain ends at exactly one
// InitTensorOp.
if (maybeInitTensor.size() != 1 \|\|
!maybeInitTensor.front().getDefiningOp<InitTensorOp>())
return WalkResult::skip();
Value initTensor = maybeInitTensor.front();

// Find a suitable insertion point.
Operation *insertionPoint =
findValidInsertionPoint(initTensor.getDefiningOp(), neededValues);
if (!insertionPoint)
continue;

// Create a replacement for the InitTensorOp.
rewriter.setInsertionPoint(insertionPoint);
Value replacement = rewriteFunc(rewriter, initTensor.getLoc(), operand);
if (!replacement)
continue;

// Replace the InitTensorOp.
rewriter.replaceOp(initTensor.getDefiningOp(), replacement);
}

// Advance to the next operation.
return WalkResult::advance();
});

return failure(status.wasInterrupted());
}

/// Try to eliminate InitTensorOps inside `op`. An InitTensorOp can be
/// eliminated if it is eventually inserted into another tensor (and some other
/// conditions are met).
///
/// E.g.:
/// %0 = linalg.init_tensor
/// %1 = linalg.fill(%cst, %0) {inplace = [true]}
/// %2 = tensor.insert_slice %1 into %t[10][20][1]
///
/// InitTensorOp elimination will try to fill %t inplace instead of filling a
/// new allocation %0 and inserting it into %t. This is done by replacing the
/// InitTensorOp with:
///
/// %0 = tensor.extract_slice %t[10][20][1]
///
/// The analysis looks for matching ExtractSliceOp/InsertSliceOp pairs and lets
/// those bufferize inplace in the absence of other conflicts.
///
/// Starting from an InsertSliceOp, an InitTensorOp at the end of the insert
/// source's reverse use-def chain is eliminated if:
/// * On the reverse use-def chain path from the InsertSliceOp to the
/// InitTensorOp, all ops were decided to bufferize inplace and the buffer
/// relation is "equivalent" (TODO: can be relaxed if needed).
/// * The reverse use-def chain has exactly one end, which is the InitTensorOp.
LogicalResult mlir::linalg::insertSliceAnchoredInitTensorEliminationStep(
RewriterBase &rewriter, Operation *op, AnalysisState &state) {
return eliminateInitTensors(
rewriter, op, state,
/anchorMatchFunc=/
[&](OpOperand &operand, SmallVector<Value> &neededValues) {
auto insertSliceOp =
dyn_cast<tensor::InsertSliceOp>(operand.getOwner());
if (!insertSliceOp)
return false;
if (&operand != &insertSliceOp->getOpOperand(0) /source/)
return false;

// Collect all values that are needed to construct the replacement op.
neededValues.append(insertSliceOp.offsets().begin(),
insertSliceOp.offsets().end());
neededValues.append(insertSliceOp.sizes().begin(),
insertSliceOp.sizes().end());
neededValues.append(insertSliceOp.strides().begin(),
insertSliceOp.strides().end());
neededValues.push_back(insertSliceOp.dest());

return true;
},
/rewriteFunc=/
[](OpBuilder &b, Location loc, OpOperand &operand) {
auto insertOp = cast<tensor::InsertSliceOp>(operand.getOwner());
// Expand offsets, sizes and strides to the full rank to handle the
// rank-reducing case.
SmallVector<OpFoldResult> mixedOffsets = insertOp.getMixedOffsets();
SmallVector<OpFoldResult> mixedSizes = insertOp.getMixedSizes();
SmallVector<OpFoldResult> mixedStrides = insertOp.getMixedStrides();
OffsetSizeAndStrideOpInterface::expandToRank(
insertOp.dest(), mixedOffsets, mixedSizes, mixedStrides,
[&](Value target, int64_t dim) -> OpFoldResult {
auto shapedType = target.getType().cast<ShapedType>();
if (shapedType.isDynamicDim(dim))
return b.create<tensor::DimOp>(loc, target, dim).result();
return b.getIndexAttr(shapedType.getDimSize(dim));
});
auto t = tensor::ExtractSliceOp::inferRankReducedResultType(
insertOp.getSourceType().getRank(),
insertOp.dest().getType().cast<RankedTensorType>(), mixedOffsets,
mixedSizes, mixedStrides);
auto extractOp = b.create<tensor::ExtractSliceOp>(
loc, t, insertOp.dest(), mixedOffsets, mixedSizes, mixedStrides);
return extractOp.result();
});
}

void mlir::linalg::registerBufferizableOpInterfaceExternalModels(		void mlir::linalg::registerBufferizableOpInterfaceExternalModels(
DialectRegistry &registry) {		DialectRegistry &registry) {
registry.addExtension(+[](MLIRContext ctx, linalg::LinalgDialect dialect) {		registry.addExtension(+[](MLIRContext ctx, linalg::LinalgDialect dialect) {
linalg::InitTensorOp::attachInterface<InitTensorOpInterface>(*ctx);

// Register all Linalg structured ops. `LinalgOp` is an interface and it is		// Register all Linalg structured ops. `LinalgOp` is an interface and it is
// not possible to attach an external interface to an existing interface.		// not possible to attach an external interface to an existing interface.
// Therefore, attach the `BufferizableOpInterface` to all ops one-by-one.		// Therefore, attach the `BufferizableOpInterface` to all ops one-by-one.
LinalgOpInterfaceHelper<		LinalgOpInterfaceHelper<
#define GET_OP_LIST		#define GET_OP_LIST
#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.cpp.inc"		#include "mlir/Dialect/Linalg/IR/LinalgStructuredOps.cpp.inc"
>::registerOpInterface(ctx);		>::registerOpInterface(ctx);
});		});
}		}

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRLinalgTransforms			add_mlir_dialect_library(MLIRLinalgTransforms
	BubbleUpExtractSlice.cpp			BubbleUpExtractSlice.cpp
	BufferizableOpInterfaceImpl.cpp			BufferizableOpInterfaceImpl.cpp
	Bufferize.cpp			Bufferize.cpp
	CodegenStrategy.cpp			CodegenStrategy.cpp
	ConstantFold.cpp			ConstantFold.cpp
	Detensorize.cpp			Detensorize.cpp
	DropUnitDims.cpp			DropUnitDims.cpp
	ElementwiseOpFusion.cpp			ElementwiseOpFusion.cpp
	ElementwiseToLinalg.cpp			ElementwiseToLinalg.cpp
	Fusion.cpp			Fusion.cpp
	FusionOnTensors.cpp			FusionOnTensors.cpp
	Generalization.cpp			Generalization.cpp
	Hoisting.cpp			Hoisting.cpp
	HoistPadding.cpp			HoistPadding.cpp
	InitTensorElimination.cpp			InitTensorToAllocTensor.cpp
	InlineScalarOperands.cpp			InlineScalarOperands.cpp
	Interchange.cpp			Interchange.cpp
	Loops.cpp			Loops.cpp
	LinalgStrategyPasses.cpp			LinalgStrategyPasses.cpp
	NamedOpConversions.cpp			NamedOpConversions.cpp
	PadOpInterchange.cpp			PadOpInterchange.cpp
	Promotion.cpp			Promotion.cpp
	SparseTensorRewriting.cpp			SparseTensorRewriting.cpp
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/InitTensorElimination.cpp

This file was deleted.

	//===- ComprehensiveBufferize.cpp - Single pass bufferization -------------===//
	//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//
	//===----------------------------------------------------------------------===//

	#include "PassDetail.h"

	#include "mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h"
	#include "mlir/Dialect/Bufferization/Transforms/OneShotAnalysis.h"
	#include "mlir/Dialect/Linalg/Passes.h"
	#include "mlir/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.h"
	#include "mlir/Pass/Pass.h"

	using namespace mlir;
	using namespace mlir::bufferization;
	using namespace mlir::linalg;

	namespace {
	struct LinalgInitTensorElimination
	: public LinalgInitTensorEliminationBase<LinalgInitTensorElimination> {
	LinalgInitTensorElimination() = default;

	void runOnOperation() override;

	void getDependentDialects(DialectRegistry &registry) const override {
	registry.insert<linalg::LinalgDialect, tensor::TensorDialect>();
	}
	};
	} // namespace

	void LinalgInitTensorElimination::runOnOperation() {
	Operation *op = getOperation();
	OneShotBufferizationOptions options;
	OneShotAnalysisState state(op, options);
	if (failed(analyzeOp(op, state))) {
	signalPassFailure();
	return;
	}

	IRRewriter rewriter(op->getContext());
	if (failed(insertSliceAnchoredInitTensorEliminationStep(rewriter, op, state)))
	signalPassFailure();
	}

	std::unique_ptr<Pass> mlir::createLinalgInitTensorEliminationPass() {
	return std::make_unique<LinalgInitTensorElimination>();
	}

mlir/lib/Dialect/Linalg/Transforms/InitTensorToAllocTensor.cpp

This file was added.

				//===- InitTensorToAllocTensor.cpp - Lower init_tensor to alloc_tensor ----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"

				#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
				#include "mlir/Dialect/Linalg/Passes.h"
				#include "mlir/Pass/Pass.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

				using namespace mlir;
				using namespace mlir::bufferization;
				using namespace mlir::linalg;

				namespace {
				struct InitTensorLoweringPattern : public OpRewritePattern<InitTensorOp> {
				using OpRewritePattern<InitTensorOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(InitTensorOp op,
				PatternRewriter &rewriter) const override {
				rewriter.replaceOpWithNewOp<bufferization::AllocTensorOp>(
				op, op.getMixedSizes(), op.getType().getElementType());
				return success();
				}
				};

				struct LinalgInitTensorToAllocTensor
				: public LinalgInitTensorToAllocTensorBase<LinalgInitTensorToAllocTensor> {
				LinalgInitTensorToAllocTensor() = default;

				void runOnOperation() override;

				void getDependentDialects(DialectRegistry &registry) const override {
				registry
				.insert<linalg::LinalgDialect, bufferization::BufferizationDialect>();
				}
				};
				} // namespace

				void LinalgInitTensorToAllocTensor::runOnOperation() {
				Operation *op = getOperation();
				RewritePatternSet patterns(op->getContext());
				patterns.insert<InitTensorLoweringPattern>(op->getContext());
				if (failed(applyPatternsAndFoldGreedily(op, std::move(patterns))))
				signalPassFailure();
				}

				std::unique_ptr<Pass> mlir::createLinalgInitTensorToAllocTensorPass() {
				return std::make_unique<LinalgInitTensorToAllocTensor>();
				}

mlir/python/mlir/dialects/BufferizationOps.td

This file was added.

				//===-- BufferizationOps.td - Entry point for BufferizationOps bindings ---===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef PYTHON_BINDINGS_BUFFERIZATION_OPS
				#define PYTHON_BINDINGS_BUFFERIZATION_OPS

				include "mlir/Bindings/Python/Attributes.td"
				include "mlir/Dialect/Bufferization/IR/BufferizationOps.td"

				#endif

mlir/python/mlir/dialects/_bufferization_ops_ext.py

This file was added.

				# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				# See https://llvm.org/LICENSE.txt for license information.
				# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

				try:
				from typing import Sequence, Union
				from ..ir import *
				from ._ods_common import get_default_loc_context as _get_default_loc_context

				from typing import Any, List, Union
				except ImportError as e:
				raise RuntimeError("Error loading imports from extension module") from e


				class AllocTensorOp:
				"""Extends the bufferization.alloc_tensor op."""

				def __init__(self,
				sizes: Union[Sequence[int], Sequence[Value]],
				element_type: Type,
				*,
				loc=None,
				ip=None):
				"""Constructs an `alloc_tensor` with either static or dynamic sizes."""
				context = get_default_loc_context(loc)
				operands = []
				attributes = {}
				# TODO: Refactor the AllocTensorOp to take an element type attribute and
				# then use normal result type inference, unifying the Python and C++ side
				# with a standard mechanism (versus stashing that in builders).
				if sizes and isinstance(sizes[0], Value):
				# Dynamic sizes.
				operands.extend(sizes)
				static_size_ints = [-1] * len(sizes)
				result_type = RankedTensorType.get(static_size_ints, element_type)
				else:
				# Static sizes.
				result_type = RankedTensorType.get(sizes, element_type)
				static_size_ints = sizes

				i64_type = IntegerType.get_signless(64)
				attributes["static_sizes"] = ArrayAttr.get(
				[IntegerAttr.get(i64_type, s) for s in static_size_ints],
				context=context)
				op = self.build_generic(
				results=[result_type],
				operands=operands,
				attributes=attributes,
				loc=loc,
				ip=ip)
				OpView.__init__(self, op)

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-alloc-tensor-elimination.mlir

This file was moved from mlir/test/Dialect/Linalg/one-shot-bufferize-init-tensor-elimination.mlir.

	// RUN: mlir-opt %s -linalg-eliminate-init-tensors -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs" -canonicalize -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -eliminate-alloc-tensors -one-shot-bufferize="bufferize-function-boundaries allow-return-allocs" -canonicalize -split-input-file \| FileCheck %s

	// CHECK: func @buffer_forwarding_conflict(			// CHECK: func @buffer_forwarding_conflict(
	// CHECK-SAME: %[[FUNC_ARG:[0-9a-zA-Z]*]]: memref<?xf32>			// CHECK-SAME: %[[FUNC_ARG:[0-9a-zA-Z]*]]: memref<?xf32>
	// CHECK-SAME: %[[sz:[0-9a-zA-Z]*]]: index			// CHECK-SAME: %[[sz:[0-9a-zA-Z]*]]: index
	func.func @buffer_forwarding_conflict(			func.func @buffer_forwarding_conflict(
	%t: tensor<?xf32> {bufferization.buffer_layout = affine_map<(d0) -> (d0)>, bufferization.writable = true},			%t: tensor<?xf32> {bufferization.buffer_layout = affine_map<(d0) -> (d0)>, bufferization.writable = true},
	%sz: index)			%sz: index)
	-> (tensor<?xf32>, tensor<?xf32>)			-> (tensor<?xf32>, tensor<?xf32>)
	{			{
	%f0 = arith.constant 0.0: f32			%f0 = arith.constant 0.0: f32
	// Alloc is needed for the first insert_slice (due to backward traversal during analysis).			// Alloc is needed for the first insert_slice (due to backward traversal during analysis).
	// CHECK: %[[DIM:.*]] = memref.dim %[[FUNC_ARG]]			// CHECK: %[[DIM:.*]] = memref.dim %[[FUNC_ARG]]
	// This allocs the whole dim to allow for a full clone of t.			// This allocs the whole dim to allow for a full clone of t.
	// CHECK: %[[ALLOC:.*]] = memref.alloc(%[[DIM]])			// CHECK: %[[ALLOC:.*]] = memref.alloc(%[[DIM]])

	// init_tensor itself does not alloc but forwards to the second			// alloc_tensor itself does not alloc but forwards to the second
	// insert_slice. InitTensorOp replaces the init_tensor with an out-of-place			// insert_slice. AllocTensorOp replaces the alloc_tensor with an out-of-place
	// extract_slice.			// extract_slice.
	// CHECK: %[[EXTRACT_SLICE_ALLOC:.*]] = memref.alloc(%[[sz]])			// CHECK: %[[EXTRACT_SLICE_ALLOC:.*]] = memref.alloc(%[[sz]])
	%a = linalg.init_tensor[%sz] : tensor<?xf32>			%a = bufferization.alloc_tensor[%sz] : tensor<?xf32>

	// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[EXTRACT_SLICE_ALLOC]] : memref<?xf32>)			// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[EXTRACT_SLICE_ALLOC]] : memref<?xf32>)
	%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>) -> tensor<?xf32>			%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>) -> tensor<?xf32>

	// CHECK: memref.copy %[[FUNC_ARG]], %[[ALLOC]] : memref<?xf32> to memref<?xf32>			// CHECK: memref.copy %[[FUNC_ARG]], %[[ALLOC]] : memref<?xf32> to memref<?xf32>
	// CHECK: %[[SV0_ALLOC:.*]] = memref.subview %[[ALLOC]][0] [%[[sz]]] [1] : memref<?xf32> to memref<?xf32>			// CHECK: %[[SV0_ALLOC:.*]] = memref.subview %[[ALLOC]][0] [%[[sz]]] [1] : memref<?xf32> to memref<?xf32>
	// CHECK: memref.copy %[[EXTRACT_SLICE_ALLOC]], %[[SV0_ALLOC]] : memref<?xf32> to memref<?xf32>			// CHECK: memref.copy %[[EXTRACT_SLICE_ALLOC]], %[[SV0_ALLOC]] : memref<?xf32> to memref<?xf32>
	%r0 = tensor.insert_slice %f into %t[0][%sz][1]: tensor<?xf32> into tensor<?xf32>			%r0 = tensor.insert_slice %f into %t[0][%sz][1]: tensor<?xf32> into tensor<?xf32>
	Show All 12 Lines
	// CHECK-SAME: %[[sz:[0-9a-zA-Z]*]]: index			// CHECK-SAME: %[[sz:[0-9a-zA-Z]*]]: index
	func.func @buffer_forwarding_no_conflict(			func.func @buffer_forwarding_no_conflict(
	%t: tensor<?xf32> {bufferization.buffer_layout = affine_map<(d0) -> (d0)>, bufferization.writable = true},			%t: tensor<?xf32> {bufferization.buffer_layout = affine_map<(d0) -> (d0)>, bufferization.writable = true},
	%sz: index)			%sz: index)
	-> (tensor<?xf32>)			-> (tensor<?xf32>)
	{			{
	%f0 = arith.constant 0.0: f32			%f0 = arith.constant 0.0: f32

	// init_tensor itself does not alloc but forwards to the insert_slice.			// alloc_tensor itself does not alloc but forwards to the insert_slice.
	// InitTensorOp replaces the init_tensor with an inplace extract_slice.			// InitTensorOp replaces the alloc_tensor with an inplace extract_slice.
	// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]			// CHECK: %[[T_SUBVIEW:.*]] = memref.subview %[[FUNC_ARG]][42] [%[[sz]]] [1]
	%a = linalg.init_tensor[%sz] : tensor<?xf32>			%a = bufferization.alloc_tensor[%sz] : tensor<?xf32>

	// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[T_SUBVIEW]] : memref<?xf32			// CHECK: linalg.fill ins({{.*}} : f32) outs(%[[T_SUBVIEW]] : memref<?xf32
	%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>) -> tensor<?xf32>			%f = linalg.fill ins(%f0 : f32) outs(%a : tensor<?xf32>) -> tensor<?xf32>

	// Self-copy canonicalizes away later.			// Self-copy canonicalizes away later.
	%r1 = tensor.insert_slice %f into %t[42][%sz][1]: tensor<?xf32> into tensor<?xf32>			%r1 = tensor.insert_slice %f into %t[42][%sz][1]: tensor<?xf32> into tensor<?xf32>

	return %r1: tensor<?xf32>			return %r1: tensor<?xf32>
	}			}

	// -----			// -----

	// CHECK: func @insertion_point_inside_loop(			// CHECK: func @insertion_point_inside_loop(
	// CHECK-SAME: %[[t:.]]: memref<?xf32, #{{.}}>, %[[sz:.*]]: index)			// CHECK-SAME: %[[t:.]]: memref<?xf32, #{{.}}>, %[[sz:.*]]: index)
	func.func @insertion_point_inside_loop(%t : tensor<?xf32>, %sz : index) -> (tensor<?xf32>) {			func.func @insertion_point_inside_loop(%t : tensor<?xf32>, %sz : index) -> (tensor<?xf32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c5 = arith.constant 5 : index			%c5 = arith.constant 5 : index

	// CHECK-NOT: memref.alloc			// CHECK-NOT: memref.alloc
	%blank = linalg.init_tensor [5] : tensor<5xf32>			%blank = bufferization.alloc_tensor [5] : tensor<5xf32>

	// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {			// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {
	%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {			%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {
	// CHECK: %[[subview:.*]] = memref.subview %[[t]][%[[iv]]] [5] [1]			// CHECK: %[[subview:.*]] = memref.subview %[[t]][%[[iv]]] [5] [1]
	%iv_i32 = arith.index_cast %iv : index to i32			%iv_i32 = arith.index_cast %iv : index to i32
	%f = arith.sitofp %iv_i32 : i32 to f32			%f = arith.sitofp %iv_i32 : i32 to f32

	// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]			// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]
	Show All 14 Lines
	func.func @insertion_point_outside_loop(%t : tensor<?xf32>, %sz : index,			func.func @insertion_point_outside_loop(%t : tensor<?xf32>, %sz : index,
	%idx : index) -> (tensor<?xf32>) {			%idx : index) -> (tensor<?xf32>) {
	%c0 = arith.constant 0 : index			%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : index			%c1 = arith.constant 1 : index
	%c5 = arith.constant 5 : index			%c5 = arith.constant 5 : index

	// CHECK-NOT: memref.alloc			// CHECK-NOT: memref.alloc
	// CHECK: %[[subview:.*]] = memref.subview %[[t]][%[[idx]]] [5] [1]			// CHECK: %[[subview:.*]] = memref.subview %[[t]][%[[idx]]] [5] [1]
	%blank = linalg.init_tensor [5] : tensor<5xf32>			%blank = bufferization.alloc_tensor [5] : tensor<5xf32>

	// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {			// CHECK: scf.for %[[iv:.]] = %{{.}} to %[[sz]] step %{{.*}} {
	%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {			%r = scf.for %iv = %c0 to %sz step %c5 iter_args(%bb = %t) -> (tensor<?xf32>) {
	%iv_i32 = arith.index_cast %iv : index to i32			%iv_i32 = arith.index_cast %iv : index to i32
	%f = arith.sitofp %iv_i32 : i32 to f32			%f = arith.sitofp %iv_i32 : i32 to f32

	// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]			// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[subview]]
	%filled = linalg.fill ins(%f : f32) outs(%blank : tensor<5xf32>) -> tensor<5xf32>			%filled = linalg.fill ins(%f : f32) outs(%blank : tensor<5xf32>) -> tensor<5xf32>

	// CHECK-NOT: memref.copy			// CHECK-NOT: memref.copy
	%inserted = tensor.insert_slice %filled into %bb[%idx][5][1] : tensor<5xf32> into tensor<?xf32>			%inserted = tensor.insert_slice %filled into %bb[%idx][5][1] : tensor<5xf32> into tensor<?xf32>
	scf.yield %inserted : tensor<?xf32>			scf.yield %inserted : tensor<?xf32>
	}			}

	return %r : tensor<?xf32>			return %r : tensor<?xf32>
	}			}

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-allow-return-allocs.mlir

Show All 10 Lines	func.func @buffer_not_deallocated(%t : tensor<?xf32>, %c : i1) -> tensor<?xf32> {
// CHECK: %[[r:.]] = scf.if %{{.}} {		// CHECK: %[[r:.]] = scf.if %{{.}} {
%r = scf.if %c -> tensor<?xf32> {		%r = scf.if %c -> tensor<?xf32> {
// CHECK: %[[some_op:.*]] = "test.some_op"		// CHECK: %[[some_op:.*]] = "test.some_op"
// CHECK: %[[alloc:.*]] = memref.alloc(%[[some_op]])		// CHECK: %[[alloc:.*]] = memref.alloc(%[[some_op]])
// CHECK: %[[casted:.*]] = memref.cast %[[alloc]]		// CHECK: %[[casted:.*]] = memref.cast %[[alloc]]
// CHECK-NOT: dealloc		// CHECK-NOT: dealloc
// CHECK: scf.yield %[[casted]]		// CHECK: scf.yield %[[casted]]
%sz = "test.some_op"() : () -> (index)		%sz = "test.some_op"() : () -> (index)
%0 = linalg.init_tensor[%sz] : tensor<?xf32>		%0 = bufferization.alloc_tensor[%sz] : tensor<?xf32>
scf.yield %0 : tensor<?xf32>		scf.yield %0 : tensor<?xf32>
} else {		} else {
// CHECK: } else {		// CHECK: } else {
// CHECK: %[[m:.*]] = bufferization.to_memref %[[t]]		// CHECK: %[[m:.*]] = bufferization.to_memref %[[t]]
// CHECK: %[[cloned:.*]] = bufferization.clone %[[m]]		// CHECK: %[[cloned:.*]] = bufferization.clone %[[m]]
// CHECK: scf.yield %[[cloned]]		// CHECK: scf.yield %[[cloned]]
scf.yield %t : tensor<?xf32>		scf.yield %t : tensor<?xf32>
}		}
// CHECK: }		// CHECK: }
// CHECK: %[[r_tensor:.*]] = bufferization.to_tensor %[[r]]		// CHECK: %[[r_tensor:.*]] = bufferization.to_tensor %[[r]]
// CHECK: memref.dealloc %[[r]]		// CHECK: memref.dealloc %[[r]]
// CHECK: return %[[r_tensor]]		// CHECK: return %[[r_tensor]]
return %r : tensor<?xf32>		return %r : tensor<?xf32>
}		}

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	// -----			// -----

	// CHECK-LABEL: func @unknown_op_may_read(			// CHECK-LABEL: func @unknown_op_may_read(
	func.func @unknown_op_may_read(%v: vector<5xf32>)			func.func @unknown_op_may_read(%v: vector<5xf32>)
	-> (tensor<10xf32>, tensor<10xf32>) {			-> (tensor<10xf32>, tensor<10xf32>) {
	%idx = arith.constant 0 : index			%idx = arith.constant 0 : index
	%cst = arith.constant 5.0 : f32			%cst = arith.constant 5.0 : f32

	// One alloc for the init_tensor, another one because the transfer_write			// One alloc for the alloc_tensor, another one because the transfer_write
	// bufferizes out-of-place.			// bufferizes out-of-place.
	// CHECK: %[[m1:.]] = memref.alloc() {{.}} : memref<10xf32>			// CHECK: %[[m1:.]] = memref.alloc() {{.}} : memref<10xf32>
	// CHECK: %[[alloc:.]] = memref.alloc() {{.}} : memref<10xf32>			// CHECK: %[[alloc:.]] = memref.alloc() {{.}} : memref<10xf32>
	%t1 = linalg.init_tensor [10] : tensor<10xf32>			%t1 = bufferization.alloc_tensor [10] : tensor<10xf32>

	// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[m1]]			// CHECK: linalg.fill ins(%{{.}}{{.}}outs(%[[m1]]
	// CHECK: %[[filled_tensor:.*]] = bufferization.to_tensor %[[m1]]			// CHECK: %[[filled_tensor:.*]] = bufferization.to_tensor %[[m1]]
	%filled = linalg.fill ins(%cst : f32) outs(%t1 : tensor<10xf32>) -> tensor<10xf32>			%filled = linalg.fill ins(%cst : f32) outs(%t1 : tensor<10xf32>) -> tensor<10xf32>

	// The transfer_write is out-of-place because "dummy_op" may read.			// The transfer_write is out-of-place because "dummy_op" may read.
	// CHECK: memref.copy %[[m1]], %[[alloc]]			// CHECK: memref.copy %[[m1]], %[[alloc]]
	// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]			// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]
	▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir

Show All 38 Lines	func.func @return_tensor(%A : tensor<?xf32>, %v : vector<4xf32>) -> (tensor<?xf32>) {
return %0 : tensor<?xf32>		return %0 : tensor<?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @func_without_tensor_args		// CHECK-LABEL: func @func_without_tensor_args
func.func @func_without_tensor_args(%v : vector<10xf32>) -> () {		func.func @func_without_tensor_args(%v : vector<10xf32>) -> () {
// CHECK: %[[alloc:.*]] = memref.alloc()		// CHECK: %[[alloc:.*]] = memref.alloc()
%0 = linalg.init_tensor[10] : tensor<10xf32>		%0 = bufferization.alloc_tensor[10] : tensor<10xf32>

%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]		// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]
%1 = vector.transfer_write %v, %0[%c0] : vector<10xf32>, tensor<10xf32>		%1 = vector.transfer_write %v, %0[%c0] : vector<10xf32>, tensor<10xf32>

%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
// CHECK: vector.transfer_read %[[alloc]]		// CHECK: vector.transfer_read %[[alloc]]
%r = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<11xf32>		%r = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<11xf32>
Show All 36 Lines	func.func @read_after_write_conflict(%cst : f32, %idx : index, %idx2 : index)
return %read, %read2 : f32, f32		return %read, %read2 : f32, f32
}		}

// -----		// -----

// CHECK-LABEL: func @copy_deallocated(		// CHECK-LABEL: func @copy_deallocated(
func.func @copy_deallocated() -> tensor<10xf32> {		func.func @copy_deallocated() -> tensor<10xf32> {
// CHECK: %[[alloc:.*]] = memref.alloc()		// CHECK: %[[alloc:.*]] = memref.alloc()
%0 = linalg.init_tensor[10] : tensor<10xf32>		%0 = bufferization.alloc_tensor[10] : tensor<10xf32>
// CHECK: %[[alloc_tensor:.*]] = bufferization.to_tensor %[[alloc]]		// CHECK: %[[alloc_tensor:.*]] = bufferization.to_tensor %[[alloc]]
// CHECK: memref.dealloc %[[alloc]]		// CHECK: memref.dealloc %[[alloc]]
// CHECK: return %[[alloc_tensor]]		// CHECK: return %[[alloc_tensor]]
return %0 : tensor<10xf32>		return %0 : tensor<10xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @select_different_tensors(		// CHECK-LABEL: func @select_different_tensors(
// CHECK-SAME: %[[t:.*]]: tensor<?xf32>		// CHECK-SAME: %[[t:.*]]: tensor<?xf32>
func.func @select_different_tensors(%t: tensor<?xf32>, %sz: index, %c: i1) -> tensor<?xf32> {		func.func @select_different_tensors(%t: tensor<?xf32>, %sz: index, %c: i1) -> tensor<?xf32> {
// CHECK-DAG: %[[m:.]] = bufferization.to_memref %[[t]] : memref<?xf32, #{{.}}>		// CHECK-DAG: %[[m:.]] = bufferization.to_memref %[[t]] : memref<?xf32, #{{.}}>
// CHECK-DAG: %[[alloc:.]] = memref.alloc(%{{.}}) {{.*}} : memref<?xf32>		// CHECK-DAG: %[[alloc:.]] = memref.alloc(%{{.}}) {{.*}} : memref<?xf32>
%0 = linalg.init_tensor [%sz] : tensor<?xf32>		%0 = bufferization.alloc_tensor [%sz] : tensor<?xf32>

// A cast must be inserted because %t and %0 have different memref types.		// A cast must be inserted because %t and %0 have different memref types.
// CHECK: %[[casted:.]] = memref.cast %[[alloc]] : memref<?xf32> to memref<?xf32, #{{.}}>		// CHECK: %[[casted:.]] = memref.cast %[[alloc]] : memref<?xf32> to memref<?xf32, #{{.}}>
// CHECK: arith.select %{{.*}}, %[[casted]], %[[m]]		// CHECK: arith.select %{{.*}}, %[[casted]], %[[m]]
%1 = arith.select %c, %0, %t : tensor<?xf32>		%1 = arith.select %c, %0, %t : tensor<?xf32>
return %1 : tensor<?xf32>		return %1 : tensor<?xf32>
}		}

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-allow-return-allocs.mlir

	Show All 11 Lines
	// Make sure that the returned buffer is not deallocated.			// Make sure that the returned buffer is not deallocated.
	// TODO: Such buffers currently leak. We need buffer hoisting / ref counting for			// TODO: Such buffers currently leak. We need buffer hoisting / ref counting for
	// this in the future.			// this in the future.

	// CHECK-LABEL: func @create_tensor() -> memref<10xf32> {			// CHECK-LABEL: func @create_tensor() -> memref<10xf32> {
	// CHECK: %[[alloc:.*]] = memref.alloc			// CHECK: %[[alloc:.*]] = memref.alloc
	// CHECK: return %[[alloc]]			// CHECK: return %[[alloc]]
	func.func @create_tensor() -> tensor<10xf32> {			func.func @create_tensor() -> tensor<10xf32> {
	%0 = linalg.init_tensor [10] : tensor<10xf32>			%0 = bufferization.alloc_tensor [10] : tensor<10xf32>
	return %0 : tensor<10xf32>			return %0 : tensor<10xf32>
	}			}

	// CHECK: func @caller(			// CHECK: func @caller(
	// CHECK: %[[call:.*]] = call @create_tensor() : () -> memref<10xf32>			// CHECK: %[[call:.*]] = call @create_tensor() : () -> memref<10xf32>
	// CHECK: %[[extracted:.*]] = memref.load %[[call]]			// CHECK: %[[extracted:.*]] = memref.load %[[call]]
	// CHECK: return %[[extracted]]			// CHECK: return %[[extracted]]
	func.func @caller(%idx: index) -> f32 {			func.func @caller(%idx: index) -> f32 {
	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-analysis.mlir

Show First 20 Lines • Show All 676 Lines • ▼ Show 20 Lines	func.func @matmul_on_tensors(
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst_0 = arith.constant 0.000000e+00 : f32		%cst_0 = arith.constant 0.000000e+00 : f32
%cst_1 = arith.constant 1.000000e+00 : f32		%cst_1 = arith.constant 1.000000e+00 : f32

%7 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%7 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}
// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]}
%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>		%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>
%11 = linalg.fill ins(%cst_1 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>		%11 = linalg.fill ins(%cst_1 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>

Show All 21 Lines	func.func @matmul_on_tensors(
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, bufferization.writable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst_0 = arith.constant 0.000000e+00 : f32		%cst_0 = arith.constant 0.000000e+00 : f32
%cst_1 = arith.constant 1.000000e+00 : f32		%cst_1 = arith.constant 1.000000e+00 : f32

%7 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%7 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: linalg.fill		// CHECK: linalg.fill
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false"]}
// CHECK: vector.transfer_write		// CHECK: vector.transfer_write
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none", "none"]
%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>		%8 = linalg.fill ins(%cst_0 : f32) outs(%7 : tensor<256x256xf32>) -> tensor<256x256xf32>
%9 = vector.transfer_read %arg0[%c0, %c0], %cst_0 {in_bounds = [false, true]} : tensor<518x518xf32>, vector<256x256xf32>		%9 = vector.transfer_read %arg0[%c0, %c0], %cst_0 {in_bounds = [false, true]} : tensor<518x518xf32>, vector<256x256xf32>
%10 = vector.transfer_write %9, %8[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, tensor<256x256xf32>		%10 = vector.transfer_write %9, %8[%c0, %c0] {in_bounds = [true, true]} : vector<256x256xf32>, tensor<256x256xf32>
▲ Show 20 Lines • Show All 509 Lines • ▼ Show 20 Lines	func.func @write_to_same_tensor_in_loop_out_of_place(
}		}
// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}		// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}

return %r0 : tensor<?xf32>		return %r0 : tensor<?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @write_to_same_init_tensor_in_place(		// CHECK-LABEL: func @write_to_same_alloc_tensor_in_place(
func.func @write_to_same_init_tensor_in_place(		func.func @write_to_same_alloc_tensor_in_place(
%A : tensor<?xf32> {linalg.inplaceable = true},		%A : tensor<?xf32> {linalg.inplaceable = true},
%lb : index, %ub : index, %step : index, %sz: index, %sz2: index)		%lb : index, %ub : index, %step : index, %sz: index, %sz2: index)
-> (tensor<?xf32>)		-> (tensor<?xf32>)
{		{
%B = linalg.init_tensor [%sz2] : tensor<?xf32>		%B = bufferization.alloc_tensor [%sz2] : tensor<?xf32>

// CHECK: scf.for {{.*}} {		// CHECK: scf.for {{.*}} {
%r0 = scf.for %i = %lb to %ub step %step iter_args(%t = %A) -> (tensor<?xf32>) {		%r0 = scf.for %i = %lb to %ub step %step iter_args(%t = %A) -> (tensor<?xf32>) {
%i2 = arith.index_cast %i : index to i32		%i2 = arith.index_cast %i : index to i32
%i3 = arith.sitofp %i2 : i32 to f32		%i3 = arith.sitofp %i2 : i32 to f32
// %B is written multiple times inside a loop, but it is an init_tensor.		// %B is written multiple times inside a loop, but it is an alloc_tensor.
// CHECK: tensor.insert		// CHECK: tensor.insert
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none"]}
%B2 = tensor.insert %i3 into %B[%i] : tensor<?xf32>		%B2 = tensor.insert %i3 into %B[%i] : tensor<?xf32>
// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
%A2 = tensor.insert_slice %B2 into %t[%i][%sz][1] : tensor<?xf32> into tensor<?xf32>		%A2 = tensor.insert_slice %B2 into %t[%i][%sz][1] : tensor<?xf32> into tensor<?xf32>
scf.yield %A2 : tensor<?xf32>		scf.yield %A2 : tensor<?xf32>
}		}
// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}		// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}

return %r0 : tensor<?xf32>		return %r0 : tensor<?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @write_to_same_init_tensor_out_of_place(		// CHECK-LABEL: func @write_to_same_alloc_tensor_out_of_place(
func.func @write_to_same_init_tensor_out_of_place(		func.func @write_to_same_alloc_tensor_out_of_place(
%A : tensor<?xf32> {linalg.inplaceable = true},		%A : tensor<?xf32> {linalg.inplaceable = true},
%lb : index, %ub : index, %step : index, %sz: index, %sz2: index, %f: f32)		%lb : index, %ub : index, %step : index, %sz: index, %sz2: index, %f: f32)
-> (tensor<?xf32>)		-> (tensor<?xf32>)
{		{
%B = linalg.init_tensor [%sz2] : tensor<?xf32>		%B = bufferization.alloc_tensor [%sz2] : tensor<?xf32>
%C = tensor.insert %f into %B[%lb] : tensor<?xf32>		%C = tensor.insert %f into %B[%lb] : tensor<?xf32>

// CHECK: scf.for {{.*}} {		// CHECK: scf.for {{.*}} {
%r0 = scf.for %i = %lb to %ub step %step iter_args(%t = %A) -> (tensor<?xf32>) {		%r0 = scf.for %i = %lb to %ub step %step iter_args(%t = %A) -> (tensor<?xf32>) {
%i2 = arith.index_cast %i : index to i32		%i2 = arith.index_cast %i : index to i32
%i3 = arith.sitofp %i2 : i32 to f32		%i3 = arith.sitofp %i2 : i32 to f32
// %C is written multiple times inside a loop. Even though %C aliases with		// %C is written multiple times inside a loop. Even though %C aliases with
// an init_tensor, out-of-bounds bufferization is necessary because there is		// an alloc_tensor, out-of-bounds bufferization is necessary because there
// another alias (%C) outside of the loop.		// is another alias (%C) outside of the loop.
// CHECK: tensor.insert		// CHECK: tensor.insert
// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["none", "false", "none"]}
%B2 = tensor.insert %i3 into %C[%i] : tensor<?xf32>		%B2 = tensor.insert %i3 into %C[%i] : tensor<?xf32>
// CHECK: tensor.insert_slice		// CHECK: tensor.insert_slice
// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}		// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
%A2 = tensor.insert_slice %B2 into %t[%i][%sz][1] : tensor<?xf32> into tensor<?xf32>		%A2 = tensor.insert_slice %B2 into %t[%i][%sz][1] : tensor<?xf32> into tensor<?xf32>
scf.yield %A2 : tensor<?xf32>		scf.yield %A2 : tensor<?xf32>
}		}
// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}		// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}

return %r0 : tensor<?xf32>		return %r0 : tensor<?xf32>
}		}

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-invalid.mlir

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

func.func @scf_if_not_aliasing(		func.func @scf_if_not_aliasing(
%cond: i1, %t1: tensor<?xf32> {bufferization.writable = true},		%cond: i1, %t1: tensor<?xf32> {bufferization.writable = true},
%idx: index) -> f32 {		%idx: index) -> f32 {
%r = scf.if %cond -> (tensor<?xf32>) {		%r = scf.if %cond -> (tensor<?xf32>) {
scf.yield %t1 : tensor<?xf32>		scf.yield %t1 : tensor<?xf32>
} else {		} else {
// This buffer aliases.		// This buffer aliases.
%t2 = linalg.init_tensor [%idx] : tensor<?xf32>		%t2 = bufferization.alloc_tensor [%idx] : tensor<?xf32>
// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}		// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}
scf.yield %t2 : tensor<?xf32>		scf.yield %t2 : tensor<?xf32>
}		}
%f = tensor.extract %r[%idx] : tensor<?xf32>		%f = tensor.extract %r[%idx] : tensor<?xf32>
return %f : f32		return %f : f32
}		}

// -----		// -----
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines	func.func @unknown_op(%A : tensor<4xf32>) -> tensor<4xf32>
// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}		// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}
return %r: tensor<4xf32>		return %r: tensor<4xf32>
}		}

// -----		// -----

func.func @mini_test_case1() -> tensor<10x20xf32> {		func.func @mini_test_case1() -> tensor<10x20xf32> {
%f0 = arith.constant 0.0 : f32		%f0 = arith.constant 0.0 : f32
%t = linalg.init_tensor [10, 20] : tensor<10x20xf32>		%t = bufferization.alloc_tensor [10, 20] : tensor<10x20xf32>
%r = linalg.fill ins(%f0 : f32) outs(%t : tensor<10x20xf32>) -> tensor<10x20xf32>		%r = linalg.fill ins(%f0 : f32) outs(%t : tensor<10x20xf32>) -> tensor<10x20xf32>
// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}		// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}
return %r : tensor<10x20xf32>		return %r : tensor<10x20xf32>
}		}

// -----		// -----

func.func @main() -> tensor<4xi32> {		func.func @main() -> tensor<4xi32> {
Show All 36 Lines
func.func @call_to_unknown_tensor_returning_func(%t : tensor<?xf32>) {		func.func @call_to_unknown_tensor_returning_func(%t : tensor<?xf32>) {
call @foo(%t) : (tensor<?xf32>) -> (f32, tensor<?xf32>, f32)		call @foo(%t) : (tensor<?xf32>) -> (f32, tensor<?xf32>, f32)
return		return
}		}

// -----		// -----

func.func @foo(%t : tensor<5xf32>) -> (tensor<5xf32>) {		func.func @foo(%t : tensor<5xf32>) -> (tensor<5xf32>) {
%0 = linalg.init_tensor [5] : tensor<5xf32>		%0 = bufferization.alloc_tensor [5] : tensor<5xf32>
// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}		// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}
return %0 : tensor<5xf32>		return %0 : tensor<5xf32>
}		}

// Note: This function is not analyzed because there was an error in the		// Note: This function is not analyzed because there was an error in the
// previous one.		// previous one.
func.func @call_to_func_returning_non_equiv_tensor(%t : tensor<5xf32>) {		func.func @call_to_func_returning_non_equiv_tensor(%t : tensor<5xf32>) {
call @foo(%t) : (tensor<5xf32>) -> (tensor<5xf32>)		call @foo(%t) : (tensor<5xf32>) -> (tensor<5xf32>)
return		return
}		}

// -----		// -----

func.func @destination_passing_style_dominance_test_1(%cst : f32, %idx : index,		func.func @destination_passing_style_dominance_test_1(%cst : f32, %idx : index,
%idx2 : index) -> f32 {		%idx2 : index) -> f32 {
%0 = scf.execute_region -> tensor<?xf32> {		%0 = scf.execute_region -> tensor<?xf32> {
%1 = linalg.init_tensor [%idx] : tensor<?xf32>		%1 = bufferization.alloc_tensor [%idx] : tensor<?xf32>
// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}		// expected-error @+1 {{operand #0 of ReturnLike op does not satisfy destination passing style}}
scf.yield %1 : tensor<?xf32>		scf.yield %1 : tensor<?xf32>
}		}
%2 = tensor.insert %cst into %0[%idx] : tensor<?xf32>		%2 = tensor.insert %cst into %0[%idx] : tensor<?xf32>
%r = tensor.extract %2[%idx2] : tensor<?xf32>		%r = tensor.extract %2[%idx2] : tensor<?xf32>
return %r : f32		return %r : f32
}		}

// -----		// -----

func.func @destination_passing_style_dominance_test_2(%cst : f32, %idx : index,		func.func @destination_passing_style_dominance_test_2(%cst : f32, %idx : index,
%idx2 : index) -> f32 {		%idx2 : index) -> f32 {
%1 = linalg.init_tensor [%idx] : tensor<?xf32>		%1 = bufferization.alloc_tensor [%idx] : tensor<?xf32>

%0 = scf.execute_region -> tensor<?xf32> {		%0 = scf.execute_region -> tensor<?xf32> {
// This YieldOp is in destination-passing style, thus no error.		// This YieldOp is in destination-passing style, thus no error.
scf.yield %1 : tensor<?xf32>		scf.yield %1 : tensor<?xf32>
}		}
%2 = tensor.insert %cst into %0[%idx] : tensor<?xf32>		%2 = tensor.insert %cst into %0[%idx] : tensor<?xf32>
%r = tensor.extract %2[%idx2] : tensor<?xf32>		%r = tensor.extract %2[%idx2] : tensor<?xf32>
return %r : f32		return %r : f32
}		}

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
// CHECK-NO-LAYOUT-MAP: return %[[alloc_no_layout]]		// CHECK-NO-LAYOUT-MAP: return %[[alloc_no_layout]]

// CHECK-FULLY-DYNAMIC-LAYOUT-MAP: #[[$map2a:.]] = affine_map<(d0, d1)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2)>		// CHECK-FULLY-DYNAMIC-LAYOUT-MAP: #[[$map2a:.]] = affine_map<(d0, d1)[s0, s1, s2] -> (d0 s1 + s0 + d1 * s2)>
// CHECK-FULLY-DYNAMIC-LAYOUT-MAP: #[[$map2b:.]] = affine_map<(d0, d1)[s0] -> (d0 10 + s0 + d1)>		// CHECK-FULLY-DYNAMIC-LAYOUT-MAP: #[[$map2b:.]] = affine_map<(d0, d1)[s0] -> (d0 10 + s0 + d1)>
// CHECK-FULLY-DYNAMIC-LAYOUT-MAP-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32,		// CHECK-FULLY-DYNAMIC-LAYOUT-MAP-LABEL: func @return_extract_slice(%{{.*}}) -> memref<2x?xf32,
// CHECK-FULLY-DYNAMIC-LAYOUT-MAP-SAME: #[[$map2a]]> {		// CHECK-FULLY-DYNAMIC-LAYOUT-MAP-SAME: #[[$map2a]]> {
func.func @return_extract_slice(%idx: index, %sz: index) -> (tensor<2x?xf32>)		func.func @return_extract_slice(%idx: index, %sz: index) -> (tensor<2x?xf32>)
{		{
%t = linalg.init_tensor [20, 10] : tensor<20x10xf32>		%t = bufferization.alloc_tensor [20, 10] : tensor<20x10xf32>
%0 = tensor.extract_slice %t[%idx, %idx][2, %sz][1, 1]		%0 = tensor.extract_slice %t[%idx, %idx][2, %sz][1, 1]
: tensor<20x10xf32> to tensor<2x?xf32>		: tensor<20x10xf32> to tensor<2x?xf32>
return %0 : tensor<2x?xf32>		return %0 : tensor<2x?xf32>
}		}

// -----		// -----

// CHECK-LABEL: func private @private_func		// CHECK-LABEL: func private @private_func
Show All 33 Lines

// -----		// -----

// Test bufferization of a function without tensor args.		// Test bufferization of a function without tensor args.

// CHECK-LABEL: func @func_without_tensor_args		// CHECK-LABEL: func @func_without_tensor_args
func.func @func_without_tensor_args(%v : vector<10xf32>) -> () {		func.func @func_without_tensor_args(%v : vector<10xf32>) -> () {
// CHECK: %[[alloc:.*]] = memref.alloc()		// CHECK: %[[alloc:.*]] = memref.alloc()
%0 = linalg.init_tensor[10] : tensor<10xf32>		%0 = bufferization.alloc_tensor[10] : tensor<10xf32>

%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]		// CHECK: vector.transfer_write %{{.*}}, %[[alloc]]
%1 = vector.transfer_write %v, %0[%c0] : vector<10xf32>, tensor<10xf32>		%1 = vector.transfer_write %v, %0[%c0] : vector<10xf32>, tensor<10xf32>

%cst = arith.constant 0.0 : f32		%cst = arith.constant 0.0 : f32
// CHECK: vector.transfer_read %[[alloc]]		// CHECK: vector.transfer_read %[[alloc]]
%r = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<11xf32>		%r = vector.transfer_read %1[%c0], %cst : tensor<10xf32>, vector<11xf32>
▲ Show 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	func.func @main() {
%v2 = arith.constant 2.0 : f32		%v2 = arith.constant 2.0 : f32

// CHECK-NEXT: %[[A:.*]] = memref.alloc() {alignment = 128 : i64} : memref<64xf32>		// CHECK-NEXT: %[[A:.*]] = memref.alloc() {alignment = 128 : i64} : memref<64xf32>
// CHECK-NEXT: %[[B:.*]] = memref.alloc() {alignment = 128 : i64} : memref<64xf32>		// CHECK-NEXT: %[[B:.*]] = memref.alloc() {alignment = 128 : i64} : memref<64xf32>
// CHECK-NEXT: %[[C:.*]] = memref.alloc() {alignment = 128 : i64} : memref<f32>		// CHECK-NEXT: %[[C:.*]] = memref.alloc() {alignment = 128 : i64} : memref<f32>
// CHECK-DAG: %[[cA:.*]] = memref.cast %[[A]] : memref<64xf32> to memref<64xf32, #[[$DYN_1D_MAP]]>		// CHECK-DAG: %[[cA:.*]] = memref.cast %[[A]] : memref<64xf32> to memref<64xf32, #[[$DYN_1D_MAP]]>
// CHECK-DAG: %[[cB:.*]] = memref.cast %[[B]] : memref<64xf32> to memref<64xf32, #[[$DYN_1D_MAP]]>		// CHECK-DAG: %[[cB:.*]] = memref.cast %[[B]] : memref<64xf32> to memref<64xf32, #[[$DYN_1D_MAP]]>
// CHECK-DAG: %[[cC:.*]] = memref.cast %[[C]] : memref<f32> to memref<f32, #[[$DYN_0D_MAP]]>		// CHECK-DAG: %[[cC:.*]] = memref.cast %[[C]] : memref<f32> to memref<f32, #[[$DYN_0D_MAP]]>
%A = linalg.init_tensor [64] : tensor<64xf32>		%A = bufferization.alloc_tensor [64] : tensor<64xf32>
%B = linalg.init_tensor [64] : tensor<64xf32>		%B = bufferization.alloc_tensor [64] : tensor<64xf32>
%C = linalg.init_tensor [] : tensor<f32>		%C = bufferization.alloc_tensor [] : tensor<f32>

// CHECK-DAG: linalg.fill ins(%[[C1]] : f32) outs(%[[A]] : memref<64xf32>)		// CHECK-DAG: linalg.fill ins(%[[C1]] : f32) outs(%[[A]] : memref<64xf32>)
// CHECK-DAG: linalg.fill ins(%[[C2]] : f32) outs(%[[B]] : memref<64xf32>)		// CHECK-DAG: linalg.fill ins(%[[C2]] : f32) outs(%[[B]] : memref<64xf32>)
// CHECK-DAG: linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32>)		// CHECK-DAG: linalg.fill ins(%[[C0]] : f32) outs(%[[C]] : memref<f32>)
%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>) -> tensor<64xf32>		%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>) -> tensor<64xf32>
%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>) -> tensor<64xf32>		%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>) -> tensor<64xf32>
%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>) -> tensor<f32>		%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>) -> tensor<f32>

▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/canonicalize.mlir

Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	func.func @load_from_buffer_cast(%arg0: index, %arg1: index,
%1 = memref.load %0[%arg0, %arg1] : memref<?x?xf32>		%1 = memref.load %0[%arg0, %arg1] : memref<?x?xf32>
return %1 : f32		return %1 : f32
}		}
// CHECK-SAME: %[[IDX0:[0-9a-z]+]]: index, %[[IDX1:[0-9a-z]+]]: index		// CHECK-SAME: %[[IDX0:[0-9a-z]+]]: index, %[[IDX1:[0-9a-z]+]]: index
// CHECK-SAME: %[[TENSOR:[0-9a-z]+]]: tensor<?x?xf32>		// CHECK-SAME: %[[TENSOR:[0-9a-z]+]]: tensor<?x?xf32>
// CHECK: %[[RES:.*]] = tensor.extract %[[TENSOR]][%[[IDX0]], %[[IDX1]]]		// CHECK: %[[RES:.*]] = tensor.extract %[[TENSOR]][%[[IDX0]], %[[IDX1]]]
// CHECK-NOT: memref.load		// CHECK-NOT: memref.load
// CHECK: return %[[RES]] : f32		// CHECK: return %[[RES]] : f32


		// -----

		func.func @alloc_tensor_canonicalize() -> (tensor<4x5x?xf32>) {
		%c6 = arith.constant 6 : index
		%0 = bufferization.alloc_tensor [4, 5, %c6] : tensor<4x5x?xf32>
		return %0 : tensor<4x5x?xf32>
		}
		// CHECK: func @alloc_tensor_canonicalize
		// CHECK: %[[T0:.+]] = bufferization.alloc_tensor [4, 5, 6] : tensor<4x5x6xf32>
		// CHECK: %[[T1:.+]] = tensor.cast %[[T0]] : tensor<4x5x6xf32> to tensor<4x5x?xf32>
		// CHECK: return %[[T1]]

mlir/test/Dialect/Bufferization/invalid.mlir

This file was added.

				// RUN: mlir-opt %s -split-input-file -verify-diagnostics

				func.func @alloc_tensor_err(%arg0 : index, %arg1 : index)
				{
				// expected-error @+1 {{specified type 'tensor<4x?x?x5xf32>' does not match the inferred type 'tensor<4x5x?x?xf32>'}}
				%1 = bufferization.alloc_tensor [4, 5, %arg0, %arg1] : tensor<4x?x?x5xf32>
				return
				}

				// -----

				func.func @alloc_tensor_err(%arg0 : index)
				{
				// expected-error @+1 {{expected 4 sizes values}}
				%1 = bufferization.alloc_tensor [4, 5, %arg0] : tensor<4x?x?x5xf32>
				return
				}

				// -----

				func.func @alloc_tensor_err(%arg0 : index)
				{
				// expected-error @+1 {{expected 2 dynamic sizes values}}
				%1 = "bufferization.alloc_tensor"(%arg0) {static_sizes = [4, -1, -1, 5]} : (index) -> tensor<4x?x?x5xf32>
				return
				}

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir

// RUN: mlir-opt %s -one-shot-bufferize="test-analysis-only bufferize-function-boundaries" -split-input-file \| FileCheck %s		// RUN: mlir-opt %s -one-shot-bufferize="test-analysis-only bufferize-function-boundaries" -split-input-file \| FileCheck %s

/// All combinations of matmul(fill(extract(init_tensor)), fill(extract(%init_tensor)), %arg2)		/// All combinations of matmul(fill(extract(alloc_tensor)), fill(extract(%alloc_tensor)), %arg2)
/// These should all be inplaceable except the first op.		/// These should all be inplaceable except the first op.

// -----		// -----

// CHECK-LABEL: func @fill_extract_matmul_		// CHECK-LABEL: func @fill_extract_matmul_
func.func @fill_extract_matmul_1234(		func.func @fill_extract_matmul_1234(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_1243(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_1324(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_1342(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_1423(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_1432(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_2134(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_2143(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_2314(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_2341(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_2413(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_2431(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["none", "false"]}		// CHECK: {__inplace_operands_attr__ = ["none", "false"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_3124(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_3142(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 9 Lines
func.func @fill_extract_matmul_3214(		func.func @fill_extract_matmul_3214(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true}) -> tensor<256x256xf32>		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true}) -> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_3241(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %2[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_3412(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_3421(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_4123(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_4132(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %1[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_4213(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>		%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<256x256xf32>) -> tensor<256x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
Show All 10 Lines	func.func @fill_extract_matmul_4231(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_4312(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
Show All 10 Lines	func.func @fill_extract_matmul_4321(
%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg0: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},		%arg1: tensor<518x518xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = false},
%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})		%arg2: tensor<256x256xf32> {bufferization.buffer_layout = affine_map<(d0, d1) -> (d0, d1)>, linalg.inplaceable = true})
-> tensor<256x256xf32>		-> tensor<256x256xf32>
{		{
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%cst_0 = arith.constant 1.000000e+00 : f32		%cst_0 = arith.constant 1.000000e+00 : f32
%0 = linalg.init_tensor [256, 256] : tensor<256x256xf32>		%0 = bufferization.alloc_tensor [256, 256] : tensor<256x256xf32>

// CHECK: {__inplace_operands_attr__ = ["false"]}		// CHECK: {__inplace_operands_attr__ = ["false"]}
%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>		%4 = tensor.extract_slice %0[0, 0] [16, 256] [1, 1] : tensor<256x256xf32> to tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["true"]}		// CHECK: {__inplace_operands_attr__ = ["true"]}
%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>		%3 = tensor.extract_slice %0[0, 0] [256, 16] [1, 1] : tensor<256x256xf32> to tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>		%2 = linalg.fill ins(%cst_0 : f32) outs(%4 : tensor<16x256xf32>) -> tensor<16x256xf32>
// CHECK: {__inplace_operands_attr__ = ["none", "true"]}		// CHECK: {__inplace_operands_attr__ = ["none", "true"]}
%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>		%1 = linalg.fill ins(%cst : f32) outs(%3 : tensor<256x16xf32>) -> tensor<256x16xf32>
// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}		// CHECK: {__inplace_operands_attr__ = ["true", "true", "true"]}
%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>		%5 = linalg.matmul ins(%1, %2 : tensor<256x16xf32>, tensor<16x256xf32>) outs(%arg2 : tensor<256x256xf32>) -> tensor<256x256xf32>
return %5 : tensor<256x256xf32>		return %5 : tensor<256x256xf32>
}		}

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis-init-tensor-elimination.mlir

	// RUN: mlir-opt %s -linalg-eliminate-init-tensors -one-shot-bufferize="bufferize-function-boundaries test-analysis-only allow-return-allocs" -split-input-file \| FileCheck %s			// RUN: mlir-opt %s -eliminate-alloc-tensors -one-shot-bufferize="bufferize-function-boundaries test-analysis-only allow-return-allocs" -split-input-file \| FileCheck %s

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// InitTensorOp elimination			// InitTensorOp elimination
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// CHECK-LABEL: func @buffer_forwarding_conflict			// CHECK-LABEL: func @buffer_forwarding_conflict
	func.func @buffer_forwarding_conflict(%arg0: tensor<?xf32> {bufferization.writable = true}, %arg1: index) -> (tensor<?xf32>, tensor<?xf32>) {			func.func @buffer_forwarding_conflict(%arg0: tensor<?xf32> {bufferization.writable = true}, %arg1: index) -> (tensor<?xf32>, tensor<?xf32>) {
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	// CHECK: tensor.extract_slice			// CHECK: tensor.extract_slice
	// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none"]			// CHECK-SAME: {__inplace_operands_attr__ = ["false", "none"]
	// Instead of allocating, share buffer with some inplace bufferization?			// Instead of allocating, share buffer with some inplace bufferization?
	%0 = linalg.init_tensor [%arg1] : tensor<?xf32>			%0 = bufferization.alloc_tensor [%arg1] : tensor<?xf32>

	// CHECK: linalg.fill			// CHECK: linalg.fill
	// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]			// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]
	%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>			%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>

	// CHECK: tensor.insert_slice			// CHECK: tensor.insert_slice
	// CHECK-SAME: {__inplace_operands_attr__ = ["true", "false", "none"]			// CHECK-SAME: {__inplace_operands_attr__ = ["true", "false", "none"]
	%2 = tensor.insert_slice %1 into %arg0[0] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>			%2 = tensor.insert_slice %1 into %arg0[0] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>
	Show All 10 Lines
	// -----			// -----

	// CHECK-LABEL: func @buffer_forwarding_no_conflict			// CHECK-LABEL: func @buffer_forwarding_no_conflict
	func.func @buffer_forwarding_no_conflict(%arg0: tensor<?xf32> {bufferization.writable = true}, %arg1: index) -> (tensor<?xf32>, tensor<?xf32>) {			func.func @buffer_forwarding_no_conflict(%arg0: tensor<?xf32> {bufferization.writable = true}, %arg1: index) -> (tensor<?xf32>, tensor<?xf32>) {
	%cst = arith.constant 0.000000e+00 : f32			%cst = arith.constant 0.000000e+00 : f32
	// CHECK: tensor.extract_slice			// CHECK: tensor.extract_slice
	// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none"]			// CHECK-SAME: {__inplace_operands_attr__ = ["true", "none"]
	// Instead of allocating, share buffer with some inplace bufferization?			// Instead of allocating, share buffer with some inplace bufferization?
	%0 = linalg.init_tensor [%arg1] : tensor<?xf32>			%0 = bufferization.alloc_tensor [%arg1] : tensor<?xf32>

	// CHECK: linalg.fill			// CHECK: linalg.fill
	// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]			// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true"]
	%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>			%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?xf32>) -> tensor<?xf32>

	// CHECK: tensor.insert_slice			// CHECK: tensor.insert_slice
	// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]			// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none"]
	%2 = tensor.insert_slice %1 into %arg0[42] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>			%2 = tensor.insert_slice %1 into %arg0[42] [%arg1] [1] : tensor<?xf32> into tensor<?xf32>

	// CHECK: return			// CHECK: return
	// CHECK-SAME: __equivalent_func_args__ = [0, 0]			// CHECK-SAME: __equivalent_func_args__ = [0, 0]
	return %2, %2 : tensor<?xf32>, tensor<?xf32>			return %2, %2 : tensor<?xf32>, tensor<?xf32>
	}			}

mlir/test/Dialect/Linalg/one-shot-bufferize-init-tensor-elimination.mlir

This file was moved to mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-alloc-tensor-elimination.mlir.

mlir/test/Dialect/Linalg/one-shot-bufferize.mlir

Show First 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	func.func @op_is_reading_but_following_ops_are_not(

// CHECK: return %[[ALLOC]]		// CHECK: return %[[ALLOC]]
return %r1 : tensor<?xf32>		return %r1 : tensor<?xf32>
}		}

// -----		// -----

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// InitTensorOp elimination would produce SSA violations for the example below.		// AllocTensorOp elimination would produce SSA violations for the example below.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

func.func @depthwise_conv_1d_nwc_wc(%arg0: index, %arg1: index, %arg2: tensor<8x18x32xf32>)		func.func @depthwise_conv_1d_nwc_wc(%arg0: index, %arg1: index, %arg2: tensor<8x18x32xf32>)
-> tensor<?x1x6x8xf32> {		-> tensor<?x1x6x8xf32> {
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%c32 = arith.constant 32 : index		%c32 = arith.constant 32 : index
%c8 = arith.constant 8 : index		%c8 = arith.constant 8 : index
%0 = linalg.init_tensor [4, 1, 6, 8] : tensor<4x1x6x8xf32>		%0 = bufferization.alloc_tensor [4, 1, 6, 8] : tensor<4x1x6x8xf32>
%1 = tensor.cast %0 : tensor<4x1x6x8xf32> to tensor<?x1x6x8xf32>		%1 = tensor.cast %0 : tensor<4x1x6x8xf32> to tensor<?x1x6x8xf32>
%2 = linalg.init_tensor [1, 6, 8] : tensor<1x6x8xf32>		%2 = bufferization.alloc_tensor [1, 6, 8] : tensor<1x6x8xf32>
%3 = scf.for %arg3 = %c0 to %c32 step %c8 iter_args(%arg4 = %1) -> (tensor<?x1x6x8xf32>) {		%3 = scf.for %arg3 = %c0 to %c32 step %c8 iter_args(%arg4 = %1) -> (tensor<?x1x6x8xf32>) {
%4 = affine.apply affine_map<(d0) -> (d0 ceildiv 8)>(%arg3)		%4 = affine.apply affine_map<(d0) -> (d0 ceildiv 8)>(%arg3)
%5 = tensor.insert_slice %2 into %arg4[%4,0, 0, 0] [1, 1, 6, 8] [1, 1, 1, 1] :		%5 = tensor.insert_slice %2 into %arg4[%4,0, 0, 0] [1, 1, 6, 8] [1, 1, 1, 1] :
tensor<1x6x8xf32> into tensor<?x1x6x8xf32>		tensor<1x6x8xf32> into tensor<?x1x6x8xf32>
scf.yield %5 : tensor<?x1x6x8xf32>		scf.yield %5 : tensor<?x1x6x8xf32>
}		}
return %3 : tensor<?x1x6x8xf32>		return %3 : tensor<?x1x6x8xf32>
}		}

// -----		// -----

// CHECK-LABEL: func @do_not_copy_init_tensors(		// CHECK-LABEL: func @do_not_copy_alloc_tensors(
func.func @do_not_copy_init_tensors(%f1: f32, %f2: f32, %idx: index)		func.func @do_not_copy_alloc_tensors(%f1: f32, %f2: f32, %idx: index)
-> (tensor<5xf32>, tensor<5xf32>)		-> (tensor<5xf32>, tensor<5xf32>)
{		{
// CHECK: memref.alloc		// CHECK: memref.alloc
// CHECK: memref.alloc		// CHECK: memref.alloc
// CHECK-NOT: copy		// CHECK-NOT: copy
// CHECK: memref.store		// CHECK: memref.store
// CHECK: memref.store		// CHECK: memref.store
%0 = linalg.init_tensor [5] : tensor<5xf32>		%0 = bufferization.alloc_tensor [5] : tensor<5xf32>
%1 = tensor.insert %f1 into %0[%idx] : tensor<5xf32>		%1 = tensor.insert %f1 into %0[%idx] : tensor<5xf32>
%2 = tensor.insert %f2 into %0[%idx] : tensor<5xf32>		%2 = tensor.insert %f2 into %0[%idx] : tensor<5xf32>
return %1, %2 : tensor<5xf32>, tensor<5xf32>		return %1, %2 : tensor<5xf32>, tensor<5xf32>
}		}

mlir/test/Dialect/SCF/one-shot-bufferize-analysis.mlir

	Show First 20 Lines • Show All 577 Lines • ▼ Show 20 Lines
	// CHECK-LABEL: func @write_to_same_tensor_in_loop_in_place(			// CHECK-LABEL: func @write_to_same_tensor_in_loop_in_place(
	func.func @write_to_same_tensor_in_loop_in_place(			func.func @write_to_same_tensor_in_loop_in_place(
	%A : tensor<?xf32> {linalg.inplaceable = true},			%A : tensor<?xf32> {linalg.inplaceable = true},
	%lb : index, %ub : index, %step : index, %sz: index)			%lb : index, %ub : index, %step : index, %sz: index)
	-> (tensor<?xf32>)			-> (tensor<?xf32>)
	{			{
	// CHECK: scf.for {{.*}} {			// CHECK: scf.for {{.*}} {
	%r0 = scf.for %i = %lb to %ub step %step iter_args(%t = %A) -> (tensor<?xf32>) {			%r0 = scf.for %i = %lb to %ub step %step iter_args(%t = %A) -> (tensor<?xf32>) {
	%B = linalg.init_tensor [%sz] : tensor<?xf32>			%B = bufferization.alloc_tensor [%sz] : tensor<?xf32>
	%i2 = arith.index_cast %i : index to i32			%i2 = arith.index_cast %i : index to i32
	%i3 = arith.sitofp %i2 : i32 to f32			%i3 = arith.sitofp %i2 : i32 to f32
	// The tensor.insert is in-place because the %B is defined inside the loop.			// The tensor.insert is in-place because the %B is defined inside the loop.
	// CHECK: tensor.insert			// CHECK: tensor.insert
	// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none"]}			// CHECK-SAME: {__inplace_operands_attr__ = ["none", "true", "none"]}
	%B2 = tensor.insert %i3 into %B[%i] : tensor<?xf32>			%B2 = tensor.insert %i3 into %B[%i] : tensor<?xf32>
	// CHECK: tensor.insert_slice			// CHECK: tensor.insert_slice
	// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}			// CHECK-SAME: {__inplace_operands_attr__ = ["true", "true", "none", "none"]}
	%A2 = tensor.insert_slice %B2 into %t[%i][%sz][1] : tensor<?xf32> into tensor<?xf32>			%A2 = tensor.insert_slice %B2 into %t[%i][%sz][1] : tensor<?xf32> into tensor<?xf32>
	scf.yield %A2 : tensor<?xf32>			scf.yield %A2 : tensor<?xf32>
	}			}
	// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}			// CHECK: } {__inplace_operands_attr__ = ["none", "none", "none", "true"]}

	return %r0 : tensor<?xf32>			return %r0 : tensor<?xf32>
	}			}

mlir/test/Dialect/SCF/one-shot-bufferize.mlir

	Show First 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	// CHECK: %[[alloc:.]] = memref.alloc(%{{.}})			// CHECK: %[[alloc:.]] = memref.alloc(%{{.}})
	// CHECK: %[[clone:.*]] = bufferization.clone %[[alloc]]			// CHECK: %[[clone:.*]] = bufferization.clone %[[alloc]]
	// CHECK: memref.dealloc %[[alloc]]			// CHECK: memref.dealloc %[[alloc]]
	// CHECK: %[[r:.]] = memref.load %[[clone]][%{{.}}]			// CHECK: %[[r:.]] = memref.load %[[clone]][%{{.}}]
	// CHECK: memref.dealloc %[[clone]]			// CHECK: memref.dealloc %[[clone]]
	// CHECK: return %[[r]]			// CHECK: return %[[r]]
	func.func @scf_execute_region_yield_non_equivalent(%i: index, %j: index) -> f32 {			func.func @scf_execute_region_yield_non_equivalent(%i: index, %j: index) -> f32 {
	%r = scf.execute_region -> (tensor<?xf32>) {			%r = scf.execute_region -> (tensor<?xf32>) {
	%t2 = linalg.init_tensor [%i] : tensor<?xf32>			%t2 = bufferization.alloc_tensor [%i] : tensor<?xf32>
	scf.yield %t2 : tensor<?xf32>			scf.yield %t2 : tensor<?xf32>
	}			}
	%f = tensor.extract %r[%j] : tensor<?xf32>			%f = tensor.extract %r[%j] : tensor<?xf32>
	return %f : f32			return %f : f32
	}			}

	// -----			// -----

	Show All 24 Lines
	// Note: This bufferizes to inefficient code, but bufferization should not see			// Note: This bufferizes to inefficient code, but bufferization should not see
	// such IR in the first place. The iter_arg would canonicalize away. This test			// such IR in the first place. The iter_arg would canonicalize away. This test
	// case is just to ensure that the bufferization generates correct code.			// case is just to ensure that the bufferization generates correct code.

	// CHECK-LABEL: func @scf_for_yield_allocation(			// CHECK-LABEL: func @scf_for_yield_allocation(
	// CHECK-SAME: %[[t:.*]]: memref<?xf32			// CHECK-SAME: %[[t:.*]]: memref<?xf32
	// CHECK: %[[cloned:.*]] = bufferization.clone %[[t]]			// CHECK: %[[cloned:.*]] = bufferization.clone %[[t]]
	// CHECK: %[[for:.]] = scf.for {{.}} iter_args(%[[iter:.*]] = %[[cloned]])			// CHECK: %[[for:.]] = scf.for {{.}} iter_args(%[[iter:.*]] = %[[cloned]])
	// This alloc is for the linalg.init_tensor.			// This alloc is for the bufferization.alloc_tensor.
	// CHECK-DAG: %[[alloc2:.]] = memref.alloc(%{{.}})			// CHECK-DAG: %[[alloc2:.]] = memref.alloc(%{{.}})
	// CHECK-DAG: memref.dealloc %[[iter]]			// CHECK-DAG: memref.dealloc %[[iter]]
	// This alloc is for the scf.yield.			// This alloc is for the scf.yield.
	// CHECK: %[[alloc3:.]] = memref.alloc(%{{.}})			// CHECK: %[[alloc3:.]] = memref.alloc(%{{.}})
	// CHECK: memref.copy %[[alloc2]], %[[alloc3]]			// CHECK: memref.copy %[[alloc2]], %[[alloc3]]
	// CHECK: memref.dealloc %[[alloc2]]			// CHECK: memref.dealloc %[[alloc2]]
	// CHECK: %[[casted3:.*]] = memref.cast %[[alloc3]]			// CHECK: %[[casted3:.*]] = memref.cast %[[alloc3]]
	// CHECK: scf.yield %[[casted3]]			// CHECK: scf.yield %[[casted3]]
	// CHECK: return %[[for]]			// CHECK: return %[[for]]
	func.func @scf_for_yield_allocation(%t: tensor<?xf32>, %lb : index, %ub : index,			func.func @scf_for_yield_allocation(%t: tensor<?xf32>, %lb : index, %ub : index,
	%step : index) -> tensor<?xf32> {			%step : index) -> tensor<?xf32> {
	%r = scf.for %i = %lb to %ub step %step iter_args(%a = %t) -> tensor<?xf32> {			%r = scf.for %i = %lb to %ub step %step iter_args(%a = %t) -> tensor<?xf32> {
	%t2 = linalg.init_tensor [%i] : tensor<?xf32>			%t2 = bufferization.alloc_tensor [%i] : tensor<?xf32>
	scf.yield %t2 : tensor<?xf32>			scf.yield %t2 : tensor<?xf32>
	}			}

	return %r : tensor<?xf32>			return %r : tensor<?xf32>
	}			}

	// -----			// -----

	▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

mlir/test/Dialect/Tensor/one-shot-bufferize.mlir

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	func.func @rank_reducing(
%i: index, %j: index,		%i: index, %j: index,
%arg0: tensor<8x18x32xf32>)		%arg0: tensor<8x18x32xf32>)
-> tensor<?x1x6x8xf32> {		-> tensor<?x1x6x8xf32> {
%c1 = arith.constant 1 : index		%c1 = arith.constant 1 : index
%c6 = arith.constant 6 : index		%c6 = arith.constant 6 : index
%c8 = arith.constant 8 : index		%c8 = arith.constant 8 : index
%c32 = arith.constant 32 : index		%c32 = arith.constant 32 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = linalg.init_tensor [4, 1, 6, 8] : tensor<4x1x6x8xf32>		%0 = bufferization.alloc_tensor [4, 1, 6, 8] : tensor<4x1x6x8xf32>
%1 = tensor.cast %0 : tensor<4x1x6x8xf32> to tensor<?x1x6x8xf32>		%1 = tensor.cast %0 : tensor<4x1x6x8xf32> to tensor<?x1x6x8xf32>
%2 = linalg.init_tensor [1, 6, 8] : tensor<1x6x8xf32>		%2 = bufferization.alloc_tensor [1, 6, 8] : tensor<1x6x8xf32>
%5 = scf.for %arg7 = %c0 to %c32 step %c8 iter_args(%arg8 = %1) -> (tensor<?x1x6x8xf32>) {		%5 = scf.for %arg7 = %c0 to %c32 step %c8 iter_args(%arg8 = %1) -> (tensor<?x1x6x8xf32>) {
%7 = affine.apply affine_map<(d0) -> (d0 ceildiv 8)>(%arg7)		%7 = affine.apply affine_map<(d0) -> (d0 ceildiv 8)>(%arg7)
%8 = tensor.extract_slice %arg0[%i, %j, %arg7] [1, 6, 8] [1, 1, 1] : tensor<8x18x32xf32> to tensor<1x6x8xf32>		%8 = tensor.extract_slice %arg0[%i, %j, %arg7] [1, 6, 8] [1, 1, 1] : tensor<8x18x32xf32> to tensor<1x6x8xf32>
%9 = scf.for %arg9 = %c0 to %c6 step %c1 iter_args(%arg10 = %2) -> (tensor<1x6x8xf32>) {		%9 = scf.for %arg9 = %c0 to %c6 step %c1 iter_args(%arg10 = %2) -> (tensor<1x6x8xf32>) {
%11 = tensor.extract_slice %8[0, %arg9, 0] [1, 1, 8] [1, 1, 1] : tensor<1x6x8xf32> to tensor<1x1x8xf32>		%11 = tensor.extract_slice %8[0, %arg9, 0] [1, 1, 8] [1, 1, 1] : tensor<1x6x8xf32> to tensor<1x1x8xf32>
%12 = tensor.insert_slice %11 into %arg10[0, %arg9, 0] [1, 1, 8] [1, 1, 1] : tensor<1x1x8xf32> into tensor<1x6x8xf32>		%12 = tensor.insert_slice %11 into %arg10[0, %arg9, 0] [1, 1, 8] [1, 1, 1] : tensor<1x1x8xf32> into tensor<1x6x8xf32>
scf.yield %12 : tensor<1x6x8xf32>		scf.yield %12 : tensor<1x6x8xf32>
}		}
%10 = tensor.insert_slice %9 into %arg8[%7, 0, 0, 0] [1, 1, 6, 8] [1, 1, 1, 1] : tensor<1x6x8xf32> into tensor<?x1x6x8xf32>		%10 = tensor.insert_slice %9 into %arg8[%7, 0, 0, 0] [1, 1, 6, 8] [1, 1, 1, 1] : tensor<1x6x8xf32> into tensor<?x1x6x8xf32>
scf.yield %10 : tensor<?x1x6x8xf32>		scf.yield %10 : tensor<?x1x6x8xf32>
}		}
return %5: tensor<?x1x6x8xf32>		return %5: tensor<?x1x6x8xf32>
}		}

mlir/test/Integration/Dialect/Linalg/CPU/test-one-shot-bufferize.mlir

Show All 10 Lines

func.func @init_and_dot(%arg0: tensor<64xf32>, %arg1: tensor<64xf32>, %arg2: tensor<f32> {linalg.inplaceable = true}) -> tensor<f32> {		func.func @init_and_dot(%arg0: tensor<64xf32>, %arg1: tensor<64xf32>, %arg2: tensor<f32> {linalg.inplaceable = true}) -> tensor<f32> {
%c64 = arith.constant 64 : index		%c64 = arith.constant 64 : index
%cst = arith.constant 0.000000e+00 : f32		%cst = arith.constant 0.000000e+00 : f32
%c2 = arith.constant 2 : index		%c2 = arith.constant 2 : index
%c0 = arith.constant 0 : index		%c0 = arith.constant 0 : index
%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<f32>) -> tensor<f32>		%0 = linalg.fill ins(%cst : f32) outs(%arg2 : tensor<f32>) -> tensor<f32>
%1 = affine.apply #map0(%c0, %c64)[%c2]		%1 = affine.apply #map0(%c0, %c64)[%c2]
%2 = linalg.init_tensor [%1, 2] : tensor<?x2xf32>		%2 = bufferization.alloc_tensor [%1, 2] : tensor<?x2xf32>
%3 = scf.for %arg3 = %c0 to %c64 step %c2 iter_args(%arg4 = %2) -> (tensor<?x2xf32>) {		%3 = scf.for %arg3 = %c0 to %c64 step %c2 iter_args(%arg4 = %2) -> (tensor<?x2xf32>) {
%8 = affine.apply #map1(%arg3, %c0)[%c2]		%8 = affine.apply #map1(%arg3, %c0)[%c2]
%9 = tensor.extract_slice %arg1[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>		%9 = tensor.extract_slice %arg1[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>
%10 = tensor.cast %9 : tensor<2xf32> to tensor<?xf32>		%10 = tensor.cast %9 : tensor<2xf32> to tensor<?xf32>
%11 = tensor.pad %10 low[%c0] high[%c0] {		%11 = tensor.pad %10 low[%c0] high[%c0] {
^bb0(%arg5: index):		^bb0(%arg5: index):
tensor.yield %cst : f32		tensor.yield %cst : f32
} : tensor<?xf32> to tensor<2xf32>		} : tensor<?xf32> to tensor<2xf32>
%12 = tensor.insert_slice %11 into %arg4[%8, 0] [1, 2] [1, 1] : tensor<2xf32> into tensor<?x2xf32>		%12 = tensor.insert_slice %11 into %arg4[%8, 0] [1, 2] [1, 1] : tensor<2xf32> into tensor<?x2xf32>
scf.yield %12 : tensor<?x2xf32>		scf.yield %12 : tensor<?x2xf32>
}		}

// %B = tensor.cast %3 : tensor<?x2xf32> to tensor<*xf32>		// %B = tensor.cast %3 : tensor<?x2xf32> to tensor<*xf32>
// call @printMemrefF32(%B) : (tensor<*xf32>) -> ()		// call @printMemrefF32(%B) : (tensor<*xf32>) -> ()

%4 = affine.apply #map0(%c0, %c64)[%c2]		%4 = affine.apply #map0(%c0, %c64)[%c2]
%5 = linalg.init_tensor [%4, 2] : tensor<?x2xf32>		%5 = bufferization.alloc_tensor [%4, 2] : tensor<?x2xf32>
%6 = scf.for %arg3 = %c0 to %c64 step %c2 iter_args(%arg4 = %5) -> (tensor<?x2xf32>) {		%6 = scf.for %arg3 = %c0 to %c64 step %c2 iter_args(%arg4 = %5) -> (tensor<?x2xf32>) {
%8 = affine.apply #map1(%arg3, %c0)[%c2]		%8 = affine.apply #map1(%arg3, %c0)[%c2]
%9 = tensor.extract_slice %arg0[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>		%9 = tensor.extract_slice %arg0[%arg3] [2] [1] : tensor<64xf32> to tensor<2xf32>
%10 = tensor.cast %9 : tensor<2xf32> to tensor<?xf32>		%10 = tensor.cast %9 : tensor<2xf32> to tensor<?xf32>
%11 = tensor.pad %10 low[%c0] high[%c0] {		%11 = tensor.pad %10 low[%c0] high[%c0] {
^bb0(%arg5: index):		^bb0(%arg5: index):
tensor.yield %cst : f32		tensor.yield %cst : f32
} : tensor<?xf32> to tensor<2xf32>		} : tensor<?xf32> to tensor<2xf32>
Show All 30 Lines	func.func @init_and_dot(%arg0: tensor<64xf32>, %arg1: tensor<64xf32>, %arg2: tensor<f32> {linalg.inplaceable = true}) -> tensor<f32> {
return %7 : tensor<f32>		return %7 : tensor<f32>
}		}

func.func @main() {		func.func @main() {
%v0 = arith.constant 0.0 : f32		%v0 = arith.constant 0.0 : f32
%v1 = arith.constant 1.0 : f32		%v1 = arith.constant 1.0 : f32
%v2 = arith.constant 2.0 : f32		%v2 = arith.constant 2.0 : f32

%A = linalg.init_tensor [64] : tensor<64xf32>		%A = bufferization.alloc_tensor [64] : tensor<64xf32>
%B = linalg.init_tensor [64] : tensor<64xf32>		%B = bufferization.alloc_tensor [64] : tensor<64xf32>
%C = linalg.init_tensor [] : tensor<f32>		%C = bufferization.alloc_tensor [] : tensor<f32>
%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>) -> tensor<64xf32>		%AA = linalg.fill ins(%v1 : f32) outs(%A : tensor<64xf32>) -> tensor<64xf32>
%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>) -> tensor<64xf32>		%BB = linalg.fill ins(%v2 : f32) outs(%B : tensor<64xf32>) -> tensor<64xf32>
%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>) -> tensor<f32>		%CC = linalg.fill ins(%v0 : f32) outs(%C : tensor<f32>) -> tensor<f32>

%res = call @init_and_dot(%AA, %BB, %CC) :		%res = call @init_and_dot(%AA, %BB, %CC) :
(tensor<64xf32>, tensor<64xf32>, tensor<f32>) -> tensor<f32>		(tensor<64xf32>, tensor<64xf32>, tensor<f32>) -> tensor<f32>

%res2 = tensor.cast %res: tensor<f32> to tensor<*xf32>		%res2 = tensor.cast %res: tensor<f32> to tensor<*xf32>
Show All 9 Lines

mlir/test/Integration/Dialect/Linalg/CPU/test-padtensor.mlir

	// RUN: mlir-opt %s -test-linalg-transform-patterns=test-linalg-to-vector-patterns \			// RUN: mlir-opt %s -test-linalg-transform-patterns=test-linalg-to-vector-patterns \
	// RUN: -linalg-bufferize -arith-bufferize -tensor-bufferize -func-bufferize \			// RUN: -linalg-init-tensor-to-alloc-tensor -linalg-bufferize -arith-bufferize \
				// RUN: -bufferization-bufferize -tensor-bufferize -func-bufferize \
	// RUN: -finalizing-bufferize -buffer-deallocation \			// RUN: -finalizing-bufferize -buffer-deallocation \
	// RUN: -convert-linalg-to-loops -convert-scf-to-cf -convert-linalg-to-llvm -convert-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \			// RUN: -convert-linalg-to-loops -convert-scf-to-cf -convert-linalg-to-llvm -convert-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts \| \
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext,%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext,%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s


	func.func @main() {			func.func @main() {
	Show All 23 Lines

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

Show First 20 Lines • Show All 8,686 Lines • ▼ Show 20 Lines	td_library(
srcs = [		srcs = [
"include/mlir/Dialect/Bufferization/IR/BufferizationBase.td",		"include/mlir/Dialect/Bufferization/IR/BufferizationBase.td",
"include/mlir/Dialect/Bufferization/IR/BufferizationOps.td",		"include/mlir/Dialect/Bufferization/IR/BufferizationOps.td",
],		],
includes = ["include"],		includes = ["include"],
deps = [		deps = [
":AllocationOpInterfaceTdFiles",		":AllocationOpInterfaceTdFiles",
":CopyOpInterfaceTdFiles",		":CopyOpInterfaceTdFiles",
		":InferTypeOpInterfaceTdFiles",
":OpBaseTdFiles",		":OpBaseTdFiles",
":SideEffectInterfacesTdFiles",		":SideEffectInterfacesTdFiles",
],		],
)		)

gentbl_cc_library(		gentbl_cc_library(
name = "BufferizationBaseIncGen",		name = "BufferizationBaseIncGen",
strip_include_prefix = "include",		strip_include_prefix = "include",
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	srcs = [
"lib/Dialect/Bufferization/IR/BufferizationOps.cpp",		"lib/Dialect/Bufferization/IR/BufferizationOps.cpp",
],		],
hdrs = [		hdrs = [
"include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h",		"include/mlir/Dialect/Bufferization/IR/BufferizableOpInterface.h",
"include/mlir/Dialect/Bufferization/IR/Bufferization.h",		"include/mlir/Dialect/Bufferization/IR/Bufferization.h",
],		],
includes = ["include"],		includes = ["include"],
deps = [		deps = [
		":Affine",
":AllocationOpInterface",		":AllocationOpInterface",
":ArithmeticDialect",		":ArithmeticDialect",
":BufferizableOpInterfaceIncGen",		":BufferizableOpInterfaceIncGen",
":BufferizationBaseIncGen",		":BufferizationBaseIncGen",
":BufferizationOpsIncGen",		":BufferizationOpsIncGen",
":CopyOpInterface",		":CopyOpInterface",
":FuncDialect",		":FuncDialect",
":IR",		":IR",
		":InferTypeOpInterface",
":MemRefDialect",		":MemRefDialect",
":Support",		":Support",
":TensorDialect",		":TensorDialect",
"//llvm:Support",		"//llvm:Support",
],		],
)		)

gentbl_cc_library(		gentbl_cc_library(
Show All 30 Lines	deps = [
":BufferizationDialect",		":BufferizationDialect",
":BufferizationPassIncGen",		":BufferizationPassIncGen",
":ControlFlowInterfaces",		":ControlFlowInterfaces",
":FuncDialect",		":FuncDialect",
":IR",		":IR",
":LoopLikeInterface",		":LoopLikeInterface",
":MemRefDialect",		":MemRefDialect",
":Pass",		":Pass",
		":TensorDialect",
":Transforms",		":Transforms",
"//llvm:Support",		"//llvm:Support",
],		],
)		)

cc_library(		cc_library(
name = "BufferizationToMemRef",		name = "BufferizationToMemRef",
srcs = [		srcs = [
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferization] Add bufferization.alloc_tensor opClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 431107

mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationBase.td

mlir/include/mlir/Dialect/Bufferization/IR/BufferizationOps.td

mlir/include/mlir/Dialect/Bufferization/Transforms/AllocTensorElimination.h

mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.h

mlir/include/mlir/Dialect/Bufferization/Transforms/Passes.td

mlir/include/mlir/Dialect/Linalg/IR/LinalgOps.td

mlir/include/mlir/Dialect/Linalg/Passes.h

mlir/include/mlir/Dialect/Linalg/Passes.td

mlir/include/mlir/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.h

mlir/lib/Dialect/Bufferization/IR/BufferizationDialect.cpp

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp

mlir/lib/Dialect/Bufferization/IR/CMakeLists.txt

mlir/lib/Dialect/Bufferization/Transforms/AllocTensorElimination.cpp

mlir/lib/Dialect/Bufferization/Transforms/Bufferize.cpp

mlir/lib/Dialect/Bufferization/Transforms/CMakeLists.txt

mlir/lib/Dialect/Bufferization/Transforms/OneShotAnalysis.cpp

mlir/lib/Dialect/Bufferization/Transforms/OneShotModuleBufferize.cpp

mlir/lib/Dialect/Linalg/Transforms/BufferizableOpInterfaceImpl.cpp

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

mlir/lib/Dialect/Linalg/Transforms/InitTensorElimination.cpp

mlir/lib/Dialect/Linalg/Transforms/InitTensorToAllocTensor.cpp

mlir/python/mlir/dialects/BufferizationOps.td

mlir/python/mlir/dialects/_bufferization_ops_ext.py

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-alloc-tensor-elimination.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-allow-return-allocs.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize-partial.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-bufferize.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-allow-return-allocs.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-analysis.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize-invalid.mlir

mlir/test/Dialect/Bufferization/Transforms/one-shot-module-bufferize.mlir

mlir/test/Dialect/Bufferization/canonicalize.mlir

mlir/test/Dialect/Bufferization/invalid.mlir

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis-2fill-extract-matmul-all-perms.mlir

mlir/test/Dialect/Linalg/one-shot-bufferize-analysis-init-tensor-elimination.mlir

mlir/test/Dialect/Linalg/one-shot-bufferize-init-tensor-elimination.mlir

mlir/test/Dialect/Linalg/one-shot-bufferize.mlir

mlir/test/Dialect/SCF/one-shot-bufferize-analysis.mlir

mlir/test/Dialect/SCF/one-shot-bufferize.mlir

mlir/test/Dialect/Tensor/one-shot-bufferize.mlir

mlir/test/Integration/Dialect/Linalg/CPU/test-one-shot-bufferize.mlir

mlir/test/Integration/Dialect/Linalg/CPU/test-padtensor.mlir

utils/bazel/llvm-project-overlay/mlir/BUILD.bazel

[mlir][bufferization] Add bufferization.alloc_tensor op
ClosedPublic