This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Bufferization/IR/
-
mlir/
-
Dialect/
-
Bufferization/
-
IR/
-
Bufferization.h
-
lib/Dialect/Bufferization/
-
Dialect/
-
Bufferization/
-
IR/
1/3
BufferizationOps.cpp
-
Transforms/
-
Bufferize.cpp
-
test/Dialect/Bufferization/Transforms/
-
Dialect/
-
Bufferization/
-
Transforms/
-
finalizing-bufferize.mlir

Differential D119935

[mlir][bufferize] Partly support memrefs with non-standard layout in `finalizing-bufferize`
ClosedPublic

Authored by springerm on Feb 16 2022, 5:28 AM.

Download Raw Diff

Details

Reviewers

bkramer
ftynse
silvas
nicolasvasilache

Commits

rGfa7c8cb4d01e: [mlir][bufferize] Support memrefs with non-standard layout in `finalizing…

Summary

This change adds support in finalizing-bufferize for IR such as:

%0 = bufferization.to_tensor %m : memref<?xf32, affine_map<(d0)[s0] -> (d0 + s0)>
%1 = bufferization.to_memref %0 : memref<?xf32>

Depending on the exact source/dest type, this folds to a memref.cast or requires a reallocation + copy.

Memref types with non-standard layouts are required to support non-copying tensor.extact_slice ops.

Note: Dest memref types with non-standard layouts are not yet supported.

The goal of this commit is to ease the transition towards One-Shot Bufferization. This change allows us to remove the existing, partial, copying bufferization pattern of tensor.extract_slice and replace it with the non-copying, BufferizableOpInterface-based variant.

Depends On D119937

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

springerm created this revision.Feb 16 2022, 5:28 AM

Herald added subscribers: sdasgup3, Groverkss, wenzhicui and 20 others. · View Herald TranscriptFeb 16 2022, 5:28 AM

springerm requested review of this revision.Feb 16 2022, 5:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2022, 5:28 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

springerm added a child revision: D119824: [mlir][bufferize][NFC] Remove obsolete tensor bufferization patterns from Linalg/Bufferize.cpp.Feb 16 2022, 5:30 AM

springerm added inline comments.

mlir/lib/Transforms/Utils/DialectConversion.cpp
2591–2595 ↗	(On Diff #409217)	@ftynse is preparing a standalone revision for this part.

Harbormaster completed remote builds in B149940: Diff 409217.Feb 16 2022, 5:48 AM

springerm edited the summary of this revision. (Show Details)Feb 16 2022, 5:56 AM

springerm added a parent revision: D119937: [mlir] call target materialization more in dialect conversion.

springerm added a reviewer: silvas.Feb 17 2022, 8:07 AM

silvas accepted this revision.Feb 17 2022, 12:01 PM

This revision is now accepted and ready to land.Feb 17 2022, 12:01 PM

rebase

Harbormaster completed remote builds in B150393: Diff 409877.Feb 18 2022, 2:00 AM

Closed by commit rGfa7c8cb4d01e: [mlir][bufferize] Support memrefs with non-standard layout in `finalizing… (authored by springerm). · Explain WhyFeb 18 2022, 2:34 AM

This revision was automatically updated to reflect the committed changes.

springerm added a commit: rGfa7c8cb4d01e: [mlir][bufferize] Support memrefs with non-standard layout in `finalizing….

mamrami added a subscriber: mamrami.Feb 1 2023, 7:10 AM

mamrami added inline comments.

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
28	It's a bit of delay, but I am using OneShot now and I have a question regarding this line.. Why do we want memory space to match? In `test/Dialect/Tensor/one-shot-bufferize.mlir:268` we have an example of a memref.copy where src and dst have different memory space. I tried to remove the check in line 31. I fail on canonicalization test since canonicalizer uses this method too. Is there a difference between canonicalizer & OneShot approaches?

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 1 2023, 7:10 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 2 others. · View Herald Transcript

springerm added inline comments.Feb 1 2023, 7:24 AM

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp
28	This function is used for folding to_tensor-to_memref pairs. If the memory space is different, we can't just fold. We could realloc and copy. But that could be expensive and potentially cause memory leaks because we don't have a good story for buffer deallocation at the moment. (Only things that are allocated by One-Shot Bufferize will get deallocated.) Note when you use One-Shot Bufferize to do a full-function bufferization, you should not run into this issue. We model all memory space copies explicitly via `bufferization.alloc_tensor`; it has a memory space attribute. At the end all to_tensor/to_memref should fold away. If they don't it would be interesting to see why that is the case.

mamrami added inline comments.Feb 1 2023, 8:27 AM

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp

I've been looking at it for quite a while today, so I have some insights.
First, here's a reproducer:

mlir-opt -allow-unregistered-dialect -one-shot-bufferize="bufferize-function-boundaries=1 allow-unknown-ops=1 allow-return-allocs=1 function-boundary-type-conversion=identity-layout-map copy-before-write=1"

Input:

module {
  func.func @foo(%arg0: tensor<16xsi8>) -> tensor<16xsi8> {
    %0 = "my.op"(%arg0) : (tensor<16xsi8>) -> tensor<16xsi8>
    return %0 : tensor<16xsi8>
  }
  func.func @bar(%arg0: memref<16xsi8>) -> memref<16xsi8, 1> {
    %0 = bufferization.to_tensor %arg0 : memref<16xsi8>
    %1 = call @foo(%0) : (tensor<16xsi8>) -> tensor<16xsi8>
    %2 = bufferization.to_memref %1 : memref<16xsi8, 1>
    return %2 : memref<16xsi8, 1>
  }
}

Output:

module {
  func.func @foo(%arg0: memref<16xsi8>) -> memref<16xsi8> {
    %0 = bufferization.to_tensor %arg0 : memref<16xsi8>
    %1 = "my.op"(%0) : (tensor<16xsi8>) -> tensor<16xsi8>
    %2 = bufferization.to_memref %1 : memref<16xsi8>
    return %2 : memref<16xsi8>
  }
  func.func @bar(%arg0: memref<16xsi8>) -> memref<16xsi8, 1> {
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<16xsi8>
    memref.copy %arg0, %alloc : memref<16xsi8> to memref<16xsi8>
    %0 = call @foo(%alloc) : (memref<16xsi8>) -> memref<16xsi8>
    %1 = bufferization.to_tensor %0 : memref<16xsi8>
    %2 = bufferization.to_memref %1 : memref<16xsi8, 1>
    return %2 : memref<16xsi8, 1>
  }
}

You mentioned full-function bufferization, but here bar has to_memref, so maybe this causes this behaviour.
I noticed that to_tensor was inserted when bufferizing the callOp, but it is not inserted to the worklist. So it stays a to_tensor and doesn't bufferize to alloc_tensor and later to alloc.
The downside of adding the to_tensor to the worklist is that for same memory space, the to_tensor - to_memref pair would fold.
Perhaps it can be added to the worklist if it still there after foldToMemrefToTensorPair..

Note to_tensor and to_memref are not supposed to bufferize. They will just fold away. Typically we have input IR such as:

func.func @bar(%arg0: memref<16xsi8>) -> memref<16xsi8, 1> {
  %0 = bufferization.to_tensor %arg0 : memref<16xsi8>
  %1 = call @foo(%0) : (tensor<16xsi8>) -> tensor<16xsi8>
  %realloc = bufferization.alloc_tensor() copy(%1) { memory_space = 1 } : tensor<16xsi8>
  %2 = bufferization.to_memref %realloc : memref<16xsi8, 1>
  return %2 : memref<16xsi8, 1>
}

Now it will just fold after bufferization.

Otherwise, the best we can do is realloc with memory space 1 + copying when folding the to_memref-to_tensor pair.

Your example will work for me, I'll just have to add a pass where I insert this alloc_tensor.
Actually I expected this to happen naturally in insertTensorCopies stage.

In D119935#4097040, @springerm wrote:

Otherwise, the best we can do is realloc with memory space 1 + copying when folding the to_memref-to_tensor pair.

In order to do that I had to delete this check of:

if (srcType.getMemorySpaceAsInt() != destType.getMemorySpaceAsInt())
  return failure();

The rest of the logic is already written and it worked for me. I wanted to upload a patch, but the canonicalizer test failed because it calls the same castOrReallocMemRefValue.
That's why I started the thread :)

Do you think it's ok to allow the realloc in case we come from oneShot?
If so, we need to decide if it is ok to allow it as part of the canonicalizer as well.

Do you think the alloc_tensor should be created at the insertTensorCopies stage?

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Bufferization/

IR/

Bufferization.h

22 lines

lib/

Dialect/

Bufferization/

IR/

BufferizationOps.cpp

150 lines

Transforms/

Bufferize.cpp

25 lines

test/

Dialect/

Bufferization/

Transforms/

finalizing-bufferize.mlir

74 lines

Diff 409884

mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h

	Show All 21 Lines

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Bufferization Dialect Operations			// Bufferization Dialect Operations
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#define GET_OP_CLASSES			#define GET_OP_CLASSES
	#include "mlir/Dialect/Bufferization/IR/BufferizationOps.h.inc"			#include "mlir/Dialect/Bufferization/IR/BufferizationOps.h.inc"

				//===----------------------------------------------------------------------===//
				// Helper functions
				//===----------------------------------------------------------------------===//

				namespace mlir {
				namespace bufferization {
				/// Try to cast the given ranked MemRef-typed value to the given ranked MemRef
				/// type. Insert a reallocation + copy if it cannot be statically guaranteed
				/// that a direct cast would be valid.
				///
				/// E.g., when casting from a ranked MemRef type with dynamic layout to a ranked
				/// MemRef type with static layout, it is not statically known whether the cast
				/// will succeed or not. Such `memref.cast` ops may fail at runtime. This
				/// function never generates such casts and conservatively inserts a copy.
				///
				/// This function returns `failure()` in case of unsupported casts. E.g., casts
				/// with differing element types or memory spaces.
				FailureOr<Value> castOrReallocMemRefValue(OpBuilder &b, Value value,
				MemRefType type);
				} // namespace bufferization
				} // namespace mlir

	#endif // MLIR_DIALECT_BUFFERIZATION_IR_BUFFERIZATION_H_			#endif // MLIR_DIALECT_BUFFERIZATION_IR_BUFFERIZATION_H_

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "mlir/Dialect/Bufferization/IR/Bufferization.h"			#include "mlir/Dialect/Bufferization/IR/Bufferization.h"
	#include "mlir/Dialect/MemRef/Utils/MemRefUtils.h"			#include "mlir/Dialect/MemRef/Utils/MemRefUtils.h"

	using namespace mlir;			using namespace mlir;
	using namespace mlir::bufferization;			using namespace mlir::bufferization;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				// Helper functions
				//===----------------------------------------------------------------------===//

				FailureOr<Value>
				mlir::bufferization::castOrReallocMemRefValue(OpBuilder &b, Value value,
				MemRefType destType) {
				auto srcType = value.getType().cast<MemRefType>();

				// Casting to the same type, nothing to do.
				if (srcType == destType)
				return value;

				// Element type, rank and memory space must match.
				mamramiUnsubmitted Not Done Reply Inline Actions It's a bit of delay, but I am using OneShot now and I have a question regarding this line.. Why do we want memory space to match? In `test/Dialect/Tensor/one-shot-bufferize.mlir:268` we have an example of a memref.copy where src and dst have different memory space. I tried to remove the check in line 31. I fail on canonicalization test since canonicalizer uses this method too. Is there a difference between canonicalizer & OneShot approaches? mamrami: It's a bit of delay, but I am using OneShot now and I have a question regarding this line.. Why…
				springermAuthorUnsubmitted Done Reply Inline Actions This function is used for folding to_tensor-to_memref pairs. If the memory space is different, we can't just fold. We could realloc and copy. But that could be expensive and potentially cause memory leaks because we don't have a good story for buffer deallocation at the moment. (Only things that are allocated by One-Shot Bufferize will get deallocated.) Note when you use One-Shot Bufferize to do a full-function bufferization, you should not run into this issue. We model all memory space copies explicitly via `bufferization.alloc_tensor`; it has a memory space attribute. At the end all to_tensor/to_memref should fold away. If they don't it would be interesting to see why that is the case. springerm: This function is used for folding to_tensor-to_memref pairs. If the memory space is different…
				mamramiUnsubmitted Not Done Reply Inline Actions I've been looking at it for quite a while today, so I have some insights. First, here's a reproducer: mlir-opt -allow-unregistered-dialect -one-shot-bufferize="bufferize-function-boundaries=1 allow-unknown-ops=1 allow-return-allocs=1 function-boundary-type-conversion=identity-layout-map copy-before-write=1" Input: module { func.func @foo(%arg0: tensor<16xsi8>) -> tensor<16xsi8> { %0 = "my.op"(%arg0) : (tensor<16xsi8>) -> tensor<16xsi8> return %0 : tensor<16xsi8> } func.func @bar(%arg0: memref<16xsi8>) -> memref<16xsi8, 1> { %0 = bufferization.to_tensor %arg0 : memref<16xsi8> %1 = call @foo(%0) : (tensor<16xsi8>) -> tensor<16xsi8> %2 = bufferization.to_memref %1 : memref<16xsi8, 1> return %2 : memref<16xsi8, 1> } } Output: module { func.func @foo(%arg0: memref<16xsi8>) -> memref<16xsi8> { %0 = bufferization.to_tensor %arg0 : memref<16xsi8> %1 = "my.op"(%0) : (tensor<16xsi8>) -> tensor<16xsi8> %2 = bufferization.to_memref %1 : memref<16xsi8> return %2 : memref<16xsi8> } func.func @bar(%arg0: memref<16xsi8>) -> memref<16xsi8, 1> { %alloc = memref.alloc() {alignment = 64 : i64} : memref<16xsi8> memref.copy %arg0, %alloc : memref<16xsi8> to memref<16xsi8> %0 = call @foo(%alloc) : (memref<16xsi8>) -> memref<16xsi8> %1 = bufferization.to_tensor %0 : memref<16xsi8> %2 = bufferization.to_memref %1 : memref<16xsi8, 1> return %2 : memref<16xsi8, 1> } } You mentioned full-function bufferization, but here `bar` has `to_memref`, so maybe this causes this behaviour. I noticed that `to_tensor` was inserted when bufferizing the callOp, but it is not inserted to the worklist. So it stays a `to_tensor` and doesn't bufferize to `alloc_tensor` and later to `alloc`. The downside of adding the `to_tensor` to the worklist is that for same memory space, the `to_tensor - to_memref` pair would fold. Perhaps it can be added to the worklist if it still there after `foldToMemrefToTensorPair`.. mamrami: I've been looking at it for quite a while today, so I have some insights. First, here's a…
				if (srcType.getElementType() != destType.getElementType())
				return failure();
				if (srcType.getMemorySpaceAsInt() != destType.getMemorySpaceAsInt())
				return failure();
				if (srcType.getRank() != destType.getRank())
				return failure();

				// In case the affine maps are different, we may need to use a copy if we go
				// from dynamic to static offset or stride (the canonicalization cannot know
				// at this point that it is really cast compatible).
				auto isGuaranteedCastCompatible = [](MemRefType source, MemRefType target) {
				int64_t sourceOffset, targetOffset;
				SmallVector<int64_t, 4> sourceStrides, targetStrides;
				if (failed(getStridesAndOffset(source, sourceStrides, sourceOffset)) \|\|
				failed(getStridesAndOffset(target, targetStrides, targetOffset)))
				return false;
				auto dynamicToStatic = [](int64_t a, int64_t b) {
				return a == MemRefType::getDynamicStrideOrOffset() &&
				b != MemRefType::getDynamicStrideOrOffset();
				};
				if (dynamicToStatic(sourceOffset, targetOffset))
				return false;
				for (auto it : zip(sourceStrides, targetStrides))
				if (dynamicToStatic(std::get<0>(it), std::get<1>(it)))
				return false;
				return true;
				};

				// Note: If `areCastCompatible`, a cast is valid, but may fail at runtime. To
				// ensure that we only generate casts that always succeed at runtime, we check
				// a fix extra conditions in `isGuaranteedCastCompatible`.
				if (memref::CastOp::areCastCompatible(srcType, destType) &&
				isGuaranteedCastCompatible(srcType, destType)) {
				Value casted = b.create<memref::CastOp>(value.getLoc(), destType, value);
				return casted;
				}

				auto loc = value.getLoc();
				SmallVector<Value, 4> dynamicOperands;
				for (int i = 0; i < destType.getRank(); ++i) {
				if (destType.getShape()[i] != ShapedType::kDynamicSize)
				continue;
				auto index = b.createOrFold<arith::ConstantIndexOp>(loc, i);
				Value size = b.create<memref::DimOp>(loc, value, index);
				dynamicOperands.push_back(size);
				}
				// TODO: Use alloc/memcpy callback from BufferizationOptions if called via
				// BufferizableOpInterface impl of ToMemrefOp.
				Value copy = b.create<memref::AllocOp>(loc, destType, dynamicOperands);
				b.create<memref::CopyOp>(loc, value, copy);
				return copy;
				}

				//===----------------------------------------------------------------------===//
	// CloneOp			// CloneOp
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	void CloneOp::getEffects(			void CloneOp::getEffects(
	SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>>			SmallVectorImpl<SideEffects::EffectInstance<MemoryEffects::Effect>>
	&effects) {			&effects) {
	effects.emplace_back(MemoryEffects::Read::get(), input(),			effects.emplace_back(MemoryEffects::Read::get(), input(),
	SideEffects::DefaultResource::get());			SideEffects::DefaultResource::get());
	▲ Show 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	/// to_memref op are different, a memref.cast is needed.			/// to_memref op are different, a memref.cast is needed.
	static LogicalResult foldToMemrefToTensorPair(RewriterBase &rewriter,			static LogicalResult foldToMemrefToTensorPair(RewriterBase &rewriter,
	ToMemrefOp toMemref,			ToMemrefOp toMemref,
	bool allowSameType = true) {			bool allowSameType = true) {
	auto memrefToTensor = toMemref.tensor().getDefiningOp<ToTensorOp>();			auto memrefToTensor = toMemref.tensor().getDefiningOp<ToTensorOp>();
	if (!memrefToTensor)			if (!memrefToTensor)
	return failure();			return failure();

	// A memref_to_tensor + tensor_to_memref with same types can be folded without			Type srcType = memrefToTensor.memref().getType();
	// inserting a cast.			Type destType = toMemref.getType();
	if (memrefToTensor.memref().getType() == toMemref.getType()) {
	if (!allowSameType)
	// Function can be configured to only handle cases where a cast is needed.			// Function can be configured to only handle cases where a cast is needed.
				if (!allowSameType && srcType == destType)
	return failure();			return failure();
	rewriter.replaceOp(toMemref, memrefToTensor.memref());
				auto rankedSrcType = srcType.dyn_cast<MemRefType>();
				auto rankedDestType = destType.dyn_cast<MemRefType>();
				auto unrankedSrcType = srcType.dyn_cast<UnrankedMemRefType>();

				// Ranked memref -> Ranked memref cast.
				if (rankedSrcType && rankedDestType) {
				FailureOr<Value> replacement = castOrReallocMemRefValue(
				rewriter, memrefToTensor.memref(), rankedDestType);
				if (failed(replacement))
				return failure();

				rewriter.replaceOp(toMemref, *replacement);
	return success();			return success();
	}			}

	// If types are definitely not cast-compatible, bail.			// Unranked memref -> Ranked memref cast: May require a copy.
	if (!memref::CastOp::areCastCompatible(memrefToTensor.memref().getType(),			// TODO: Not implemented at the moment.
	toMemref.getType()))			if (unrankedSrcType && rankedDestType)
	return failure();			return failure();

	// We already know that the types are potentially cast-compatible. However			// Unranked memref -> unranked memref cast
	// in case the affine maps are different, we may need to use a copy if we go			// Ranked memref -> unranked memref cast: No copy needed.
	// from dynamic to static offset or stride (the canonicalization cannot know			assert(memref::CastOp::areCastCompatible(srcType, destType) &&
	// at this point that it is really cast compatible).			"expected that types are cast compatible");
	auto isGuaranteedCastCompatible = [](MemRefType source, MemRefType target) {			rewriter.replaceOpWithNewOp<memref::CastOp>(toMemref, destType,
	int64_t sourceOffset, targetOffset;
	SmallVector<int64_t, 4> sourceStrides, targetStrides;
	if (failed(getStridesAndOffset(source, sourceStrides, sourceOffset)) \|\|
	failed(getStridesAndOffset(target, targetStrides, targetOffset)))
	return false;
	auto dynamicToStatic = [](int64_t a, int64_t b) {
	return a == MemRefType::getDynamicStrideOrOffset() &&
	b != MemRefType::getDynamicStrideOrOffset();
	};
	if (dynamicToStatic(sourceOffset, targetOffset))
	return false;
	for (auto it : zip(sourceStrides, targetStrides))
	if (dynamicToStatic(std::get<0>(it), std::get<1>(it)))
	return false;
	return true;
	};

	auto memrefToTensorType =
	memrefToTensor.memref().getType().dyn_cast<MemRefType>();
	auto toMemrefType = toMemref.getType().dyn_cast<MemRefType>();
	if (memrefToTensorType && toMemrefType &&
	!isGuaranteedCastCompatible(memrefToTensorType, toMemrefType)) {
	MemRefType resultType = toMemrefType;
	auto loc = toMemref.getLoc();
	SmallVector<Value, 4> dynamicOperands;
	for (int i = 0; i < resultType.getRank(); ++i) {
	if (resultType.getShape()[i] != ShapedType::kDynamicSize)
	continue;
	auto index = rewriter.createOrFold<arith::ConstantIndexOp>(loc, i);
	Value size = rewriter.create<tensor::DimOp>(loc, memrefToTensor, index);
	dynamicOperands.push_back(size);
	}
	// TODO: Use alloc/memcpy callback from BufferizationOptions if called via
	// BufferizableOpInterface impl of ToMemrefOp.
	auto copy =
	rewriter.create<memref::AllocOp>(loc, resultType, dynamicOperands);
	rewriter.create<memref::CopyOp>(loc, memrefToTensor.memref(), copy);
	rewriter.replaceOp(toMemref, {copy});
	} else
	rewriter.replaceOpWithNewOp<memref::CastOp>(toMemref, toMemref.getType(),
	memrefToTensor.memref());			memrefToTensor.memref());
	return success();			return success();
	}			}

	/// Canonicalize bufferization.to_tensor + bufferization.to_memref to			/// Canonicalize bufferization.to_tensor + bufferization.to_memref to
	/// memref.cast when type mismatches prevent `ToMemrefOp::fold` to kick in.			/// memref.cast when type mismatches prevent `ToMemrefOp::fold` to kick in.
	struct TensorLoadToMemref : public OpRewritePattern<ToMemrefOp> {			struct TensorLoadToMemref : public OpRewritePattern<ToMemrefOp> {
	using OpRewritePattern<ToMemrefOp>::OpRewritePattern;			using OpRewritePattern<ToMemrefOp>::OpRewritePattern;

	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

mlir/lib/Dialect/Bufferization/Transforms/Bufferize.cpp

Show All 39 Lines	BufferizeTypeConverter::BufferizeTypeConverter() {
// Convert UnrankedTensorType to UnrankedMemRefType.		// Convert UnrankedTensorType to UnrankedMemRefType.
addConversion([](UnrankedTensorType type) -> Type {		addConversion([](UnrankedTensorType type) -> Type {
return UnrankedMemRefType::get(type.getElementType(), 0);		return UnrankedMemRefType::get(type.getElementType(), 0);
});		});
addArgumentMaterialization(materializeToTensor);		addArgumentMaterialization(materializeToTensor);
addSourceMaterialization(materializeToTensor);		addSourceMaterialization(materializeToTensor);
addTargetMaterialization([](OpBuilder &builder, BaseMemRefType type,		addTargetMaterialization([](OpBuilder &builder, BaseMemRefType type,
ValueRange inputs, Location loc) -> Value {		ValueRange inputs, Location loc) -> Value {
assert(inputs.size() == 1);		assert(inputs.size() == 1 && "expected exactly one input");
assert(inputs[0].getType().isa<TensorType>());
		if (auto inputType = inputs[0].getType().dyn_cast<MemRefType>()) {
		// MemRef to MemRef cast.
		assert(inputType != type && "expected different types");
		// Unranked to ranked and ranked to unranked casts must be explicit.
		auto rankedDestType = type.dyn_cast<MemRefType>();
		if (!rankedDestType)
		return nullptr;
		FailureOr<Value> replacement =
		castOrReallocMemRefValue(builder, inputs[0], rankedDestType);
		if (failed(replacement))
		return nullptr;
		return *replacement;
		}

		if (inputs[0].getType().isa<TensorType>()) {
		// Tensor to MemRef cast.
return builder.create<bufferization::ToMemrefOp>(loc, type, inputs[0]);		return builder.create<bufferization::ToMemrefOp>(loc, type, inputs[0]);
		}

		llvm_unreachable("only tensor/memref input types supported");
});		});
}		}

void mlir::bufferization::populateBufferizeMaterializationLegality(		void mlir::bufferization::populateBufferizeMaterializationLegality(
ConversionTarget &target) {		ConversionTarget &target) {
target.addLegalOp<bufferization::ToTensorOp, bufferization::ToMemrefOp>();		target.addLegalOp<bufferization::ToTensorOp, bufferization::ToMemrefOp>();
}		}

▲ Show 20 Lines • Show All 205 Lines • Show Last 20 Lines

mlir/test/Dialect/Bufferization/Transforms/finalizing-bufferize.mlir

	Show All 20 Lines
	// -----			// -----

	func @unable_to_convert_lone_tensor_load(%arg0: memref<f32>) {			func @unable_to_convert_lone_tensor_load(%arg0: memref<f32>) {
	%0 = bufferization.to_tensor %arg0 : memref<f32>			%0 = bufferization.to_tensor %arg0 : memref<f32>
	// expected-error @+1 {{failed to legalize operation 'test.sink'}}			// expected-error @+1 {{failed to legalize operation 'test.sink'}}
	"test.sink"(%0) : (tensor<f32>) -> ()			"test.sink"(%0) : (tensor<f32>) -> ()
	return			return
	}			}

				// -----

				// CHECK: #[[$map1:.*]] = affine_map<(d0)[s0] -> (d0 + s0)>
				// CHECK-LABEL: func @dyn_layout_to_no_layout_cast(
				// CHECK-SAME: %[[arg:.*]]: memref<?xf32, #[[$map1]]>)
				// CHECK: %[[c0:.*]] = arith.constant 0 : index
				// CHECK: %[[dim:.*]] = memref.dim %[[arg]], %[[c0]]
				// CHECK: %[[alloc:.*]] = memref.alloc(%[[dim]]) : memref<?xf32>
				// CHECK: memref.copy %[[arg]], %[[alloc]]
				// CHECK: return %[[alloc]]
				#map1 = affine_map<(d0)[s0] -> (d0 + s0)>
				func @dyn_layout_to_no_layout_cast(%m: memref<?xf32, #map1>) -> memref<?xf32> {
				%0 = bufferization.to_tensor %m : memref<?xf32, #map1>
				%1 = bufferization.to_memref %0 : memref<?xf32>
				return %1 : memref<?xf32>
				}

				// -----

				// CHECK: #[[$map2:.]] = affine_map<(d0)[s0] -> (d0 100 + s0)>
				// CHECK-LABEL: func @fancy_layout_to_no_layout_cast(
				// CHECK-SAME: %[[arg:.*]]: memref<?xf32, #[[$map2]]>)
				// CHECK: %[[c0:.*]] = arith.constant 0 : index
				// CHECK: %[[dim:.*]] = memref.dim %[[arg]], %[[c0]]
				// CHECK: %[[alloc:.*]] = memref.alloc(%[[dim]]) : memref<?xf32>
				// CHECK: memref.copy %[[arg]], %[[alloc]]
				// CHECK: return %[[alloc]]
				#map2 = affine_map<(d0)[s0] -> (d0 * 100 + s0)>
				func @fancy_layout_to_no_layout_cast(%m: memref<?xf32, #map2>) -> memref<?xf32> {
				%0 = bufferization.to_tensor %m : memref<?xf32, #map2>
				%1 = bufferization.to_memref %0 : memref<?xf32>
				return %1 : memref<?xf32>
				}

				// -----

				// CHECK: #[[$map3:.*]] = affine_map<(d0)[s0] -> (d0 + 25)>
				// CHECK-LABEL: func @static_layout_to_no_layout_cast(
				// CHECK-SAME: %[[arg:.*]]: memref<?xf32, #[[$map3]]>)
				// CHECK: %[[c0:.*]] = arith.constant 0 : index
				// CHECK: %[[dim:.*]] = memref.dim %[[arg]], %[[c0]]
				// CHECK: %[[alloc:.*]] = memref.alloc(%[[dim]]) : memref<?xf32>
				// CHECK: memref.copy %[[arg]], %[[alloc]]
				// CHECK: return %[[alloc]]
				#map3 = affine_map<(d0)[s0] -> (d0 + 25)>
				func @static_layout_to_no_layout_cast(%m: memref<?xf32, #map3>) -> memref<?xf32> {
				%0 = bufferization.to_tensor %m : memref<?xf32, #map3>
				%1 = bufferization.to_memref %0 : memref<?xf32>
				return %1 : memref<?xf32>
				}

				// -----

				// TODO: to_memref with layout maps not supported yet. This should fold to a
				// memref.cast.
				#map4 = affine_map<(d0)[s0] -> (d0 + s0)>
				func @no_layout_to_dyn_layout_cast(%m: memref<?xf32>) -> memref<?xf32, #map4> {
				%0 = bufferization.to_tensor %m : memref<?xf32>
				// expected-error @+1 {{failed to materialize conversion for result #0 of operation 'bufferization.to_memref' that remained live after conversion}}
				%1 = bufferization.to_memref %0 : memref<?xf32, #map4>
				// expected-note @+1 {{see existing live user here}}
				return %1 : memref<?xf32, #map4>
				}

				// -----

				func @illegal_unranked_to_rank(%m: memref<*xf32>) -> memref<?xf32> {
				// expected-note @+1 {{prior use here}}
				%0 = bufferization.to_tensor %m : memref<*xf32>
				// expected-error @+1 {{expects different type than prior uses: 'tensor<?xf32>' vs 'tensor<*xf32>'}}
				%1 = bufferization.to_memref %0 : memref<?xf32>
				return %1 : memref<?xf32>
				}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferize] Partly support memrefs with non-standard layout in `finalizing-bufferize`ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 409884

mlir/include/mlir/Dialect/Bufferization/IR/Bufferization.h

mlir/lib/Dialect/Bufferization/IR/BufferizationOps.cpp

mlir/lib/Dialect/Bufferization/Transforms/Bufferize.cpp

mlir/test/Dialect/Bufferization/Transforms/finalizing-bufferize.mlir

[mlir][bufferize] Partly support memrefs with non-standard layout in `finalizing-bufferize`
ClosedPublic