This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/
-
mlir/
-
Dialect/
-
Linalg/
-
Passes.td
-
StandardOps/Transforms/
-
Transforms/
-
Passes.h
-
Passes.td
-
integration_test/Dialect/Linalg/CPU/
-
Dialect/
-
Linalg/
-
CPU/
-
test-elementwise.mlir
-
test-subtensor-insert-multiple-uses.mlir
-
test-subtensor-insert.mlir
-
test-tensor-e2e.mlir
-
test-tensor-matmul.mlir
-
lib/Dialect/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
1/5
Bufferize.cpp
-
StandardOps/Transforms/
-
Transforms/
-
CMakeLists.txt
-
TensorConstantBufferize.cpp
-
test/Dialect/
-
Dialect/
-
Linalg/
-
bufferize.mlir
-
Standard/
-
tensor-constant-bufferize.mlir

Differential D91306

[mlir] Bufferize tensor constant ops
ClosedPublic

Authored by silvas on Nov 11 2020, 4:12 PM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
jurahul

Commits

rGfaa66b1b2c7a: [mlir] Bufferize tensor constant ops

Summary

We lower them to a std.global_memref (uniqued by constant value) + a
std.get_global_memref to produce the corresponding memref value.
This allows removing Linalg's somewhat hacky lowering of tensor
constants, now that std properly supports this.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

silvas created this revision.Nov 11 2020, 4:12 PM

Herald added a reviewer: aartbik. · View Herald TranscriptNov 11 2020, 4:12 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: rdzhabarov, tatianashp, msifontes and 14 others. · View Herald Transcript

silvas requested review of this revision.Nov 11 2020, 4:12 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptNov 11 2020, 4:12 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

silvas edited the summary of this revision. (Show Details)Nov 11 2020, 4:13 PM

Herald added a subscriber: limo1996. · View Herald TranscriptNov 11 2020, 4:13 PM

silvas added a reviewer: jurahul.Nov 11 2020, 4:14 PM

Harbormaster completed remote builds in B78539: Diff 304675.Nov 11 2020, 4:37 PM

nicolasvasilache added inline comments.Nov 12 2020, 4:25 AM

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp

313

I think this proposal has far reaching consequences on transformations that are not yet well understood.
In the short-term I'd favor a first class "bufferizable_in_place" attribute on subtensor/subtensor_insert to carry the information.

That attribute may be either:

obtained as the result of some analysis / canonicalization
as a transformation invariant (e.g. the transformation that introduces it decides whether it has enough information/guarantees to add the attribute)
directly written in the IR for testing purposes.

Irrespective of this discussion, I think the root of the problem still lies on the tensor_load + tensor_to_memref canonicalization (see below).

329

I reverted this line locally after patching this revision and here is what I see:

./bin/mlir-opt -linalg-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize -convert-linalg-to-loops -print-ir-after-all ../mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert.mlir

// *** IR Dump After TensorConstantBufferize ***
module {
  global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01>
  global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01>
  func @main() {
    %0 = get_global_memref @__constant_2xf32 : memref<2xf32>
    %1 = tensor_load %0 : memref<2xf32>
    %2 = get_global_memref @__constant_1xf32 : memref<1xf32>
    %3 = tensor_load %2 : memref<1xf32>
    %4 = tensor_to_memref %3 : memref<1xf32>
    %5 = tensor_to_memref %1 : memref<2xf32>
    %6 = subview %5[0] [1] [1]  : memref<2xf32> to memref<1xf32>
    linalg.copy(%4, %6) : memref<1xf32>, memref<1xf32>
    %7 = tensor_load %5 : memref<2xf32>
    %8 = tensor_to_memref %7 : memref<2xf32>
    %9 = memref_cast %8 : memref<2xf32> to memref<*xf32>
    %10 = tensor_load %9 : memref<*xf32>
    call @print_memref_f32(%10) : (tensor<*xf32>) -> ()
    return
  }
  func @print_memref_f32(tensor<*xf32>)
}

// *** IR Dump After FuncBufferize ***
module {
  global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01>
  global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01>
  func @main() {
    %0 = get_global_memref @__constant_2xf32 : memref<2xf32>
    %1 = get_global_memref @__constant_1xf32 : memref<1xf32>
    %2 = subview %0[0] [1] [1]  : memref<2xf32> to memref<1xf32>
    linalg.copy(%1, %2) : memref<1xf32>, memref<1xf32>
    %3 = memref_cast %0 : memref<2xf32> to memref<*xf32>
    call @print_memref_f32(%3) : (memref<*xf32>) -> ()
    return
  }
  func @print_memref_f32(memref<*xf32>)
}

The problem seems to be again coming from the tensor_load + tensor_to_memref canonicalization.

The tensor_to_memref semantics is:

Create a memref from a tensor. This is equivalent to allocating a new
memref of the appropriate (possibly dynamic) shape, and then copying the
elements (as if by a tensor_store op) into the newly allocated memref.

Under these semantics, I would argue the IR is correct after TensorConstantBufferize and incorrect after FuncBufferize.

Relaxing the semantics of tensor_to_memref to not guarantee an alloc + copy would make it reasonable to require the bufferization to insert alloc + copy when it does not know better.

However, I am not sure how to relax tensor_to_memref or even phrase it:

Create a memref from a tensor %t. The resulting memref aliases the memref operand of all tensor_load operations that flows into %t .. ?

silvas added inline comments.Nov 12 2020, 10:27 AM

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp
313	First of all, this is not a proposal. This is a fact, and this fixes a miscompile that was already present in the code, even before this patch. For clarity, I'll split that out. I think that adding a BufferizesToSomethingThatWritesIntoAnOperand trait would help. Then we could have a copy insertion pass that inserts copies at the tensor level (if needed), and the bufferization patterns will assume that those copies have been inserted. "bufferizable in place" is a global property, so transformations cannot possibly know when creating IR whether it is safe to do in place. See e.g. test-tensor-matmul -- that one fails because we attempt to do a subtensor_insert into a std.constant, which happens to bufferize to something read only. A similar problem trivially occurs if the std.constant is used as the init tensor of two different linalg ops (possibly obscured by arbitrary control flow).
329	Agreed that we aren't obeying the "copy" aspect of tensor_to_memref. But that's not the problem here. The current way that we are lowering tensor_to_memref is simply emulating what would happen if you were running dialect conversion in one big pass (in that situation, tensors would be replaced directly by memrefs everywhere, and no tensor_to_memref/tensor_load pairs would be inserted to keep the IR legal). So the problem is independent of tensor_to_memref. It's a fundamental aspect of simply converting a tensor Value to a memref Value without doing anything else -- the correct semantics is that every use of a tensor Value that has been converted to memref needs to see a separate copy, if that use might mutate its operand in place. So we either need to update the dialect conversion framework to respect that, or we need to do a separate copy insertion phase that establishes the invariants.

Split out fix to subtensor bufferization into https://reviews.llvm.org/D91371

Harbormaster completed remote builds in B78655: Diff 304908.Nov 12 2020, 11:05 AM

silvas edited the summary of this revision. (Show Details)Nov 12 2020, 11:06 AM

silvas added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp
313	I split out https://reviews.llvm.org/D91371 which fixes just subtensor_insert in isolation.

LGTM once rebased on the addressed https://reviews.llvm.org/D91371.

Looking forward to the next steps :) !

update

Harbormaster completed remote builds in B78664: Diff 304923.Nov 12 2020, 11:54 AM

This revision was not accepted when it landed; it landed in state Needs Review.Nov 12 2020, 2:57 PM

This revision was landed with ongoing or failed builds.

Closed by commit rGfaa66b1b2c7a: [mlir] Bufferize tensor constant ops (authored by silvas). · Explain Why

This revision was automatically updated to reflect the committed changes.

silvas added a commit: rGfaa66b1b2c7a: [mlir] Bufferize tensor constant ops.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Passes.td

2 lines

StandardOps/

Transforms/

Passes.h

3 lines

Passes.td

13 lines

integration_test/

Dialect/

Linalg/

CPU/

test-elementwise.mlir

2 lines

test-subtensor-insert-multiple-uses.mlir

2 lines

test-subtensor-insert.mlir

22 lines

test-tensor-e2e.mlir

2 lines

test-tensor-matmul.mlir

4 lines

lib/

Dialect/

Linalg/

Transforms/

Bufferize.cpp

59 lines

StandardOps/

Transforms/

CMakeLists.txt

1 line

TensorConstantBufferize.cpp

124 lines

test/

Dialect/

Linalg/

bufferize.mlir

18 lines

Standard/

tensor-constant-bufferize.mlir

59 lines

Diff 304977

mlir/include/mlir/Dialect/Linalg/Passes.td

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	def LinalgLowerToLoops : FunctionPass<"convert-linalg-to-loops"> {
let summary = "Lower the operations from the linalg dialect into loops";		let summary = "Lower the operations from the linalg dialect into loops";
let constructor = "mlir::createConvertLinalgToLoopsPass()";		let constructor = "mlir::createConvertLinalgToLoopsPass()";
let dependentDialects = ["linalg::LinalgDialect", "scf::SCFDialect", "AffineDialect"];		let dependentDialects = ["linalg::LinalgDialect", "scf::SCFDialect", "AffineDialect"];
}		}

def LinalgBufferize : Pass<"linalg-bufferize", "ModuleOp"> {		def LinalgBufferize : Pass<"linalg-bufferize", "ModuleOp"> {
let summary = "Bufferize the linalg dialect";		let summary = "Bufferize the linalg dialect";
let constructor = "mlir::createLinalgBufferizePass()";		let constructor = "mlir::createLinalgBufferizePass()";
let dependentDialects = ["linalg::LinalgDialect", "vector::VectorDialect"];		let dependentDialects = ["linalg::LinalgDialect"];
}		}

def LinalgLowerToParallelLoops		def LinalgLowerToParallelLoops
: FunctionPass<"convert-linalg-to-parallel-loops"> {		: FunctionPass<"convert-linalg-to-parallel-loops"> {
let summary = "Lower the operations from the linalg dialect into parallel "		let summary = "Lower the operations from the linalg dialect into parallel "
"loops";		"loops";
let constructor = "mlir::createConvertLinalgToParallelLoopsPass()";		let constructor = "mlir::createConvertLinalgToParallelLoopsPass()";
let dependentDialects = ["AffineDialect", "linalg::LinalgDialect", "scf::SCFDialect"];		let dependentDialects = ["AffineDialect", "linalg::LinalgDialect", "scf::SCFDialect"];
Show All 40 Lines

mlir/include/mlir/Dialect/StandardOps/Transforms/Passes.h

Show All 29 Lines	void populateStdBufferizePatterns(MLIRContext *context,
OwningRewritePatternList &patterns);		OwningRewritePatternList &patterns);

/// Creates an instance of std bufferization pass.		/// Creates an instance of std bufferization pass.
std::unique_ptr<Pass> createStdBufferizePass();		std::unique_ptr<Pass> createStdBufferizePass();

/// Creates an instance of func bufferization pass.		/// Creates an instance of func bufferization pass.
std::unique_ptr<Pass> createFuncBufferizePass();		std::unique_ptr<Pass> createFuncBufferizePass();

		/// Creates an instance of tensor constant bufferization pass.
		std::unique_ptr<Pass> createTensorConstantBufferizePass();

/// Creates an instance of the StdExpand pass that legalizes Std		/// Creates an instance of the StdExpand pass that legalizes Std
/// dialect ops to be convertible to LLVM. For example,		/// dialect ops to be convertible to LLVM. For example,
/// `std.ceildivi_signed` gets transformed to a number of std operations,		/// `std.ceildivi_signed` gets transformed to a number of std operations,
/// which can be lowered to LLVM; `memref_reshape` gets converted to		/// which can be lowered to LLVM; `memref_reshape` gets converted to
/// `memref_reinterpret_cast`.		/// `memref_reinterpret_cast`.
std::unique_ptr<Pass> createStdExpandOpsPass();		std::unique_ptr<Pass> createStdExpandOpsPass();

/// Collects a set of patterns to rewrite ops within the Std dialect.		/// Collects a set of patterns to rewrite ops within the Std dialect.
Show All 14 Lines

mlir/include/mlir/Dialect/StandardOps/Transforms/Passes.td

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	let description = [{
- block arguments		- block arguments
- the result of tensor_load		- the result of tensor_load
Other values of tensor type should be eliminated by earlier		Other values of tensor type should be eliminated by earlier
bufferization passes.		bufferization passes.
}];		}];
let constructor = "mlir::createFuncBufferizePass()";		let constructor = "mlir::createFuncBufferizePass()";
}		}

		def TensorConstantBufferize : Pass<"tensor-constant-bufferize", "ModuleOp"> {
		let summary = "Bufferize tensor constants.";
		let description = [{
		This pass bufferizes tensor constants.

		This pass needs to be a module pass because it inserts std.global_memref
		ops into the module, which cannot be done safely from a function pass due to
		multi-threading. Most other bufferization passes can run in parallel at
		function granularity.
		}];
		let constructor = "mlir::createTensorConstantBufferizePass()";
		}

#endif // MLIR_DIALECT_STANDARD_TRANSFORMS_PASSES		#endif // MLIR_DIALECT_STANDARD_TRANSFORMS_PASSES

mlir/integration_test/Dialect/Linalg/CPU/test-elementwise.mlir

	// RUN: mlir-opt %s -convert-elementwise-to-linalg -std-bufferize -linalg-bufferize -func-bufferize -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \			// RUN: mlir-opt %s -convert-elementwise-to-linalg -std-bufferize -tensor-constant-bufferize -linalg-bufferize -func-bufferize -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%a = constant dense<[1.0, 2.0, 3.0]> : tensor<3xf32>			%a = constant dense<[1.0, 2.0, 3.0]> : tensor<3xf32>
	%b = constant dense<[10.0, 20.0, 30.0]> : tensor<3xf32>			%b = constant dense<[10.0, 20.0, 30.0]> : tensor<3xf32>

	Show All 10 Lines

mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert-multiple-uses.mlir

	// RUN: mlir-opt %s -linalg-bufferize -std-bufferize -func-bufferize \			// RUN: mlir-opt %s -linalg-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize \
	// RUN: -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \			// RUN: -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%const = constant dense<10.0> : tensor<2xf32>			%const = constant dense<10.0> : tensor<2xf32>
	%insert_val = constant dense<20.0> : tensor<1xf32>			%insert_val = constant dense<20.0> : tensor<1xf32>
	Show All 26 Lines

mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert.mlir

This file was added.

				// RUN: mlir-opt %s -linalg-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize \
				// RUN: -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \
				// RUN: mlir-cpu-runner -e main -entry-point-result=void \
				// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
				// RUN: \| FileCheck %s

				func @main() {
				%const = constant dense<10.0> : tensor<2xf32>
				%insert_val = constant dense<20.0> : tensor<1xf32>
				%inserted = subtensor_insert %insert_val into %const[0][1][1] : tensor<1xf32> into tensor<2xf32>

				%unranked = tensor_cast %inserted : tensor<2xf32> to tensor<*xf32>
				call @print_memref_f32(%unranked) : (tensor<*xf32>) -> ()

				// CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
				// CHECK-SAME: rank = 1 offset = 0 sizes = [2] strides = [1] data =
				// CHECK-NEXT: [20, 10]

				return
				}

				func @print_memref_f32(%ptr : tensor<*xf32>)

mlir/integration_test/Dialect/Linalg/CPU/test-tensor-e2e.mlir

	// RUN: mlir-opt %s -std-bufferize -linalg-bufferize -func-bufferize -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \			// RUN: mlir-opt %s -tensor-constant-bufferize -std-bufferize -linalg-bufferize -func-bufferize -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func @foo() -> tensor<4xf32> {			func @foo() -> tensor<4xf32> {
	%0 = constant dense<[1.0, 2.0, 3.0, 4.0]> : tensor<4xf32>			%0 = constant dense<[1.0, 2.0, 3.0, 4.0]> : tensor<4xf32>
	return %0 : tensor<4xf32>			return %0 : tensor<4xf32>
	}			}
	Show All 25 Lines

mlir/integration_test/Dialect/Linalg/CPU/test-tensor-matmul.mlir

	// RUN: mlir-opt %s -linalg-bufferize -std-bufferize -func-bufferize \			// RUN: mlir-opt %s -linalg-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize \
	// RUN: -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \			// RUN: -convert-linalg-to-loops -convert-linalg-to-llvm -convert-std-to-llvm \| \
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	// RUN: mlir-opt %s -linalg-tile="linalg-tile-sizes=1,2,3" -linalg-bufferize \			// RUN: mlir-opt %s -linalg-tile="linalg-tile-sizes=1,2,3" -linalg-bufferize \
	// RUN: -scf-bufferize -std-bufferize -func-bufferize -convert-linalg-to-loops \			// RUN: -scf-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize -convert-linalg-to-loops \
	// RUN: -convert-scf-to-std -convert-linalg-to-llvm \| \			// RUN: -convert-scf-to-std -convert-linalg-to-llvm \| \
	// RUN: mlir-cpu-runner -e main -entry-point-result=void \			// RUN: mlir-cpu-runner -e main -entry-point-result=void \
	// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \			// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func @main() {			func @main() {
	%A = constant dense<[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]> : tensor<2x3xf32>			%A = constant dense<[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]> : tensor<2x3xf32>
	%B = constant dense<[[1.0, 2.0, 3.0, 4.0],			%B = constant dense<[[1.0, 2.0, 3.0, 4.0],
	Show All 19 Lines

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp

Show First 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	matchAndRewrite(SubTensorInsertOp op, ArrayRef<Value> operands,
SubTensorInsertOpAdaptor adaptor(operands,		SubTensorInsertOpAdaptor adaptor(operands,
op.getOperation()->getAttrDictionary());		op.getOperation()->getAttrDictionary());
Value sourceMemRef = adaptor.source();		Value sourceMemRef = adaptor.source();
assert(sourceMemRef.getType().isa<MemRefType>());		assert(sourceMemRef.getType().isa<MemRefType>());

// For now, be conservative and copy the converted input memref.		// For now, be conservative and copy the converted input memref.
// In general, the converted input memref here could be aliased or could		// In general, the converted input memref here could be aliased or could
// point into constant memory, so mutating it would lead to miscompilations.		// point into constant memory, so mutating it would lead to miscompilations.
Value destMemRef = cloneMemref(op.getLoc(), adaptor.dest(), rewriter);		Value destMemRef = cloneMemref(op.getLoc(), adaptor.dest(), rewriter);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I think this proposal has far reaching consequences on transformations that are not yet well understood. In the short-term I'd favor a first class "bufferizable_in_place" attribute on subtensor/subtensor_insert to carry the information. That attribute may be either: obtained as the result of some analysis / canonicalization as a transformation invariant (e.g. the transformation that introduces it decides whether it has enough information/guarantees to add the attribute) directly written in the IR for testing purposes. Irrespective of this discussion, I think the root of the problem still lies on the tensor_load + tensor_to_memref canonicalization (see below). nicolasvasilache: I think this proposal has far reaching consequences on transformations that are not yet well…
		silvasAuthorUnsubmitted Not Done Reply Inline Actions First of all, this is not a proposal. This is a fact, and this fixes a miscompile that was already present in the code, even before this patch. For clarity, I'll split that out. I think that adding a BufferizesToSomethingThatWritesIntoAnOperand trait would help. Then we could have a copy insertion pass that inserts copies at the tensor level (if needed), and the bufferization patterns will assume that those copies have been inserted. "bufferizable in place" is a global property, so transformations cannot possibly know when creating IR whether it is safe to do in place. See e.g. test-tensor-matmul -- that one fails because we attempt to do a subtensor_insert into a std.constant, which happens to bufferize to something read only. A similar problem trivially occurs if the std.constant is used as the init tensor of two different linalg ops (possibly obscured by arbitrary control flow). silvas: First of all, this is not a proposal. This is a fact, and this fixes a miscompile that was…
		silvasAuthorUnsubmitted Done Reply Inline Actions I split out https://reviews.llvm.org/D91371 which fixes just subtensor_insert in isolation. silvas: I split out https://reviews.llvm.org/D91371 which fixes just subtensor_insert in isolation.
assert(destMemRef.getType().isa<MemRefType>());		assert(destMemRef.getType().isa<MemRefType>());

// Take a subview to copy the small memref.		// Take a subview to copy the small memref.
Value subview = rewriter.create<SubViewOp>(		Value subview = rewriter.create<SubViewOp>(
op.getLoc(), destMemRef, extractFromI64ArrayAttr(op.static_offsets()),		op.getLoc(), destMemRef, extractFromI64ArrayAttr(op.static_offsets()),
extractFromI64ArrayAttr(op.static_sizes()),		extractFromI64ArrayAttr(op.static_sizes()),
extractFromI64ArrayAttr(op.static_strides()), adaptor.offsets(),		extractFromI64ArrayAttr(op.static_strides()), adaptor.offsets(),
adaptor.sizes(), adaptor.strides());		adaptor.sizes(), adaptor.strides());
// Copy the small memref.		// Copy the small memref.
rewriter.create<linalg::CopyOp>(op.getLoc(), sourceMemRef, subview);		rewriter.create<linalg::CopyOp>(op.getLoc(), sourceMemRef, subview);
rewriter.replaceOp(op, destMemRef);		rewriter.replaceOp(op, destMemRef);
return success();		return success();
}		}
};		};

/// TensorConstantOp conversion inserts a linearized 1-D vector constant that is
/// stored in memory. A linalg.reshape is introduced to convert to the desired
/// n-D buffer form.
class TensorConstantOpConverter : public OpConversionPattern<ConstantOp> {
public:
using OpConversionPattern::OpConversionPattern;

LogicalResult
matchAndRewrite(ConstantOp op, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const final {

RankedTensorType rankedTensorType =
op.getType().dyn_cast<RankedTensorType>();
if (!rankedTensorType)
return failure();
if (llvm::any_of(rankedTensorType.getShape(), [](int64_t s) {
return s == 0 \|\| ShapedType::isDynamic(s);
}))
return failure();

int64_t nElements = 1;
for (int64_t s : rankedTensorType.getShape())
nElements *= s;
Type elementType = rankedTensorType.getElementType();
MemRefType memrefType =
getTypeConverter()->convertType(op.getType()).cast<MemRefType>();
VectorType flatVectorType = VectorType::get({nElements}, elementType);
MemRefType memrefOfFlatVectorType = MemRefType::get({}, flatVectorType);
MemRefType flatMemrefType = MemRefType::get({nElements}, elementType);

Location loc = op.getLoc();
auto attr = op.getValue().cast<DenseElementsAttr>();
Value alloc =
rewriter.create<AllocOp>(loc, memrefOfFlatVectorType, ValueRange{});
Value cstVec = rewriter.create<ConstantOp>(loc, flatVectorType,
attr.reshape(flatVectorType));
rewriter.create<StoreOp>(loc, cstVec, alloc);

Value memref =
rewriter.create<vector::TypeCastOp>(loc, flatMemrefType, alloc);
if (rankedTensorType.getRank() > 1) {
// Introduce a linalg.reshape to flatten the memref.
AffineMap collapseAllDims = AffineMap::getMultiDimIdentityMap(
/numDims=/rankedTensorType.getRank(), op.getContext());
memref = rewriter.create<linalg::ReshapeOp>(
loc, memrefType, memref,
rewriter.getAffineMapArrayAttr(collapseAllDims));
}
rewriter.replaceOp(op, memref);

return success();
}
};
} // namespace		} // namespace

		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I reverted this line locally after patching this revision and here is what I see: ./bin/mlir-opt -linalg-bufferize -std-bufferize -tensor-constant-bufferize -func-bufferize -convert-linalg-to-loops -print-ir-after-all ../mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert.mlir // * IR Dump After TensorConstantBufferize * module { global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01> global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01> func @main() { %0 = get_global_memref @__constant_2xf32 : memref<2xf32> %1 = tensor_load %0 : memref<2xf32> %2 = get_global_memref @__constant_1xf32 : memref<1xf32> %3 = tensor_load %2 : memref<1xf32> %4 = tensor_to_memref %3 : memref<1xf32> %5 = tensor_to_memref %1 : memref<2xf32> %6 = subview %5[0] [1] [1] : memref<2xf32> to memref<1xf32> linalg.copy(%4, %6) : memref<1xf32>, memref<1xf32> %7 = tensor_load %5 : memref<2xf32> %8 = tensor_to_memref %7 : memref<2xf32> %9 = memref_cast %8 : memref<2xf32> to memref<xf32> %10 = tensor_load %9 : memref<xf32> call @print_memref_f32(%10) : (tensor<xf32>) -> () return } func @print_memref_f32(tensor<xf32>) } // * IR Dump After FuncBufferize * module { global_memref "private" constant @__constant_1xf32 : memref<1xf32> = dense<2.000000e+01> global_memref "private" constant @__constant_2xf32 : memref<2xf32> = dense<1.000000e+01> func @main() { %0 = get_global_memref @__constant_2xf32 : memref<2xf32> %1 = get_global_memref @__constant_1xf32 : memref<1xf32> %2 = subview %0[0] [1] [1] : memref<2xf32> to memref<1xf32> linalg.copy(%1, %2) : memref<1xf32>, memref<1xf32> %3 = memref_cast %0 : memref<2xf32> to memref<xf32> call @print_memref_f32(%3) : (memref<xf32>) -> () return } func @print_memref_f32(memref<xf32>) } The problem seems to be again coming from the `tensor_load` + `tensor_to_memref` canonicalization. The tensor_to_memref semantics is: Create a memref from a tensor. This is equivalent to allocating a new memref of the appropriate (possibly dynamic) shape, and then copying the elements (as if by a tensor_store op) into the newly allocated memref. Under these semantics, I would argue the IR is correct after `TensorConstantBufferize` and incorrect after `FuncBufferize`. Relaxing the semantics of `tensor_to_memref` to not guarantee an alloc + copy would make it reasonable to require the bufferization to insert alloc + copy when it does not know better. However, I am not sure how to relax `tensor_to_memref` or even phrase it: Create a memref from a tensor %t. The resulting memref aliases the memref operand of all tensor_load operations that flows into %t .. ? nicolasvasilache:* I reverted this line locally after patching this revision and here is what I see: ``` .
		silvasAuthorUnsubmitted Not Done Reply Inline Actions Agreed that we aren't obeying the "copy" aspect of tensor_to_memref. But that's not the problem here. The current way that we are lowering tensor_to_memref is simply emulating what would happen if you were running dialect conversion in one big pass (in that situation, tensors would be replaced directly by memrefs everywhere, and no tensor_to_memref/tensor_load pairs would be inserted to keep the IR legal). So the problem is independent of tensor_to_memref. It's a fundamental aspect of simply converting a tensor Value to a memref Value without doing anything else -- the correct semantics is that every use of a tensor Value that has been converted to memref needs to see a separate copy, if that use might mutate its operand in place. So we either need to update the dialect conversion framework to respect that, or we need to do a separate copy insertion phase that establishes the invariants. silvas: Agreed that we aren't obeying the "copy" aspect of tensor_to_memref. But that's not the problem…
namespace {		namespace {
/// Converts Linalg operations that work on tensor-type operands or results to		/// Converts Linalg operations that work on tensor-type operands or results to
/// work on buffers.		/// work on buffers.
struct LinalgBufferizePass : public LinalgBufferizeBase<LinalgBufferizePass> {		struct LinalgBufferizePass : public LinalgBufferizeBase<LinalgBufferizePass> {
void runOnOperation() override {		void runOnOperation() override {
MLIRContext &context = getContext();		MLIRContext &context = getContext();
ConversionTarget target(context);		ConversionTarget target(context);
BufferizeTypeConverter typeConverter;		BufferizeTypeConverter typeConverter;

// Mark all Standard operations legal.		// Mark all Standard operations legal.
target.addLegalDialect<StandardOpsDialect, vector::VectorDialect>();		target.addLegalDialect<StandardOpsDialect>();
target.addIllegalOp<SubTensorOp, SubTensorInsertOp>();		target.addIllegalOp<SubTensorOp, SubTensorInsertOp>();

// Mark all Linalg operations illegal as long as they work on tensors.		// Mark all Linalg operations illegal as long as they work on tensors.
auto isLegalOperation = [&](Operation *op) {		auto isLegalOperation = [&](Operation *op) {
return typeConverter.isLegal(op);		return typeConverter.isLegal(op);
};		};
target.addDynamicallyLegalDialect<linalg::LinalgDialect>(isLegalOperation);		target.addDynamicallyLegalDialect<linalg::LinalgDialect>(isLegalOperation);
target.addDynamicallyLegalOp<ConstantOp>(isLegalOperation);		target.addDynamicallyLegalOp<ConstantOp>(isLegalOperation);
Show All 14 Lines
void mlir::linalg::populateLinalgBufferizePatterns(		void mlir::linalg::populateLinalgBufferizePatterns(
MLIRContext *context, BufferizeTypeConverter &typeConverter,		MLIRContext *context, BufferizeTypeConverter &typeConverter,
OwningRewritePatternList &patterns) {		OwningRewritePatternList &patterns) {
patterns.insert<BufferizeAnyLinalgOp>(typeConverter);		patterns.insert<BufferizeAnyLinalgOp>(typeConverter);
// TODO: Drop this once tensor constants work in standard.		// TODO: Drop this once tensor constants work in standard.
patterns.insert<		patterns.insert<
// clang-format off		// clang-format off
SubTensorOpConverter,		SubTensorOpConverter,
SubTensorInsertOpConverter,		SubTensorInsertOpConverter
TensorConstantOpConverter
// clang-format on		// clang-format on
>(typeConverter, context);		>(typeConverter, context);
}		}

mlir/lib/Dialect/StandardOps/Transforms/CMakeLists.txt

	add_mlir_dialect_library(MLIRStandardOpsTransforms			add_mlir_dialect_library(MLIRStandardOpsTransforms
	Bufferize.cpp			Bufferize.cpp
	ExpandOps.cpp			ExpandOps.cpp
	ExpandTanh.cpp			ExpandTanh.cpp
	FuncBufferize.cpp			FuncBufferize.cpp
	FuncConversions.cpp			FuncConversions.cpp
				TensorConstantBufferize.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/StandardOps/Transforms			${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/StandardOps/Transforms

	DEPENDS			DEPENDS
	MLIRStandardTransformsIncGen			MLIRStandardTransformsIncGen

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRIR			MLIRIR
	MLIRPass			MLIRPass
	MLIRSCF			MLIRSCF
	MLIRStandard			MLIRStandard
	MLIRTransforms			MLIRTransforms
	)			)

mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp

This file was added.

				//===- Bufferize.cpp - Bufferization for std ops --------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements bufferization of tensor-valued std.constant ops.
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/Dialect/StandardOps/Transforms/Passes.h"
				#include "mlir/IR/BlockAndValueMapping.h"
				#include "mlir/Transforms/Bufferize.h"
				#include "mlir/Transforms/DialectConversion.h"

				using namespace mlir;

				namespace {
				// This class creates global ops for all tensor-valued constants in the program.
				// It creates them with pretty names and makes sure that duplicate globals
				// aren't created.
				class GlobalCreator {
				public:
				explicit GlobalCreator(ModuleOp module);
				GlobalMemrefOp getGlobalFor(Attribute attr) {
				assert(globals.find(attr) != globals.end() && "unknown constant attr");
				return globals[attr];
				}

				private:
				DenseMap<Attribute, GlobalMemrefOp> globals;
				};

				GlobalCreator::GlobalCreator(ModuleOp module) {
				BufferizeTypeConverter typeConverter;
				// Create a builder without an insertion point. We will insert using the
				// symbol table to guarantee unique names.
				OpBuilder globalBuilder(module.getContext());
				SymbolTable symbolTable(module);
				module.walk([&](ConstantOp op) {
				// We only want tensor constants for now.
				auto type = op.getType().dyn_cast<RankedTensorType>();
				if (!type)
				return;
				// If we already have a global for this constant value, no need to do
				// anything else.
				auto it = globals.find(op.getValue());
				if (it != globals.end())
				return;

				// Create a pretty name.
				SmallString<64> buf;
				llvm::raw_svector_ostream os(buf);
				interleave(type.getShape(), os, "x");
				os << "x" << type.getElementType();

				auto global = globalBuilder.create<GlobalMemrefOp>(
				op.getLoc(), (Twine("__constant_") + os.str()).str(),
				/sym_visibility=/globalBuilder.getStringAttr("private"),
				/type=/
				TypeAttr::get(typeConverter.convertType(type)), /initial_value=/
				op.getValue().cast<ElementsAttr>(), /constant=/true);
				symbolTable.insert(global);
				// The symbol table inserts at the end of the module, but globals are a bit
				// nicer if they are at the beginning.
				global.getOperation()->moveBefore(&module.front());
				globals[op.getValue()] = global;
				});
				}
				} // namespace

				namespace {
				class BufferizeTensorConstantOp : public OpConversionPattern<ConstantOp> {
				public:
				BufferizeTensorConstantOp(GlobalCreator &globals,
				TypeConverter &typeConverter, MLIRContext *context)
				: OpConversionPattern<ConstantOp>(typeConverter, context, /benefit=/1),
				globals(globals) {}

				LogicalResult
				matchAndRewrite(ConstantOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				auto type = op.getType().dyn_cast<RankedTensorType>();
				if (!type)
				return failure();

				auto globalMemref = globals.getGlobalFor(op.value());
				rewriter.replaceOpWithNewOp<GetGlobalMemrefOp>(op, globalMemref.type(),
				globalMemref.getName());
				return success();
				}
				GlobalCreator &globals;
				};
				} // namespace

				namespace {
				struct TensorConstantBufferizePass
				: public TensorConstantBufferizeBase<TensorConstantBufferizePass> {
				void runOnOperation() override {
				auto module = getOperation();
				GlobalCreator globals(module);

				auto *context = &getContext();
				BufferizeTypeConverter typeConverter;
				OwningRewritePatternList patterns;
				ConversionTarget target(*context);

				target.addLegalDialect<StandardOpsDialect>();
				patterns.insert<BufferizeTensorConstantOp>(globals, typeConverter, context);
				target.addDynamicallyLegalOp<ConstantOp>(
				[&](ConstantOp op) { return typeConverter.isLegal(op.getType()); });
				if (failed(applyPartialConversion(module, target, std::move(patterns))))
				signalPassFailure();
				}
				};
				} // namespace

				std::unique_ptr<Pass> mlir::createTensorConstantBufferizePass() {
				return std::make_unique<TensorConstantBufferizePass>();
				}

mlir/test/Dialect/Linalg/bufferize.mlir

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	%0, %1 = linalg.generic {
%tmp1 = exp %gen_arg1 : f32		%tmp1 = exp %gen_arg1 : f32
linalg.yield %tmp1, %tmp1 : f32, f32		linalg.yield %tmp1, %tmp1 : f32, f32
} -> tensor<?x?xf32>, tensor<?x?xf32>		} -> tensor<?x?xf32>, tensor<?x?xf32>
return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>		return %0, %1 : tensor<?x?xf32>, tensor<?x?xf32>
}		}

// -----		// -----

// Check lowering of tensor-valued std.constant's
// TODO: Move this to std-bufferize.

// CHECK-LABEL: func @constant() -> tensor<2x3xf32> {
// CHECK: %[[VECTOR_MEMREF:.*]] = alloc() : memref<vector<6xf32>>
// CHECK: %[[VECTOR_CONST:.*]] = constant dense<[0.000000e+00, 1.000000e+00, 2.000000e+00, 3.000000e+00, 4.000000e+00, 5.000000e+00]> : vector<6xf32>
// CHECK: store %[[VECTOR_CONST]], %[[VECTOR_MEMREF]][] : memref<vector<6xf32>>
// CHECK: %[[MEMREF:.*]] = vector.type_cast %[[VECTOR_MEMREF]] : memref<vector<6xf32>> to memref<6xf32>
// CHECK: %[[FINAL_SHAPE:.*]] = linalg.reshape %[[MEMREF]] [#map] : memref<6xf32> into memref<2x3xf32>
// CHECK: %[[RESULT:.*]] = tensor_load %[[FINAL_SHAPE]] : memref<2x3xf32>
// CHECK: return %[[RESULT]] : tensor<2x3xf32>
func @constant() -> tensor<2x3xf32> {
%0 = constant dense<[[0.0, 1.0, 2.0], [3.0, 4.0, 5.0]]> : tensor<2x3xf32>
return %0: tensor<2x3xf32>
}

// -----

#accesses = [		#accesses = [
affine_map<(i, j, k) -> (j, i, k)>,		affine_map<(i, j, k) -> (j, i, k)>,
affine_map<(i, j, k) -> (i, j)>		affine_map<(i, j, k) -> (i, j)>
]		]

#trait = {		#trait = {
indexing_maps = #accesses,		indexing_maps = #accesses,
iterator_types = ["parallel", "parallel", "reduction"]		iterator_types = ["parallel", "parallel", "reduction"]
▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir

This file was added.

				// RUN: mlir-opt %s -tensor-constant-bufferize -split-input-file

				// CHECK-LABEL: module {
				// We check the debug name too since we put some effort into making that readable.
				// The name isn't load-bearing though.
				// CHECK: global_memref "private" constant @__constant_3x4xf32 : memref<3x4xf32> = dense<7.000000e+00>
				// CHECK: @basic
				func @basic() -> tensor<3x4xf32> {
				// CHECK: %[[MEMREF:.*]] = get_global_memref @__constant_3x4xf32 : memref<3x4xf32>
				// CHECK: %[[TENSOR:.*]] = tensor_load %[[MEMREF]]
				%0 = constant dense<7.0> : tensor<3x4xf32>
				// CHECK: return %[[TENSOR]]
				return %0 : tensor<3x4xf32>
				}

				// CHECK: }

				// -----

				// CHECK-LABEL: module {

				// Only one global is created.
				// CHECK: global_memref
				// CHECK-NOT: global_memref
				func @duplicate_constants() -> (tensor<3x4xf32>, tensor<3x4xf32>) {
				%0 = constant dense<7.0> : tensor<3x4xf32>
				%1 = constant dense<7.0> : tensor<3x4xf32>
				return %0, %1 : tensor<3x4xf32>, tensor<3x4xf32>
				}

				// CHECK: }

				// -----

				// CHECK-LABEL: module {

				// Two globals are created.
				// CHECK: global_memref
				// CHECK: global_memref
				// CHECK-NOT: global_memref
				func @multiple_constants() -> (tensor<3x4xf32>, tensor<3x4xf32>) {
				%0 = constant dense<7.0> : tensor<3x4xf32>
				%1 = constant dense<8.0> : tensor<3x4xf32>
				return %0, %1 : tensor<3x4xf32>, tensor<3x4xf32>
				}

				// CHECK: }

				// -----

				// CHECK-LABEL: module {
				// We don't convert non-tensor globals.
				// CHECK-NOT: global_memref
				func @non_tensor() {
				%0 = constant 7 : i32
				return
				}

				// CHECK: }

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Bufferize tensor constant opsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 304977

mlir/include/mlir/Dialect/Linalg/Passes.td

mlir/include/mlir/Dialect/StandardOps/Transforms/Passes.h

mlir/include/mlir/Dialect/StandardOps/Transforms/Passes.td

mlir/integration_test/Dialect/Linalg/CPU/test-elementwise.mlir

mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert-multiple-uses.mlir

mlir/integration_test/Dialect/Linalg/CPU/test-subtensor-insert.mlir

mlir/integration_test/Dialect/Linalg/CPU/test-tensor-e2e.mlir

mlir/integration_test/Dialect/Linalg/CPU/test-tensor-matmul.mlir

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp

mlir/lib/Dialect/StandardOps/Transforms/CMakeLists.txt

mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp

mlir/test/Dialect/Linalg/bufferize.mlir

mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir

[mlir] Bufferize tensor constant ops
ClosedPublic