This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/StandardOps/IR/
-
mlir/
-
Dialect/
-
StandardOps/
-
IR/
-
Ops.td
-
lib/Dialect/StandardOps/IR/
-
Dialect/
-
StandardOps/
-
IR/
4/4
Ops.cpp
-
test/
-
Dialect/Standard/
-
Standard/
-
canonicalize.mlir
-
Transforms/
1
test-symbol-dce.mlir

Differential D90709

[MLIR] Add canonicalization to fold tensor_load of a constant global memref.
AbandonedPublic

Authored by jurahul on Nov 3 2020, 12:31 PM.

Download Raw Diff

Details

Reviewers

mehdi_amini
rriddle
silvas
herhut

Summary

Fold tensor_load(get_global_memref(constant global_memref)) -> its initial value
Also dropped 'NoSideEffects' from global_memref to make sure unused public global_memref's do not get deleted.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	530 ms	windows > LLVM.CodeGen/AMDGPU::ds_read2.ll

Event Timeline

jurahul created this revision.Nov 3 2020, 12:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2020, 12:31 PM

Herald added subscribers: rdzhabarov, tatianashp, msifontes and 10 others. · View Herald Transcript

jurahul requested review of this revision.Nov 3 2020, 12:31 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptNov 3 2020, 12:31 PM

rriddle accepted this revision.Nov 3 2020, 12:35 PM

rriddle added inline comments.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031	If this wasn't a symbol lookup, I'd say that this could just be a fold.
mlir/test/Transforms/test-symbol-dce.mlir
28	Please remove these, we don't need to recheck every new symbol operation.

This revision is now accepted and ready to land.Nov 3 2020, 12:35 PM

I don't think this is a good idea, because tensor_load(std.get_global_memref) is precisely the pattern we expect to create from bufferizing a std.constant op on tensors. As such, this canonicalization would undo that lowering, making canonicalization not play well with bufferization.

Do you have a use case where you have std.global_memrefs but basic tensor-level constant folding has not already happened? Typically, I expect that tensor_load(std.get_global_memref) will mainly exist as an intermediate state of bufferization -- presumably, if the user is bufferizing their program, they are already happy with the canonicalizations that have happened at the tensor level.

Thoughts?

Dropping NoSideEffects change LGTM.

In D90709#2372043, @silvas wrote:

I don't think this is a good idea, because tensor_load(std.get_global_memref) is precisely the pattern we expect to create from bufferizing a std.constant op on tensors. As such, this canonicalization would undo that lowering, making canonicalization not play well with bufferization.

Do you have a use case where you have std.global_memrefs but basic tensor-level constant folding has not already happened? Typically, I expect that tensor_load(std.get_global_memref) will mainly exist as an intermediate state of bufferization -- presumably, if the user is bufferizing their program, they are already happy with the canonicalizations that have happened at the tensor level.

Thoughts?

How this interacts with bufferization is a good point. I am a bit apprehensive however on what the story is w.r.t optimizing the intermixed tensor/memref world. Can you elaborate your thoughts on that? If we are to make a very concerted decision not to perform these optimizations, it seems only right that we document somewhere our opinions on the topic such that in the future we have a reference for what our policy is. This type of optimization seems "obvious" to many newcomers(including myself to some extent as I haven't followed the bufferization work that closely).

Harbormaster completed remote builds in B77453: Diff 302663.Nov 3 2020, 1:17 PM

Yes, to me this is one of those obvious optimizations that can/should happen. But I see Sean's point about it undoing bufferization. May be this can be a separate optimization pass rather than a canonicalization? THB, this just seemed something natural to implement and I don't have an immediate use case in mind.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031	I thought folders are now allowed to create new IR, and canonicalizers are. This one is creating new IR (ConstantOp).

rriddle added inline comments.Nov 3 2020, 1:26 PM

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031	A `fold` method may return either: ) An existing SSA `Value` ) An `Attribute` representing a constant value This would use the second, i.e. `fold` also supports constant folding. You wouldn't create the new IR yourself, you would rely on the folder to do that for you.

I don't quite get the point about the canonicalization hindering bufferization, can you craft a small example Sean?

In D90709#2372099, @rriddle wrote:

How this interacts with bufferization is a good point. I am a bit apprehensive however on what the story is w.r.t optimizing the intermixed tensor/memref world. Can you elaborate your thoughts on that? If we are to make a very concerted decision not to perform these optimizations, it seems only right that we document somewhere our opinions on the topic such that in the future we have a reference for what our policy is. This type of optimization seems "obvious" to many newcomers(including myself to some extent as I haven't followed the bufferization work that closely).

Type conversion in general involves rewrites like

source_op(x) -> source_materialization(target_op(target_materialization(x))

(where the infra usually inserts the materializations). If we have patterns like

source_materialization(target_op(target_materialization(x)) -> source_op(x)

then that pattern is liable to undo a type conversion. In the case of this patch, source_op/target_op have zero operands, so we don't see target_materialization i.e. tensor_to_memref in the pattern. But it's clearer if we consider other ops, like canonicalizing tensor_load(memref_cast(tensor_to_memref(x)) -> tensor_cast(x). It's just too open-ended and seems unproductive.

I would say that the general rule is "avoid write patterns involving materialization ops". One exception that can be useful is something like source_materalization(target_materialization(x)) -> x which can help reduce intermediate IR volume.

Also, if looked at from the perspective of a type conversion that can plausibly go both directions, such as !perl.string -> !python.string and !python.string -> !perl.string, then it's clear that neither direction is "canonical". Arguably that's the case for one-way conversions too: neither direction is "canonical", one is just a higher level of abstraction that is lowered to a different level of abstraction.

In D90709#2372195, @mehdi_amini wrote:

I don't quite get the point about the canonicalization hindering bufferization, can you craft a small example Sean?

We would expand constant() -> tensor_load(get_global_memref()) as part of bufferization, but if we happen to canonicalize before finalizing the bufferization, then we would fold it back to constant() and have to re-run bufferization for constant.

Generally, I haven't yet found a reason to need to run canonicalizations during the partially-bufferized state, but this just seems like a gotcha. Also, if you look at my previous post in reply to River, there's a slippery slope here. Do we convert memref_cast back to tensor_cast? Do we convert std.load back to extract_element? ... it seems like very bufferization pattern admits an inverse de-bufferization pattern that, when looked at in isolation, seems like an "optimization", but is just undoing a deliberate lowering.

jurahul added inline comments.Nov 3 2020, 2:30 PM

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031	Thanks. Will update (once the discussion settles).

I have started https://reviews.llvm.org/D90768 for the NoSideEffects and one more change. For this one, if there are concerns and the fact that we don't have an immediate use case, we can leave this out for now and revisit when we actually need this. Does that sound ok?

Thanks for splitting it. I LGTM'ed the other patch.

Abandoning for now. This should be simple to implement when the need arises (as a folder or a separate pass).

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

StandardOps/

IR/

Ops.td

7 lines

lib/

Dialect/

StandardOps/

IR/

Ops.cpp

31 lines

test/

Dialect/

Standard/

canonicalize.mlir

13 lines

Transforms/

test-symbol-dce.mlir

6 lines

Diff 302663

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

Show First 20 Lines • Show All 2,003 Lines • ▼ Show 20 Lines	def FPTruncOp : CastOp<"fptrunc">, Arguments<(ins AnyType:$in)> {

let hasFolder = 0;		let hasFolder = 0;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GlobalMemrefOp		// GlobalMemrefOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def GlobalMemrefOp : Std_Op<"global_memref", [NoSideEffect, Symbol]> {		def GlobalMemrefOp : Std_Op<"global_memref", [Symbol]> {
let summary = "declare or define a global memref variable";		let summary = "declare or define a global memref variable";
let description = [{		let description = [{
The `global_memref` operation declares or defines a named global variable.		The `global_memref` operation declares or defines a named global variable.
The backing memory for the variable is allocated statically and is described		The backing memory for the variable is allocated statically and is described
by the type of the variable (which should be a statically shaped memref		by the type of the variable (which should be a statically shaped memref
type). The operation is a declaration if no `inital_value` is specified,		type). The operation is a declaration if no `inital_value` is specified,
else it is a definition. The `initial_value` can either be a unit attribute		else it is a definition. The `initial_value` can either be a unit attribute
to represent a definition of an uninitialized global variable, or an		to represent a definition of an uninitialized global variable, or an
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	let description = [{
%x = get_global_memref @foo : memref<2xf32>		%x = get_global_memref @foo : memref<2xf32>
```		```
}];		}];

let arguments = (ins FlatSymbolRefAttr:$name);		let arguments = (ins FlatSymbolRefAttr:$name);
let results = (outs AnyStaticShapeMemRef:$result);		let results = (outs AnyStaticShapeMemRef:$result);
let assemblyFormat = "$name `:` type($result) attr-dict";		let assemblyFormat = "$name `:` type($result) attr-dict";

		let extraClassDeclaration = [{
		GlobalMemrefOp getGlobalVariable();
		}];

// `GetGlobalMemrefOp` is fully verified by its traits.		// `GetGlobalMemrefOp` is fully verified by its traits.
let verifier = ?;		let verifier = ?;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ImOp		// ImOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 1,509 Lines • ▼ Show 20 Lines	TensorType getType() {
return resultType.cast<TensorType>();		return resultType.cast<TensorType>();
return {};		return {};
}		}
}];		}];

let assemblyFormat = "$memref attr-dict `:` type($memref)";		let assemblyFormat = "$memref attr-dict `:` type($memref)";

let hasFolder = 1;		let hasFolder = 1;
		let hasCanonicalizer = 1;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TensorStoreOp		// TensorStoreOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def TensorStoreOp : Std_Op<"tensor_store",		def TensorStoreOp : Std_Op<"tensor_store",
[SameOperandsShape, SameOperandsElementType,		[SameOperandsShape, SameOperandsElementType,
▲ Show 20 Lines • Show All 408 Lines • Show Last 20 Lines

mlir/lib/Dialect/StandardOps/IR/Ops.cpp

Show First 20 Lines • Show All 2,228 Lines • ▼ Show 20 Lines	static LogicalResult verify(GlobalMemrefOp op) {
// TODO: verify visibility for declarations.		// TODO: verify visibility for declarations.
return success();		return success();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// GetGlobalMemrefOp		// GetGlobalMemrefOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		GlobalMemrefOp GetGlobalMemrefOp::getGlobalVariable() {
		return SymbolTable::lookupNearestSymbolFrom<GlobalMemrefOp>(*this, name());
		}

LogicalResult		LogicalResult
GetGlobalMemrefOp::verifySymbolUses(SymbolTableCollection &symbolTable) {		GetGlobalMemrefOp::verifySymbolUses(SymbolTableCollection &symbolTable) {
// Verify that the result type is same as the type of the referenced		// Verify that the result type is same as the type of the referenced
// global_memref op.		// global_memref op.
auto global =		auto global =
symbolTable.lookupNearestSymbolFrom<GlobalMemrefOp>(*this, nameAttr());		symbolTable.lookupNearestSymbolFrom<GlobalMemrefOp>(*this, nameAttr());
if (!global)		if (!global)
return emitOpError("'")		return emitOpError("'")
▲ Show 20 Lines • Show All 1,763 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

OpFoldResult TensorLoadOp::fold(ArrayRef<Attribute>) {		OpFoldResult TensorLoadOp::fold(ArrayRef<Attribute>) {
if (auto tensorToMemref = memref().getDefiningOp<TensorToMemrefOp>())		if (auto tensorToMemref = memref().getDefiningOp<TensorToMemrefOp>())
return tensorToMemref.tensor();		return tensorToMemref.tensor();
return {};		return {};
}		}

		namespace {

		struct TensorLoadConstantGlobalFolder : public OpRewritePattern<TensorLoadOp> {
		using OpRewritePattern<TensorLoadOp>::OpRewritePattern;

		LogicalResult matchAndRewrite(TensorLoadOp tensorLoadOp,
		PatternRewriter &rewriter) const override {
		auto getGlobalMemref =
		tensorLoadOp.memref().getDefiningOp<GetGlobalMemrefOp>();
		if (!getGlobalMemref)
		return failure();
		GlobalMemrefOp global = getGlobalMemref.getGlobalVariable();
		rriddleUnsubmitted Done Reply Inline Actions If this wasn't a symbol lookup, I'd say that this could just be a fold. rriddle: If this wasn't a symbol lookup, I'd say that this could just be a fold.
		jurahulAuthorUnsubmitted Done Reply Inline Actions I thought folders are now allowed to create new IR, and canonicalizers are. This one is creating new IR (ConstantOp). jurahul: I thought folders are now allowed to create new IR, and canonicalizers are. This one is…
		rriddleUnsubmitted Done Reply Inline Actions A `fold` method may return either: ) An existing SSA `Value` ) An `Attribute` representing a constant value This would use the second, i.e. `fold` also supports constant folding. You wouldn't create the new IR yourself, you would rely on the folder to do that for you. rriddle: A `fold` method may return either: ) An existing SSA `Value` ) An `Attribute` representing a…
		jurahulAuthorUnsubmitted Done Reply Inline Actions Thanks. Will update (once the discussion settles). jurahul: Thanks. Will update (once the discussion settles).
		if (!global \|\| !global.constant())
		return failure();
		rewriter.replaceOpWithNewOp<ConstantOp>(
		tensorLoadOp, tensorLoadOp.result().getType(), *global.initial_value());
		return success();
		}
		};

		} // end anonymous namespace

		void TensorLoadOp::getCanonicalizationPatterns(
		OwningRewritePatternList &results, MLIRContext *context) {
		results.insert<TensorLoadConstantGlobalFolder>(context);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TensorToMemrefOp		// TensorToMemrefOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

OpFoldResult TensorToMemrefOp::fold(ArrayRef<Attribute>) {		OpFoldResult TensorToMemrefOp::fold(ArrayRef<Attribute>) {
if (auto tensorLoad = tensor().getDefiningOp<TensorLoadOp>())		if (auto tensorLoad = tensor().getDefiningOp<TensorLoadOp>())
if (tensorLoad.memref().getType() == getType())		if (tensorLoad.memref().getType() == getType())
return tensorLoad.memref();		return tensorLoad.memref();
▲ Show 20 Lines • Show All 378 Lines • Show Last 20 Lines

mlir/test/Dialect/Standard/canonicalize.mlir

	Show All 25 Lines
	// CHECK: %[[TENSOR:.*]] = tensor_load %[[MEMREF_ADDRSPACE2]] : memref<?xf32, 2>			// CHECK: %[[TENSOR:.*]] = tensor_load %[[MEMREF_ADDRSPACE2]] : memref<?xf32, 2>
	// CHECK: %[[MEMREF_ADDRSPACE7:.*]] = tensor_to_memref %[[TENSOR]] : memref<?xf32, 7>			// CHECK: %[[MEMREF_ADDRSPACE7:.*]] = tensor_to_memref %[[TENSOR]] : memref<?xf32, 7>
	// CHECK: return %[[MEMREF_ADDRSPACE7]]			// CHECK: return %[[MEMREF_ADDRSPACE7]]
	func @no_fold_tensor_to_memref_of_tensor_load(%arg0: memref<?xf32, 2>) -> memref<?xf32, 7> {			func @no_fold_tensor_to_memref_of_tensor_load(%arg0: memref<?xf32, 2>) -> memref<?xf32, 7> {
	%0 = tensor_load %arg0 : memref<?xf32, 2>			%0 = tensor_load %arg0 : memref<?xf32, 2>
	%1 = tensor_to_memref %0 : memref<?xf32, 7>			%1 = tensor_to_memref %0 : memref<?xf32, 7>
	return %1 : memref<?xf32, 7>			return %1 : memref<?xf32, 7>
	}			}

				// Test case: Folding of tensor_load(get_global_memeref(constant global_memref)) -> initial value
				// CHECK: global_memref "public" constant @gv : memref<4xf32>
				global_memref "public" constant @gv : memref<4xf32> = dense<[0.0, 1.0, 2.0, 3.0]>

				// CHECK-LABEL: func @fold_tenor_load_of_constant_global_memref
				// CHECK: %[[CONST:.*]] = constant
				// CHECK: return %[[CONST]]
				func @fold_tenor_load_of_constant_global_memref() -> tensor<4xf32> {
				%0 = get_global_memref @gv : memref<4xf32>
				%1 = tensor_load %0 : memref<4xf32>
				return %1 : tensor<4xf32>
				}

mlir/test/Transforms/test-symbol-dce.mlir

Show All 18 Lines	module attributes {test.simple} {

// CHECK: func @public_function		// CHECK: func @public_function
func @public_function() {		func @public_function() {
"foo.return"() {uses = [@live_private_function, @live_nested_function]} : () -> ()		"foo.return"() {uses = [@live_private_function, @live_nested_function]} : () -> ()
}		}

// CHECK: func @public_function_explicit		// CHECK: func @public_function_explicit
func @public_function_explicit() attributes { sym_visibility = "public" }		func @public_function_explicit() attributes { sym_visibility = "public" }

		// CHECK: global_memref "public" @gv
		rriddleUnsubmitted Not Done Reply Inline Actions Please remove these, we don't need to recheck every new symbol operation. rriddle: Please remove these, we don't need to recheck every new symbol operation.
		global_memref "public" @gv : memref<3xf32> = uninitialized

		// CHECK-NOT: global_memref "private" @unused_gv
		global_memref "private" @unused_gv : memref<3xi16> = uninitialized
}		}

// -----		// -----

// Check that we don't DCE nested symbols if they are used.		// Check that we don't DCE nested symbols if they are used.
// CHECK-LABEL: module attributes {test.nested}		// CHECK-LABEL: module attributes {test.nested}
module attributes {test.nested} {		module attributes {test.nested} {
// CHECK: module @public_module		// CHECK: module @public_module
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add canonicalization to fold tensor_load of a constant global memref.AbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 302663

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

mlir/lib/Dialect/StandardOps/IR/Ops.cpp

mlir/test/Dialect/Standard/canonicalize.mlir

mlir/test/Transforms/test-symbol-dce.mlir

[MLIR] Add canonicalization to fold tensor_load of a constant global memref.
AbandonedPublic