This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add canonicalization to fold tensor_load of a constant global memref.
AbandonedPublic

Authored by jurahul on Nov 3 2020, 12:31 PM.

Details

Summary
  • Fold tensor_load(get_global_memref(constant global_memref)) -> its initial value
  • Also dropped 'NoSideEffects' from global_memref to make sure unused public global_memref's do not get deleted.

Diff Detail

Unit TestsFailed

Event Timeline

jurahul created this revision.Nov 3 2020, 12:31 PM
jurahul requested review of this revision.Nov 3 2020, 12:31 PM
rriddle accepted this revision.Nov 3 2020, 12:35 PM
rriddle added inline comments.
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031

If this wasn't a symbol lookup, I'd say that this could just be a fold.

mlir/test/Transforms/test-symbol-dce.mlir
28

Please remove these, we don't need to recheck every new symbol operation.

This revision is now accepted and ready to land.Nov 3 2020, 12:35 PM

I don't think this is a good idea, because tensor_load(std.get_global_memref) is precisely the pattern we expect to create from bufferizing a std.constant op on tensors. As such, this canonicalization would undo that lowering, making canonicalization not play well with bufferization.

Do you have a use case where you have std.global_memrefs but basic tensor-level constant folding has not already happened? Typically, I expect that tensor_load(std.get_global_memref) will mainly exist as an intermediate state of bufferization -- presumably, if the user is bufferizing their program, they are already happy with the canonicalizations that have happened at the tensor level.

Thoughts?

Dropping NoSideEffects change LGTM.

I don't think this is a good idea, because tensor_load(std.get_global_memref) is precisely the pattern we expect to create from bufferizing a std.constant op on tensors. As such, this canonicalization would undo that lowering, making canonicalization not play well with bufferization.

Do you have a use case where you have std.global_memrefs but basic tensor-level constant folding has not already happened? Typically, I expect that tensor_load(std.get_global_memref) will mainly exist as an intermediate state of bufferization -- presumably, if the user is bufferizing their program, they are already happy with the canonicalizations that have happened at the tensor level.

Thoughts?

How this interacts with bufferization is a good point. I am a bit apprehensive however on what the story is w.r.t optimizing the intermixed tensor/memref world. Can you elaborate your thoughts on that? If we are to make a very concerted decision not to perform these optimizations, it seems only right that we document somewhere our opinions on the topic such that in the future we have a reference for what our policy is. This type of optimization seems "obvious" to many newcomers(including myself to some extent as I haven't followed the bufferization work that closely).

Yes, to me this is one of those obvious optimizations that can/should happen. But I see Sean's point about it undoing bufferization. May be this can be a separate optimization pass rather than a canonicalization? THB, this just seemed something natural to implement and I don't have an immediate use case in mind.

mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031

I thought folders are now allowed to create new IR, and canonicalizers are. This one is creating new IR (ConstantOp).

rriddle added inline comments.Nov 3 2020, 1:26 PM
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031

A fold method may return either:
*) An existing SSA Value
*) An Attribute representing a constant value

This would use the second, i.e. fold also supports constant folding. You wouldn't create the new IR yourself, you would rely on the folder to do that for you.

I don't quite get the point about the canonicalization hindering bufferization, can you craft a small example Sean?

silvas added a comment.Nov 3 2020, 1:37 PM

How this interacts with bufferization is a good point. I am a bit apprehensive however on what the story is w.r.t optimizing the intermixed tensor/memref world. Can you elaborate your thoughts on that? If we are to make a very concerted decision not to perform these optimizations, it seems only right that we document somewhere our opinions on the topic such that in the future we have a reference for what our policy is. This type of optimization seems "obvious" to many newcomers(including myself to some extent as I haven't followed the bufferization work that closely).

Type conversion in general involves rewrites like

source_op(x) -> source_materialization(target_op(target_materialization(x))

(where the infra usually inserts the materializations). If we have patterns like

source_materialization(target_op(target_materialization(x)) -> source_op(x)

then that pattern is liable to undo a type conversion. In the case of this patch, source_op/target_op have zero operands, so we don't see target_materialization i.e. tensor_to_memref in the pattern. But it's clearer if we consider other ops, like canonicalizing tensor_load(memref_cast(tensor_to_memref(x)) -> tensor_cast(x). It's just too open-ended and seems unproductive.

I would say that the general rule is "avoid write patterns involving materialization ops". One exception that can be useful is something like source_materalization(target_materialization(x)) -> x which can help reduce intermediate IR volume.

Also, if looked at from the perspective of a type conversion that can plausibly go both directions, such as !perl.string -> !python.string and !python.string -> !perl.string, then it's clear that neither direction is "canonical". Arguably that's the case for one-way conversions too: neither direction is "canonical", one is just a higher level of abstraction that is lowered to a different level of abstraction.

silvas added a comment.Nov 3 2020, 1:42 PM

I don't quite get the point about the canonicalization hindering bufferization, can you craft a small example Sean?

We would expand constant() -> tensor_load(get_global_memref()) as part of bufferization, but if we happen to canonicalize before finalizing the bufferization, then we would fold it back to constant() and have to re-run bufferization for constant.

Generally, I haven't yet found a reason to need to run canonicalizations during the partially-bufferized state, but this just seems like a gotcha. Also, if you look at my previous post in reply to River, there's a slippery slope here. Do we convert memref_cast back to tensor_cast? Do we convert std.load back to extract_element? ... it seems like very bufferization pattern admits an inverse de-bufferization pattern that, when looked at in isolation, seems like an "optimization", but is just undoing a deliberate lowering.

jurahul added inline comments.Nov 3 2020, 2:30 PM
mlir/lib/Dialect/StandardOps/IR/Ops.cpp
4031

Thanks. Will update (once the discussion settles).

I have started https://reviews.llvm.org/D90768 for the NoSideEffects and one more change. For this one, if there are concerns and the fact that we don't have an immediate use case, we can leave this out for now and revisit when we actually need this. Does that sound ok?

silvas added a comment.Nov 4 2020, 9:34 AM

Thanks for splitting it. I LGTM'ed the other patch.

jurahul abandoned this revision.Nov 4 2020, 2:07 PM

Abandoning for now. This should be simple to implement when the need arises (as a folder or a separate pass).