This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/include/mlir/Dialect/StandardOps/IR/
-
include/
-
mlir/
-
Dialect/
-
StandardOps/
-
IR/
2/6
Ops.td

Differential D91967

[mlir][std] Mark tensor_to_memref as no side-effect
AbandonedPublic

Authored by herhut on Nov 23 2020, 7:37 AM.

Download Raw Diff

Details

Reviewers

silvas
mehdi_amini

Summary

While tensor_to_memref does allocate a new memref (and hence has a heap effect in theory), it practically is a glue operation between tensor and memref islands in the IR. As such, it should be subject to CSE and DCR. This helps avoid residual tensor_to_memref operations that keep the computation of tensors alive that actually are no longer used.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

herhut created this revision.Nov 23 2020, 7:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2020, 7:37 AM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 14 others. · View Herald Transcript

herhut requested review of this revision.Nov 23 2020, 7:37 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptNov 23 2020, 7:37 AM

herhut added a reviewer: silvas.Nov 23 2020, 7:38 AM

Harbormaster completed remote builds in B79796: Diff 307078.Nov 23 2020, 8:07 AM

NoSideEffect does not seem to fit to me: there is an effect for allocation. It should also write to its output.

This revision now requires changes to proceed.Nov 23 2020, 9:21 AM

bondhugula added a subscriber: bondhugula.Nov 23 2020, 10:17 AM

bondhugula added inline comments.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
3735–3739	This op is actually missing MemoryEffects. I thought something that is creating a new memref should have a `MemAlloc` MemoryEffect. It had it in a previous version and looks like it was removed - so there must be a reason for it. @silvas

bondhugula added inline comments.Nov 23 2020, 10:25 AM

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
3736	It would be incorrect to mark it side-effect free. This op would have a side effect on the heap. For eg., if this is marked side-effect free, CSE would eliminate one of these: %m1 = tensor_to_memref %t : memref<...> %m2 = tensor_to_memref %t : memref<...> On another note, it's out of line to mention things like "This is a transient op" in this op's description. An op is an op - transient or the length of its lifetime is a statement on its current usage, and the op description shouldn't be capturing these.

In D91967#2411703, @mehdi_amini wrote:

NoSideEffect does not seem to fit to me: there is an effect for allocation. It should also write to its output.

See my comment in reply to Uday. It need not allocate, and trying to give it those semantics breaks stuff. This op is defined by the fold tensor_to_memref(tensor_load(x)) -> x, which is what makes it a valid materialization according to the dialect conversion framework.

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
3735–3739	Yes, I removed it here: https://reviews.llvm.org/rG796880288a756d1866dad0210a818896eda844cc As described in that commit, the semantics of tensor_to_memref allocating don't match reality. In particular, the situations under which the dialect conversion framework inserts/removes them (sometimes implicitly) cannot be justified under the "allocating" semantics.

To go into more detail. Consider this IR:

%0 = "foo_tensor"() : () -> tensor<f32>
%1 = "bar_tensor"(%0) : tensor<f32> -> tensor<f32>

Now, suppose we run a single dialect conversion pass that converts foo_tensor and bar_tensor. We get:

%0 = "foo_memref"() : () -> memref<f32>
%1 = "bar_memref"(%0) : memref<f32> -> memref<f32>

Now, suppose we run a dialect conversion pass that is composable, that is, it inserts materializations so that we can convert only bar_memref. We get:

%0 = "foo_tensor"() : () -> tensor<f32>
%memref = tensor_to_memref %0 : memref<f32>
%1 = "bar_memref"(%memref) : memref<f32> -> memref<f32>

If tensor_to_memref is marked as allocating, then now we have lost information compared to doing a single big-bang conversion pass. We must remove the tensor_to_memref to finish bufferization. If it is marked as allocating, then we need to either replace the tensor_to_memref with alloc+copy (which in general is very difficult to remove), or prove that a copy isn't needed (which is in general very difficult to prove).

Now, suppose we run another dialect conversion pass that bufferizes foo_tensor:

%0 = "foo_memref"() : () -> memref<f32>
%tensor = tensor_load %0 : memref<f32>
%memref = tensor_to_memref %tensor : memref<f32>
%1 = "bar_memref"(%memref) : memref<f32> -> memref<f32>

Now, when we finalize bufferization, we need to be able to apply the identity tensor_to_memref(tensor_load(x)) -> x to get the same result as a single big-bang dialect conversion pass. There's nothing new here... it's literally what dialect conversion would have done internally to itself if running this as one big-bang conversion anyway.

As a historical note, tensor_to_memref didn't originally have the alloc+copy semantics. The only reason I initially added the alloc+copy semantics was that it was suggested by a reviewer and a the time I thought "why not? that helps to make the spec clearer". However, at that time, I didn't think through the ramifications of adding those semantics. Now it's clear that we cannot have the alloc+copy semantics.

I'm fine to remove the word "transient". But the thing that is clear is that this op is defined by the identity tensor_to_memref(tensor_load(x)) -> x. It is not defined by being allocating.

In D91967#2412146, @silvas wrote:

This op is defined by the fold tensor_to_memref(tensor_load(x)) -> x

I am puzzled: I don't quite get how an op can be defined only in the context of a fold.
This is how the op is defined in the doc: https://mlir.llvm.org/docs/Dialects/Standard/#stdtensor_to_memref-tensortomemrefop ; but that seems quite bogus to me.

In D91967#2412146, @silvas wrote:

We must remove the tensor_to_memref to finish bufferization. If it is marked as allocating, then we need to either replace the tensor_to_memref with alloc+copy (which in general is very difficult to remove), or prove that a copy isn't needed (which is in general very difficult to prove).

I don't understand how the semantics is anything else than "alloc+copy" right now: yes you have to work around it as you mention but that seems inherent to what it provides.

As a historical note, tensor_to_memref didn't originally have the alloc+copy semantics. The only reason I initially added the alloc+copy semantics was that it was suggested by a reviewer and a the time I thought "why not? that helps to make the spec clearer". However, at that time, I didn't think through the ramifications of adding those semantics. Now it's clear that we cannot have the alloc+copy semantics.

I don't quite get what semantics it has then, and I'd like to see a proper sound definition first.

In D91967#2412289, @mehdi_amini wrote:

In D91967#2412146, @silvas wrote:

This op is defined by the fold tensor_to_memref(tensor_load(x)) -> x

I am puzzled: I don't quite get how an op can be defined only in the context of a fold.
This is how the op is defined in the doc: https://mlir.llvm.org/docs/Dialects/Standard/#stdtensor_to_memref-tensortomemrefop ; but that seems quite bogus to me.

In D91967#2412146, @silvas wrote:

We must remove the tensor_to_memref to finish bufferization. If it is marked as allocating, then we need to either replace the tensor_to_memref with alloc+copy (which in general is very difficult to remove), or prove that a copy isn't needed (which is in general very difficult to prove).

I don't understand how the semantics is anything else than "alloc+copy" right now: yes you have to work around it as you mention but that seems inherent to what it provides.

As a historical note, tensor_to_memref didn't originally have the alloc+copy semantics. The only reason I initially added the alloc+copy semantics was that it was suggested by a reviewer and a the time I thought "why not? that helps to make the spec clearer". However, at that time, I didn't think through the ramifications of adding those semantics. Now it's clear that we cannot have the alloc+copy semantics.

I don't quite get what semantics it has then, and I'd like to see a proper sound definition first.

What is unsound about "the only thing you can do with this op is to apply this fold to it"?

My issue is that this does not give it a proper semantic definition in the absence of a tensor_load feeding into it.

%t_a = constant 42
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)

What do we expect to be printed here?

In D91967#2412462, @mehdi_amini wrote:

My issue is that this does not give it a proper semantic definition in the absence of a tensor_load feeding into it.

It seems like you want tensor_to_memref to define whether its result points at constant memory and/or is aliased to any other memref in the program. I don't see why we need to define that property, except via the fold. This is not unprecedented -- the result of std.subview/std.assume_alignment does not guarantee anything about the resulting memref pointing at non-constant memory (or aliasing any other live tensor in the program).

%t_a = constant 42
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)
What do we expect to be printed here?

It depends on how the constant is lowered, which will create the necessary tensor_load to match the tensor_to_memref (for example, it might be lowered to std.global_memref {constant} which would make the memset illegal). It is similar to asking what will be printed for this basic block:

^bb1(%arg0: memref<f32>)
  memset(%arg0, 0)
  print(%arg0)

The answer is "it depends" on what has happened in predecessors (and perhaps even on the memref-level calling convention that is adopted by the program, which might only happen in later passes; until then, we have to be conservative).

You seem to want an answer to the question "for every memref in the program, can we statically know if it is writable/aliased?". The answer to that question is no. E.g. you could have

def f(constant_memref, mutable_memref):
  if cond():
    memref = constant_memref
  else:
    memref = mutable_memref
  use(memref)

Same goes for aliasing properties.

With a small wording tweak, we can easily define the contents of the resulting memref -- the element read at index [i,j] of the resulting memref is the same as the tensor at index [i,j]. That is, we can define what data will be "read" from the resulting memref. Happy to do that.

In D91967#2412511, @silvas wrote:

In D91967#2412462, @mehdi_amini wrote:

My issue is that this does not give it a proper semantic definition in the absence of a tensor_load feeding into it.

It seems like you want tensor_to_memref to define whether its result points at constant memory and/or is aliased to any other memref in the program. I don't see why we need to define that property, except via the fold.

I don't quite get what you mean: I expect to have a semantics well defined, including for the example I provided.

This is not unprecedented -- the result of std.subview/std.assume_alignment does not guarantee anything about the resulting memref pointing at non-constant memory (or aliasing any other live tensor in the program).

std.subview takes a memref and returns another one: it clearly defines its aliasing I believe, but again I may just not understand your angle right now.

%t_a = constant 42
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)
What do we expect to be printed here?
It depends on how the constant is lowered, which will create the necessary tensor_load to match the tensor_to_memref (for example, it might be lowered to std.global_memref {constant} which would make the memset illegal). It is similar to asking what will be printed for this basic block:

OK forget the constant, I originally wrote:

%t_a = Foo() : tensor<..>
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)

The problem is really one of aliasing and mutability here.

^bb1(%arg0: memref<f32>)
  memset(%arg0, 0)
  print(%arg0)
The answer is "it depends" on what has happened in predecessors (and perhaps even on the memref-level calling convention that is adopted by the program, which might only happen in later passes; until then, we have to be conservative).

your example is absolutely unambiguous here: the print will print 0.

You seem to want an answer to the question "for every memref in the program, can we statically know if it is writable/aliased?".

I wasn't going for mutability, only aliasing.

Same goes for aliasing properties.

An important difference is that you're bridging the tensor domain and the memref domain: in the memref domain there is a clear conservative answer, my example is intended to show how it bleeds into the tensor domain.

In D91967#2412548, @mehdi_amini wrote:
^bb1(%arg0: memref<f32>)
  memset(%arg0, 0)
  print(%arg0)
The answer is "it depends" on what has happened in predecessors (and perhaps even on the memref-level calling convention that is adopted by the program, which might only happen in later passes; until then, we have to be conservative).
your example is absolutely unambiguous here: the print will print 0.

Actually, the program could crash if %arg0 points at constant memory. tensor_to_memref provides exactly the same guarantee as a random block argument of memref type. What more do you want from tensor_to_memref?

Same goes for aliasing properties.

An important difference is that you're bridging the tensor domain and the memref domain: in the memref domain there is a clear conservative answer, my example is intended to show how it bleeds into the tensor domain.

I don't see the "bleeding". If anything, I have tried to make tensor_to_memref agnostic to the details of how pass pipeline authors decide they want to bufferize tensor ops.

I don't want every tensor op to have a "shadow memref semantics" -- at that point, why should we have a memref type at all? XLA has a "shadow memref semantics" for every tensor op, e.g. DynamicUpdateSlice is guaranteed to operate in place and mutate its operand. Using that information, the XLA compiler inserts tensor-level copies so that the "shadow memref program" will be correct. And then "buffer allocation" is really just building a map<tensor Value, int64_t offset_in_a_slab_of_mem> for each tensor in the program. There is no need for memrefs at all.

In D91967#2412590, @silvas wrote:
In D91967#2412548, @mehdi_amini wrote:
^bb1(%arg0: memref<f32>)
  memset(%arg0, 0)
  print(%arg0)
The answer is "it depends" on what has happened in predecessors (and perhaps even on the memref-level calling convention that is adopted by the program, which might only happen in later passes; until then, we have to be conservative).
your example is absolutely unambiguous here: the print will print 0.
Actually, the program could crash if %arg0 points at constant memory.

Of course the program could have undefined behavior, but when talking about the semantics of operation we assume that the program does not have UB, otherwise there is no semantics by definition.

tensor_to_memref provides exactly the same guarantee as a random block argument of memref type. What more do you want from tensor_to_memref?

A block argument has a predecessor which supplies the memref, this isn't the case here.

Same goes for aliasing properties.

An important difference is that you're bridging the tensor domain and the memref domain: in the memref domain there is a clear conservative answer, my example is intended to show how it bleeds into the tensor domain.

I don't see the "bleeding". If anything, I have tried to make tensor_to_memref agnostic to the details of how pass pipeline authors decide they want to bufferize tensor ops.

I don't want every tensor op to have a "shadow memref semantics" -- at that point, why should we have a memref type at all? XLA has a "shadow memref semantics" for every tensor op, e.g. DynamicUpdateSlice is guaranteed to operate in place and mutate its operand. Using that information, the XLA compiler inserts tensor-level copies so that the "shadow memref program" will be correct. And then "buffer allocation" is really just building a map<tensor Value, int64_t offset_in_a_slab_of_mem> for each tensor in the program. There is no need for memrefs at all.

Can we please address my example and define the semantics here:

%t_a = Foo() : tensor<..> # Produce a tensor value
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)

Is the effect of the memset visible to the print?
It seems that you're saying "no" with your "shadow memref" explanation, but then that implies that tensor_to_memref has alloc+store semantics.

In D91967#2412696, @mehdi_amini wrote:
Can we please address my example and define the semantics here:
%t_a = Foo() : tensor<..> # Produce a tensor value
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)
Is the effect of the memset visible to the print?
It seems that you're saying "no" with your "shadow memref" explanation, but then that implies that tensor_to_memref has alloc+store semantics.

Yes, the effect of the memset is visible to the print.

I don't think that implies alloc+store semantics though. We don't specify how tensors are represented under the hood, and having them actually be backed by memrefs (with some special copy on write handling; such as a special page fault handler that is invisible to the program) would be totally valid and would make tensor_to_memref just take a "view" into a memref that is backing some tensor.

Why do you think alloc+store semantics are implied?

In D91967#2412771, @silvas wrote:
In D91967#2412696, @mehdi_amini wrote:
Can we please address my example and define the semantics here:
%t_a = Foo() : tensor<..> # Produce a tensor value
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)
Is the effect of the memset visible to the print?
It seems that you're saying "no" with your "shadow memref" explanation, but then that implies that tensor_to_memref has alloc+store semantics.
Yes, the effect of the memset is visible to the print.

I don't think that implies alloc+store semantics though. We don't specify how tensors are represented under the hood, and having them actually be backed by memrefs (with some special copy on write handling; such as a special page fault handler that is invisible to the program) would be totally valid. Why do you think alloc+store semantics are implied?

Those semantics seem extremely easy to break given that we generally don't treat tensors as memory based. I can see plenty of situations where we would break the invariants that you just described. I don't see how we would prevent side effect free code motion over that memset, given that tensors are not treated as side effecting.

In D91967#2412771, @silvas wrote:
In D91967#2412696, @mehdi_amini wrote:
Can we please address my example and define the semantics here:
%t_a = Foo() : tensor<..> # Produce a tensor value
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)
Is the effect of the memset visible to the print?
It seems that you're saying "no" with your "shadow memref" explanation, but then that implies that tensor_to_memref has alloc+store semantics.
Yes, the effect of the memset is visible to the print.

But I'm puzzled because that seems to point to the "shadow memref" that you said you didn't want? Again I'm probably missing something in your explanation.
Did you notice that the print is printing the *tensor* and not the memref?
To be sure we're referring to the same thing here? By "visible" here we really mean that the print will print the value set by the memset on the memref right? (So guarantee 0 here)

I don't think that implies alloc+store semantics though. [...] Why do you think alloc+store semantics are implied?

It is the opposite (I wrote above that if you answered "No" then I would have seen the implication).

We don't specify how tensors are represented under the hood, and having them actually be backed by memrefs (with some special copy on write handling; such as a special page fault handler that is invisible to the program) would be totally valid.

I have a problem with your weird aliasing here (copy-on-write would mean that the answer above should be "no: the memset effects aren't visible to the print because it'll copy-on-write").
Let me expand the example above:

%cst1 = constant 1 : tensor<..>
%t_add = add(%cst1, %cst1) : tensor<..>
%mem_add = tensor_to_memref %t_add
memset(%mem_add, 0)
%t_sub = sub(%t_add, %cst1) : tensor<..>
print(%t_sub)

Previously you were saying that the effect of the`memset` effects should be visible to the print, so they should also be visible to the sub here. That means that now we would want to guarantee printing -1, however sub(add(1, 1), 1) can be folded away:

%cst1 = constant 1 : tensor<..>
%t_add = add(%cst1, %cst1) : tensor<..>
%mem_add = tensor_to_memref %t_add
memset(%mem_add, 0)
print(%cst1)

We're printing 1 instead, which is consistent with the tensor domain value semantics.

In D91967#2412797, @mehdi_amini wrote:

We don't specify how tensors are represented under the hood, and having them actually be backed by memrefs (with some special copy on write handling; such as a special page fault handler that is invisible to the program) would be totally valid.

Actually it isn't even clear to me that memref and copy-on-write are actually compatible, but I'm not looking to derail the discussion here :)

bondhugula added inline comments.Nov 24 2020, 1:15 AM

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
3735–3739	How could a memref be created from a tensor value without memory being allocated?! The only way I could imagine that happening is the tensor itself somehow storing the memref descriptor - sort of like `tensor<memref<f32>>`. But that's not what's happening here. I think this op's design and semantics are unreal! Perhaps there is another way to accomplish what you want without creating such "fake" ops. When this `tensor_to_memref` thing was proposed on discourse, AFAIR it was a combo for "alloc + tensor_store" which is why it had the "allocation" memory effect and that made sense. But now, this is a complete departure!

I assumed these operations are in the same category as the LLVM::DialectCastOp (also known as llvm.mlir.cast). There it says

llvm.mlir.cast op casts between Standard and LLVM dialects. It only changes
    the dialect, but does not change compile-time or runtime semantics.

and it is a vehicle to enable gradual lowering from standard to llvm. Would it help if we moved these operations into a special bufferization dialect and call them bufferize.cast? These ops have no lowering but merely cast between the two domains. Casting from tensor to memref creates an immutable view of the tensor as a memref. Mutating such a memref has undefined behavior. Casting a memref to tensor creates a view of that memref and has undefined behavior if the memref is mutated after the cast. Bufferization lowering patterns have to take these semantics into account (they have to anyway, I don't think we should ever create bufferization patterns that mutate any of the operation's operands).

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td
3735–3739	So if we want to model `tensor_to_memref` as having a side-effect, then we need to be consistent and also give it the corresponding allocation and memory write effects. The same is true for `tensor_load`, which currently has no effect on the heap either and hence might be reordered if I am not mistaken. However, then we can also no longer fold `tensor_to_memref(tensor_load(%v))` to `%v` because that would remove the copy of `%v`. Instead, we have to canonicalize to `copy(%v)`. I would like to have such a copy operation anyway.
3736	That is exactly the behavior I was after. In my understanding of that operation, `tensor_to_memref` does not promise you a fresh copy of a memref of %t but rather _a_ copy of the contents of `%t`. Mutating such copy can have undefined aliasing effects and is generally unsafe. But if that is not what we are after, then my proposed change indeed makes no sense.

In D91967#2412797, @mehdi_amini wrote:
In D91967#2412771, @silvas wrote:
In D91967#2412696, @mehdi_amini wrote:
Can we please address my example and define the semantics here:
%t_a = Foo() : tensor<..> # Produce a tensor value
%mem_a = tensor_to_memref %t_a
memset(%mem_a, 0)
print(%t_a)
Is the effect of the memset visible to the print?
It seems that you're saying "no" with your "shadow memref" explanation, but then that implies that tensor_to_memref has alloc+store semantics.
Yes, the effect of the memset is visible to the print.
But I'm puzzled because that seems to point to the "shadow memref" that you said you didn't want? Again I'm probably missing something in your explanation.
Did you notice that the print is printing the *tensor* and not the memref?

Oh.... sorry! I didn't see that the print is printing the tensor. Indeed, the original tensor value is printed! No operation on a memref can *ever* change the value of a tensor.

Herald added a subscriber: nimiwio. · View Herald TranscriptNov 24 2020, 11:34 AM

In D91967#2413223, @herhut wrote:
I assumed these operations are in the same category as the LLVM::DialectCastOp (also known as llvm.mlir.cast). There it says
llvm.mlir.cast op casts between Standard and LLVM dialects. It only changes
    the dialect, but does not change compile-time or runtime semantics.
and it is a vehicle to enable gradual lowering from standard to llvm. Would it help if we moved these operations into a special bufferization dialect and call them bufferize.cast? These ops have no lowering but merely cast between the two domains. Casting from tensor to memref creates an immutable view of the tensor as a memref. Mutating such a memref has undefined behavior. Casting a memref to tensor creates a view of that memref and has undefined behavior if the memref is mutated after the cast. Bufferization lowering patterns have to take these semantics into account (they have to anyway, I don't think we should ever create bufferization patterns that mutate any of the operation's operands).

Thanks! This formalizes what I was thinking.

Btw, I should say that I'm mildly ok with annotating tensor_to_memref/tensor_load with alloc+store/load side effects... but that somewhat breaks the composability of bufferization passes because now there is information loss compared to doing one big dialect conversion invocation (need to prove that copies can be elided). I think that something like Stephan's idea of a bufferize.cast op with highly restrictive semantics is much preferable at maintaining the composability of dialect conversion passes.

The main inconsistency that I see with marking tensor_to_memref/tensor_load as having memory effects is that neither op actually has memory effects when viewed as a cast op. They always fold away as part of conversion. For example, the memref produced by tensor_to_memref *always* refers to some memref that already existed in the program, at the time that it is removed. Until all tensors are converted to memrefs, you might not see the memref that it folds away to (and personally, I'm ok with that, but it sounds like that irks some folks to rely on these ops eventually being removed in that way).

Oh.... sorry! I didn't see that the print is printing the tensor. Indeed, the original tensor value is printed! No operation on a memref can *ever* change the value of a tensor.

Seems like we're on the same track again then, I'm glad we clarified this aspect! :)

Casting from tensor to memref creates an immutable view of the tensor as a memref. Mutating such a memref has undefined behavior. Casting a memref to tensor creates a view of that memref and has undefined behavior if the memref is mutated after the cast.

Thanks! This is getting to the semantics question I had in mind and wanted to clarify.

Bufferization lowering patterns have to take these semantics into account (they have to anyway, I don't think we should ever create bufferization patterns that mutate any of the operation's operands).

My only concern is how this creeps outside of the bufferization process in itself and how it fits the rest of the semantics around tensor/memref. Another example would be:

%mem = ...
write(%mem)
%t = memref_to_tensor(%mem)
print %(t)

If "memref_to_tensor" is "NoSideEffect", then we're allowed to move it before the write to the memref, so the print can move up as well:

%mem = ...
%t = memref_to_tensor(%mem)
print %(t)
write(%mem)

So even with the restriction you mentioned in the definition above, I believe that memref_to_tensor should at least have "read" effects on the memref (for symmetrically "tensor_to_memref" likely deserves a "write" effect?).

In D91967#2414676, @mehdi_amini wrote:

So even with the restriction you mentioned in the definition above, I believe that memref_to_tensor should at least have "read" effects on the memref (for symmetrically "tensor_to_memref" likely deserves a "write" effect?).

Actually the symmetry may not be warranted: when getting a memref from a tensor only the consumers of tensor_to_memref need to have a read effect.
It is interesting than the other way around (casting a memref to a tensor) has to "freeze" the value of the memref into the tensor and must read it though. Is a cast in this direction even making sense? How does it differ from tensor_load for example?

In D91967#2414691, @mehdi_amini wrote:

In D91967#2414676, @mehdi_amini wrote:

So even with the restriction you mentioned in the definition above, I believe that memref_to_tensor should at least have "read" effects on the memref (for symmetrically "tensor_to_memref" likely deserves a "write" effect?).

Actually the symmetry may not be warranted: when getting a memref from a tensor only the consumers of tensor_to_memref need to have a read effect.
It is interesting than the other way around (casting a memref to a tensor) has to "freeze" the value of the memref into the tensor and must read it though. Is a cast in this direction even making sense? How does it differ from tensor_load for example?

I think it is effectively the same as tensor_load. The description of tensor_load says:

Create a tensor from a memref, making an independent copy of the element
data. The result value is a tensor whose shape and element type match the
memref operand.

What it describes as "making an independent copy" is effectively the property that we are looking to formalize with MemRead (it takes a "snapshot" of the memref at a point in time). However, "snapshot" is not as strong as the "freeze" semantics that Stephan proposes which prohibits any mutation to the tensor after the tensor_load op.

Given the highly restrictive use cases where these ops are needed in the dialect conversion framework, the stronger "freeze" semantics seem to be fine.

So, to conclude here, it seems to me the best way forward is to move this dialect conversion ops to their own dialect and make it clear that they are essentially cast operations between tensor and memref for the purpose of conversion. This is a discussion for discourse, though.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

StandardOps/

IR/

Ops.td

2 lines

Diff 307078

mlir/include/mlir/Dialect/StandardOps/IR/Ops.td

Show First 20 Lines • Show All 3,726 Lines • ▼ Show 20 Lines	def TensorStoreOp : Std_Op<"tensor_store",

let assemblyFormat = "$tensor `,` $memref attr-dict `:` type($memref)";		let assemblyFormat = "$tensor `,` $memref attr-dict `:` type($memref)";
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TensorToMemrefOp		// TensorToMemrefOp
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def TensorToMemrefOp : Std_Op<"tensor_to_memref",		def TensorToMemrefOp : Std_Op<"tensor_to_memref",
[SameOperandsAndResultShape, SameOperandsAndResultElementType,		[NoSideEffect, SameOperandsAndResultShape, SameOperandsAndResultElementType,
		bondhugulaUnsubmitted Not Done Reply Inline Actions It would be incorrect to mark it side-effect free. This op would have a side effect on the heap. For eg., if this is marked side-effect free, CSE would eliminate one of these: %m1 = tensor_to_memref %t : memref<...> %m2 = tensor_to_memref %t : memref<...> On another note, it's out of line to mention things like "This is a transient op" in this op's description. An op is an op - transient or the length of its lifetime is a statement on its current usage, and the op description shouldn't be capturing these. bondhugula: It would be incorrect to mark it side-effect free. This op would have a side effect on the heap.
		herhutAuthorUnsubmitted Done Reply Inline Actions That is exactly the behavior I was after. In my understanding of that operation, `tensor_to_memref` does not promise you a fresh copy of a memref of %t but rather _a_ copy of the contents of `%t`. Mutating such copy can have undefined aliasing effects and is generally unsafe. But if that is not what we are after, then my proposed change indeed makes no sense. herhut: That is exactly the behavior I was after. In my understanding of that operation…
TypesMatchWith<"type of 'tensor' is the tensor equivalent of 'memref'",		TypesMatchWith<"type of 'tensor' is the tensor equivalent of 'memref'",
"memref", "tensor",		"memref", "tensor",
"getTensorTypeFromMemRefType($_self)">]> {		"getTensorTypeFromMemRefType($_self)">]> {
		bondhugulaUnsubmitted Not Done Reply Inline Actions This op is actually missing MemoryEffects. I thought something that is creating a new memref should have a `MemAlloc` MemoryEffect. It had it in a previous version and looks like it was removed - so there must be a reason for it. @silvas bondhugula: This op is actually missing MemoryEffects. I thought something that is creating a new memref…
		silvasUnsubmitted Not Done Reply Inline Actions Yes, I removed it here: https://reviews.llvm.org/rG796880288a756d1866dad0210a818896eda844cc As described in that commit, the semantics of tensor_to_memref allocating don't match reality. In particular, the situations under which the dialect conversion framework inserts/removes them (sometimes implicitly) cannot be justified under the "allocating" semantics. silvas: Yes, I removed it here: https://reviews.llvm.org/rG796880288a756d1866dad0210a818896eda844cc As…
		bondhugulaUnsubmitted Not Done Reply Inline Actions How could a memref be created from a tensor value without memory being allocated?! The only way I could imagine that happening is the tensor itself somehow storing the memref descriptor - sort of like `tensor<memref<f32>>`. But that's not what's happening here. I think this op's design and semantics are unreal! Perhaps there is another way to accomplish what you want without creating such "fake" ops. When this `tensor_to_memref` thing was proposed on discourse, AFAIR it was a combo for "alloc + tensor_store" which is why it had the "allocation" memory effect and that made sense. But now, this is a complete departure! bondhugula: How could a memref be created from a tensor value without memory being allocated?! The only way…
		herhutAuthorUnsubmitted Done Reply Inline Actions So if we want to model `tensor_to_memref` as having a side-effect, then we need to be consistent and also give it the corresponding allocation and memory write effects. The same is true for `tensor_load`, which currently has no effect on the heap either and hence might be reordered if I am not mistaken. However, then we can also no longer fold `tensor_to_memref(tensor_load(%v))` to `%v` because that would remove the copy of `%v`. Instead, we have to canonicalize to `copy(%v)`. I would like to have such a copy operation anyway. herhut: So if we want to model `tensor_to_memref` as having a side-effect, then we need to be…
let summary = "tensor to memref operation";		let summary = "tensor to memref operation";
let description = [{		let description = [{
Create a memref from a tensor. This is a transient op created as a		Create a memref from a tensor. This is a transient op created as a
materialization during type conversions between tensors and memrefs.		materialization during type conversions between tensors and memrefs.

The opposite of this op is tensor_load. Together, these two ops are useful		The opposite of this op is tensor_load. Together, these two ops are useful
for source/target materializations when doing type conversions involving		for source/target materializations when doing type conversions involving
tensors and memrefs.		tensors and memrefs.
▲ Show 20 Lines • Show All 373 Lines • Show Last 20 Lines