This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] NFC - Add bufferization options
AbandonedPublic

Authored by nicolasvasilache on Nov 13 2020, 7:41 AM.

Details

Summary

This revision adds bufferization options to Linalg with a hook to allow changing the Alloc behavior. The default behavior is unchanged.

Diff Detail

Event Timeline

nicolasvasilache requested review of this revision.Nov 13 2020, 7:41 AM
silvas requested changes to this revision.Nov 13 2020, 11:55 AM

We are generally moving in the direction of using std.alloc by default, and having a separate pass that promotes std.alloc to std.alloca before buffer-deallocation runs. For example, with the promote-buffers-to-stack pass.

If you just want to convert all allocs to alloca's, we might need a minor tweak the logic in that pass to allow "-1" to mean "any buffer". E.g. promote-buffers-to-stack{max-alloc-size-in-bytes=-1}.

Would that work for you?

This revision now requires changes to proceed.Nov 13 2020, 11:55 AM

We are generally moving in the direction of using std.alloc by default, and having a separate pass that promotes std.alloc to std.alloca before buffer-deallocation runs. For example, with the promote-buffers-to-stack pass.

Hmm .. pretty much everything in MLIR is evolving towards Op interfaces (e.g. AllocLike / CopyLike / ViewLike op interfaces) + TypeConverter which is itself really a Type -> Type interface.
Doesn't this evolution you describe become limited by what std.alloc which seems to go against generalizability and composability?

If you just want to convert all allocs to alloca's, we might need a minor tweak the logic in that pass to allow "-1" to mean "any buffer". E.g. promote-buffers-to-stack{max-alloc-size-in-bytes=-1}.

Alloca is not the only use case, assigning to various memory spaces is another use case.
In the future we will also likely see dialect-specific alloc/copy/dealloc + types (e.g. sparse and quantized).
Such ops and types will need to compose with, at least, the patterns for control-flow.

Would that work for you?

To answer your Q, yes I can easily implement a legalization that converts an Alloc to either Alloca or Alloc + memory space.
I expect it will have the usual "fixup pass" issues: (a) potentially need to recover information that may have been lost, (b) creating yet another pass, (c) 2 passes = new opportunities for phase ordering issues.

It seems to me that the hook proposed in this revision is in the spirit of previous MLIR evolutions that I would also expect bufferization to follow at some point in the future.

Then, there is also the timing aspect: if you're saying that you have a general solution that will be available very soon then great, happy to wait.
If not, then I'd prefer to not be blocked on something quite simple that can easily be evolved.

Maybe the easiest Q I have is whether you have some design doc for what bufferization should look like in a longer-term future?
I've been wondering in particular if bufferization is a generic dialect conversion with some additional AllocLike and CopyLike behavior?

We are generally moving in the direction of using std.alloc by default, and having a separate pass that promotes std.alloc to std.alloca before buffer-deallocation runs. For example, with the promote-buffers-to-stack pass.

Hmm .. pretty much everything in MLIR is evolving towards Op interfaces (e.g. AllocLike / CopyLike / ViewLike op interfaces) + TypeConverter which is itself really a Type -> Type interface.
Doesn't this evolution you describe become limited by what std.alloc which seems to go against generalizability and composability?

Interfaces are just one way to generalize things in MLIR. It's not the right abstraction mechanism for everything.

In particular, I think that interfaces are mostly useful for anlaysis purposes and some transformation purposes, but not as much for lowering. The reason is that when you are lowering, you really are just creating an op with a specific signature. E.g. if I need to create a "copy" when lowering, I am going to call a callback createCopy(Value from, Value to). So you might as well create std.copy %from, %to and let consumers lower std.copy to whatever they want. There is no extra information created by directly creating my.copy %from, %to vs simply lowering std.copy -> my.copy later.

If you just want to convert all allocs to alloca's, we might need a minor tweak the logic in that pass to allow "-1" to mean "any buffer". E.g. promote-buffers-to-stack{max-alloc-size-in-bytes=-1}.

Alloca is not the only use case, assigning to various memory spaces is another use case.
In the future we will also likely see dialect-specific alloc/copy/dealloc + types (e.g. sparse and quantized).
Such ops and types will need to compose with, at least, the patterns for control-flow.

Would that work for you?

To answer your Q, yes I can easily implement a legalization that converts an Alloc to either Alloca or Alloc + memory space.
I expect it will have the usual "fixup pass" issues: (a) potentially need to recover information that may have been lost, (b) creating yet another pass, (c) 2 passes = new opportunities for phase ordering issues.

Handling Alloc + memory space is highly nontrivial, because it entails context-dependent type conversions. Everything about the current type conversion infrastructure assumes that types are converted context-independently. That is, if you see a tensor type, then you know which memref type it turns into; once you open up the door to the same tensor type converting to either memref<.., 0> or memref<..., 3> you're walking on very thin ice for the dialect conversion infrastructure. It will require a lot of thought to do properly, if it is even feasible to do at all.

Example:

  %0 = "foo" : tensor<2xf32> // hypothetically converts to memref<2xf32, 0>
  %1 = "bar" : tensor<2xf32> // hypothetically converts to memref<2xf32, 3>
  br ^bb1(%0, %1)
^bb1(%bbarg0: tensor<2xf32>, %bbarg1: tensor<2xf32>):
  // use %bbarg0, %bbarg1

The key problem is knowing what type %bbarg0 and %bbarg1 need to be converted to. This is difficult to do in the current dialect conversion framework, but is relatively easy to do in a post-pass.

If you just want to blanket convert all tensors to memref<..., 3> that's easier to handle in the dialect conversion framework, but I don't see what that buys you over a simple post-pass that just adds the address space to all memref types.

It seems to me that the hook proposed in this revision is in the spirit of previous MLIR evolutions that I would also expect bufferization to follow at some point in the future.

Then, there is also the timing aspect: if you're saying that you have a general solution that will be available very soon then great, happy to wait.

I think the alloca case can be easily worked around as I described. The memref + address space is less trivial, and this patch doesn't do anything to fix the fundamental difficulties with that approach (actually, it moves us away from solving them).

If not, then I'd prefer to not be blocked on something quite simple that can easily be evolved.

Maybe the easiest Q I have is whether you have some design doc for what bufferization should look like in a longer-term future?

I think we're still figuring it out. This area is still very early in development.

I've been wondering in particular if bufferization is a generic dialect conversion with some additional AllocLike and CopyLike behavior?

I don't think additional AllocLike or CopyLike behavior is load bearing. Users can convert std.alloc/std.copy to whatever they like. There's no extra information available at bufferization time that can create ops carrying any information not already carried by std.alloc / std.copy. We can look into having some sort of transformation that precomputes some attribute for different tensor ops, and then propagates that to the std.alloc/copy. It requires thought to make sure that composes right though.

One option going forward here is to allow the tensor type to carry attributes on it (as Chris mentions here: https://llvm.discourse.group/t/rfc-memref-memory-shape-as-attribute/2229/3), such as a memory space or desired allocation kind, and then have bufferization respect that. That puts the brunt of the effort of updating types (such as basic block args and scf.if types) on the pass that annotates those memory spaces rather than conflating it with the actual dialect conversion itself.

Overall, I think one of the key issues is to what extent we want to expose "what this tensor is going to bufferize to" at the tensor level. Especially when such annotations come into existence and how they are kept valid / up to date.

nicolasvasilache abandoned this revision.Dec 7 2020, 6:02 AM