This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Refactored BufferDeallocation pass to enable support for individual alloc/clone/free operations.
AbandonedPublic

Authored by dfki-mako on Dec 3 2020, 3:01 AM.

Details

Summary

This CL refactors the BufferDeallocation pass to offer the opportunity to use custom alloc/clone/free operations. This overcomes the current limitations of emitting hard coded std::alloc, linalg::copy and std::dealloc operations.

Diff Detail

Event Timeline

dfki-mako created this revision.Dec 3 2020, 3:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2020, 3:01 AM
dfki-mako requested review of this revision.Dec 3 2020, 3:01 AM

Can we please finish the discussion in https://llvm.discourse.group/t/remove-tight-coupling-of-the-bufferdeallocation-pass-to-std-and-linalg-operations/2162/12 before moving forward with this? It's not clear to me that we all agreed that this was the right direction.

herhut added a comment.Dec 4 2020, 3:17 AM

I disagree with @silvas on this. This particular change does not take any opinion on the discussion you reference. It merely makes the hard-wired current behavior configurable by choosing different ops. It does not mandate how the decision which free operation to use is done.

This unblocks users that do not want to use std.free or linalg.copy.

mlir/lib/Transforms/BufferDeallocation.cpp
505

Taking a reference of the contents of a shared pointer is dangerous, as the reference might outlive the lifetime of the shared pointer's content.

532

I just filed a bug for this that you can also reference here. https://bugs.llvm.org/show_bug.cgi?id=48385

silvas added a comment.EditedDec 4 2020, 1:12 PM

I disagree with @silvas on this. This particular change does not take any opinion on the discussion you reference. It merely makes the hard-wired current behavior configurable by choosing different ops. It does not mandate how the decision which free operation to use is done.

To be clear, whether one should be able to customize which free operation is used and how that is done is specifically what is being discussed in that thread.

This change is literally implementing code to solve the problem under discussion in that thread, without taking into consideration any of the design considerations brought up in that thread. It's not that this patch "does not take any opinion" -- this patch does nothing other than take an opinion on that discussion.

This unblocks users that do not want to use std.free or linalg.copy.

The discussion in that thread is specifically about how we want to address this user need. Why would we move forward with this until we decide that this is the way we want to address that user need? For example, I don't see the advantage of this approach vs just creating "free" and "clone" ops and letting downstream users convert those to whatever they choose. This is best covered in that discussion thread.

As a general matter of policy in LLVM, we do not bypass ongoing design discussions to unblock downstream users.

tpopp added a subscriber: tpopp.Dec 4 2020, 5:57 PM

As a general matter of policy in LLVM, we do not bypass ongoing design discussions to unblock downstream users.

Just a note. It's fair to say that they should have pinged the thread one more time, but that thread had a discussion, where dfki and herhut responded to the debate and then received no response for 17 days. That's not really an ongoing discussion at that point.

silvas added a comment.Dec 4 2020, 7:18 PM

As a general matter of policy in LLVM, we do not bypass ongoing design discussions to unblock downstream users.

Just a note. It's fair to say that they should have pinged the thread one more time, but that thread had a discussion, where dfki and herhut responded to the debate and then received no response for 17 days. That's not really an ongoing discussion at that point.

We can call it an "abandoned discussion", but that doesn't make it much better. No response doesn't mean "approved". I would expect an attempt to revive the discussion, rather than bypass it.

Also, some of the later posts were unrelated to my question, which is "why not lower through std.copy, possibly with an attribute for the resources". The last response related to that was from @herhut

If we can agree to have a std.copy, then using allocation resources seems a good way to model this. We could then go ahead and have a reference counted resource.

I interpreted this to mean that we wanted to pursue a direction of std.copy with resource attributes. I support that. I apologize that I did not follow up with a reply to that. I've made a reply in the thread.

@silvas Our intention was not to "bypass anything", sorry for the confusion. As you have mentioned, the discussion went a bit "idle" and we had to more forwards to simplify the integration of the BufferDeallocation pass into other third-party sub projects. Therefore, we proposed this CL that realizes an extremely light-weight version of the Pass Interface being discussed in the forums. However, the proposed solution in the scope of this CL is meant to unblock others on the one hand and being minimally invasive on the other hand. In other words: as soon as we can come up with a better solution based on the Discourse discussion, we can easily "revert" or "update" this pass interface. Please note further that existing code will not break or require further updates after merging this CL.

@silvas Our intention was not to "bypass anything", sorry for the confusion. As you have mentioned, the discussion went a bit "idle" and we had to more forwards to simplify the integration of the BufferDeallocation pass into other third-party sub projects. Therefore, we proposed this CL that realizes an extremely light-weight version of the Pass Interface being discussed in the forums. However, the proposed solution in the scope of this CL is meant to unblock others on the one hand and being minimally invasive on the other hand. In other words: as soon as we can come up with a better solution based on the Discourse discussion, we can easily "revert" or "update" this pass interface. Please note further that existing code will not break or require further updates after merging this CL.

Typically, once downstream projects have something that unblocks them, they lose interest in contributing to a long-term solution. That is why in LLVM we tend to avoid "unblocking" downstream projects and insist on more up-front discussion that optimizes long-term community maintenance costs. Also, typically the party implementing the "better solution" ends up being different from the party that was "unblocked", which is unfair from a community maintenance perspective.

A recent example of this is my push to rationalize the bufferization infrastructure (as presented recently at the ODM). I spent about a month of time on that, and a large amount of it was undoing "minimally invasive" solutions that were added without significant discussion. For example, the "minimally invasive" features in https://reviews.llvm.org/D85133 ended up being a significant amount of that time -- I probably spent more time undoing those features than was involved in adding them in the first place. Also, to make things worse, neither of those features were still being used by downstream projects (yet I was still pushed by various people to generalize/rationalize them...), so they should not have been added upstream in the first place, and I think that a bit more up-front discussion would have revealed that (the best code is no code!).

First, the work @silvas did on improving how bufferization works in MLIR core brought a great cleanup with it and the outcome is much nicer than what we had before. Thank you for investing your time and thought on it. A lot of what got cleaned up had to do with how operations get bufferized and how we can do this partially and incrementally, a functionality that we did not have before and that we are already heavily using now. Awesome!

I have heard the argument that LLVM does not accept contributions until there is a fully vetted design before and I have not worked long enough in llvm to judge this. However, applying such a high bar to a code base as immature as MLIR means that we cannot evolve the system incrementally, which would hinder progress. There are many solutions in MLIR that are stop-gaps to unblock things while the full solution does not exist, yet. The modelling of side-effects went through this, the modelling of resources is still in that state, alias and dependence analysis, dominance on regions. There are many places where we have incremental solutions. Where I agree is that we should have a vision of where we want to get so that we do not build solutions that paint us into a corner.

Lets bring this discussion back to the change at hands here. In the discussion you referenced, the question is how we will handle different copy and free operations depending on how a memref was allocated. The goal in that discussion is to find a way to properly model mixed settings that use different kinds of allocation beyond the current support of alloc vs. alloca that is hard coded. This change will not enable that nor does it change the situation wrt. how this is currently modeled.

What this change does is making the used free and copy operations configurable, so that this can be used independently of linalg.copy and std.free. A general solution to the problem will also solve this small issue but solving this small issue will not make the general problem harder to solve, nor will it make a general solution any less desirable.

Therefore, I think that this change is a valid incremental step that brings us slightly closer to what we need without impacting the choices we have nor paint us in a corner. And that makes it good enough for me to land this approach.

silvas added a comment.EditedDec 9 2020, 1:13 PM

Totally agreed that MLIR is in a youthful stage. I think it is a valuable discussion to be had explicitly with the community about how we approach the tradeoffs in adopting more incremental code vs more vetted designs. Getting clarity there would greatly simplify this discussion!

What this change does is making the used free and copy operations configurable, so that this can be used independently of linalg.copy and std.free. A general solution to the problem will also solve this small issue but solving this small issue will not make the general problem harder to solve, nor will it make a general solution any less desirable.

It isn't obvious to me that solving this small issue will not make the general problem harder to solve. My main concern, as I discussed above, is that it mean that somebody eventually implementing the more general functionality has to first figure out how to "undo" the solution to the "small issue" first. I'l give a concrete example:

Having this customizable with a callback will allow users to potentially start doing all sorts of random stuff inside. What if a user starts depending on that? For example, what if the callback references some side data structure and generates different clone/free ops depending on that, possibly in a less structured way than we would like to support upstream. That suddenly becomes much more difficult to unwind. Of course, implementing that callback is likely to be done in a downstream project, where doing something like this as a shortcut might seem more acceptable. But that shortcut then becomes a cost that the MLIR community needs to pay to unwind!

As evidenced in my recent refactorings, even functionality that ends up being functionally *unused* in downstream projects ends up being a significant maintenance cost that needs to be paid by the MLIR community.

mehdi_amini added a comment.EditedDec 9 2020, 6:34 PM

+1 to @silvas here, we have to be careful with the balance in general.

In this case I still wonder why do we need to have customization of the alloc/free for a given type? I asked in the thread on Discourse but I don't think I really got an answer (or at least it is still unclear to me)

herhut added a comment.EditedDec 10 2020, 1:13 AM

Having this customizable with a callback will allow users to potentially start doing all sorts of random stuff inside. What if a user starts depending on that? For example, what if the callback references some side data structure and generates different clone/free ops depending on that, possibly in a less structured way than we would like to support upstream. That suddenly becomes much more difficult to unwind. Of course, implementing that callback is likely to be done in a downstream project, where doing something like this as a shortcut might seem more acceptable. But that shortcut then becomes a cost that the MLIR community needs to pay to unwind!

We could, for example, explicitly state that this is not allowed and that the call back should always create the same operations. Then we have a contract of what we want to support and a clear mandate to break downstream that does not follow the contract. Would that approach work for you?

In this case I still wonder why do we need to have customization of the alloc/free for a given type? I asked in the thread on Discourse but I don't think I really got an answer (or at least it is still unclear to me)

This does not allow to configure the alloc operation, at all. Allocs are created by bufferization patterns and the authors of those currently have a free choice. Allowing to configure the used free operation is useful when this deallocation pass is used with other alloc operations (which it already supports via interfaces). Configuring copy operations is useful when the user of this deallocation pass has knowledge about how memrefs should be copied other than allocating + linalg.copy. We do not even know what the correct alloc operation is to insert there.

The issue with free can be solved by first inserting frees and then rewriting them, which is what we would do today. The issue with copy is more complex, as it currently does not introduce a single operation. To enable the latter, we have discussed to always insert a copy operation and then users can rewrite them at will. I think there is a forthcoming post from @dfki-mako about that. The idea essentially would be to have a bufferize dialect, that contains copy, cast and maybe free operations that can have semantics that are tailored to the needs of bufferization.

Having this customizable with a callback will allow users to potentially start doing all sorts of random stuff inside. What if a user starts depending on that? For example, what if the callback references some side data structure and generates different clone/free ops depending on that, possibly in a less structured way than we would like to support upstream. That suddenly becomes much more difficult to unwind. Of course, implementing that callback is likely to be done in a downstream project, where doing something like this as a shortcut might seem more acceptable. But that shortcut then becomes a cost that the MLIR community needs to pay to unwind!

We could, for example, explicitly state that this is not allowed and that the call back should always create the same operations. Then we have a contract of what we want to support and a clear mandate to break downstream that does not follow the contract. Would that approach work for you?

That seems to be functionally equivalent to my approach of having memref.clone(%0) {tag = "mytag"} and passing in "mytag" to this pass, but with the disadvantage that users might ignore the advice and misuse the callbacks (Hyrum's law). Also, I'm more worried about the things that I can't foresee. That is just one example. Another would be users calling getDefiningOp on the Value and making decisions based on that. Another would be users looking at the Value and seeing what function they are in to determine which clone/free to use. I'm quite sure there's a long list of possible other issues that we cannot foresee.

In this case I still wonder why do we need to have customization of the alloc/free for a given type? I asked in the thread on Discourse but I don't think I really got an answer (or at least it is still unclear to me)

This does not allow to configure the alloc operation, at all. Allocs are created by bufferization patterns and the authors of those currently have a free choice. Allowing to configure the used free operation is useful when this deallocation pass is used with other alloc operations (which it already supports via interfaces). Configuring copy operations is useful when the user of this deallocation pass has knowledge about how memrefs should be copied other than allocating + linalg.copy. We do not even know what the correct alloc operation is to insert there.

The issue with free can be solved by first inserting frees and then rewriting them, which is what we would do today. The issue with copy is more complex, as it currently does not introduce a single operation. To enable the latter, we have discussed to always insert a copy operation and then users can rewrite them at will. I think there is a forthcoming post from @dfki-mako about that. The idea essentially would be to have a bufferize dialect, that contains copy, cast and maybe free operations that can have semantics that are tailored to the needs of bufferization.

The current approach assumes that all memrefs that this pass will attempt to free can be freed/cloned by the ops produced by these callbacks. That seems incredibly fragile, considering what you said about bufferization patterns being able to freely choose their allocation operation -- I think that's an incredibly strong argument that we need a more general solution, probably something that rationalizes the invariants needed by this pass with what bufferization patterns are allowed to do. It seems that we would need to require all allocating ops to specify what resource/tag they allocate from so we can know how to free it, and then do some dataflow analysis to propagate that information, and fail BufferDeallocation if we cannot figure out which clone/free is legal to insert.

Maybe we can take a step back, talk concretely about the problem that we are actually trying to solve (on discourse), rather than some abstract goal of making this pass more generic. I would especially like to understand how we guarantee the invariant stated above that all allocations can be validly freed by the ops produced by these callbacks.

The issue with free can be solved by first inserting frees and then rewriting them, which is what we would do today. The issue with copy is more complex, as it currently does not introduce a single operation. To enable the latter, we have discussed to always insert a copy operation and then users can rewrite them at will. I think there is a forthcoming post from @dfki-mako about that. The idea essentially would be to have a bufferize dialect, that contains copy, cast and maybe free operations that can have semantics that are tailored to the needs of bufferization.

The current approach assumes that all memrefs that this pass will attempt to free can be freed/cloned by the ops produced by these callbacks. That seems incredibly fragile, considering what you said about bufferization patterns being able to freely choose their allocation operation -- I think that's an incredibly strong argument that we need a more general solution, probably something that rationalizes the invariants needed by this pass with what bufferization patterns are allowed to do. It seems that we would need to require all allocating ops to specify what resource/tag they allocate from so we can know how to free it, and then do some dataflow analysis to propagate that information, and fail BufferDeallocation if we cannot figure out which clone/free is legal to insert.

The current approach always uses free for operations that have an allocation effect on the default resource. It already traces values to their allocation. As there basically is only one allocation resource, conflict resolution is not yet implemented.

Maybe we can take a step back, talk concretely about the problem that we are actually trying to solve (on discourse), rather than some abstract goal of making this pass more generic. I would especially like to understand how we guarantee the invariant stated above that all allocations can be validly freed by the ops produced by these callbacks.

It does not change the status quo, which is that we cannot guarantee that. Currently it inserts the free operation from the standard dialect, as the interfaces are not rich enough to tell. Also see the stale discussion on discourse around that topic for reference.

dfki-mako abandoned this revision.Dec 22 2020, 1:30 AM

@silvas @mehdi_amini @herhut based on the feedback and the discussion in the public Discourse forums, we will close this PR for now until there is a better solution to overcome these limitations in the future.