The current BufferPlacement pass does not support allocation nodes that carry additional dependencies (like in the case of dynamic shaped types). These allocations can often not be moved freely and in turn might remain in divergent control-flow branches. This requires a different strategy with respect to block arguments and aliases. This CL adds additional functionality to support allocation nodes in divergent control flow while avoiding memory leaks.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can you add tests for dynamic allocation cases, as well?
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
11 | Incomplete sentence to ensure that all buffers | |
37 | In other words, the analysis uses the post-dominator to free the allocated memory. If no such post-dominator exists, aliases are removed by inserting allocs and copies. Is that the idea? | |
45 | Nit: Remove highly. So alloc works the same way but this time with a common dominator? If no such dominator exists, copies are inserted? | |
182 | Optimizations should be independent of BufferPlacement, so this TODO does not belong here. | |
209 | Would it make sense to compute this once and share it between the different phases? The moving of allocs does not influence the aliasing. | |
225 | This should not be an assert. Instead either ignore the alloc (which AFAIK was the mode before) or fail the transformation and report an error. Generally, the reason this was not supported before was that multiple results creates situations where the alloc cannot be moved. Can the new system not tolerate this with copies? I am OK with this not being supported, as a multi-result alloc seems a very boutique use. | |
237 | Allocation node with know shape? Another way to look at it would be to check whether the allocation has any operands, which structurally is the reason it cannot be moved. | |
247 | Same here, this could just use the operands of the alloc. | |
248–271 | Is this dependency information that you are lacking? | |
249 | It is a clever reuse of findPlacementBlock. Maybe it should no longer talk about aliases. Instead, it is a helper that finds a dominator/postdominator for a range of values. | |
370–371 | This does not account for nested blocks in nested regions. Also, why is this needed? | |
387 | Would it make sense to store BlockArgument to start with? Aliases are always BlockArguments, right? | |
389 | Somewhat sad that MutableOperandRange does not support to get values out... | |
390 | This breaks the aliasing between source and value. If there are further aliases of source that alias via value, they would no longer be an alias. But we would still potentially insert a copy to break the aliasing? | |
464 | Why would we want to insert before the end operation? That would free it before the last use? | |
502 | a an -> an | |
mlir/test/Transforms/buffer-placement.mlir | ||
93 | The numbering is strange of ALLOC02 | |
333 | I cannot follow this without the block numbers in the checks. Why do we get copies here? | |
440 | Same here. |
Implemented a fix-point iteration to reduce the number of required copies.
Fixed minor issues (see review comments).
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
37 | Yep, that's the general idea. | |
225 | In principal yes; however, we currently remove all Dealloc nodes in the beginning of the transformation. If we keep them for unsupported Alloc ops, this should not be an issue. | |
370–371 | Obsolete; we have added a new fix-point iteration. | |
389 | Unfortunately, it seems to be the only way at the moment... | |
mlir/test/Transforms/buffer-placement.mlir | ||
93 | The name ALLOC02 should indicate that the allocations ALLOC0 and ALLOC2 will be freed at this location. | |
333 | The updated algorithm does not insert any copies in these cases any more. |
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
225 | Can we be *safe* in this case instead of failing entirely? |
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
225 | @mehdi_amini If we do not remove Dealloc nodes (or nodes with free semantics) that are related to unsupported alloc nodes, we can be sure that the program will not be "worse" than before with respect to these buffers. |
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
306 | Nit: Remove the { }. | |
313 | Aren't there two cases here? If this immediate alias is not dominated, then a copy is inserted, the alias turns effectively into an allocation and also needs to be processed. If we do not insert a copy, then the alias still has to be processed, using the original allocation as the block that needs to dominate. I think the second case is missing here. |
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
225 | Right, so we shouldn't need to emit an error here then? We can just skip here? |
Fixed corner cases in which no copies have been created due to aliasing (see reviewer comments).
Added additional test cases to simulate such special control-flow scenarios.
Changed behavior of the BufferPlacement pass to ignore Alloc operations with multiple allocation results.
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
188 | Nit: Finds -> Find? | |
287 | With this change, you consider all transitive aliases of a value. However, as soon as you insert a copy (by adding a blockarg to the list of blocks to free) that set would need updating, as you now have fewer transitive aliases. In you example, consider the case where there is another aliasing block (say b7) after b6. Then we would get a copy from b5 to b6, as b2 does not dominate b6. This would also take care of the block b7 after b6, as long as b6 dominates it. However, with the current approach, you would still check whether b2 dominates this additional block b7 and like insert a copy from b6 to b7. I think that, when adding a blockarg to the free list, you have to use that block as the dominating block (as it now becomes the allocating place) when processing aliases of the new freed blockarg. | |
381 | I do not understand what the block argument refers to here. | |
390 | Nit: delloc -> dealloc | |
mlir/test/Transforms/buffer-placement.mlir | ||
152 | Could you add all block numbers here. That makes it easier to follow where the br operations belong. |
mlir/test/Transforms/buffer-placement.mlir | ||
---|---|---|
60 | can this (and all other comments and also IR below) fit 80 chars? |
Updated processing of aliases to avoid the aggressive insertion of copy operations in special cases.
Reformatted test cases.
Thanks. This looks correct to me now.
You mentioned offline that you added a test for the case that was not covered before. I did not find it in this CL though.
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
305 | You could also populate toProcess here and then just have the loop below. This works for me too, though. |
Resolved merge conflicts and integrated an additional test case into an existing one in order to capture an additional special case.
mlir/lib/Transforms/BufferPlacement.cpp | ||
---|---|---|
305 | This would also work. However, we would prefer the more explicit version to help others understand the code more easily. |
Incomplete sentence to ensure that all buffers