The current BufferPlacement transformation contains several concepts for hoisting allocations. However, more advanced hoisting techniques should not be integrated into the BufferPlacement transformation. Hence, this CL refactors the current BufferPlacement pass into two separate pieces: BufferDeallocation and BufferHoisting. Moreover, it extends the hoisting functionality by allowing to move allocations out of loops.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Some first comments.
mlir/include/mlir/Transforms/BufferPlacement.h | ||
---|---|---|
296 ↗ | (On Diff #292173) | Could this just be a tuple of Value, Block and Operation without any associated logic? |
354 ↗ | (On Diff #292173) | Why is this called Operation? If it is a transformation, use Transformation. Or maybe Pass? |
366 ↗ | (On Diff #292173) | Could this be a static function? It does not need to be an inherited member. |
382 ↗ | (On Diff #292173) | Do all passes that inherit this really need all the below information? |
mlir/lib/Transforms/BufferDeallocation.cpp | ||
250 | Why do we need to store the placementBlock here? Does this ever need to be updated? As far as I can see, it is always the block where the corresponding alloc was initially. Would allocValue->getDefiningOp()->getBlock() not always do the same? | |
264 | This logic could be moved to the use site or is this reused by anything other than the 'BufferPlacementHoisting' pass? | |
mlir/lib/Transforms/BufferOptimizations.cpp | ||
31 | Nit: Hosting Maybe also drop the Placement and instead call it BufferAllocationHoisting? | |
78 | This can only be done if the alloc you are moving is loop invariant. This is ensured by the calling context but not clear from this code. Also, you could query the LoopLikeOpInterface as to whether the operands are loop independent. This also is not an optimization of the loop is never executed. What happens if the allocated buffer escapes the loop over a backedge? Something like %0 = alloc scf.for ... init(%alloc) { bb0(%arg0...): %1 = alloc <update %1 using %arg0> yield %1 } I think with your rewrite, you would allocate %1 once and from the second iteration, the two buffers would now alias. |
mlir/include/mlir/Transforms/BufferPlacement.h | ||
---|---|---|
277 ↗ | (On Diff #292173) | nit: I would add begin() just for consistency. In that case you would not need to explain why you only have end() like you do now. |
310–317 ↗ | (On Diff #292173) | nit: move constructors to the top of the struct. Also: why do you need them to be explicit? |
mlir/include/mlir/Transforms/Passes.td | ||
126 | +1 for formatting | |
mlir/lib/Transforms/BufferDeallocation.cpp | ||
268 | const LivenessBlockInfo& livenessInfo = *liveness.getLiveness(placementBlock); | |
mlir/lib/Transforms/BufferOptimizations.cpp | ||
2 | nit: remove empty line | |
58 | you don't really need Block* placementBlock line. The code is shorter without this tmp variable. if (operands.size() < 1) { // If not, we have to find the common dominator of all aliases and move // the allocation out of nested loops. auto resultAliases = aliases.resolve(alloc); allocEntry.placementBlock = findCommonDominator(alloc, resultAliases, dominators); moveOutOfLoop(allocEntry.placementBlock); return; } // If this node has dependencies, check all dependent nodes with respect // to a common post dominator in which all values are available. ValueSetT dependencies(++operands.begin(), operands.end()); allocEntry.placementBlock = findCommonDominator(*operands.begin(), dependencies, postDominators); | |
68 | std::next(operands.begin()) |
Addressed reviewer comments and changed the BufferAllocationHoisting pass to pay attention to escaping buffers via backedges.
mlir/lib/Transforms/BufferDeallocation.cpp | ||
---|---|---|
264 | This logic will be used by other passes in the near future - in one of the upcoming CLs :) |
Mostly minor comments on readability.
mlir/include/mlir/Transforms/BufferPlacement.h | ||
---|---|---|
313–314 ↗ | (On Diff #295278) | Could this comment be improved to document the return value better? A single Operation * is returned while "associated deallocs" is mentioned. |
mlir/include/mlir/Transforms/Passes.h | ||
31–41 | Add a few more words to this comment (like the one below) - it's almost stating the obvious. | |
mlir/include/mlir/Transforms/Passes.td | ||
105–107 | This is a welcome change, thanks! | |
mlir/lib/Transforms/BufferOptimizations.cpp | ||
10–12 | Nit: use width | |
35 | Avoid auto. | |
59 | Missing assert for getDefiningOp. | |
62 | !operands.empty() - less expensive in general and more idiomatic. | |
109 | Nit: analysis -> info | |
111 | analysis -> info | |
127 | Nit: Terminate with period. |
Much nicer!
mlir/include/mlir/Transforms/Passes.h | ||
---|---|---|
39 | Does it reduce copies? I though the goal was to avoid re-allocations inside of loops and instead share a single buffer for the whole loop? | |
mlir/lib/Transforms/BufferDeallocation.cpp | ||
379 | nit: This no longer finds the initial allocation block. | |
415–416 | Maybe clarify this a bit. It ensures that all allocs in the program have a corresponding de-allocation. As a side-effect, it might also introduce copies (which lead to allocs again). | |
mlir/lib/Transforms/BufferOptimizations.cpp | ||
45 | It is not clear to me why this is computed first for all allocations here and then there is a second round in hoist() that applies this? Once could equally well compute one placement and then host the alloc, avoiding creating intermediate state. So in essence, move the loop outwards and do foreach alloc : allocs { computePos; hoist; } | |
51 | nit: common | |
59 | all values this allocation depends on are available? | |
90 | Does this address my example? This moves up the blocks until all aliases are covered. So in my example, if there was another aliasing on some outer level, it would still break the semantics, no? | |
109 | This moves within one region level, correct? | |
119 | I would spell this out for readability: state.isLegalPlacementOp(parentOp) | |
124 | recordPotentialPlacement as this does not move? | |
148 | The goal here is to move the alloc high enough to cover all aliases, right? Moving it any higher will not avoid copies but only increase memory pressure. This should be covered here. isa<LoopLikeOpInterface> is a very weak condition. There might be loops that do not implement this. So a better way to phrase this would be that the operation also does implement the RegionBranchOpInterface and has no back-edges between regions. Otherwise it is a loop and hoisting out of loops is illegal. | |
161 | So this will walk all the way up and if it finds a loop anywhere during that walk it will hoist. This also means hoisting out of conditionals (might be bad) and through unknown region based control flow. This should only hoist out of operations that implement LoopLikeOpInterface and only if the allocation does not escape that loop. | |
mlir/test/Transforms/buffer-loop-hoisting.mlir | ||
2 | It is not clear to me why these tests are in this file. Not all of them are concerned with loops. | |
4 | BufferLoopHoisting here and below? | |
87 | This comment does not describe what is tested. In the test, nothing is moved. | |
177 | Why is this legal to do? %3 escapes this loop on the back-edge and result. So if there was another use of iterBuf after the alloc, this would introduce an aliasing of buffers that did not exist before. In my mind, it is only legal to hoist an allocation out of the loop if it does not escape the loop otherwise. | |
205 | Two issues
|
Refactored code and addressed reviewer comments.
mlir/lib/Transforms/BufferOptimizations.cpp | ||
---|---|---|
148 |
The function findPlacementBlock does not move the placement block above the dominatorBlock, which has been determined before. Since the dominatorBlock represents the immediate common dominator of all aliases (while taking potential dependencies into account), it cannot happen that the memory pressure is significantly increased because the allocation will not be moved further.
I agree that we might want to capture these cases, as well. | |
161 | The current implementation behaves differently, however, if we want to ensure that allocation does not escape the loop in all cases, we have to restructure the implementation slightly. | |
mlir/test/Transforms/buffer-loop-hoisting.mlir | ||
2 | It is not required to have all of these tests. However, out intention was to ensure that the allocations are *not* moved in tests without any loops. |
mlir/lib/Transforms/BufferOptimizations.cpp | ||
---|---|---|
207 | nit: the one -> one | |
259 | Please fix. | |
mlir/test/Transforms/buffer-hoisting.mlir | ||
5 | nit: to their | |
mlir/test/Transforms/buffer-loop-hoisting.mlir | ||
13 | It does not move it, right? | |
223 | This is no longer the same. | |
259 | Is is hard to see what the actual structure is that this is checking. | |
311 | Can you add small tests that check simple cases:
That likely makes it easier to write CHECK patterns. |
Does it reduce copies? I though the goal was to avoid re-allocations inside of loops and instead share a single buffer for the whole loop?