This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Refactored BufferPlacement transformation into BufferDeallocation and BufferHoisting.
ClosedPublic

Authored by dfki-mako on Sep 16 2020, 3:51 AM.

Details

Summary

The current BufferPlacement transformation contains several concepts for hoisting allocations. However, more advanced hoisting techniques should not be integrated into the BufferPlacement transformation. Hence, this CL refactors the current BufferPlacement pass into two separate pieces: BufferDeallocation and BufferHoisting. Moreover, it extends the hoisting functionality by allowing to move allocations out of loops.

Diff Detail

Event Timeline

dfki-mako created this revision.Sep 16 2020, 3:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2020, 3:51 AM
dfki-mako requested review of this revision.Sep 16 2020, 3:51 AM

Some first comments.

mlir/include/mlir/Transforms/BufferPlacement.h
296 ↗(On Diff #292173)

Could this just be a tuple of Value, Block and Operation without any associated logic?

354 ↗(On Diff #292173)

Why is this called Operation? If it is a transformation, use Transformation. Or maybe Pass?

366 ↗(On Diff #292173)

Could this be a static function? It does not need to be an inherited member.

382 ↗(On Diff #292173)

Do all passes that inherit this really need all the below information?

mlir/lib/Transforms/BufferDeallocation.cpp
250

Why do we need to store the placementBlock here? Does this ever need to be updated? As far as I can see, it is always the block where the corresponding alloc was initially. Would allocValue->getDefiningOp()->getBlock() not always do the same?

264

This logic could be moved to the use site or is this reused by anything other than the 'BufferPlacementHoisting' pass?

mlir/lib/Transforms/BufferOptimizations.cpp
31

Nit: Hosting

Maybe also drop the Placement and instead call it BufferAllocationHoisting?

78

This can only be done if the alloc you are moving is loop invariant. This is ensured by the calling context but not clear from this code. Also, you could query the LoopLikeOpInterface as to whether the operands are loop independent.

This also is not an optimization of the loop is never executed.

What happens if the allocated buffer escapes the loop over a backedge? Something like

%0 = alloc
scf.for ... init(%alloc) {
bb0(%arg0...):
  %1 = alloc
  <update %1 using %arg0>
 yield %1
}

I think with your rewrite, you would allocate %1 once and from the second iteration, the two buffers would now alias.

pifon2a added inline comments.Sep 21 2020, 1:44 AM
mlir/include/mlir/Transforms/BufferPlacement.h
277 ↗(On Diff #292173)

nit: I would add begin() just for consistency. In that case you would not need to explain why you only have end() like you do now.

310–317 ↗(On Diff #292173)

nit: move constructors to the top of the struct. Also: why do you need them to be explicit?

mlir/include/mlir/Transforms/Passes.td
126

+1 for formatting

mlir/lib/Transforms/BufferDeallocation.cpp
268

const LivenessBlockInfo& livenessInfo = *liveness.getLiveness(placementBlock);

mlir/lib/Transforms/BufferOptimizations.cpp
2

nit: remove empty line

58

you don't really need Block* placementBlock line. The code is shorter without this tmp variable.

if (operands.size() < 1) {
  // If not, we have to find the common dominator of all aliases and move
  // the allocation out of nested loops.
  auto resultAliases = aliases.resolve(alloc);
  allocEntry.placementBlock = findCommonDominator(alloc, resultAliases, dominators);
  moveOutOfLoop(allocEntry.placementBlock);
  return;
} 
// If this node has dependencies, check all dependent nodes with respect
// to a common post dominator in which all values are available.
ValueSetT dependencies(++operands.begin(), operands.end());
allocEntry.placementBlock =
    findCommonDominator(*operands.begin(), dependencies, postDominators);
68

std::next(operands.begin())

pifon2a requested changes to this revision.Sep 21 2020, 1:44 AM
This revision now requires changes to proceed.Sep 21 2020, 1:44 AM
dfki-mako updated this revision to Diff 295278.Sep 30 2020, 7:37 AM
dfki-mako marked 14 inline comments as done.

Addressed reviewer comments and changed the BufferAllocationHoisting pass to pay attention to escaping buffers via backedges.

dfki-mako added inline comments.Sep 30 2020, 7:39 AM
mlir/lib/Transforms/BufferDeallocation.cpp
264

This logic will be used by other passes in the near future - in one of the upcoming CLs :)

bondhugula requested changes to this revision.Oct 13 2020, 12:50 AM
bondhugula added a subscriber: bondhugula.

Mostly minor comments on readability.

mlir/include/mlir/Transforms/BufferPlacement.h
313–314 ↗(On Diff #295278)

Could this comment be improved to document the return value better? A single Operation * is returned while "associated deallocs" is mentioned.

mlir/include/mlir/Transforms/Passes.h
31–41

Add a few more words to this comment (like the one below) - it's almost stating the obvious.

mlir/include/mlir/Transforms/Passes.td
105–107

This is a welcome change, thanks!

mlir/lib/Transforms/BufferOptimizations.cpp
10–12

Nit: use width

35

Avoid auto.

59

Missing assert for getDefiningOp.

62

!operands.empty() - less expensive in general and more idiomatic.

109

Nit: analysis -> info

111

analysis -> info

127

Nit: Terminate with period.

This revision now requires changes to proceed.Oct 13 2020, 12:50 AM
dfki-mako updated this revision to Diff 297821.Oct 13 2020, 4:24 AM
dfki-mako marked 10 inline comments as done.

Refined all parts of this PR and addressed all reviewer comments.

herhut requested changes to this revision.Oct 13 2020, 5:37 AM

Much nicer!

mlir/include/mlir/Transforms/Passes.h
39

Does it reduce copies? I though the goal was to avoid re-allocations inside of loops and instead share a single buffer for the whole loop?

mlir/lib/Transforms/BufferDeallocation.cpp
379

nit: This no longer finds the initial allocation block.

415–416

Maybe clarify this a bit. It ensures that all allocs in the program have a corresponding de-allocation. As a side-effect, it might also introduce copies (which lead to allocs again).

mlir/lib/Transforms/BufferOptimizations.cpp
45

It is not clear to me why this is computed first for all allocations here and then there is a second round in hoist() that applies this? Once could equally well compute one placement and then host the alloc, avoiding creating intermediate state.

So in essence, move the loop outwards and do

foreach alloc : allocs {
  computePos;
  hoist;
}
51

nit: common

59

all values this allocation depends on are available?

90

Does this address my example? This moves up the blocks until all aliases are covered. So in my example, if there was another aliasing on some outer level, it would still break the semantics, no?

109

This moves within one region level, correct?

119

I would spell this out for readability: state.isLegalPlacementOp(parentOp)

124

recordPotentialPlacement as this does not move?

148

The goal here is to move the alloc high enough to cover all aliases, right? Moving it any higher will not avoid copies but only increase memory pressure. This should be covered here.

isa<LoopLikeOpInterface> is a very weak condition. There might be loops that do not implement this. So a better way to phrase this would be that the operation also does implement the RegionBranchOpInterface and has no back-edges between regions. Otherwise it is a loop and hoisting out of loops is illegal.

161

So this will walk all the way up and if it finds a loop anywhere during that walk it will hoist. This also means hoisting out of conditionals (might be bad) and through unknown region based control flow.

This should only hoist out of operations that implement LoopLikeOpInterface and only if the allocation does not escape that loop.

mlir/test/Transforms/buffer-loop-hoisting.mlir
2

It is not clear to me why these tests are in this file. Not all of them are concerned with loops.

4

BufferLoopHoisting here and below?

87

This comment does not describe what is tested. In the test, nothing is moved.

177

Why is this legal to do? %3 escapes this loop on the back-edge and result. So if there was another use of iterBuf after the alloc, this would introduce an aliasing of buffers that did not exist before.

In my mind, it is only legal to hoist an allocation out of the loop if it does not escape the loop otherwise.

205

Two issues

  • the then case might be rare, in which case this is not an improvement
  • the allocation escapes
This revision now requires changes to proceed.Oct 13 2020, 5:37 AM
dfki-mako updated this revision to Diff 298135.Oct 14 2020, 6:06 AM
dfki-mako marked 17 inline comments as done.

Refactored code and addressed reviewer comments.

mlir/lib/Transforms/BufferOptimizations.cpp
148

The goal here is to move the alloc high enough to cover all aliases, right? Moving it any higher will not avoid copies but only increase memory pressure. This should be covered here.

The function findPlacementBlock does not move the placement block above the dominatorBlock, which has been determined before. Since the dominatorBlock represents the immediate common dominator of all aliases (while taking potential dependencies into account), it cannot happen that the memory pressure is significantly increased because the allocation will not be moved further.

isa<LoopLikeOpInterface> is a very weak condition

I agree that we might want to capture these cases, as well.

161

The current implementation behaves differently, however, if we want to ensure that allocation does not escape the loop in all cases, we have to restructure the implementation slightly.

mlir/test/Transforms/buffer-loop-hoisting.mlir
2

It is not required to have all of these tests. However, out intention was to ensure that the allocations are *not* moved in tests without any loops.

herhut added inline comments.Oct 14 2020, 7:16 AM
mlir/lib/Transforms/BufferOptimizations.cpp
207

nit: the one -> one

259

Please fix.

mlir/test/Transforms/buffer-hoisting.mlir
5

nit: to their

mlir/test/Transforms/buffer-loop-hoisting.mlir
13

It does not move it, right?

223

This is no longer the same.

259

Is is hard to see what the actual structure is that this is checking.

311

Can you add small tests that check simple cases:

  • one alloc hoisted out of loop, one hoisted through multiple loops
  • alloc not hoisted out of conditional, alloc hoisted out of loop in conditional
  • alloc with dependencies not hoisted out of loop, alloc with dependencies in loop nest hoisted as far as possible.

That likely makes it easier to write CHECK patterns.

dfki-mako updated this revision to Diff 298590.Oct 16 2020, 3:46 AM
dfki-mako marked 7 inline comments as done.

Addressed reviewer comments.

herhut accepted this revision.Oct 16 2020, 3:58 AM
This revision was not accepted when it landed; it landed in state Needs Review.Oct 19 2020, 3:54 AM
This revision was automatically updated to reflect the committed changes.