Page MenuHomePhabricator

[mlir][linalg] Improve the padding packing loop computation.
ClosedPublic

Authored by gysit on Oct 28 2021, 6:11 AM.

Details

Summary

The revision updates the packing loop search in hoist padding. Instead of considering all loops in the backward slice, we now compute a separate backward slice containing the index computations only. This modification ensures we do not add packing loops that are not used to index the packed buffer due to spurious dependencies. One instance where such spurious dependencies can appear is the extract slice operation introduced between the tile loops of a double tiling.

Depends On D112412

Diff Detail

Event Timeline

gysit created this revision.Oct 28 2021, 6:11 AM
gysit requested review of this revision.Oct 28 2021, 6:11 AM
gysit added inline comments.Oct 28 2021, 6:16 AM
mlir/lib/Dialect/Linalg/Transforms/HoistPadding.cpp
171

The assertion triggers for the double tiling test case since an affine min computation in the body of the outermost loop actually is in front of the loop. This is OK since they are not dependent meaning both orders are valid topological sorts.

Additionally, I think it may be possible that the outermost loop is not in the backward slice (no data dependency) and we may still want to hoist out of it.

gysit updated this revision to Diff 383443.Oct 29 2021, 11:47 AM

Rename IndexEdge to IndexEdges.

gysit updated this revision to Diff 384693.Nov 4 2021, 3:09 AM

Improve tests and comments.

nicolasvasilache accepted this revision.Nov 4 2021, 6:58 AM

Let's proceed with this modulo refactoring and at least trying to implement a check re side-effect that would work correctly in the scf::ForOp slice computation case.

mlir/lib/Dialect/Linalg/Transforms/HoistPadding.cpp
226–227

seems this has reached a natural limit and would benefit from its own function ?

227

A loop

231

sounds even more like an interface for slicing

233

Could you rephrase since and with the exception of ?

234

can we have some comment on what this represents ?
If this is to better filter out the fact that the backward slice is too coarse and we want finer control over which values used in scf::For should propagate of not please explain do in the broader context of proper documentation about the behavior of the slice algorithm that you are using.

241

A bit more contained and precise explanation + an example in pseudoIR would be useful here.

The backward slice computation algorithm takes a filter function that ... 
This has the effect of ..
When applied to ... turns all enclosing loops into packing loops regardless of whether they are actually used

etc

Example:
for i
  for j
    for k
      ..
247

are these actual assertions that can never happen or failure conditions where we should abort the transformation?

260

The way you are using this seems to hint at an interface for performing slice analysis that would be more precise.
The trickiness seems related to ops that contain other ops with side effects.

Is the restriction you are making here valid in the presence of say load/store on memref within the scf::ForOp ?
Maybe that is part of what needs to be computed here: is there a nested op with side effects (which in practice we know doesn't happen in the case we are interested in but we need to be conservative).

This revision is now accepted and ready to land.Nov 4 2021, 6:58 AM
gysit added inline comments.Nov 4 2021, 10:50 AM
mlir/lib/Dialect/Linalg/Transforms/HoistPadding.cpp
260

@nicolasvasilache
After reading this I think we should consider switching to an explicit model where we encode the ops that can appear in the backward slice and how we handle them. For example, if the induction variable of a for op is part of the backward slice we only add the bounds and the step to the backward slice but not the iteration argument which is usually completely unrelated.

For the ForOp the fact that it has a body and side effects is not the only problem. It is more the fact that it has a complex semantic and we are not interested in the iteration argument part. For example, an hypothetical cast_tensor_and_affine_apply op would also be problematic and would require special treatment depending on which of its results is found in the backward slice. I.e. we either continue the backward slice computation with the cast or the apply specific operands.

Loads and stores on memrefs are a different beast. They would usually not appear in the backward slice but can in a later point in time indeed make hoisting invalid. We can probably handle this by adding a check what operations may be part of the hoisted / cloned loop nest?

gysit updated this revision to Diff 385062.Nov 5 2021, 7:18 AM

Address comments and use a more fail safe strategy.

gysit updated this revision to Diff 385408.Nov 8 2021, 12:10 AM
gysit marked an inline comment as done.

Rebase and minor fixes.

nicolasvasilache accepted this revision.Nov 8 2021, 1:12 AM
This revision was automatically updated to reflect the committed changes.