This is an archive of the discontinued LLVM Phabricator instance.

[mlir][TilingInterface] Fix `iter_args` handling in tile (and fuse).
ClosedPublic

Authored by mravishankar on Sep 21 2022, 11:11 PM.

Details

Summary

The current approach for handling iter_args was to replace all uses
of the value that is used as init value with the corresponding
region block argument within the scf.for. This is not always
correct. Instead a more deliberate approach needs to be taken to
handle these. If the slice being fused represents a slice of the
destination operand of the untiled op, then

  • Make the destination of the fused producer the init value of the loop nest
  • For the tiled and fused producer op created, replace the slice of the destination operand with a slice of the corresponding region iter arg of the innermost loop of the generated loop nest

Event Timeline

mravishankar created this revision.Sep 21 2022, 11:11 PM
mravishankar requested review of this revision.Sep 21 2022, 11:11 PM
mravishankar added inline comments.
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
191

@nicolasvasilache responding to some of the comments from https://reviews.llvm.org/D134144 here.

  1. This is just moved here from below.
  2. I forgot that all the results of the tiled operation need to be yielded the same time for things to work correctly. The reason being that to get the tile offset and size of the result computed you need to use the offset and size of the iteration space. If you just yield one result of the tiled operation at a time, after the first yield, the offset of the iteration space (which depends on the induction variable of the loop) becomes invalid, but is still referenced in the offset computation. So you need to first compute the resultTileOffsets and resultTileSizes` of all the results before you yield them. Note that this is only an issue for returning the tiled results for the consumer (or the first tiled op). After that, if you fuse a producer and need to yield the value of the tiled producer, you have the resultTileOffset and resultTileSizes from the slice op that is used. Since after the yield the IR is in a consistent state, processing other slices doesnt have an issue. So this "batch processing of yields" only needs to happen for the tiled operation.. In any case, this method is just "move" of code from below. This is what was always done (I just had forgotten this).
  3. You left a comment about front-loading the use of SubsetExtractOpInterface. I think that is a larger change that has to happen as a generalization of the whole TilingInterface. That probably needs to be done later. Also note that for just the tiling, you dont have slices. So we need a different container for the offsets and sizes that does not need the op itself. So there are things to be handled here w.r.t to SubsetExtractOpInterface` that are sort of unrelated to this change. This is really fixing a bug.

Much better commit message with an IR example that points the error and the solution.
This is all too abstract and in your head right now, this is not a "simple fix" by any stretch of the definition.

Please address the comments first, name things better, add significant doc and remove all unnecessary complexity / code motion that you still can.

Once done I'll spend another hour trying to review this.

Feel free to add another reviewer that may have a better ability to grok where this is going.

mlir/include/mlir/Dialect/SCF/Utils/Utils.h
69

incomplete sentence.

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
191
  1. this move creates significant reviewcomplexity, can it be undone?

Is the objective to just add /*replaceIterOperandsUsesInLoop =*/false ?
If so, it will be much easier to just do without moving the code and reducing the review complexity.

Then you can refactor as part of the NFC.

I am unable to tell whether

for (auto tiledResult : llvm::enumerate(tilingResult.tiledOp->getResults())) {
  if (failed(op.getResultTilePosition(
          rewriter, tiledResult.index(), offsets, sizes,
          resultTileOffsetsList[tiledResult.index()],
          resultTileSizesList[tiledResult.index()]))) {
    return rewriter.notifyMatchFailure(
        op, "unable to get insertion position of tiled result");
  }
}

+
changes in the function is NFC compared to:

for (auto resultNum : llvm::seq<unsigned>(0, op->getNumResults())) {
      SmallVector<OpFoldResult> resultTileOffsets, resultTileSizes;
      if (failed(op.getResultTilePosition(b, resultNum, offsets, sizes,
                                          resultTileOffsets,
                                          resultTileSizes))) {
        op.emitOpError("unable to get position of result ")
            << resultNum << " of the tiled implementation";
        return {};
      }

If the answer is yes, please make it easier to accept your PR by avoiding complex refactorings mixed with non-trivial functional changes: leave the lambda where it was and just add /*replaceIterOperandsUsesInLoop =*/false.

If the answer is no, we need significantly more comments here to understand why not.

390

Treating all the code above this line as unnecessary change and assuming that adding`/*replaceIterOperandsUsesInLoop =*/false.` is the only useful change.

415

this functions needs a significantly better name that is descriptive of what it does.
walkBackXAndReturnY or something

478

you can now auto [x, y] = ...

498–499
At this point, if the slice is for a destination operand, the IR is ill-formed because XXX (with a clear example with multiple loops and clear pointing out where in the IR the problem is and why).

Problematic IR Example

To correct this, we need to update Y and Z as follows:
  - iter_arg ...
  - destination of the slice ...

Corrected IR example.
mlir/lib/Dialect/SCF/Utils/Utils.cpp
131

Comments and examples needed here, I have no idea why this post-hoc patchup is happening here and why.
Can this be achieved in a more idiomatic fashion, without post-hoc patchup, by e.g. splitting replaceLoopWithNewYields in multiple parts?

Remove code movements that are NFC to reduce diff.

mravishankar marked 3 inline comments as done.

Address comments.

mravishankar added inline comments.Sep 22 2022, 3:08 PM
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
191

Inlined the function. The comment doesnt point to where the code is now, but it is just that change.

415

Went with getUntiledProducerFromSliceSource.

478

TIL....

mlir/lib/Dialect/SCF/Utils/Utils.cpp
131

I think this can be simplified. Instead of going bottom up from inner to outer, we can try making this go top down, from outer to inner... I can do this after the fact. Maybe not add to the patch right now?

nicolasvasilache accepted this revision.Sep 23 2022, 4:09 AM

Thanks Mahesh, much nicer an easier to follow!
Approving conditioned on the top-down/bottom-up/middle-out cleanup arriving very soon.

mlir/include/mlir/Dialect/SCF/Utils/Utils.h
48

nit: double are replaced

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
170

nit: is in

191

Great, thanks for removing that cognitive barrier!

198

Plz add some TODO about SubsetExtractOpInterface

415

Thanks!

520

Could you please highlight

%arg0 = %1 /*incorrect value*/)

and

tensor.extract_slice %0 /* incorrect value */

?

520

Could you please highlight

%arg0 = %0 /*now this is correct*/)
and

tensor.extract_slice %arg1 /* now this is correct */
?

547

can we use (or add) some helper function to avoid magic indexings in clients of this op (there should already be something available).

mlir/lib/Dialect/SCF/Utils/Utils.cpp
131

ok, but let's please do it as part of this list of patches.
Before or after the NFC is fine, but please don't let this linger, approving conditioned on that cleanup arriving very soon.

This revision is now accepted and ready to land.Sep 23 2022, 4:09 AM
mravishankar marked 7 inline comments as done.

Rebase and address comments.

Rebase and address comments.

This revision was landed with ongoing or failed builds.Sep 26 2022, 12:15 PM
This revision was automatically updated to reflect the committed changes.