This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] linalg.tiled_loop peeling
ClosedPublic

Authored by springerm on Aug 17 2021, 10:47 PM.

Details

Summary

Split linalg.tiled_loop ops into two ops: One loop where the step size divides the iteration space evenly and another loop for the remaining iteration. This pattern is similar to the scf.for loop peeling pattern and reuses that pattern's affine.min op canonicalization functionality.

Depends On D108009

Diff Detail

Event Timeline

springerm created this revision.Aug 17 2021, 10:47 PM
springerm requested review of this revision.Aug 17 2021, 10:47 PM
springerm retitled this revision from [mlir][linalg] linalg.tiled_loop peeling Split linalg.tiled_loop ops into two ops: One loop where the step size divides the iteration space evenly and another loop for the remaining iteration. This pattern is similar to the scf.for loop peeling... to [mlir][linalg] linalg.tiled_loop peeling.Aug 17 2021, 10:48 PM
springerm edited the summary of this revision. (Show Details)
mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
652

side note: we tend to try and move away from unsigned as it creates a bunch of unnecessary issues in a bunch of places.
@mehdi_amini -> int64_t ?

665

1l ?

672

How about renaming lb, ub, step to lbVal, ubVal, stepVal and writing the AffineExpr in a more intuitive way?

AffineExpr lb, ub, step;
bindDims(b.getContext(), lb, ub);
bindSymbols(b.getContext(), step;
auto modMap = AffineMap::get(3, 0, {ub - ((ub - lb) % step)});

Note that your AffineExpr is illegal as dim divides dim; which makes me wonder how your tests pass?

675

Please check against the degenerate case as I would hope that the mod could be simplified to 0 in certain cases.

springerm updated this revision to Diff 367678.Aug 19 2021, 6:53 PM
springerm marked an inline comment as done.

address comments

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
665

I vagely remember some awkwardness with integer literals. 1l is long or sth., but depending on the architecture that may be 64-bit or sth else. So I thought the static cast may be safer?

672

Btw parts of this is copied from the SCF loop peeling pattern. (But they are too different to share the same impl.)

675

The above checks should already handle this:

// No specialization necessary if step already divides upper bound evenly.
if (lbInt && ubInt && stepInt && (*ubInt - *lbInt) % *stepInt == 0)
  return failure();
// No specialization necessary if step size is 1.
if (stepInt == static_cast<int64_t>(1))
  return failure();
springerm marked an inline comment as done.Aug 19 2021, 6:53 PM
herhut added inline comments.Aug 26 2021, 2:24 AM
mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
654

nit: should this be moved down to where the rewriting starts?

mlir/lib/Dialect/SCF/Transforms/LoopSpecialization.cpp
339

nit: this comment seems of. scf.for is also a loop.

mlir/test/Dialect/Linalg/tiled-loop-peeling.mlir
40

I don't understand where the nesting of loops comes from.

mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp
617

Could we also have a mode where the peeled off loop is not rewritten at all? Currently, the peeled off loop gets further peeled along other dimensions.

The idea being that, if it is a boundary anyway, we might avoid specializing further in the interest of code size.

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
675

I'd expect b.createOrFold<AffineApplyOp> can (or will be able to) also simplify in symbolic cases.
The check you have currently only works for the static case.

I'd think you could drop the 2 early-exit cases above and just check createOrFold against the 0 constant.
It would potentially introduce an extra constant 0 : index but simplify the code.
I think I would take that tradeoff.

springerm marked 5 inline comments as done.Sep 1 2021, 9:01 PM
springerm added inline comments.
mlir/test/Dialect/Linalg/tiled-loop-peeling.mlir
40

Good catch. This test case was wrong. And the scary thing is that it passed because it matched ops from the following test case. (Now fixed by using RUN: ...-LABEL:.

mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp
617

I'm adding this in a separate commit if that's OK.

springerm updated this revision to Diff 370154.Sep 1 2021, 9:01 PM
springerm marked 2 inline comments as done.

address comments

tpopp added a subscriber: tpopp.Sep 2 2021, 1:07 AM
tpopp added a comment.Sep 3 2021, 8:44 AM

Drive by comment

mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
676–677

This needs to be updated some for memrefs I believe. When the loop op is bufferized, the loop op will have "outputs" but no "results" such as

linalg.tiled_loop (%i) = (%c0) to (%c24) step (%c4)
        ins(%lhs, %rhs : memref<24x64xi8>, memref<24x64xi8>)
        outs(%out : memref<24x64xi8>)
        iterators("parallel")
        distribution("block_x") {
mlir/test/Dialect/Linalg/tiled-loop-peeling.mlir
138–147

This example gets unlucky with the current TiledLoop semantics. If there were an "outs" specified, I believe this would fail as the final peel after peeling would have "outs" be empty. (see my other comment)

springerm updated this revision to Diff 370782.Sep 4 2021, 10:08 PM
springerm marked an inline comment as done.

address comments

springerm marked an inline comment as done.Sep 4 2021, 10:15 PM
springerm added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Loops.cpp
676–677

Nice catch

herhut accepted this revision.Sep 6 2021, 4:48 AM

Thanks!

This revision is now accepted and ready to land.Sep 6 2021, 4:48 AM
This revision was landed with ongoing or failed builds.Sep 6 2021, 5:50 PM
This revision was automatically updated to reflect the committed changes.