This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Add tile_size option to `structured.tile_to_foreach_thread_op`
ClosedPublic

Authored by christopherbate on Jul 19 2022, 7:43 PM.

Details

Summary

This change modifies structured.tile_to_foreach_thread_op so that
it accepts either tile_sizes or num_threads parameters. If
tile_sizes are specified, then the number of threads required is
derived the tile sizes rather than the other way around. In both cases,
more aggressive folding of loop parameters is enabled during the
transformation, allowing for the potential elimination of affine.min
and affine.max operations in the static shape case when calculating
the final adjusted tile size.

Diff Detail

Event Timeline

christopherbate requested review of this revision.Jul 19 2022, 7:43 PM
mlir/lib/Dialect/Affine/IR/AffineOps.cpp
725

Without this change an error will be produced because we give an OpFoldResult vector that comes directly from I64ArrayAttr that belongs to an op attribute. Explicitly convert to index type.

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
252–253

You can't use RewriterBase in the body-creation lambda, so I moved to the non-lambda creation form and manually move the insertion point below.

nicolasvasilache accepted this revision.Jul 20 2022, 1:13 AM

Thanks much!

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
642

This should say 'num_threads', also 0 is not a valid num_threads

663–664

Ah, very cool, I didn't realize the custom assembly format allows ternary expressions now.

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
725

can you add this as a comment in the code to explain why the cast?

mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp
195

nit: typo/grammo.

252–253

Yes, this is annoying and I had the same issue recently.
Another possibility is to explicitly cast the OpBuilder as a RewriterBase inside the lambda when you control the call site, you can do that here if you prefer (fine to leave as is).

In any case, can you please add a comment explaining this?

307

Making this discovery more powerful is going to be painful with SSA values.
OTOH we know by construction in the "tile_size case" that we don't need the max.
I would just add a bool passed at the call site, true for the "tile_size case" and false for the "num_threads case".
Then this discovery can refine the "num_threads case".

mlir/test/Dialect/Linalg/tile-to-foreach-thread.mlir
47

Nice test case!

This revision is now accepted and ready to land.Jul 20 2022, 1:13 AM
christopherbate marked 4 inline comments as done.

Address comments.

christopherbate marked 2 inline comments as done.Jul 20 2022, 12:36 PM
christopherbate added inline comments.
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
642

also 0 is not a valid num_threads

But in the doc that you wrote above, you state

Zero tile sizes indicate that the dimension is not tiled, and can be thought of as tiling by the full size of data.

I had assumed you meant that 0 is a sentinel value indicating to skip that dimension, regardless of whether it is specified in num_threads or tile_size. If you specify num_threads then the derived tile size can not be zero.

Otherwise you can't handle ops that have a reduction dimension that appears before a parallel dimension that you would like to tile.

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
642

You're right, I confused myself over nothing, please ignore.