This change modifies structured.tile_to_foreach_thread_op so that
it accepts either tile_sizes or num_threads parameters. If
tile_sizes are specified, then the number of threads required is
derived the tile sizes rather than the other way around. In both cases,
more aggressive folding of loop parameters is enabled during the
transformation, allowing for the potential elimination of affine.min
and affine.max operations in the static shape case when calculating
the final adjusted tile size.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/Affine/IR/AffineOps.cpp | ||
---|---|---|
725 | Without this change an error will be produced because we give an OpFoldResult vector that comes directly from I64ArrayAttr that belongs to an op attribute. Explicitly convert to index type. | |
mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp | ||
237 | You can't use RewriterBase in the body-creation lambda, so I moved to the non-lambda creation form and manually move the insertion point below. |
Thanks much!
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td | ||
---|---|---|
593 | This should say 'num_threads', also 0 is not a valid num_threads | |
612 | Ah, very cool, I didn't realize the custom assembly format allows ternary expressions now. | |
mlir/lib/Dialect/Affine/IR/AffineOps.cpp | ||
725 | can you add this as a comment in the code to explain why the cast? | |
mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp | ||
195 | nit: typo/grammo. | |
237 | Yes, this is annoying and I had the same issue recently. In any case, can you please add a comment explaining this? | |
292 | Making this discovery more powerful is going to be painful with SSA values. | |
mlir/test/Dialect/Linalg/tile-to-foreach-thread.mlir | ||
53 | Nice test case! |
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td | ||
---|---|---|
593 |
But in the doc that you wrote above, you state
I had assumed you meant that 0 is a sentinel value indicating to skip that dimension, regardless of whether it is specified in num_threads or tile_size. If you specify num_threads then the derived tile size can not be zero. Otherwise you can't handle ops that have a reduction dimension that appears before a parallel dimension that you would like to tile. |
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td | ||
---|---|---|
593 | You're right, I confused myself over nothing, please ignore. |
This should say 'num_threads', also 0 is not a valid num_threads