This adds a transformation to tile reduction operations to partial
reduction using scf.foreachthread. This uses
PartialReductionOpInterface to create a merge operation of the partial
tiles.
Details
Diff Detail
Event Timeline
Generally looks good, but have we tried refactoring enough?
| mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td | ||
|---|---|---|
| 726 | number number ? | |
| 727 | nit: period + newline to better align the sentences. | |
| mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h | ||
| 456 | Can we make this singular? There should only be one such (multi-)loop. | |
| mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp | ||
| 1189 | Very important, on silenceable failure and success paths please always set all results or it will be hard to debug in Suppress error mode. E.g. a bit higher in this file is an example: // If nothing to fuse, propagate success.
if (producerOps.empty()) {
results.set(getFusedOp().cast<OpResult>(),
SmallVector<mlir::Operation *>{});
return DiagnosedSilenceableFailure::success();
} | |
| 1190 | Please see https://reviews.llvm.org/D136072 (and save it somewhere), we want to use emitSilenceableFailure + twine as much as possible everywhere from now on. | |
| mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp | ||
| 414 | This seems very long given that we already have mlir::scf::tileReductionUsingScf (in a separate file) and tileToForeachThreadOpImpl (which this impl does not reuse). Have you tried reusing enough ? | |
| 435 | don't reimplement linalgOp. getReductionDims (ish) | |
| 475 | can this be lifted and become a notifyMatchFailure rather than a hard error that leaves the IR in a modified (an invalid?) state? | |
| mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp | ||
|---|---|---|
| 1189 | Ok thanks for pointing that out, I believe it should match the right flow now. | |
| mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp | ||
| 414 | Yes I couldn't find a way to isolate more helpers besides calculateTileOffsetsAndSizes as the rest of the processing is slightly different than what is in tileToForeachThreadOpImpl | |
Skimmed through. +1 for using getTiledImplementation! I can review in more detail if needed....
number number ?