This revision adds a new op map_nested_foreach_thread_to_gpu_blocks to transform dialect.
If generate_gpu_launch argument is given, the op first generates gpu_launch. Otherwise, target must be gpu_launch. The op searches top level scf.foreach_threads inside the gpu_launch and distributes them with gpu.block_id attribute.
Loop mapping is explicit and given by the map_nested_foreach_thread_to_gpu_blocks op. Mapping is done one-to-one, therefore the loops disappear.
It also adds gpu dialect as dependent since the new op can create gpu::LaunchOp for given scf::ForeachThreadOp.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
It is debatable whether this transform should 1) create a gpu.launch from a scf.foreach_thread or 2) expect the scf.foreach_thread to be already nested within one.
The current implementation is 2) and is coming from IREE which has special constraints about what IR comes into existence when and how async is handled.
- seems better suited to upstream MLIR. Instead of
// Step 6. Erase old op. rewriter.eraseOp(foreachThreadOp);
you want to create a gpu::LaunchOp with the outlined body.
I.e. targetBlock = foreachThreadOp->getBlock(); should be the block of the gpu::LaunchOp.
Things will be synchronous for now and we will look into async launches later.
Most of the implementation should be factored out in a helper that we can reuse on the IREE side, as before.
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp | ||
---|---|---|
1218 | nit double comment |
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h | ||
---|---|---|
136 ↗ | (On Diff #461846) | This will break in various ways at link time when used from outside, please use a proper type / typedef. LoopNest mlir::scf::buildLoopNest( OpBuilder &builder, Location loc, ValueRange lbs, ValueRange ubs, ValueRange steps, ValueRange iterArgs, function_ref<ValueVector(OpBuilder &, Location, ValueRange, ValueRange)> bodyBuilder) { |
mlir/test/Dialect/Linalg/transform-gpu.mlir | ||
135 | nice! |
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp | ||
---|---|---|
1465 | I'm not familiar with transform ops, is this how this is done (rather than rewriter.notifyMatchFailure())? | |
1482–1483 | Would it be cleaner to clone the region with a block arg mapping so that it can be rolled back? | |
1489 | This looks like a left-over. | |
1594 | Should there be some check somewhere that the current gridDims are 1 before rewriting them? |
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp | ||
---|---|---|
1465 | Transform ops are more directive, the equivalent of the pattern driver is the tranform IR itself where, if things fail, to apply we want a louder signal (at least for now). | |
1482–1483 | Rollback is a thing only for DialectConversion today, we do not use that. | |
1594 | Yes, it would make sense to add an error at the top. |
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp | ||
---|---|---|
21 | please drop spurious include (that I imagine was insert automatically by e.g. vscode ?) | |
24 | please drop spurious include (that I imagine was insert automatically by e.g. vscode ?) | |
1404 | almost never use unsigned unless doing bit manipulations, just use int64_t everywhere plz. | |
1404 | Definitely use an OpBuilder::InsertionGuard here, never let a function leak a modification of insertion point | |
1405 | Please don't use the magic constant 1 to detect whether or not to modify, pass an explicit Optional<int64_t> | |
1411 | optional for less magic constant-y tests | |
1429 | Note: it is good practice OpBuilder::InsertionGuard each time you modify insertion points in a scoped piece of code. | |
1435 | 0, 0, 0, feels fishy here, what is the intent ? | |
1527 | we use int64_t everywhere we can, please don't deviate from that | |
1541 | Definitely use an OpBuilder::InsertionGuard here, never let a function leak a modification of insertion point | |
1564 | Note: it is good practice OpBuilder::InsertionGuard each time you modify insertion points in a scoped piece of code. | |
1579 | Note: it is good practice OpBuilder::InsertionGuard each time you modify insertion points in a scoped piece of code. I would still use one here given the fact that this lambda may have a chance of turning into a standalone function in the future. |
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td | ||
---|---|---|
876–877 | Please do not use === that indicates the page title. Use the same-level header as the operation operations in this file. | |
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h | ||
140 ↗ | (On Diff #462457) | What does "top-level" mean here? Is this going considering ascendants or descendants of the given operation? |
136 ↗ | (On Diff #461846) | Overall, when you need an immutable cheap view of a sequential construct such as vector, take an ArrayRef, when you need a mutable vector, take a SmallVectorImpl so it is compatible with any stack-size and not only the hardcoded/inferred one. |
Please do not use === that indicates the page title. Use the same-level header as the operation operations in this file.