This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add map_nested_foreach_thread_to_gpu_blocks op to transform dialect
ClosedPublic

Authored by guraypp on Sep 19 2022, 9:05 AM.

Details

Summary

This revision adds a new op map_nested_foreach_thread_to_gpu_blocks to transform dialect.
If generate_gpu_launch argument is given, the op first generates gpu_launch. Otherwise, target must be gpu_launch. The op searches top level scf.foreach_threads inside the gpu_launch and distributes them with gpu.block_id attribute.
Loop mapping is explicit and given by the map_nested_foreach_thread_to_gpu_blocks op. Mapping is done one-to-one, therefore the loops disappear.
It also adds gpu dialect as dependent since the new op can create gpu::LaunchOp for given scf::ForeachThreadOp.

Diff Detail

Event Timeline

guraypp created this revision.Sep 19 2022, 9:05 AM
guraypp requested review of this revision.Sep 19 2022, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 9:05 AM

It is debatable whether this transform should 1) create a gpu.launch from a scf.foreach_thread or 2) expect the scf.foreach_thread to be already nested within one.
The current implementation is 2) and is coming from IREE which has special constraints about what IR comes into existence when and how async is handled.

  1. seems better suited to upstream MLIR. Instead of
// Step 6. Erase old op.
rewriter.eraseOp(foreachThreadOp);

you want to create a gpu::LaunchOp with the outlined body.
I.e. targetBlock = foreachThreadOp->getBlock(); should be the block of the gpu::LaunchOp.

Things will be synchronous for now and we will look into async launches later.

Most of the implementation should be factored out in a helper that we can reuse on the IREE side, as before.

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1218

nit double comment

Also, please be proactive in good doc comments everywhere.

guraypp edited the summary of this revision. (Show Details)Sep 21 2022, 4:16 AM
mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
136

This will break in various ways at link time when used from outside, please use a proper type / typedef.
E.g. something similar coming from scf:

LoopNest mlir::scf::buildLoopNest(
    OpBuilder &builder, Location loc, ValueRange lbs, ValueRange ubs,
    ValueRange steps, ValueRange iterArgs,
    function_ref<ValueVector(OpBuilder &, Location, ValueRange, ValueRange)>
        bodyBuilder) {
mlir/test/Dialect/Linalg/transform-gpu.mlir
134

nice!

guraypp updated this revision to Diff 461872.Sep 21 2022, 6:07 AM

Remove template type for lambda and add a proper type

csigg added inline comments.Sep 21 2022, 6:16 AM
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1366

I'm not familiar with transform ops, is this how this is done (rather than rewriter.notifyMatchFailure())?

1383–1384

Would it be cleaner to clone the region with a block arg mapping so that it can be rolled back?

1390

This looks like a left-over.

1495

Should there be some check somewhere that the current gridDims are 1 before rewriting them?

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1366

Transform ops are more directive, the equivalent of the pattern driver is the tranform IR itself where, if things fail, to apply we want a louder signal (at least for now).

1383–1384

Rollback is a thing only for DialectConversion today, we do not use that.
Transforms maintain operation handles, moving ops allows to keep the handles valid which is a behavior we want here.

1495

Yes, it would make sense to add an error at the top.

guraypp updated this revision to Diff 462116.Sep 22 2022, 2:00 AM

Address csigg's comments

guraypp updated this revision to Diff 462122.Sep 22 2022, 3:02 AM
guraypp marked an inline comment as done.

rebase

guraypp updated this revision to Diff 462401.Sep 23 2022, 12:13 AM

fix mistakes in alterGpuLaunch

guraypp retitled this revision from [mlir][wip] Add map_nested_foreach_thread_to_gpu_blocks op to transform dialect to [mlir] Add map_nested_foreach_thread_to_gpu_blocks op to transform dialect.Sep 23 2022, 12:47 AM
guraypp edited the summary of this revision. (Show Details)
guraypp edited the summary of this revision. (Show Details)
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
21

please drop spurious include (that I imagine was insert automatically by e.g. vscode ?)

24

please drop spurious include (that I imagine was insert automatically by e.g. vscode ?)

1295

Definitely use an OpBuilder::InsertionGuard here, never let a function leak a modification of insertion point

1297

almost never use unsigned unless doing bit manipulations, just use int64_t everywhere plz.

1298

Please don't use the magic constant 1 to detect whether or not to modify, pass an explicit Optional<int64_t>

1300–1311

optional for less magic constant-y tests

1329

Note: it is good practice OpBuilder::InsertionGuard each time you modify insertion points in a scoped piece of code.
Here is does not matter much because it won't leak above, but it does matter a lot at function boundaries.

1335–1336

0, 0, 0, feels fishy here, what is the intent ?
Use None if you want it to be optional.

1428

we use int64_t everywhere we can, please don't deviate from that

1442

Definitely use an OpBuilder::InsertionGuard here, never let a function leak a modification of insertion point

1465

Note: it is good practice OpBuilder::InsertionGuard each time you modify insertion points in a scoped piece of code.
Here is does not matter much because it won't leak above, but it does matter a lot at function boundaries.

1480

Note: it is good practice OpBuilder::InsertionGuard each time you modify insertion points in a scoped piece of code.
Here is does not matter much because it won't leak above, but it does matter a lot at function boundaries.

I would still use one here given the fact that this lambda may have a chance of turning into a standalone function in the future.

guraypp updated this revision to Diff 462457.Sep 23 2022, 5:43 AM

address nicolasvasilache's comments

ftynse added a subscriber: ftynse.Sep 23 2022, 6:22 AM
ftynse added inline comments.
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
876–877

Please do not use === that indicates the page title. Use the same-level header as the operation operations in this file.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
136

Overall, when you need an immutable cheap view of a sequential construct such as vector, take an ArrayRef, when you need a mutable vector, take a SmallVectorImpl so it is compatible with any stack-size and not only the hardcoded/inferred one.

140

What does "top-level" mean here? Is this going considering ascendants or descendants of the given operation?

nicolasvasilache accepted this revision.Sep 23 2022, 6:46 AM

Please address @ftynse 's comments and let's land this

This revision is now accepted and ready to land.Sep 23 2022, 6:46 AM
guraypp updated this revision to Diff 462473.Sep 23 2022, 7:19 AM
guraypp marked 21 inline comments as done.

remove ==== in tablegen file