This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add map_nested_foreach_thread_to_gpu_threads op to transform dialect
ClosedPublic

Authored by guraypp on Sep 15 2022, 8:50 AM.

Details

Summary

This revision adds a new op map_nested_foreach_thread_to_gpu_threads to transform dialect. The op searches scf.foreach_threads inside the gpu_launch and distributes them with gpu.thread_id attribute.

Loop mapping is explicit and given by the map_nested_foreach_thread_to_gpu_threads op. Mapping is done one-to-one, therefore the loops dissappear.

The dynamic trip count or trip count that are larger than thread size are not supported for the time being. However, we can indeed support them by generating a loop inside with cyclic scheduling.

For the time being, trip counts that are dynamic or bigger than thread sizes are not supported. However, in the future the compiler can indeed generate a loop with static cyclic scheduling to support these cases.

Current mechanism allows scf.foreach_threads to be siblings or nested. There cannot be interleaving code between the loops when they are nested.

Diff Detail

Event Timeline

guraypp created this revision.Sep 15 2022, 8:50 AM

Some high level fly by comments, thanks for sharing an early draft, will review when not in draft mode anymore.
Generally looks good and it's great to see things connecting quite easily!

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1034

/// prefixed doc comments on all upstreamed functions please, with doc examples that will surface the limitations of the current impl.

1065

This should become a helper function on the scf::ForeachThreadOp directly.
I would call it getPermutedThreadIndices to avoid the conflation with getThreadIndices which only return the bbargs in order.

Also the implementation will need to be improved to be more robust with accepting 0 values and partial thread_dim_mapping attr specifications (this can be done in a followup once we have the basic bits connected).

1083

IANM workgroup is vulkan terminology that is used in IREE.
We should call this Grid size (here and everywhere).

1156

RAUW is not good here, I have something in flight separately.
It should be something like:

for (auto it : llvm::zip(threadIndices, threadOps)) {
  Value val = std::get<0>(it);
  if (!val) continue;
  for (Operation *user : llvm::make_early_inc_range(val.getUsers())) {
    rewriter.updateRootInPlace(user, [&](){
      user->replaceUsesOfWith(val, std::get<1>(it));
    });
  }
 }
mlir/test/Dialect/Linalg/transform-gpu.mlir
59

nit:nl

guraypp updated this revision to Diff 460776.Sep 16 2022, 8:31 AM

Addressed comments

guraypp edited the summary of this revision. (Show Details)Sep 16 2022, 8:33 AM
mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
759

there is is no translation_info attribute in MLIR

805

This example looks suspicious .. the result depends on what the blockDim attribute passed to the transform op contains.
Also, the threads directive should be updated with the proper blockDim values.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
127

The builder/rewriter is often the first operand passed in such APIs, any reason to deviate from that ?

mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
504

docs please

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1034

docs please

1035

plz rename to rewriteOneForeachThreadToGpuThreads

1037

Rewriter should be the first argument.

1061

please drop the , 3 here and below, these are not necessary anymore.

1130

plz rename to rewriteNestedForeachThreadsTo GpuThreads

1132

Rewriter should be the first argument.

1161

This needs to happen before the PR can land, otherwise the code will be incorrect.

mlir/lib/Dialect/SCF/IR/SCF.cpp
1266

drop TODO

1281

doc

1287

doc

mlir/test/Dialect/Linalg/transform-gpu.mlir
46

Let's be more explicit about the function name: map_nested_foreach_thread_to_gpu_threads.

Also, this currently has to anchor on a func but this is not really necessary, we may want to anchor on gpu.launch instead.

guraypp updated this revision to Diff 461169.Sep 19 2022, 3:03 AM
guraypp edited the summary of this revision. (Show Details)

addressed comments

guraypp published this revision for review.Sep 19 2022, 3:04 AM
guraypp retitled this revision from [WIP][mlir] Add foreach_thread_to_gpu_and_translation_info op to transform dialect to [mlir] Add map_nested_foreach_thread_to_gpu_threads op to transform dialect.
guraypp edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 3:04 AM
guraypp updated this revision to Diff 461173.Sep 19 2022, 3:28 AM
guraypp marked 19 inline comments as done.
guraypp edited the summary of this revision. (Show Details)

rebase

mlir/lib/Dialect/SCF/IR/SCF.cpp
1284

Better doc comments please.

/// Return the thread indices in the order specified by the thread_dim_mapping attribute.
/// Return failure is thread_dim_mapping is not a valid permutation.
guraypp updated this revision to Diff 461185.Sep 19 2022, 4:46 AM

improve the comment

guraypp updated this revision to Diff 461187.Sep 19 2022, 5:27 AM

fix clang-format

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
771

Please reflect the part that are not yet captured from longer comment I rewrote in this doc.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
124

Better comments please, fixing them for you here as there are some misconceptions:

/// Searches `scf.foreach_thread` ops nested under `target` and maps each such op to GPU threads.
/// Mapping is one-to-one and the induction variables of `scf.foreach_thread` are rewritten to gpu.thread_id according to the thread_dim_apping attribute. 
/// Sibling `scf.foreach_thread` are supported in which case, the union of the number of threads is and may result in predication.
/// Multiple `scf.foreach_thread` are currently not supported.
/// Dynamic, `scf.foreach_thread` trip counts are currently not supported.
/// Dynamic block dim sizes are currently not supported.

Please lift these comments into the op documentation in Tablegen.

mlir/include/mlir/Dialect/SCF/IR/SCFOps.td
504

Better doc comments everywhere please

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1039

please update comment to what I pasted above.

guraypp updated this revision to Diff 461189.Sep 19 2022, 5:38 AM

update the doc of MapNestedForeachThreadToGpuThreads

guraypp updated this revision to Diff 461194.Sep 19 2022, 6:10 AM

Improve doc

nicolasvasilache accepted this revision.Sep 19 2022, 7:11 AM

Looks good, thanks Guray!

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
767

is computed

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
128

I forgot a word: "is computed"

This revision is now accepted and ready to land.Sep 19 2022, 7:11 AM
guraypp updated this revision to Diff 461200.Sep 19 2022, 7:21 AM

fix minor comment

This revision was landed with ongoing or failed builds.Sep 19 2022, 7:27 AM
This revision was automatically updated to reflect the committed changes.