This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add `subgroup_reduce` operation
ClosedPublic

Authored by Hardcode84 on Oct 5 2022, 3:02 PM.

Details

Summary

Introduce subgroup_reduce operation, similar to all_reduce, but operating on subgroup scope instead of workgroup.
It is intended as low-level building block for more high level abstractions (e.g for workgroup-wide all_reduce ops).
Only introduce version taking reduce operation enum for simplicity sake.

Diff Detail

Event Timeline

Hardcode84 created this revision.Oct 5 2022, 3:02 PM
Hardcode84 requested review of this revision.Oct 5 2022, 3:02 PM

are you planning to add some lowering for it? This can be useful to at least "enforce the semantic". For instance it should be easy to have a lowering of this shuffle ops.

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
727–728

can you clarify the behavior when some lanes in the subgroups are inactive?

are you planning to add some lowering for it? This can be useful to at least "enforce the semantic". For instance it should be easy to have a lowering of this shuffle ops.

We are planning to lower them directly to GroupNonUniformXYZ spirv ops (those ops allows both Workgroup and Subgroup scope, but Intel Level Zero/Intel Graphics Compiler we are targeting only supports Subgroup version at the moment).

mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
727–728

spirv ops like GroupNonUniformFAddOp we are targeting allow non-uniform execution, but we don't need it at the moment. I will add uniform requirement similar to all_reduce. We can always add something like non-uniform flag to these ops later if needed.

Hardcode84 updated this revision to Diff 465682.Oct 6 2022, 1:59 AM

add uniform requirement

This revision is now accepted and ready to land.Oct 10 2022, 2:16 PM
This revision was automatically updated to reflect the committed changes.