This is an archive of the discontinued LLVM Phabricator instance.

[mlir][vector] Add pattern to distribute vector reduction to GPU shuffles
ClosedPublic

Authored by ThomasRaoux on Jun 6 2022, 7:40 PM.

Details

Summary

Add a pattern to do ad hoc lowering of vector.reduction to a sequence of warp shuffles. This allow distributing reduction on a warp for GPU targets.
Also add an execution test for warp reduction.

co-authored with @springerm

Diff Detail

Event Timeline

ThomasRaoux created this revision.Jun 6 2022, 7:40 PM
Herald added a project: Restricted Project. · View Herald Transcript
ThomasRaoux requested review of this revision.Jun 6 2022, 7:40 PM
Herald added a project: Restricted Project. · View Herald Transcript
csigg added inline comments.Jun 6 2022, 10:35 PM
mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp
747

These %e look wrong.

770

Could you use rewriter.notifyMatchFailure() to make debugging easier? Here and below.

785–786

Would it be cleaner to initialize those vectors during construction instead of push_back?

786

.getType()?

791

Would getResults().back() be easier to parse?

792

I would have expected this to be 'laneId' of some sort. Does this happen latter in the lowering?

808

Could we use 'bfly' reduction instead of broadcasting separately?

ThomasRaoux marked an inline comment as done.

Address review comments

ThomasRaoux marked an inline comment as done.Jun 7 2022, 8:08 AM
ThomasRaoux added inline comments.
mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp
747

good catch, fixed it.

785–786

indeed.

791

The new op has extra results so this is not the last element of the newWarpOp results.

792

The op will return Value vector<1xf32> this extract just converts it back to a scalar. The laneId indexing is implicitly done due to the distribution of the vector onto simt lanes.

808

That’s a good point. One potential problem I see with the

808

sorry outdated reply, changed it to butterfly reduction.

springerm accepted this revision.Jun 9 2022, 2:55 AM
springerm added inline comments.
mlir/lib/Dialect/Vector/Transforms/VectorDistribute.cpp
807

Better to wrap in updateRootInplace

This revision is now accepted and ready to land.Jun 9 2022, 2:55 AM
mehdi_amini added inline comments.Jun 14 2022, 3:32 AM
mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt
28

Seems like a layering violation?

springerm added inline comments.Jun 14 2022, 3:37 AM
mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt
28

Note this is VectorTransforms not VectorDialect. But this code could also live in GPUTransforms (adding a dependency on VectorDialect to GPUTransforms instead).

ThomasRaoux added inline comments.Jun 15 2022, 8:20 AM
mlir/lib/Dialect/Vector/Transforms/CMakeLists.txt
28

address review comment