This is an archive of the discontinued LLVM Phabricator instance.

[mlir][ROCM] Add Wave/Warp shuffle lowering and op for ROCM.
ClosedPublic

Authored by raikonenfnu on Aug 23 2023, 4:57 PM.

Details

Summary

Reduction is heavily used for many DL workload especially with
softmax/Attention layers. Wave/Warp shuffle and reduction is known to be
a speedy/efficient way to do these reductions.

In this patch we introduce AMD shuffle intrinsic Ops to ROCDL, along with it's corresponding lowering from gpu.shuffle. This should speed up a lot of DL workloads on ROCM backend. Currently, we have support for xor and idx, which are the more common ones. In the future, we plan on adding support for Down and Up, as well as using the ds_swizzle to further enhance it's performance when width and offsets are constant.

Diff Detail

Event Timeline

raikonenfnu created this revision.Aug 23 2023, 4:57 PM
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript

Moving from draft

raikonenfnu published this revision for review.Aug 23 2023, 5:22 PM

update from draft

antiagainst accepted this revision.Aug 24 2023, 8:36 AM
antiagainst added inline comments.
mlir/lib/Conversion/GPUToROCDL/LowerGpuOpsToROCDLOps.cpp
116

s/NVVM/ROCDL/

This revision is now accepted and ready to land.Aug 24 2023, 8:36 AM

Updated terminologies i.e Warpsize -> Width, and better comments/explanation.

raikonenfnu marked an inline comment as done.Aug 24 2023, 9:58 AM

Clang formatting

This revision was automatically updated to reflect the committed changes.
mlir/test/Conversion/GPUToROCDL/gpu-to-rocdl.mlir