This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][NVGPU] Adding `nvgpu.wargroup.mma` Op for Hopper GPUs
AbandonedPublic

Authored by guraypp on Aug 21 2023, 9:16 AM.

Details

Summary

[MLIR][NVGPU] Adding nvgpu.wargroup.mma Op for Hopper GPUs

This work introduces a new operation called wargroup.mma to the NVGPU dialect of MLIR. The purpose of this operation is to facilitate warpgroup-level matrix multiply and accumulate (WGMMA) operations on Hopper GPUs with sm_90a architecture.

Previously, the nvvm.wgmma.mma_async operation was introduced to support wargroup-level matrix operations in NVVM dialect. This op is used multiple instances of nvvm.wgmma.mma_async to achieve the desired shape. The new nvgpu.wargroup.mma operation abstracts this complexity and provides a higher-level interface for performing wargroup-level matrix operations.

The nvgpu.wargroup.mma does followings:

  1. Corresponds multiple wgmma instructions.
  2. Iterates input matrix descriptors to achieve the desired computation shape.
  3. Groups and runs wgmma instructions asynchronously, and eventually waits them. This are done by wgmma.fence.aligned, wgmma.commit.group.sync.aligned, and wgmma.wait.group.sync.aligned
  4. Results fragmented matrices

Here's an example usage of the nvgpu.wargroup.mma operation:

%wgmmaResult, %wgmmaResult2 = nvgpu.wargroup.mma %descA, %descB, %acc, group = 1 {transposeB}:
!nvgpu.wgmma.descriptor<tensor = memref<128x64xf16, 3>>,
!nvgpu.wgmma.descriptor<tensor = memref<64x128xf16, 3>>,
vector<128x128xf32>
-> !nvgpu.warpgroup.result<tensor = !llvm.struct<...>,
!nvgpu.warpgroup.result<tensor = !llvm.struct<...>>

Diff Detail

Event Timeline

guraypp created this revision.Aug 21 2023, 9:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 9:16 AM
guraypp requested review of this revision.Aug 21 2023, 9:16 AM
guraypp planned changes to this revision.Aug 21 2023, 9:16 AM
guraypp added reviewers: qcolombet, manishucsd.

This work is still draft. I plan to add more tests

guraypp updated this revision to Diff 555285.Sep 1 2023, 12:32 AM

add type, improves descriptor iterators, improve the operation

guraypp edited the summary of this revision. (Show Details)Sep 1 2023, 12:32 AM