This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][NVGPU] Introduce `nvgpu.wargroup.mma.store` Op for Hopper GPUs
AbandonedPublic

Authored by guraypp on Sep 1 2023, 1:33 AM.

Details

Summary

This work introduces a new operation called wargroup.mma.store to the NVGPU dialect of MLIR. The purpose of this operation is to facilitate storing fragmanted results of WGMMA to the given memref.

An example of fragmentation is given here :
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d

The wargroup.mma.store does followings:

  1. Takes one or more fragmented results matrix.
  2. Calculates indexes per thread in warp group and stores the data into give memref.

Here's an example usage of the nvgpu.wargroup.mma operation:

%res, %res2 = nvgpu.wargroup.mma ...
nvgpu.wargroup.mma.store [%res1, %res2], %matrixD : !nvgpu.warpgroup.result<tensor = !llvm.struct<...>>, !nvgpu.warpgroup.result<tensor = !llvm.struct<...>> to memref<128x128xf32,3>

Depens on D158434

Diff Detail