This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Transform] Introduce nvgpu transform extensions
ClosedPublic

Authored by nicolasvasilache on Jun 21 2023, 6:31 AM.

Details

Summary

Mapping to NVGPU operations such as mma.sync with mixed precision and ldmatrix with transposes and
various data types involves complex matchings from low-level IR.
This is akin to raising complex patterns after unnecessarily having lost structural information.
To avoid such unnecessary complexity, introduce a direct mapping step from a matmul on memrefs
to distributed NVGPU vector abstractions.
In this context, mapping to specific mma.sync operations is trivial and consists in simply
translating the documentation into indexing expressions.

Correctness is demonstrated with an end-to-end integration test.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald Transcript
nicolasvasilache requested review of this revision.Jun 21 2023, 6:31 AM
Herald added a project: Restricted Project. · View Herald Transcript
springerm accepted this revision.Jun 21 2023, 7:49 AM
springerm added inline comments.
mlir/lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp
350

API has changed, this comes with a TransformRewriter now. No need to create a TrackingListener and Rewriter.

This revision is now accepted and ready to land.Jun 21 2023, 7:49 AM
qcolombet added inline comments.Jun 21 2023, 8:10 AM
mlir/include/mlir/Dialect/NVGPU/CMakeLists.txt
2

I'm guessing we're missing some bazel changes as well.

mlir/lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp
106

Since we're implementing these in the same file, could we inline the body here?

I found is easier to read when the comment and implementation are next to each other, since in this case, we only apply the spec.

219

indexingsRef is dead here

221

We should be able to switch that to a range loop on indexings.

224

Do we assert/bail out somewhere that memref has two dimensions?

mlir/test/Dialect/NVGPU/transform-matmul-to-nvvm.mlir
8

Maybe generate the lines with generate-test-checks.py.

I find the count checks not particularly useful to figure out what is going on.

mlir/test/Integration/GPU/CUDA/TensorCore/transform-mma-sync-matmul-f32.mlir
25

Would it make sense to also check that a mma.sync operation is generated?

I'm asking because since the transform will skip unsupported instructions, we may actually not be testing (in the future) the mma.sync codegen.

Matt added a subscriber: Matt.Jun 21 2023, 12:43 PM
nicolasvasilache marked 7 inline comments as done.

Address comments

mlir/lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp
221

I actually need this in a followup but ok for now

224

done in caller