This is an archive of the discontinued LLVM Phabricator instance.

[mlir][NVGPU] nvgpu.mmasync on F32 through TF32
ClosedPublic

Authored by manishucsd on Jul 21 2022, 11:25 AM.

Details

Summary

Adds optional attribute to support tensor cores on F32 datatype by lowering to mma.sync with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding tf32Enabled as an attribute to allow the IR to be aware of MmaSyncOp datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3.

For mma.sync on f32 input using tensor cores there are two possibilites:
(a) tf32 (1 mma.sync per warp-level matrix-multiply-accumulate)
(b) tf32x3 (3 mma.sync per warp-level matrix-multiply-accumulate)

Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three mma.sync tensor core operations.

Diff Detail

Event Timeline

manishucsd created this revision.Jul 21 2022, 11:25 AM
manishucsd requested review of this revision.Jul 21 2022, 11:25 AM

Looks like a good path forward.

Since, TF32 is not a native datatype in LLVM we are adding tf32Enabled as an attribute to allow the IR to be aware of MmaSyncOp datatype.

Can you update the description to be a bit more explicit in terms of your plan here, for example:

  1. Are you going to change the NVGPU to NVVM conversion so that it refuses to convert nvgpu.mma.sync ops with f32 types if the tf32Enabled is not present? It seems like that should be part of this diff.
  2. Maybe explicitly note that you are setting the attribute to true by default now in VectorToGPU. Since your rewrite returns failure if the attribute is set, I assume that behavior will change on the next diff?
mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
117

I think just putting`UnitAttr` without the OptionalAttr should be sufficient.

mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
692
It is your intent to enable by default currently?
mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
132
176

Braces can be removed

mlir/lib/Dialect/NVGPU/Transforms/MmaSyncTF32Transform.cpp
27
mlir/test/Dialect/NVGPU/mma-sync-f32-to-tf32.mlir
7

This test might read more naturally if you change to

//           CHECK: nvgpu.mma.sync
// CHECK-SAME: tf32Enabled
mlir/test/lib/Dialect/NVGPU/TestNVGPUTransforms.cpp
62

Maybe just make the option a string then -precision=tf32 or -precision=3xtf32

Looks like a good path forward.

Since, TF32 is not a native datatype in LLVM we are adding tf32Enabled as an attribute to allow the IR to be aware of MmaSyncOp datatype.

Can you update the description to be a bit more explicit in terms of your plan here, for example:

  1. Are you going to change the NVGPU to NVVM conversion so that it refuses to convert nvgpu.mma.sync ops with f32 types if the tf32Enabled is not present? It seems like that should be part of this diff.

+1 I think we should make NVGPU to NVVM fail if the attribute is not there for f32.

  1. Maybe explicitly note that you are setting the attribute to true by default now in VectorToGPU. Since your rewrite returns failure if the attribute is set, I assume that behavior will change on the next diff?

I think we shouldn't make VectorToGPU make that choice.

mlir/include/mlir/Dialect/NVGPU/Transforms/Transforms.h
59

I think the struct is a bit of an overkill, passing directly the enum should be good enough.

mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp
692

It would be nice to add a builder function where we don't need to pass the extra attribute and that would just set it to false.

UnitAttr() should not enable it I believe? I don't think we want VectorToGPU to make this decision.

mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
132

nit: could be made a member function.

manishucsd marked 7 inline comments as done.Jul 22 2022, 4:07 PM
manishucsd marked an inline comment as done.Jul 22 2022, 4:15 PM
manishucsd added inline comments.
mlir/include/mlir/Dialect/NVGPU/IR/NVGPU.td
117

tf32Enabled has no meaning for i8, f16, bf16, f64. Thus, we decided to put an Optional attribute. It's presence has meaning and acceptable for f32 datatype.

mlir/lib/Dialect/NVGPU/IR/NVGPUDialect.cpp
132

Thanks. Updated.

176

Thanks for catching this. clang doesn't remove these braces... hmm

manishucsd marked an inline comment as done.Jul 26 2022, 9:28 AM
manishucsd added inline comments.
mlir/include/mlir/Dialect/NVGPU/Transforms/Transforms.h
59

I am in favor or keeping the struct for MmaSyncTransformOptions and enums for specific paths for F32 lowering. We could have more MmaSyncTranfrom enums in the future.

ThomasRaoux added inline comments.Jul 26 2022, 9:51 AM
mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp
302

I don't think we want to return an error for the case where tf32 is not enabled. We should just fail the pattern.

manishucsd marked an inline comment as done.
manishucsd marked an inline comment as done.
manishucsd added inline comments.Jul 26 2022, 2:09 PM
mlir/include/mlir/Dialect/NVGPU/Transforms/Transforms.h
59

After an offline discussion, we have decide to go with just the enum for now.

This revision is now accepted and ready to land.Jul 26 2022, 2:59 PM
This revision was automatically updated to reflect the committed changes.