This is an archive of the discontinued LLVM Phabricator instance.

[mlir][GPUtoNVVM] Relax restriction on wmma op lowering
ClosedPublic

Authored by ThomasRaoux on Oct 25 2021, 12:32 PM.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Oct 25 2021, 12:32 PM
ThomasRaoux requested review of this revision.Oct 25 2021, 12:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2021, 12:32 PM

Could you give me a pointer as to what this is for?

Could you give me a pointer as to what this is for?

It's to support lowering to nvvm with 64bits indexes

This looks good to me - thanks.

mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
187–189

I assume this explanation wasn't a valid one. Could you clarify?

mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
13–22

The default has changed to 64-bit here: could you please capture this in the commit summary?

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir
3

This can be run with a 32-bit index bitwidth. Do we need this change?

bondhugula added inline comments.Oct 26 2021, 11:06 PM
mlir/test/Conversion/GPUToNVVM/wmma-ops-to-nvvm.mlir
13–22

I now realize that the default now just works as opposed to failing: it's just derived from the data layout and would happen to be 64-bit by default.

bondhugula accepted this revision.Oct 26 2021, 11:07 PM

LGTM!

mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
87

auto -> IntegerAttr here.

This revision is now accepted and ready to land.Oct 26 2021, 11:07 PM

Thanks for the review!

mlir/lib/Conversion/GPUToNVVM/WmmaOpsToNvvm.cpp
187–189

Yes, even though the intrinsic only accept 32bits stride those ops can be used with 64bits indexes.

mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir
3

Correct it can run with 32bits index but most tests use the default 64bit mode. Overall I don't believe that the 32bit mode is as relevant for CUDA. Let me know if you want me to keep an integration test in 32bit mode, note that I'm still testing both mode in the lit tests.

bondhugula added inline comments.Oct 27 2021, 12:33 AM
mlir/test/Integration/GPU/CUDA/TensorCore/wmma-matmul-f32.mlir
3

Reg. 32-bit mode's relevance for CUDA, my understanding is exactly the opposite. The 32-bit indexing is quite important and common: it's often sufficient and is expected to only perform better. I recall @herhut or others from his team reporting on the MLIR forum that they saw better performance in many cases when using 32-bit indexing. So I think it's completely fine to use 32-bit indexing for cases where it's guaranteed to be sufficient.

Address review comment

ThomasRaoux marked an inline comment as done.Oct 27 2021, 9:31 PM
This revision was landed with ongoing or failed builds.Oct 27 2021, 9:32 PM
This revision was automatically updated to reflect the committed changes.