This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Run the TMA test for sm_90
ClosedPublic

Authored by guraypp on Sep 1 2023, 3:20 AM.

Details

Summary

TMA was introduced to MLIR, however, it needed ptxas compiler. Recent work D154117 introduced that!

This work runs the existing integration test.

Diff Detail

Event Timeline

guraypp created this revision.Sep 1 2023, 3:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2023, 3:20 AM
guraypp requested review of this revision.Sep 1 2023, 3:20 AM

@fmorac I use the gpu-module-to-binary pass, you recently introduced, for mlir->llvm->ptx->cubin, eventually link the hosts llvm (has embedded cubin) by clang to generate the executable. Is this the right way to use your Pass?

I used to run gpu mlir integration tests with mlir-cpu-runner, but I guess gpu-module-to-binary is not compatible with it.

fmorac added a comment.Sep 4 2023, 4:13 AM

@fmorac I use the gpu-module-to-binary pass, you recently introduced, for mlir->llvm->ptx->cubin, eventually link the hosts llvm (has embedded cubin) by clang to generate the executable. Is this the right way to use your Pass?

I used to run gpu mlir integration tests with mlir-cpu-runner, but I guess gpu-module-to-binary is not compatible with it.

Couple of things mlir-cpu-runner should work, for example the following should work -if you have a sm_70 GPU, with the all-reduce-and.mlir test:

mlir-opt all-reduce-and.mlir -gpu-kernel-outlining -nvvm-attach-target=chip=sm_70 \
    | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm))' \
    | mlir-opt -gpu-to-llvm -gpu-module-to-binary \
    |  mlir-cpu-runner --shared-libs=${LLVM_LIB}/libmlir_cuda_runtime.so --shared-libs=${LLVM_LIB}/libmlir_runner_utils.so --entry-point-result=void

Adding module=main_kernel is not necessary in --nvvm-attach-target=, that's just to filter to which modules to add the target.

There might be issues if the chip doesn't match the GPU the code is running, ie. chip=sm_80 but GPU is sm_90.

The clang target is not supported upstream as you have it yet.

If the above workflow with mlir-cpu-runner is not working, could you send me the error?

guraypp updated this revision to Diff 555710.Sep 4 2023, 5:05 AM

use mlir-cpu-runner

@fmorac I use the gpu-module-to-binary pass, you recently introduced, for mlir->llvm->ptx->cubin, eventually link the hosts llvm (has embedded cubin) by clang to generate the executable. Is this the right way to use your Pass?

I used to run gpu mlir integration tests with mlir-cpu-runner, but I guess gpu-module-to-binary is not compatible with it.

Couple of things mlir-cpu-runner should work, for example the following should work -if you have a sm_70 GPU, with the all-reduce-and.mlir test:

mlir-opt all-reduce-and.mlir -gpu-kernel-outlining -nvvm-attach-target=chip=sm_70 \
    | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm))' \
    | mlir-opt -gpu-to-llvm -gpu-module-to-binary \
    |  mlir-cpu-runner --shared-libs=${LLVM_LIB}/libmlir_cuda_runtime.so --shared-libs=${LLVM_LIB}/libmlir_runner_utils.so --entry-point-result=void

Thanks for the recipe. My test works now with the mlir-cpu-runner. I updated the test code

Adding module=main_kernel is not necessary in --nvvm-attach-target=, that's just to filter to which modules to add the target.

I actually need it to set ptx version. The default version for sm_90 is 7.8, that does not support PTX instructions for TMA. So I set to ptx80+

There might be issues if the chip doesn't match the GPU the code is running, ie. chip=sm_80 but GPU is sm_90.

The clang target is not supported upstream as you have it yet.

If the above workflow with mlir-cpu-runner is not working, could you send me the error?

I used to get interface is not implemented error, cannot recall. I cannot reproduce now. I guess I was using it incorrectly.

fmorac added a comment.Sep 4 2023, 6:33 AM

Adding module=main_kernel is not necessary in --nvvm-attach-target=, that's just to filter to which modules to add the target.

I actually need it to set ptx version. The default version for sm_90 is 7.8, that does not support PTX instructions for TMA. So I set to ptx80+

What I was saying is that this is enough:

--nvvm-attach-target="features=+ptx80 chip=sm_90 O=3"

I used to get interface is not implemented error, cannot recall. I cannot reproduce now. I guess I was using it incorrectly.

Ok I see, that was a registration call that was missing, but you shouldn't get it. If it ever pops again please let me know.

fmorac accepted this revision.Sep 4 2023, 9:09 AM

LGTM!

This revision is now accepted and ready to land.Sep 4 2023, 9:09 AM
This revision was automatically updated to reflect the committed changes.