TMA was introduced to MLIR, however, it needed ptxas compiler. Recent work D154117 introduced that!
This work runs the existing integration test.
Differential D159347
[MLIR] Run the TMA test for sm_90 guraypp on Sep 1 2023, 3:20 AM. Authored by
Details TMA was introduced to MLIR, however, it needed ptxas compiler. Recent work D154117 introduced that! This work runs the existing integration test.
Diff Detail
Event TimelineComment Actions @fmorac I use the gpu-module-to-binary pass, you recently introduced, for mlir->llvm->ptx->cubin, eventually link the hosts llvm (has embedded cubin) by clang to generate the executable. Is this the right way to use your Pass? I used to run gpu mlir integration tests with mlir-cpu-runner, but I guess gpu-module-to-binary is not compatible with it. Comment Actions Couple of things mlir-cpu-runner should work, for example the following should work -if you have a sm_70 GPU, with the all-reduce-and.mlir test: mlir-opt all-reduce-and.mlir -gpu-kernel-outlining -nvvm-attach-target=chip=sm_70 \ | mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm))' \ | mlir-opt -gpu-to-llvm -gpu-module-to-binary \ | mlir-cpu-runner --shared-libs=${LLVM_LIB}/libmlir_cuda_runtime.so --shared-libs=${LLVM_LIB}/libmlir_runner_utils.so --entry-point-result=void Adding module=main_kernel is not necessary in --nvvm-attach-target=, that's just to filter to which modules to add the target. There might be issues if the chip doesn't match the GPU the code is running, ie. chip=sm_80 but GPU is sm_90. The clang target is not supported upstream as you have it yet. If the above workflow with mlir-cpu-runner is not working, could you send me the error? Comment Actions Thanks for the recipe. My test works now with the mlir-cpu-runner. I updated the test code
I actually need it to set ptx version. The default version for sm_90 is 7.8, that does not support PTX instructions for TMA. So I set to ptx80+
I used to get interface is not implemented error, cannot recall. I cannot reproduce now. I guess I was using it incorrectly. Comment Actions What I was saying is that this is enough: --nvvm-attach-target="features=+ptx80 chip=sm_90 O=3"
Ok I see, that was a registration call that was missing, but you shouldn't get it. If it ever pops again please let me know. |