This is an archive of the discontinued LLVM Phabricator instance.

[mlir][NVGPU] WIP - Apply layering changes for e2e NVVM
Needs ReviewPublic

Authored by nicolasvasilache on Jul 28 2023, 7:12 AM.

Details

Summary

Better index -> i32/i64 handling is needed to support TMA operations.
This commit is a draft that makes some connections possible.

Diff Detail

Event Timeline

Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
nicolasvasilache requested review of this revision.Jul 28 2023, 7:12 AM
nicolasvasilache planned changes to this revision.Jul 28 2023, 7:22 AM
nicolasvasilache added subscribers: kerrmudgeon, guraypp.

@guraypp @mehdi_amini @kerrmudgeon FYI, with this WIP I am able to generate PTX for TMA descriptors + load based on @guraypp 's ongoing work

mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
657

what is the proper way to replace the magic constant at the GPU -> LLVM conversion ?

mlir/lib/Conversion/GPUToNVVM/LowerGpuOpsToNVVMOps.cpp
206

what is the proper way to replace the magic constant at the GPU -> NVVM conversion ?

guraypp added inline comments.Jul 28 2023, 7:30 AM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
657

I put isSharedMemoryAddressSpace in nvgpu dialect, not sure you can use it here.

Update after debugging with --mlir-print-ir-after-all and ensuring all intermediate IRs are valid.

qcolombet added inline comments.Aug 7 2023, 5:10 AM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
659

The size of the address space pointer is technically carried by the datalayout in LLVM.

For NVPTX in particular, this is controlled by nvptx-short-ptr (https://github.com/llvm/llvm-project/blob/0b17e9d2859acfec2cf757472f3822f6b5aad020/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp#L60).

So we may need to query the datalayout here (no idea if that's even feasible) instead of hardcoding this.

guraypp added inline comments.Aug 7 2023, 6:14 AM
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
659

I previously highlighted this flag in another PR. Let's ensure its setting in MLIR, otherwise LLVM promotes 32-bit registers to 64-bit no matter what we do in MLIR.

Nevertheless, having 32-bit in MLIR, as in this work, offers advantages. For instance, when generating assembly or PTX directly, maintaining 32-bit is crucial.