Better index -> i32/i64 handling is needed to support TMA operations.
This commit is a draft that makes some connections possible.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@guraypp @mehdi_amini @kerrmudgeon FYI, with this WIP I am able to generate PTX for TMA descriptors + load based on @guraypp 's ongoing work
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
---|---|---|
657 | I put isSharedMemoryAddressSpace in nvgpu dialect, not sure you can use it here. |
Update after debugging with --mlir-print-ir-after-all and ensuring all intermediate IRs are valid.
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
---|---|---|
659 | The size of the address space pointer is technically carried by the datalayout in LLVM. For NVPTX in particular, this is controlled by nvptx-short-ptr (https://github.com/llvm/llvm-project/blob/0b17e9d2859acfec2cf757472f3822f6b5aad020/llvm/lib/Target/NVPTX/NVPTXTargetMachine.cpp#L60). So we may need to query the datalayout here (no idea if that's even feasible) instead of hardcoding this. |
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp | ||
---|---|---|
659 | I previously highlighted this flag in another PR. Let's ensure its setting in MLIR, otherwise LLVM promotes 32-bit registers to 64-bit no matter what we do in MLIR. Nevertheless, having 32-bit in MLIR, as in this work, offers advantages. For instance, when generating assembly or PTX directly, maintaining 32-bit is crucial. |
what is the proper way to replace the magic constant at the GPU -> LLVM conversion ?