Move async copy operations to NVGPU as they only exist on NV target and are designed to match ptx semantic. This allows us to also add more fine grain caching hint attribute to the op.
Add hint to bypass L1 and hook it up to NVVM op.
Details
Diff Detail
Event Timeline
A couple of minor things, otherwise LGTM.
mlir/include/mlir/Dialect/NVGPU/NVGPU.td | ||
---|---|---|
172 | This would change the diff from being pure code movement, but can't this have NoSideEffects? | |
mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp | ||
160 | Is there a source of truth for this in NVVM dialect? This (renamed) would be useful to have in the dialect header. |
mlir/include/mlir/Dialect/NVGPU/NVGPU.td | ||
---|---|---|
172 | The problem is that at this point we don't want re-ordering of those operations with unrelated commit op as we don't have code to reorder correctly when we lower those ops so we have to rely on operations order. This is something we should improve but I don't have a good solution at this point. | |
mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp | ||
160 | good point, moved it there |
This would change the diff from being pure code movement, but can't this have NoSideEffects?