cp_async_zfill is currently not present in the nvvm backend, this patch adds cp_async_zfill support by adding inline asm when lowering from nvgpu to nvvm.
Details
Diff Detail
Event Timeline
mlir/lib/Conversion/NVGPUToNVVM/NVGPUToNVVM.cpp | ||
---|---|---|
388 | nit: remove empty line | |
417 | I believe this naming is cutlass specific and I don't really see it anywhere in ptx or nvvm spec. I think it would be better to explain the difference between those two different version of cp.async and not name it cp_async_zfill |
Great thanks for adding this!
Just a small question: What is the reason for using inline ASM? If you haven't already tried, adding the NVPTX backend support for intrinsics is pretty trivial. It's probably the same amount of code. If you search for past diffs that add NVPTX intrinsics, you can look at the files they modify and follow the same process.
I had suggested it that way to avoid having to add an intrinsic at this point as it would require adding a bunch of new intrinsics (there are 5 for cp.aync that would double it). I wonder if we want to duplicate the intrinsics or let the codegen match cp.async with cp_size == src_size. Anyway I was trying to avoid this discussion and effort at the moment. Maybe I overestimated the effort needed.
In any case I agree that we should add support to it in NVPTX long term.
For supporting async copy with zero fills, inline asm seems to be the fastest route for now. Yeah, we would want to eventually support it via NVPTX backend.
nit: remove empty line