Details
Diff Detail
Event Timeline
mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td | ||
---|---|---|
200 | Maybe add the wait_all variant as well when you are at it. |
Nice that it plugs so easily.
mlir/test/Dialect/LLVMIR/nvvm.mlir | ||
---|---|---|
104 | 4 ? I find the example in the NVVM doc scary https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy: // Example of .wait_all: cp.async.ca.shared.global [shrd1], [gbl1], 4; cp.async.cg.shared.global [shrd2], [gbl2], 16; cp.async.wait_all; // waits for all prior cp.async to complete // Example of .wait_group : cp.async.ca.shared.global [shrd3], [gbl3], 8; cp.async.commit_group; // End of group 1 cp.async.cg.shared.global [shrd4], [gbl4], 16; cp.async.commit_group; // End of group 2 cp.async.cg.shared.global [shrd5], [gbl5], 16; cp.async.commit_group; // End of group 3 cp.async.wait_group 1; // waits for group 1 and group 2 to complete In my mind, there must be an off-by-one error either in the code or in the comments. or Or am I fundamentally misunderstanding something..? |
mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td | ||
---|---|---|
200 | wait_all is strictly equivalent to wait 0 so I'm not sure in what case we would want to use it. I'll add it only if needed to avoid useless code. | |
mlir/test/Dialect/LLVMIR/nvvm.mlir | ||
104 | 4 was just a random value. Here if we want to wait on the previous cp we would need 0. I'll change it to 0. In the example from the spec there is a cp.async.wait_all after the group 0 then group 1, 2 and 3 are committed so cp.async.wait_group 1 waits until at most 1 group is pending which means 1 and 2 are complete. (meaning all groups are complete but 1) |
Prefer single line when possible.