This is an archive of the discontinued LLVM Phabricator instance.

[mlir][nvvm] Add async copy ops to nvvm dialect
ClosedPublic

Authored by ThomasRaoux on Dec 7 2021, 7:32 PM.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Dec 7 2021, 7:32 PM
ThomasRaoux requested review of this revision.Dec 7 2021, 7:32 PM
rriddle added inline comments.Dec 7 2021, 7:38 PM
mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
190–192

Prefer single line when possible.

212–214

Address review feedback

ThomasRaoux marked 2 inline comments as done.Dec 7 2021, 7:53 PM
mravishankar accepted this revision.Dec 7 2021, 11:41 PM
mravishankar added inline comments.
mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
200

Maybe add the wait_all variant as well when you are at it.

This revision is now accepted and ready to land.Dec 7 2021, 11:41 PM
nicolasvasilache accepted this revision.Dec 8 2021, 4:54 AM

Nice that it plugs so easily.

mlir/test/Dialect/LLVMIR/nvvm.mlir
104

4 ?

I find the example in the NVVM doc scary https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-asynchronous-copy:

// Example of .wait_all:
cp.async.ca.shared.global [shrd1], [gbl1], 4;
cp.async.cg.shared.global [shrd2], [gbl2], 16;
cp.async.wait_all;  // waits for all prior cp.async to complete

// Example of .wait_group :
cp.async.ca.shared.global [shrd3], [gbl3], 8;
cp.async.commit_group;  // End of group 1

cp.async.cg.shared.global [shrd4], [gbl4], 16;
cp.async.commit_group;  // End of group 2

cp.async.cg.shared.global [shrd5], [gbl5], 16;
cp.async.commit_group;  // End of group 3

cp.async.wait_group 1;  // waits for group 1 and group 2 to complete

In my mind, there must be an off-by-one error either in the code or in the comments.
Should it be:
cp.async.wait_group **2**; // waits for group 1 and group 2 to complete (natural)

or
cp.async.wait_group 1; // waits for group **0** and group **1** to complete (weird but less broken)

Or am I fundamentally misunderstanding something..?

ThomasRaoux added inline comments.Dec 8 2021, 9:14 AM
mlir/include/mlir/Dialect/LLVMIR/NVVMOps.td
200

wait_all is strictly equivalent to wait 0 so I'm not sure in what case we would want to use it. I'll add it only if needed to avoid useless code.

mlir/test/Dialect/LLVMIR/nvvm.mlir
104

4 was just a random value. Here if we want to wait on the previous cp we would need 0. I'll change it to 0.

In the example from the spec there is a cp.async.wait_all after the group 0 then group 1, 2 and 3 are committed so cp.async.wait_group 1 waits until at most 1 group is pending which means 1 and 2 are complete. (meaning all groups are complete but 1)

This revision was landed with ongoing or failed builds.Dec 8 2021, 9:42 AM
This revision was automatically updated to reflect the committed changes.