This is an archive of the discontinued LLVM Phabricator instance.

[mlir][NVGPU]: Fix op description of nvgpu.device_async_wait.
ClosedPublic

Authored by yaoyuannnn on Jun 29 2023, 12:11 AM.

Details

Summary

According to the NVIDIA documentation on cp.async.wait_group
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-wait-group-cp-async-wait-all),
the numGroups attribute in nvgpu.device_async_wait should give an upper
bound of pending async group count (instead of a lower bound) when the
executing thread can be unblocked.

Diff Detail

Event Timeline

yaoyuannnn created this revision.Jun 29 2023, 12:11 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 12:11 AM
This comment was removed by yaoyuannnn.
yaoyuannnn published this revision for review.Jun 29 2023, 12:25 AM
christopherbate accepted this revision.Jun 29 2023, 9:43 AM

In case anyone needs an example to verify: if you push 16 groups, and set the wait to numGroups = 12, then that means you want to unblock when 12 groups or fewer are in flight (4 groups have completed).

It might be a bit clearer if a better example is provided.

This revision is now accepted and ready to land.Jun 29 2023, 9:43 AM

Updated with an example of using numGroups.

Fixed a typo.

Hi Christopher, can you help push the patch?