According to the NVIDIA documentation on cp.async.wait_group
(https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-wait-group-cp-async-wait-all),
the numGroups attribute in nvgpu.device_async_wait should give an upper
bound of pending async group count (instead of a lower bound) when the
executing thread can be unblocked.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This comment was removed by yaoyuannnn.
Comment Actions
In case anyone needs an example to verify: if you push 16 groups, and set the wait to numGroups = 12, then that means you want to unblock when 12 groups or fewer are in flight (4 groups have completed).
It might be a bit clearer if a better example is provided.