The RPC interface relies on waiting on atomic signals to coordinate
which side of the protocol is in control of the shared buffer. The GPU client
supports briefly suspending the executing thread group. This is used by the
thread scheduler to identify which thread groups can be switched out so that
others may execute. This allows us to ensure that other threads get a chance
to make forward progress while these threads wait on the atomic signal.
This is currently only relevant on the client-side. We could use an
alternative implementation on the server that uses the standard
nanosleep on supported hosts.
this probably gets rotated, might be better written do_while
sleep before or after load? the fence above probably takes time