This patch adds initial support for an RPC client / server architecture.
The GPU is unable to perform several system utilities on its own, so in
order to implement features like printing or memory allocation we need
to be able to communicate with the executing process. This is done via a
buffer of "sharable" memory. That is, a buffer with a unified pointer
that both the client and server can use to communicate.
The implementation here is based off of Jon Chesterfields minimal RPC
example in his work. We use an inbox and outbox to communicate
between if there is an RPC request and to signify when work is done.
We use a fixed-size buffer for the communication channel. This is fixed
size so that we can ensure that there is enough space for all
compute-units on the GPU to issue work to any of the ports. Right now
the implementation is single threaded so there is only a single buffer
that is not shared.
This implementation still has several features missing to be complete.
Such as multi-threaded support and asynchrnonous calls.
Depends on D145912
The number of bytes copied at each round are either buffer_len (i.e. 56B) or the total length of the string, it should be the remainder.