This patch adds extra intrinsics for the GPU. Some of these are unused
for now but will be used later. We use these currently to update the
RPC handling. Currently, every thread can update the RPC client, which
isn't correct. This patch adds code neccesary to allow a single thread
to perfrom the write while the others wait.
Feedback is welcome for the naming of these functions. I'm copying the
OpenMP nomenclature where we call an AMD wavefront or NVIDIA warp a
lane.
wrong dimension label. Here and below.