To support printf NVPTX and AMD targets are handled differently. The
latter performs host calls, which we don't really want in OpenMP, the
former translates printf calls to vprintf calls as the NVIDIA
runtime provides an implementation for the device of vprintf. This
patch unifies the AMD and NVPTX handling and emits for both calls to the
vprintf wrapper __llvm_omp_vprintf which we define in our new device
runtime. The main benefit of this wrapper is that we can more easily
control (and profile) the emission of printf calls in device code.
Note: Tests are coming.
We actually do know. Above we allocate and fill the buffer. For the OpenMP wrapper you could easily add a third argument later in order to facilitate an OpenMP runtime printf impl. I would even like it to be target agnostic (e.g., replace the default CUDA route on request). That said, we should tackle that separately, wdyt?