We can now use the AMDGPU back-end to generate valid AMD kernel code, which
gets linked by LLD to valid AMD executables. The OpenCL runtime can execute
said kernels. This requires the ROCm driver stack to be installed, supported
are any AMD GPUs from the Fiji family and up. The specific GPU family can be
chosen when compiling, default is Fiji.
The LLD pass is currently a bit of a hack with a temporary file, but sadly this
is the only way this is possible at the moment, since LLD is not available as a
from-code pass. The way the temp file is being handled will change, such that
the name is random/different between compilations.