In OpenCL, a kernel is allowed to call other kernels as if
they are regular functions. To support it, clang emits
amdgpu_kernel calling convention for both caller and callee.
A backend pass in our downstream compiler alters such calls
by introducing regular function bodies which are clones of
the callee kernels. This implementation currently limits us
in certain ways. For instance, the restriction to not use
byref attribute for callee kernels.
To avoid such limitations, this patch brings in those
cloned functions early on and prevents clang from generating
amdgpu_kernel call sites. A new function body will be added
for each kernel in the compilation unit expecting that the
unused clones will get removed at link time.
I don't think we can really start with the function IR. The TargetABIInfo could be different from the kernel and function form (and will due to using byval/byref etc.)