This patch sets target specific calling convention for CUDA kernels in IR.
Patch by Greg Rodgers.
Modified and lit test added by Yaxun Liu.
Differential D44747
Set calling convention for CUDA kernel yaxunl on Mar 21 2018, 11:21 AM. Authored by
Details This patch sets target specific calling convention for CUDA kernels in IR. Patch by Greg Rodgers.
Diff Detail Event TimelineComment Actions Is there a reason for this to be done as a special case in IRGen instead of just implicitly applying the calling convention in Sema? Comment Actions The calling convention is not used in Sema, therefore it seems simpler to do it in codegen. I could try doing this in Sema too. Is there any advantage of doing this in Sema? Thanks. Comment Actions In IRGen, it's a special case for your specific language mode on your specific target. In Sema, it can be done as part of the special checking for kernel functions. Also, it looks like CUDA allows you to take the address of a __global__ function, and indirect calls to such functions presumably still follow the normal CUDA restrictions, so there must be *some* reflection of this in Sema. Comment Actions Revised by John's comments. Introduce CC_CUDAKernel calling convention in AST, which is translated to target calling convention in IR.
Comment Actions LGTM. If __global__ is supported in C++ structures, you might also need to make sure that member function constants (&A::kernel_function) drop the CC. And it might be a good idea to make sure that decltype(kernel_function) doesn't have a problem with it, either, since that does do some special-case work.
Comment Actions Matt, are you OK with the change from amdgcn backend point of view? Thanks.
Comment Actions I will try fixing that. The CUDA kernel calling convention should be dropped in all DRE's since it is invisible to the user. Sam Comment Actions Apparently it's not always the case. __global__ void EmptyKernel(void) { } template <typename T> __global__ void EmptyKernelT(void) { } struct Dummy { /// Type definition of the EmptyKernel kernel entry point typedef void (*EmptyKernelPtr)(); EmptyKernelPtr Empty() { return EmptyKernel; } // this one works EmptyKernelPtr EmptyT() { return EmptyKernelT<void>; } // this one errors out. }; Do you think this is something you can fix quickly or do you want to unroll the change while you figure out what's going on? Comment Actions Let's revert it for now. I will create a review after fixing it and commit it again with the fix. Thanks. Sam |