This is an archive of the discontinued LLVM Phabricator instance.

Set calling convention for CUDA kernel
ClosedPublic

Authored by yaxunl on Mar 21 2018, 11:21 AM.

Details

Summary

This patch sets target specific calling convention for CUDA kernels in IR.

Patch by Greg Rodgers.
Modified and lit test added by Yaxun Liu.

Diff Detail

Repository
rC Clang

Event Timeline

yaxunl created this revision.Mar 21 2018, 11:21 AM

Is there a reason for this to be done as a special case in IRGen instead of just implicitly applying the calling convention in Sema?

yaxunl updated this revision to Diff 139359.Mar 21 2018, 1:09 PM

Upload diff with full context.

Is there a reason for this to be done as a special case in IRGen instead of just implicitly applying the calling convention in Sema?

The calling convention is not used in Sema, therefore it seems simpler to do it in codegen. I could try doing this in Sema too. Is there any advantage of doing this in Sema?

Thanks.

rjmccall added a comment.EditedMar 21 2018, 2:41 PM

Is there a reason for this to be done as a special case in IRGen instead of just implicitly applying the calling convention in Sema?

The calling convention is not used in Sema, therefore it seems simpler to do it in codegen. I could try doing this in Sema too. Is there any advantage of doing this in Sema?

In IRGen, it's a special case for your specific language mode on your specific target. In Sema, it can be done as part of the special checking for kernel functions.

Also, it looks like CUDA allows you to take the address of a __global__ function, and indirect calls to such functions presumably still follow the normal CUDA restrictions, so there must be *some* reflection of this in Sema.

yaxunl updated this revision to Diff 139625.Mar 23 2018, 11:47 AM
yaxunl retitled this revision from [AMDGPU] Set calling convention for CUDA kernel to Set calling convention for CUDA kernel.
yaxunl edited the summary of this revision. (Show Details)

Revised by John's comments. Introduce CC_CUDAKernel calling convention in AST, which is translated to target calling convention in IR.

rjmccall added inline comments.Mar 23 2018, 8:04 PM
lib/AST/Type.cpp
2762

For consistency with the rest of this switch, please put the return on the same line as its case.

lib/AST/TypePrinter.cpp
781

I think the spelling for this is __global__. You might need to adjust printing because this isn't the right place to print it, of course.

lib/CodeGen/CGCall.cpp
68

For consistency with the rest of this switch, please put the return on the same line as its case.

lib/Sema/SemaOverload.cpp
1492

It's cheaper not to check the CUDA language mode here; pulling the CC out of the FPT is easy.

Why is this necessary, anyway? From the spec, it doesn't look to me like kernel function pointers can be converted to ordinary function pointers. A kernel function pointer is supposed to be declared with something like __global__ void (*fn)(void). You'll need to change your patch to SemaType to apply the CC even when compiling for the host, of course.

I was going to say that you should use this CC in your validation that calls with execution configurations go to kernel functions, but... I can't actually find where you do that validation.

Do you need these function pointers to be a different size from the host function pointer?

tools/libclang/CXType.cpp
630

Formatting.

yaxunl marked 3 inline comments as done.Mar 27 2018, 1:00 PM
yaxunl added inline comments.
lib/Sema/SemaOverload.cpp
1492

In CUDA, __global__ can only be used with function declaration or definition. Using it in function pointer declaration will result in a warning: 'global' attribute only applies to functions.

Also, there is this lit test in SemaCUDA:

__global__ void kernel() {}

typedef void (*fn_ptr_t)();

__host__ fn_ptr_t get_ptr_h() {
  return kernel;
}

It allows implicit conversion of __global__ void() to void(*)(), therefore I need the above change to drop the CUDA kernel calling convention in such implicit conversion.

rjmccall added inline comments.Mar 27 2018, 1:19 PM
lib/Sema/SemaOverload.cpp
1492

I see. I must have mis-read the specification, but I see that the code samples I can find online agree with that test case. So __global__ function pointers are just treated as function pointers, and it's simply undefined behavior if you try to call a function pointer that happens to be a kernel without an execution configuration, or contrariwise if you use an execution configuration to call a function pointer that isn't a kernel.

In that case, I think the best solution is to just immediately strip __global__ from the type of a DRE to a kernel function, since __global__ isn't supposed to be part of the user-facing type system.

yaxunl updated this revision to Diff 140076.Mar 28 2018, 8:11 AM
yaxunl marked 4 inline comments as done.

Revised by John's comments. Drop CUDA kernel calling convention in DRE.

tra added a subscriber: tra.Mar 28 2018, 10:37 AM

LGTM.

If __global__ is supported in C++ structures, you might also need to make sure that member function constants (&A::kernel_function) drop the CC. And it might be a good idea to make sure that decltype(kernel_function) doesn't have a problem with it, either, since that does do some special-case work.

lib/Sema/SemaExpr.cpp
1669

You should use getAs<FunctionType> here (no const necessary). It's possible to declare a function with a typedef of function type, you just can't define it that way.

Matt, are you OK with the change from amdgcn backend point of view? Thanks.

lib/Sema/SemaExpr.cpp
1669

will do.

yaxunl updated this revision to Diff 140131.Mar 28 2018, 1:50 PM
yaxunl marked 2 inline comments as done.

Use getAs instead of dyn_cast as John suggested.

rjmccall accepted this revision.Mar 28 2018, 2:40 PM

LGTM.

This revision is now accepted and ready to land.Mar 28 2018, 2:40 PM
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.
tra added inline comments.Apr 3 2018, 10:40 AM
lib/Sema/SemaType.cpp
3319–3330

This apparently breaks compilation of some CUDA code in our internal tests. I'm working on minimizing a reproduction case. Should this code be enabled for AMD GPUs only?

tra added inline comments.Apr 3 2018, 10:51 AM
lib/Sema/SemaType.cpp
3319–3330

Here's a small snippet of code that previously used to compile and work:

template <typename T>
__global__ void EmptyKernel(void) { }

struct Dummy {
  /// Type definition of the EmptyKernel kernel entry point
  typedef void (*EmptyKernelPtr)();
  EmptyKernelPtr Empty() { return EmptyKernel<void>; }
};

AFAICT, it's currently impossible to apply global to pointers, so there's no way to make the code above work with this patch applied.

I will try fixing that.

The CUDA kernel calling convention should be dropped in all DRE's since it is invisible to the user.

Sam

tra added a comment.Apr 3 2018, 11:08 AM

I will try fixing that.

The CUDA kernel calling convention should be dropped in all DRE's since it is invisible to the user.

Sam

Apparently it's not always the case.
Here's a bit more elaborate example demonstrating inconsistency in this behavior. Calling convention is ignored for regular functions, but not for function templates.

__global__ void EmptyKernel(void) { }

template <typename T>
__global__ void EmptyKernelT(void) { }

struct Dummy {
  /// Type definition of the EmptyKernel kernel entry point
  typedef void (*EmptyKernelPtr)();
  EmptyKernelPtr Empty() { return EmptyKernel; } // this one works
  EmptyKernelPtr EmptyT() { return EmptyKernelT<void>; } // this one errors out.
};

Do you think this is something you can fix quickly or do you want to unroll the change while you figure out what's going on?

Let's revert it for now. I will create a review after fixing it and commit it again with the fix.

Thanks.

Sam

tra added a comment.Apr 3 2018, 11:33 AM

Let's revert it for now. I will create a review after fixing it and commit it again with the fix.

Thanks.

Sam

reverted in r329099.