Download Raw Diff

Details

Reviewers

rjmccall
arsenm

Commits

rGb2f2bb26e443: Set calling convention for CUDA kernel
rC328795: Set calling convention for CUDA kernel
rL328795: Set calling convention for CUDA kernel

Summary

This patch sets target specific calling convention for CUDA kernels in IR.

Patch by Greg Rodgers.
Modified and lit test added by Yaxun Liu.

Diff Detail

Event Timeline

yaxunl created this revision.Mar 21 2018, 11:21 AM

Herald added subscribers: t-tye, tpr, dstuttard and 2 others. · View Herald TranscriptMar 21 2018, 11:21 AM

Is there a reason for this to be done as a special case in IRGen instead of just implicitly applying the calling convention in Sema?

Upload diff with full context.

In D44747#1044893, @rjmccall wrote:

Is there a reason for this to be done as a special case in IRGen instead of just implicitly applying the calling convention in Sema?

The calling convention is not used in Sema, therefore it seems simpler to do it in codegen. I could try doing this in Sema too. Is there any advantage of doing this in Sema?

Thanks.

In D44747#1044916, @yaxunl wrote:

In D44747#1044893, @rjmccall wrote:

Is there a reason for this to be done as a special case in IRGen instead of just implicitly applying the calling convention in Sema?

The calling convention is not used in Sema, therefore it seems simpler to do it in codegen. I could try doing this in Sema too. Is there any advantage of doing this in Sema?

In IRGen, it's a special case for your specific language mode on your specific target. In Sema, it can be done as part of the special checking for kernel functions.

Also, it looks like CUDA allows you to take the address of a __global__ function, and indirect calls to such functions presumably still follow the normal CUDA restrictions, so there must be *some* reflection of this in Sema.

Revised by John's comments. Introduce CC_CUDAKernel calling convention in AST, which is translated to target calling convention in IR.

rjmccall added inline comments.Mar 23 2018, 8:04 PM

lib/AST/Type.cpp
2762	For consistency with the rest of this switch, please put the return on the same line as its case.
lib/AST/TypePrinter.cpp
781	I think the spelling for this is `__global__`. You might need to adjust printing because this isn't the right place to print it, of course.
lib/CodeGen/CGCall.cpp
68	For consistency with the rest of this switch, please put the return on the same line as its case.
lib/Sema/SemaOverload.cpp
1492	It's cheaper not to check the CUDA language mode here; pulling the CC out of the FPT is easy. Why is this necessary, anyway? From the spec, it doesn't look to me like kernel function pointers can be converted to ordinary function pointers. A kernel function pointer is supposed to be declared with something like `__global__ void (*fn)(void)`. You'll need to change your patch to SemaType to apply the CC even when compiling for the host, of course. I was going to say that you should use this CC in your validation that calls with execution configurations go to kernel functions, but... I can't actually find where you do that validation. Do you need these function pointers to be a different size from the host function pointer?
tools/libclang/CXType.cpp
630	Formatting.

yaxunl marked 3 inline comments as done.Mar 27 2018, 1:00 PM

yaxunl added inline comments.

lib/Sema/SemaOverload.cpp
1492	In CUDA, `__global__` can only be used with function declaration or definition. Using it in function pointer declaration will result in a warning: 'global' attribute only applies to functions. Also, there is this lit test in SemaCUDA: __global__ void kernel() {} typedef void (fn_ptr_t)(); __host__ fn_ptr_t get_ptr_h() { return kernel; } It allows implicit conversion of `__global__ void()` to void()(), therefore I need the above change to drop the CUDA kernel calling convention in such implicit conversion.

rjmccall added inline comments.Mar 27 2018, 1:19 PM

lib/Sema/SemaOverload.cpp
1492	I see. I must have mis-read the specification, but I see that the code samples I can find online agree with that test case. So `__global__` function pointers are just treated as function pointers, and it's simply undefined behavior if you try to call a function pointer that happens to be a kernel without an execution configuration, or contrariwise if you use an execution configuration to call a function pointer that isn't a kernel. In that case, I think the best solution is to just immediately strip `__global__` from the type of a DRE to a kernel function, since `__global__` isn't supposed to be part of the user-facing type system.

Revised by John's comments. Drop CUDA kernel calling convention in DRE.

tra added a subscriber: tra.Mar 28 2018, 10:37 AM

LGTM.

If __global__ is supported in C++ structures, you might also need to make sure that member function constants (&A::kernel_function) drop the CC. And it might be a good idea to make sure that decltype(kernel_function) doesn't have a problem with it, either, since that does do some special-case work.

lib/Sema/SemaExpr.cpp
1669 ↗	(On Diff #140076)	You should use `getAs<FunctionType>` here (no `const` necessary). It's possible to declare a function with a typedef of function type, you just can't define it that way.

Matt, are you OK with the change from amdgcn backend point of view? Thanks.

lib/Sema/SemaExpr.cpp
1669 ↗	(On Diff #140076)	will do.

Use getAs instead of dyn_cast as John suggested.

LGTM.

This revision is now accepted and ready to land.Mar 28 2018, 2:40 PM

Closed by commit rL328795: Set calling convention for CUDA kernel (authored by yaxunl). · Explain WhyMar 29 2018, 8:07 AM

Closed by commit rC328795: Set calling convention for CUDA kernel (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptMar 29 2018, 8:07 AM

tra added inline comments.Apr 3 2018, 10:40 AM

lib/Sema/SemaType.cpp
3319–3330	This apparently breaks compilation of some CUDA code in our internal tests. I'm working on minimizing a reproduction case. Should this code be enabled for AMD GPUs only?

tra added inline comments.Apr 3 2018, 10:51 AM

lib/Sema/SemaType.cpp
3319–3330	Here's a small snippet of code that previously used to compile and work: template <typename T> __global__ void EmptyKernel(void) { } struct Dummy { /// Type definition of the EmptyKernel kernel entry point typedef void (*EmptyKernelPtr)(); EmptyKernelPtr Empty() { return EmptyKernel<void>; } }; AFAICT, it's currently impossible to apply global to pointers, so there's no way to make the code above work with this patch applied.

I will try fixing that.

The CUDA kernel calling convention should be dropped in all DRE's since it is invisible to the user.

Sam

In D44747#1055877, @yaxunl wrote:

I will try fixing that.

The CUDA kernel calling convention should be dropped in all DRE's since it is invisible to the user.

Sam

Apparently it's not always the case.
Here's a bit more elaborate example demonstrating inconsistency in this behavior. Calling convention is ignored for regular functions, but not for function templates.

__global__ void EmptyKernel(void) { }

template <typename T>
__global__ void EmptyKernelT(void) { }

struct Dummy {
  /// Type definition of the EmptyKernel kernel entry point
  typedef void (*EmptyKernelPtr)();
  EmptyKernelPtr Empty() { return EmptyKernel; } // this one works
  EmptyKernelPtr EmptyT() { return EmptyKernelT<void>; } // this one errors out.
};

Do you think this is something you can fix quickly or do you want to unroll the change while you figure out what's going on?

Let's revert it for now. I will create a review after fixing it and commit it again with the fix.

Thanks.

Sam

tra mentioned this in rC329099: Revert "Set calling convention for CUDA kernel".Apr 3 2018, 11:32 AM

tra mentioned this in rL329099: Revert "Set calling convention for CUDA kernel".

In D44747#1055894, @yaxunl wrote:

Let's revert it for now. I will create a review after fixing it and commit it again with the fix.

Thanks.

Sam

reverted in r329099.

tra mentioned this in D45223: [CUDA] Set LLVM calling convention for CUDA kernel.Apr 18 2018, 2:48 PM

Diff 139625

include/clang/Basic/Specifiers.h

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	#include "clang/Basic/OpenCLImageTypes.def"
enum InClassInitStyle {		enum InClassInitStyle {
ICIS_NoInit, ///< No in-class initializer.		ICIS_NoInit, ///< No in-class initializer.
ICIS_CopyInit, ///< Copy initialization.		ICIS_CopyInit, ///< Copy initialization.
ICIS_ListInit ///< Direct list-initialization.		ICIS_ListInit ///< Direct list-initialization.
};		};

/// \brief CallingConv - Specifies the calling convention that a function uses.		/// \brief CallingConv - Specifies the calling convention that a function uses.
enum CallingConv {		enum CallingConv {
CC_C, // __attribute__((cdecl))		CC_C, // __attribute__((cdecl))
CC_X86StdCall, // __attribute__((stdcall))		CC_X86StdCall, // __attribute__((stdcall))
CC_X86FastCall, // __attribute__((fastcall))		CC_X86FastCall, // __attribute__((fastcall))
CC_X86ThisCall, // __attribute__((thiscall))		CC_X86ThisCall, // __attribute__((thiscall))
CC_X86VectorCall, // __attribute__((vectorcall))		CC_X86VectorCall, // __attribute__((vectorcall))
CC_X86Pascal, // __attribute__((pascal))		CC_X86Pascal, // __attribute__((pascal))
CC_Win64, // __attribute__((ms_abi))		CC_Win64, // __attribute__((ms_abi))
CC_X86_64SysV, // __attribute__((sysv_abi))		CC_X86_64SysV, // __attribute__((sysv_abi))
CC_X86RegCall, // __attribute__((regcall))		CC_X86RegCall, // __attribute__((regcall))
CC_AAPCS, // __attribute__((pcs("aapcs")))		CC_AAPCS, // __attribute__((pcs("aapcs")))
CC_AAPCS_VFP, // __attribute__((pcs("aapcs-vfp")))		CC_AAPCS_VFP, // __attribute__((pcs("aapcs-vfp")))
CC_IntelOclBicc, // __attribute__((intel_ocl_bicc))		CC_IntelOclBicc, // __attribute__((intel_ocl_bicc))
CC_SpirFunction, // default for OpenCL functions on SPIR target		CC_SpirFunction, // default for OpenCL functions on SPIR target
CC_OpenCLKernel, // inferred for OpenCL kernels		CC_OpenCLKernel, // inferred for OpenCL kernels
CC_Swift, // __attribute__((swiftcall))		CC_Swift, // __attribute__((swiftcall))
CC_PreserveMost, // __attribute__((preserve_most))		CC_PreserveMost, // __attribute__((preserve_most))
CC_PreserveAll, // __attribute__((preserve_all))		CC_PreserveAll, // __attribute__((preserve_all))
		CC_CUDAKernel, // inferred for CUDA kernels
};		};

/// \brief Checks whether the given calling convention supports variadic		/// \brief Checks whether the given calling convention supports variadic
/// calls. Unprototyped calls also use the variadic call rules.		/// calls. Unprototyped calls also use the variadic call rules.
inline bool supportsVariadicCall(CallingConv CC) {		inline bool supportsVariadicCall(CallingConv CC) {
switch (CC) {		switch (CC) {
case CC_X86StdCall:		case CC_X86StdCall:
case CC_X86FastCall:		case CC_X86FastCall:
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

lib/AST/ItaniumMangle.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	case CC_X86RegCall:			case CC_X86RegCall:
	case CC_AAPCS:			case CC_AAPCS:
	case CC_AAPCS_VFP:			case CC_AAPCS_VFP:
	case CC_IntelOclBicc:			case CC_IntelOclBicc:
	case CC_SpirFunction:			case CC_SpirFunction:
	case CC_OpenCLKernel:			case CC_OpenCLKernel:
	case CC_PreserveMost:			case CC_PreserveMost:
	case CC_PreserveAll:			case CC_PreserveAll:
				case CC_CUDAKernel:
	// FIXME: we should be mangling all of the above.			// FIXME: we should be mangling all of the above.
	return "";			return "";

	case CC_Swift:			case CC_Swift:
	return "swiftcall";			return "swiftcall";
	}			}
	llvm_unreachable("bad calling convention");			llvm_unreachable("bad calling convention");
	}			}
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/AST/Type.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	case CC_AAPCS: return "aapcs";			case CC_AAPCS: return "aapcs";
	case CC_AAPCS_VFP: return "aapcs-vfp";			case CC_AAPCS_VFP: return "aapcs-vfp";
	case CC_IntelOclBicc: return "intel_ocl_bicc";			case CC_IntelOclBicc: return "intel_ocl_bicc";
	case CC_SpirFunction: return "spir_function";			case CC_SpirFunction: return "spir_function";
	case CC_OpenCLKernel: return "opencl_kernel";			case CC_OpenCLKernel: return "opencl_kernel";
	case CC_Swift: return "swiftcall";			case CC_Swift: return "swiftcall";
	case CC_PreserveMost: return "preserve_most";			case CC_PreserveMost: return "preserve_most";
	case CC_PreserveAll: return "preserve_all";			case CC_PreserveAll: return "preserve_all";
				case CC_CUDAKernel:
				return "cuda_kernel";
				rjmccallUnsubmitted Done Reply Inline Actions For consistency with the rest of this switch, please put the return on the same line as its case. rjmccall: For consistency with the rest of this switch, please put the return on the same line as its…
	}			}

	llvm_unreachable("Invalid calling convention.");			llvm_unreachable("Invalid calling convention.");
	}			}

	FunctionProtoType::FunctionProtoType(QualType result, ArrayRef<QualType> params,			FunctionProtoType::FunctionProtoType(QualType result, ArrayRef<QualType> params,
	QualType canonical,			QualType canonical,
	const ExtProtoInfo &epi)			const ExtProtoInfo &epi)
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/AST/TypePrinter.cpp

Show First 20 Lines • Show All 772 Lines • ▼ Show 20 Lines	if (!InsideCCAttribute) {
case CC_X86_64SysV:		case CC_X86_64SysV:
OS << " __attribute__((sysv_abi))";		OS << " __attribute__((sysv_abi))";
break;		break;
case CC_X86RegCall:		case CC_X86RegCall:
OS << " __attribute__((regcall))";		OS << " __attribute__((regcall))";
break;		break;
case CC_SpirFunction:		case CC_SpirFunction:
case CC_OpenCLKernel:		case CC_OpenCLKernel:
		case CC_CUDAKernel:
		rjmccallUnsubmitted Done Reply Inline Actions I think the spelling for this is `__global__`. You might need to adjust printing because this isn't the right place to print it, of course. rjmccall: I think the spelling for this is `__global__`. You might need to adjust printing because this…
// Do nothing. These CCs are not available as attributes.		// Do nothing. These CCs are not available as attributes.
break;		break;
case CC_Swift:		case CC_Swift:
OS << " __attribute__((swiftcall))";		OS << " __attribute__((swiftcall))";
break;		break;
case CC_PreserveMost:		case CC_PreserveMost:
OS << " __attribute__((preserve_most))";		OS << " __attribute__((preserve_most))";
break;		break;
▲ Show 20 Lines • Show All 963 Lines • Show Last 20 Lines

lib/CodeGen/CGCall.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	unsigned CodeGenTypes::ClangCallConvToLLVMCallConv(CallingConv CC) {
case CC_X86Pascal: return llvm::CallingConv::C;		case CC_X86Pascal: return llvm::CallingConv::C;
// TODO: Add support for __vectorcall to LLVM.		// TODO: Add support for __vectorcall to LLVM.
case CC_X86VectorCall: return llvm::CallingConv::X86_VectorCall;		case CC_X86VectorCall: return llvm::CallingConv::X86_VectorCall;
case CC_SpirFunction: return llvm::CallingConv::SPIR_FUNC;		case CC_SpirFunction: return llvm::CallingConv::SPIR_FUNC;
case CC_OpenCLKernel: return CGM.getTargetCodeGenInfo().getOpenCLKernelCallingConv();		case CC_OpenCLKernel: return CGM.getTargetCodeGenInfo().getOpenCLKernelCallingConv();
case CC_PreserveMost: return llvm::CallingConv::PreserveMost;		case CC_PreserveMost: return llvm::CallingConv::PreserveMost;
case CC_PreserveAll: return llvm::CallingConv::PreserveAll;		case CC_PreserveAll: return llvm::CallingConv::PreserveAll;
case CC_Swift: return llvm::CallingConv::Swift;		case CC_Swift: return llvm::CallingConv::Swift;
		case CC_CUDAKernel:
		return CGM.getTargetCodeGenInfo().getCUDAKernelCallingConv();
		rjmccallUnsubmitted Done Reply Inline Actions For consistency with the rest of this switch, please put the return on the same line as its case. rjmccall: For consistency with the rest of this switch, please put the return on the same line as its…
}		}
}		}

/// Derives the 'this' type for codegen purposes, i.e. ignoring method		/// Derives the 'this' type for codegen purposes, i.e. ignoring method
/// qualification.		/// qualification.
/// FIXME: address space qualification?		/// FIXME: address space qualification?
static CanQualType GetThisType(ASTContext &Context, const CXXRecordDecl *RD) {		static CanQualType GetThisType(ASTContext &Context, const CXXRecordDecl *RD) {
QualType RecTy = Context.getTagDeclType(RD)->getCanonicalTypeInternal();		QualType RecTy = Context.getTagDeclType(RD)->getCanonicalTypeInternal();
▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/CodeGen/CGDebugInfo.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	case CC_Swift:			case CC_Swift:
	return llvm::dwarf::DW_CC_LLVM_Swift;			return llvm::dwarf::DW_CC_LLVM_Swift;
	case CC_PreserveMost:			case CC_PreserveMost:
	return llvm::dwarf::DW_CC_LLVM_PreserveMost;			return llvm::dwarf::DW_CC_LLVM_PreserveMost;
	case CC_PreserveAll:			case CC_PreserveAll:
	return llvm::dwarf::DW_CC_LLVM_PreserveAll;			return llvm::dwarf::DW_CC_LLVM_PreserveAll;
	case CC_X86RegCall:			case CC_X86RegCall:
	return llvm::dwarf::DW_CC_LLVM_X86RegCall;			return llvm::dwarf::DW_CC_LLVM_X86RegCall;
				case CC_CUDAKernel:
				// ToDo: Add llvm::dwarf::DW_CC_LLVM_CUDAKernel;
				return 0;
	}			}
	return 0;			return 0;
	}			}

	llvm::DIType CGDebugInfo::CreateType(const FunctionType Ty,			llvm::DIType CGDebugInfo::CreateType(const FunctionType Ty,
	llvm::DIFile *Unit) {			llvm::DIFile *Unit) {
	SmallVector<llvm::Metadata *, 16> EltTys;			SmallVector<llvm::Metadata *, 16> EltTys;

	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/CodeGen/TargetInfo.h

Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	public:
/// this platform.		/// this platform.
virtual void getDetectMismatchOption(llvm::StringRef Name,		virtual void getDetectMismatchOption(llvm::StringRef Name,
llvm::StringRef Value,		llvm::StringRef Value,
llvm::SmallString<32> &Opt) const {}		llvm::SmallString<32> &Opt) const {}

/// Get LLVM calling convention for OpenCL kernel.		/// Get LLVM calling convention for OpenCL kernel.
virtual unsigned getOpenCLKernelCallingConv() const;		virtual unsigned getOpenCLKernelCallingConv() const;

		/// Get LLVM calling convention for CUDA kernel.
		virtual unsigned getCUDAKernelCallingConv() const;

/// Get target specific null pointer.		/// Get target specific null pointer.
/// \param T is the LLVM type of the null pointer.		/// \param T is the LLVM type of the null pointer.
/// \param QT is the clang QualType of the null pointer.		/// \param QT is the clang QualType of the null pointer.
/// \return ConstantPointerNull with the given type \p T.		/// \return ConstantPointerNull with the given type \p T.
/// Each target can override it to return its own desired constant value.		/// Each target can override it to return its own desired constant value.
virtual llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,		virtual llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,
llvm::PointerType *T, QualType QT) const;		llvm::PointerType *T, QualType QT) const;

▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

lib/CodeGen/TargetInfo.cpp

	Show First 20 Lines • Show All 425 Lines • ▼ Show 20 Lines
	// list to enable feasible implementation of clSetKernelArg() with			// list to enable feasible implementation of clSetKernelArg() with
	// aggregates etc. In case we would use the default C calling conv here,			// aggregates etc. In case we would use the default C calling conv here,
	// clSetKernelArg() might break depending on the target-specific			// clSetKernelArg() might break depending on the target-specific
	// conventions; different targets might split structs passed as values			// conventions; different targets might split structs passed as values
	// to multiple function arguments etc.			// to multiple function arguments etc.
	return llvm::CallingConv::SPIR_KERNEL;			return llvm::CallingConv::SPIR_KERNEL;
	}			}

				unsigned TargetCodeGenInfo::getCUDAKernelCallingConv() const {
				return llvm::CallingConv::C;
				}

	llvm::Constant *TargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM,			llvm::Constant *TargetCodeGenInfo::getNullPointer(const CodeGen::CodeGenModule &CGM,
	llvm::PointerType *T, QualType QT) const {			llvm::PointerType *T, QualType QT) const {
	return llvm::ConstantPointerNull::get(T);			return llvm::ConstantPointerNull::get(T);
	}			}

	LangAS TargetCodeGenInfo::getGlobalVarAddressSpace(CodeGenModule &CGM,			LangAS TargetCodeGenInfo::getGlobalVarAddressSpace(CodeGenModule &CGM,
	const VarDecl *D) const {			const VarDecl *D) const {
	assert(!CGM.getLangOpts().OpenCL &&			assert(!CGM.getLangOpts().OpenCL &&
	▲ Show 20 Lines • Show All 1,984 Lines • ▼ Show 20 Lines

	class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {			class AMDGPUTargetCodeGenInfo : public TargetCodeGenInfo {
	public:			public:
	AMDGPUTargetCodeGenInfo(CodeGenTypes &CGT)			AMDGPUTargetCodeGenInfo(CodeGenTypes &CGT)
	: TargetCodeGenInfo(new AMDGPUABIInfo(CGT)) {}			: TargetCodeGenInfo(new AMDGPUABIInfo(CGT)) {}
	void setTargetAttributes(const Decl D, llvm::GlobalValue GV,			void setTargetAttributes(const Decl D, llvm::GlobalValue GV,
	CodeGen::CodeGenModule &M) const override;			CodeGen::CodeGenModule &M) const override;
	unsigned getOpenCLKernelCallingConv() const override;			unsigned getOpenCLKernelCallingConv() const override;
				unsigned getCUDAKernelCallingConv() const override;

	llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,			llvm::Constant *getNullPointer(const CodeGen::CodeGenModule &CGM,
	llvm::PointerType *T, QualType QT) const override;			llvm::PointerType *T, QualType QT) const override;

	LangAS getASTAllocaAddressSpace() const override {			LangAS getASTAllocaAddressSpace() const override {
	return getLangASFromTargetAS(			return getLangASFromTargetAS(
	getABIInfo().getDataLayout().getAllocaAddrSpace());			getABIInfo().getDataLayout().getAllocaAddrSpace());
	}			}
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));			F->addFnAttr("amdgpu-num-vgpr", llvm::utostr(NumVGPR));
	}			}
	}			}

	unsigned AMDGPUTargetCodeGenInfo::getOpenCLKernelCallingConv() const {			unsigned AMDGPUTargetCodeGenInfo::getOpenCLKernelCallingConv() const {
	return llvm::CallingConv::AMDGPU_KERNEL;			return llvm::CallingConv::AMDGPU_KERNEL;
	}			}

				unsigned AMDGPUTargetCodeGenInfo::getCUDAKernelCallingConv() const {
				return llvm::CallingConv::AMDGPU_KERNEL;
				}

	// Currently LLVM assumes null pointers always have value 0,			// Currently LLVM assumes null pointers always have value 0,
	// which results in incorrectly transformed IR. Therefore, instead of			// which results in incorrectly transformed IR. Therefore, instead of
	// emitting null pointers in private and local address spaces, a null			// emitting null pointers in private and local address spaces, a null
	// pointer in generic address space is emitted which is casted to a			// pointer in generic address space is emitted which is casted to a
	// pointer in local or private address space.			// pointer in local or private address space.
	llvm::Constant *AMDGPUTargetCodeGenInfo::getNullPointer(			llvm::Constant *AMDGPUTargetCodeGenInfo::getNullPointer(
	const CodeGen::CodeGenModule &CGM, llvm::PointerType *PT,			const CodeGen::CodeGenModule &CGM, llvm::PointerType *PT,
	QualType QT) const {			QualType QT) const {
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/Sema/SemaOverload.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	if (FromFPT->isNothrow(Context) && !ToFPT->isNothrow(Context)) {			if (FromFPT->isNothrow(Context) && !ToFPT->isNothrow(Context)) {
	FromFn = cast<FunctionType>(			FromFn = cast<FunctionType>(
	Context.getFunctionTypeWithExceptionSpec(QualType(FromFPT, 0),			Context.getFunctionTypeWithExceptionSpec(QualType(FromFPT, 0),
	EST_None)			EST_None)
	.getTypePtr());			.getTypePtr());
	Changed = true;			Changed = true;
	}			}

				// Drop cuda_kernel calling convention since function pointer can only
				// be used in host code.
				if (getLangOpts().CUDA && FromFPT->getCallConv() == CC_CUDAKernel &&
				FromFPT->getCallConv() != ToFPT->getCallConv()) {
				FromFn = Context.adjustFunctionType(
				FromFn, FromEInfo.withCallingConv(ToFPT->getCallConv()));
				Changed = true;
				}
				rjmccallUnsubmitted Done Reply Inline Actions It's cheaper not to check the CUDA language mode here; pulling the CC out of the FPT is easy. Why is this necessary, anyway? From the spec, it doesn't look to me like kernel function pointers can be converted to ordinary function pointers. A kernel function pointer is supposed to be declared with something like `__global__ void (fn)(void)`. You'll need to change your patch to SemaType to apply the CC even when compiling for the host, of course. I was going to say that you should use this CC in your validation that calls with execution configurations go to kernel functions, but... I can't actually find where you do that validation. Do you need these function pointers to be a different size from the host function pointer? rjmccall:* It's cheaper not to check the CUDA language mode here; pulling the CC out of the FPT is easy.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions In CUDA, `__global__` can only be used with function declaration or definition. Using it in function pointer declaration will result in a warning: 'global' attribute only applies to functions. Also, there is this lit test in SemaCUDA: __global__ void kernel() {} typedef void (fn_ptr_t)(); __host__ fn_ptr_t get_ptr_h() { return kernel; } It allows implicit conversion of `__global__ void()` to void()(), therefore I need the above change to drop the CUDA kernel calling convention in such implicit conversion. yaxunl: In CUDA, `__global__` can only be used with function declaration or definition. Using it in…
				rjmccallUnsubmitted Done Reply Inline Actions I see. I must have mis-read the specification, but I see that the code samples I can find online agree with that test case. So `__global__` function pointers are just treated as function pointers, and it's simply undefined behavior if you try to call a function pointer that happens to be a kernel without an execution configuration, or contrariwise if you use an execution configuration to call a function pointer that isn't a kernel. In that case, I think the best solution is to just immediately strip `__global__` from the type of a DRE to a kernel function, since `__global__` isn't supposed to be part of the user-facing type system. rjmccall: I see. I must have mis-read the specification, but I see that the code samples I can find…

	// Convert FromFPT's ExtParameterInfo if necessary. The conversion is valid			// Convert FromFPT's ExtParameterInfo if necessary. The conversion is valid
	// only if the ExtParameterInfo lists of the two function prototypes can be			// only if the ExtParameterInfo lists of the two function prototypes can be
	// merged and the merged list is identical to ToFPT's ExtParameterInfo list.			// merged and the merged list is identical to ToFPT's ExtParameterInfo list.
	SmallVector<FunctionProtoType::ExtParameterInfo, 4> NewParamInfos;			SmallVector<FunctionProtoType::ExtParameterInfo, 4> NewParamInfos;
	bool CanUseToFPT, CanUseFromFPT;			bool CanUseToFPT, CanUseFromFPT;
	if (Context.mergeExtParameterInfo(ToFPT, FromFPT, CanUseToFPT,			if (Context.mergeExtParameterInfo(ToFPT, FromFPT, CanUseToFPT,
	CanUseFromFPT, NewParamInfos) &&			CanUseFromFPT, NewParamInfos) &&
	CanUseToFPT && !CanUseFromFPT) {			CanUseToFPT && !CanUseFromFPT) {
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/Sema/SemaType.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	D.getDeclSpec().getStorageClassSpec() != DeclSpec::SCS_typedef &&			D.getDeclSpec().getStorageClassSpec() != DeclSpec::SCS_typedef &&
	!D.isStaticMember();			!D.isStaticMember();
	}			}
	}			}

	CallingConv CC = S.Context.getDefaultCallingConvention(FTI.isVariadic,			CallingConv CC = S.Context.getDefaultCallingConvention(FTI.isVariadic,
	IsCXXInstanceMethod);			IsCXXInstanceMethod);

				// Attribute AT_CUDAGlobal affects the calling convention for AMDGPU targets.
				// This is the simplest place to infer calling convention for CUDA kernels.
				if (S.getLangOpts().CUDA && S.getLangOpts().CUDAIsDevice) {
				for (const AttributeList *Attr = D.getDeclSpec().getAttributes().getList();
				Attr; Attr = Attr->getNext()) {
				if (Attr->getKind() == AttributeList::AT_CUDAGlobal) {
				CC = CC_CUDAKernel;
				break;
				}
				}
				}

				traUnsubmitted Not Done Reply Inline Actions This apparently breaks compilation of some CUDA code in our internal tests. I'm working on minimizing a reproduction case. Should this code be enabled for AMD GPUs only? tra: This apparently breaks compilation of some CUDA code in our internal tests. I'm working on…
				traUnsubmitted Not Done Reply Inline Actions Here's a small snippet of code that previously used to compile and work: template <typename T> __global__ void EmptyKernel(void) { } struct Dummy { /// Type definition of the EmptyKernel kernel entry point typedef void (EmptyKernelPtr)(); EmptyKernelPtr Empty() { return EmptyKernel<void>; } }; AFAICT, it's currently impossible to apply global to pointers, so there's no way to make the code above work with this patch applied. tra:* Here's a small snippet of code that previously used to compile and work: ``` template…
	// Attribute AT_OpenCLKernel affects the calling convention for SPIR			// Attribute AT_OpenCLKernel affects the calling convention for SPIR
	// and AMDGPU targets, hence it cannot be treated as a calling			// and AMDGPU targets, hence it cannot be treated as a calling
	// convention attribute. This is the simplest place to infer			// convention attribute. This is the simplest place to infer
	// calling convention for OpenCL kernels.			// calling convention for OpenCL kernels.
	if (S.getLangOpts().OpenCL) {			if (S.getLangOpts().OpenCL) {
	for (const AttributeList *Attr = D.getDeclSpec().getAttributes().getList();			for (const AttributeList *Attr = D.getDeclSpec().getAttributes().getList();
	Attr; Attr = Attr->getNext()) {			Attr; Attr = Attr->getNext()) {
	if (Attr->getKind() == AttributeList::AT_OpenCLKernel) {			if (Attr->getKind() == AttributeList::AT_OpenCLKernel) {
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

tools/libclang/CXType.cpp

Show First 20 Lines • Show All 620 Lines • ▼ Show 20 Lines	switch (FD->getCallConv()) {
TCALLINGCONV(AAPCS);		TCALLINGCONV(AAPCS);
TCALLINGCONV(AAPCS_VFP);		TCALLINGCONV(AAPCS_VFP);
TCALLINGCONV(IntelOclBicc);		TCALLINGCONV(IntelOclBicc);
TCALLINGCONV(Swift);		TCALLINGCONV(Swift);
TCALLINGCONV(PreserveMost);		TCALLINGCONV(PreserveMost);
TCALLINGCONV(PreserveAll);		TCALLINGCONV(PreserveAll);
case CC_SpirFunction: return CXCallingConv_Unexposed;		case CC_SpirFunction: return CXCallingConv_Unexposed;
case CC_OpenCLKernel: return CXCallingConv_Unexposed;		case CC_OpenCLKernel: return CXCallingConv_Unexposed;
		case CC_CUDAKernel:
		return CXCallingConv_Unexposed;
		rjmccallUnsubmitted Done Reply Inline Actions Formatting. rjmccall: Formatting.
break;		break;
}		}
#undef TCALLINGCONV		#undef TCALLINGCONV
}		}

return CXCallingConv_Invalid;		return CXCallingConv_Invalid;
}		}

▲ Show 20 Lines • Show All 475 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Set calling convention for CUDA kernel
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 139625

include/clang/Basic/Specifiers.h

lib/AST/ItaniumMangle.cpp

lib/AST/Type.cpp

lib/AST/TypePrinter.cpp

lib/CodeGen/CGCall.cpp

lib/CodeGen/CGDebugInfo.cpp

lib/CodeGen/TargetInfo.h

lib/CodeGen/TargetInfo.cpp

lib/Sema/SemaOverload.cpp

lib/Sema/SemaType.cpp

tools/libclang/CXType.cpp

This is an archive of the discontinued LLVM Phabricator instance.

Set calling convention for CUDA kernelClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 139625

include/clang/Basic/Specifiers.h

lib/AST/ItaniumMangle.cpp

lib/AST/Type.cpp

lib/AST/TypePrinter.cpp

lib/CodeGen/CGCall.cpp

lib/CodeGen/CGDebugInfo.cpp

lib/CodeGen/TargetInfo.h

lib/CodeGen/TargetInfo.cpp

lib/Sema/SemaOverload.cpp

lib/Sema/SemaType.cpp

tools/libclang/CXType.cpp

Set calling convention for CUDA kernel
ClosedPublic