This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/
-
clang/
-
Basic/
-
DiagnosticSemaKinds.td
-
Sema/
-
Sema.h
-
lib/
-
CodeGen/
-
CGCUDANV.cpp
-
Headers/
-
__clang_cuda_runtime_wrapper.h
-
Sema/
-
SemaCUDA.cpp
-
SemaDecl.cpp
-
test/
-
CodeGenCUDA/
-
Inputs/
-
cuda.h
-
device-stub.cu
-
kernel-args-alignment.cu
-
kernel-call.cu
-
Driver/
-
cuda-simple.cu
-
SemaCUDA/
-
Inputs/
-
cuda.h
-
config-type.cu
-
unittests/ASTMatchers/
-
ASTMatchers/
-
ASTMatchersTest.h

Differential D57488

[CUDA] add support for the new kernel launch API in CUDA-9.2+.
ClosedPublic

Authored by tra on Jan 30 2019, 4:36 PM.

Download Raw Diff

Details

Reviewers

jlebar

Commits

rGc62214da3de0: [CUDA] add support for the new kernel launch API in CUDA-9.2+.
rC352799: [CUDA] add support for the new kernel launch API in CUDA-9.2+.
rL352799: [CUDA] add support for the new kernel launch API in CUDA-9.2+.

Summary

Instead of calling CUDA runtime to arrange function arguments,
the new API constructs arguments in a local array and the kernels
are launched with __cudaLaunchKernel().

The old API has been deprecated and is expected to go away
in the next CUDA release.

Diff Detail

Repository: rC Clang

Event Timeline

tra created this revision.Jan 30 2019, 4:36 PM

Herald added subscribers: bixia, sanjoy. · View Herald TranscriptJan 30 2019, 4:36 PM

Harbormaster completed remote builds in B27516: Diff 184405.Jan 30 2019, 4:36 PM

tra added a parent revision: D57487: [CUDA] Propagate detected version of CUDA to cc1.Jan 30 2019, 4:37 PM

tra mentioned this in D57487: [CUDA] Propagate detected version of CUDA to cc1.

LGTM, mostly nits.

clang/include/clang/Sema/Sema.h
10316 ↗	(On Diff #184405)	Could we be a little less vague, what exactly is the launch-configuration function? (Could be as simple as adding `e.g. cudaFooBar()`.)
clang/lib/CodeGen/CGCUDANV.cpp
201 ↗	(On Diff #184405)	nit `in a local array`
212 ↗	(On Diff #184405)	Nit, s/`1UL`/`uint64{1}`/ or size_t, whatever this function takes. As-is we're baking in the assumption that unsigned long is the same as the type returned by Args.size(), which isn't necessarily true. As an alternative, you could do `std::max<size_t>(1, Args.size())` or whatever the appropriate type is.
239 ↗	(On Diff #184405)	Unfixed FIXME?
260 ↗	(On Diff #184405)	I see lots of references to `__cudaPushCallConfiguration`, but this is the only reference I see to `__cudaPopCallConfiguration`. Is this a typo? Also are we supposed to emit matching push and pop function calls? Kind of weird to do one without the other...
266 ↗	(On Diff #184405)	Whitespace nit, maybe move this whitespace line before the comment?
clang/lib/Headers/__clang_cuda_runtime_wrapper.h
429 ↗	(On Diff #184405)	s/undocumented function/this undocumented function/?

This revision is now accepted and ready to land.Jan 30 2019, 5:05 PM

Addressed Justin's comments.

Harbormaster completed remote builds in B27549: Diff 184543.Jan 31 2019, 10:30 AM

tra added inline comments.Jan 31 2019, 10:37 AM

clang/lib/CodeGen/CGCUDANV.cpp
239 ↗	(On Diff #184405)	Fixed the comment. :-) There's not much we can do if we have no declaration for cudaLaunchKernel, so throwing the error here is the best we can do.
260 ↗	(On Diff #184405)	the `pop` part is indeed used only here. `Push` is something that takes user-specified parameters, so we get Sema to check them. `Pop` is much simpler and does not have any direct user exposure, so we can just create and use it here. As for matching, it is balanced. `Push` is called at the kernel launch site with the parameters of `<<<>>>` .`Pop` is done in the host-side kernel stub where we retrieve those parameters and pass them to the CUDA runtime. Essentially, push/pop are poor names for these functions are the nesting is never more than one level deep. We could've just stashed the arguments in a fixed buffer somewhere.

Updated ASTMatchers unit test.

Harbormaster completed remote builds in B27564: Diff 184592.Jan 31 2019, 1:24 PM

Closed by commit rC352799: [CUDA] add support for the new kernel launch API in CUDA-9.2+. (authored by tra). · Explain WhyJan 31 2019, 1:34 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

clang/

Basic/

DiagnosticSemaKinds.td

2 lines

Sema/

Sema.h

5 lines

lib/

CodeGen/

CGCUDANV.cpp

110 lines

Headers/

__clang_cuda_runtime_wrapper.h

10 lines

Sema/

SemaCUDA.cpp

19 lines

SemaDecl.cpp

7 lines

test/

CodeGenCUDA/

Inputs/

cuda.h

13 lines

device-stub.cu

65 lines

kernel-args-alignment.cu

16 lines

kernel-call.cu

17 lines

Driver/

cuda-simple.cu

6 lines

SemaCUDA/

Inputs/

cuda.h

12 lines

config-type.cu

8 lines

unittests/

ASTMatchers/

ASTMatchersTest.h

4 lines

Diff 184598

include/clang/Basic/DiagnosticSemaKinds.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,137 Lines • ▼ Show 20 Lines	def err_deleted_inherited_ctor_use : Error<
"constructor inherited by %0 from base class %1 is implicitly deleted">;		"constructor inherited by %0 from base class %1 is implicitly deleted">;

def note_called_by : Note<"called by %0">;		def note_called_by : Note<"called by %0">;
def err_kern_type_not_void_return : Error<		def err_kern_type_not_void_return : Error<
"kernel function type %0 must have void return type">;		"kernel function type %0 must have void return type">;
def err_kern_is_nonstatic_method : Error<		def err_kern_is_nonstatic_method : Error<
"kernel function %0 must be a free function or static member function">;		"kernel function %0 must be a free function or static member function">;
def err_config_scalar_return : Error<		def err_config_scalar_return : Error<
"CUDA special function 'cudaConfigureCall' must have scalar return type">;		"CUDA special function '%0' must have scalar return type">;
def err_kern_call_not_global_function : Error<		def err_kern_call_not_global_function : Error<
"kernel call to non-global function %0">;		"kernel call to non-global function %0">;
def err_global_call_not_config : Error<		def err_global_call_not_config : Error<
"call to global function %0 not configured">;		"call to global function %0 not configured">;
def err_ref_bad_target : Error<		def err_ref_bad_target : Error<
"reference to %select{__device__\|__global__\|__host__\|__host__ __device__}0 "		"reference to %select{__device__\|__global__\|__host__\|__host__ __device__}0 "
"function %1 in %select{__device__\|__global__\|__host__\|__host__ __device__}2 function">;		"function %1 in %select{__device__\|__global__\|__host__\|__host__ __device__}2 function">;
def err_ref_bad_target_global_initializer : Error<		def err_ref_bad_target_global_initializer : Error<
▲ Show 20 Lines • Show All 2,360 Lines • Show Last 20 Lines

include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,342 Lines • ▼ Show 20 Lines	public:

/// Check whether NewFD is a valid overload for CUDA. Emits		/// Check whether NewFD is a valid overload for CUDA. Emits
/// diagnostics and invalidates NewFD if not.		/// diagnostics and invalidates NewFD if not.
void checkCUDATargetOverload(FunctionDecl *NewFD,		void checkCUDATargetOverload(FunctionDecl *NewFD,
const LookupResult &Previous);		const LookupResult &Previous);
/// Copies target attributes from the template TD to the function FD.		/// Copies target attributes from the template TD to the function FD.
void inheritCUDATargetAttrs(FunctionDecl *FD, const FunctionTemplateDecl &TD);		void inheritCUDATargetAttrs(FunctionDecl *FD, const FunctionTemplateDecl &TD);

		/// Returns the name of the launch configuration function. This is the name
		/// of the function that will be called to configure kernel call, with the
		/// parameters specified via <<<>>>.
		std::string getCudaConfigureFuncName() const;

/// \name Code completion		/// \name Code completion
//@{		//@{
/// Describes the context in which code completion occurs.		/// Describes the context in which code completion occurs.
enum ParserCompletionContext {		enum ParserCompletionContext {
/// Code completion occurs at top-level or namespace context.		/// Code completion occurs at top-level or namespace context.
PCC_Namespace,		PCC_Namespace,
/// Code completion occurs within a class, struct, or union.		/// Code completion occurs within a class, struct, or union.
PCC_Class,		PCC_Class,
▲ Show 20 Lines • Show All 648 Lines • Show Last 20 Lines

lib/CodeGen/CGCUDANV.cpp

Show All 9 Lines
// runtime library.		// runtime library.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "CGCUDARuntime.h"		#include "CGCUDARuntime.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "CodeGenModule.h"		#include "CodeGenModule.h"
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
		#include "clang/Basic/Cuda.h"
		#include "clang/CodeGen/CodeGenABITypes.h"
#include "clang/CodeGen/ConstantInitBuilder.h"		#include "clang/CodeGen/ConstantInitBuilder.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/Support/Format.h"		#include "llvm/Support/Format.h"

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	llvm::BasicBlock *DummyBlock =
llvm::BasicBlock::Create(Context, "", DummyFunc);		llvm::BasicBlock::Create(Context, "", DummyFunc);
CGBuilderTy FuncBuilder(CGM, Context);		CGBuilderTy FuncBuilder(CGM, Context);
FuncBuilder.SetInsertPoint(DummyBlock);		FuncBuilder.SetInsertPoint(DummyBlock);
FuncBuilder.CreateRetVoid();		FuncBuilder.CreateRetVoid();

return DummyFunc;		return DummyFunc;
}		}

void emitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args);		void emitDeviceStubBodyLegacy(CodeGenFunction &CGF, FunctionArgList &Args);
		void emitDeviceStubBodyNew(CodeGenFunction &CGF, FunctionArgList &Args);

public:		public:
CGNVCUDARuntime(CodeGenModule &CGM);		CGNVCUDARuntime(CodeGenModule &CGM);

void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) override;		void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) override;
void registerDeviceVar(llvm::GlobalVariable &Var, unsigned Flags) override {		void registerDeviceVar(llvm::GlobalVariable &Var, unsigned Flags) override {
DeviceVars.push_back(std::make_pair(&Var, Flags));		DeviceVars.push_back(std::make_pair(&Var, Flags));
}		}
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	llvm::FunctionType *CGNVCUDARuntime::getRegisterLinkedBinaryFnTy() const {
llvm::Type *Params[] = {RegisterGlobalsFnTy->getPointerTo(), VoidPtrTy,		llvm::Type *Params[] = {RegisterGlobalsFnTy->getPointerTo(), VoidPtrTy,
VoidPtrTy, CallbackFnTy->getPointerTo()};		VoidPtrTy, CallbackFnTy->getPointerTo()};
return llvm::FunctionType::get(VoidTy, Params, false);		return llvm::FunctionType::get(VoidTy, Params, false);
}		}

void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
EmittedKernels.push_back(CGF.CurFn);		EmittedKernels.push_back(CGF.CurFn);
emitDeviceStubBody(CGF, Args);		if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
		CudaFeature::CUDA_USES_NEW_LAUNCH))
		emitDeviceStubBodyNew(CGF, Args);
		else
		emitDeviceStubBodyLegacy(CGF, Args);
		}

		// CUDA 9.0+ uses new way to launch kernels. Parameters are packed in a local
		// array and kernels are launched using cudaLaunchKernel().
		void CGNVCUDARuntime::emitDeviceStubBodyNew(CodeGenFunction &CGF,
		FunctionArgList &Args) {
		// Build the shadow stack entry at the very start of the function.

		// Calculate amount of space we will need for all arguments. If we have no
		// args, allocate a single pointer so we still have a valid pointer to the
		// argument array that we can pass to runtime, even if it will be unused.
		Address KernelArgs = CGF.CreateTempAlloca(
		VoidPtrTy, CharUnits::fromQuantity(16), "kernel_args",
		llvm::ConstantInt::get(SizeTy, std::max<size_t>(1, Args.size())));
		// Store pointers to the arguments in a locally allocated launch_args.
		for (unsigned i = 0; i < Args.size(); ++i) {
		llvm::Value* VarPtr = CGF.GetAddrOfLocalVar(Args[i]).getPointer();
		llvm::Value *VoidVarPtr = CGF.Builder.CreatePointerCast(VarPtr, VoidPtrTy);
		CGF.Builder.CreateDefaultAlignedStore(
		VoidVarPtr, CGF.Builder.CreateConstGEP1_32(KernelArgs.getPointer(), i));
		}

		llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end");

		// Lookup cudaLaunchKernel function.
		// cudaError_t cudaLaunchKernel(const void *func, dim3 gridDim, dim3 blockDim,
		// void **args, size_t sharedMem,
		// cudaStream_t stream);
		TranslationUnitDecl *TUDecl = CGM.getContext().getTranslationUnitDecl();
		DeclContext *DC = TranslationUnitDecl::castToDeclContext(TUDecl);
		IdentifierInfo &cudaLaunchKernelII =
		CGM.getContext().Idents.get("cudaLaunchKernel");
		FunctionDecl *cudaLaunchKernelFD = nullptr;
		for (const auto &Result : DC->lookup(&cudaLaunchKernelII)) {
		if (FunctionDecl *FD = dyn_cast<FunctionDecl>(Result))
		cudaLaunchKernelFD = FD;
		}

		if (cudaLaunchKernelFD == nullptr) {
		CGM.Error(CGF.CurFuncDecl->getLocation(),
		"Can't find declaration for cudaLaunchKernel()");
		return;
		}
		// Create temporary dim3 grid_dim, block_dim.
		ParmVarDecl *GridDimParam = cudaLaunchKernelFD->getParamDecl(1);
		QualType Dim3Ty = GridDimParam->getType();
		Address GridDim =
		CGF.CreateMemTemp(Dim3Ty, CharUnits::fromQuantity(8), "grid_dim");
		Address BlockDim =
		CGF.CreateMemTemp(Dim3Ty, CharUnits::fromQuantity(8), "block_dim");
		Address ShmemSize =
		CGF.CreateTempAlloca(SizeTy, CGM.getSizeAlign(), "shmem_size");
		Address Stream =
		CGF.CreateTempAlloca(VoidPtrTy, CGM.getPointerAlign(), "stream");
		llvm::Constant *cudaPopConfigFn = CGM.CreateRuntimeFunction(
		llvm::FunctionType::get(IntTy,
		{/gridDim=/GridDim.getType(),
		/blockDim=/BlockDim.getType(),
		/ShmemSize=/ShmemSize.getType(),
		/Stream=/Stream.getType()},
		/isVarArg=/false),
		"__cudaPopCallConfiguration");

		CGF.EmitRuntimeCallOrInvoke(cudaPopConfigFn,
		{GridDim.getPointer(), BlockDim.getPointer(),
		ShmemSize.getPointer(), Stream.getPointer()});

		// Emit the call to cudaLaunch
		llvm::Value *Kernel = CGF.Builder.CreatePointerCast(CGF.CurFn, VoidPtrTy);
		CallArgList LaunchKernelArgs;
		LaunchKernelArgs.add(RValue::get(Kernel),
		cudaLaunchKernelFD->getParamDecl(0)->getType());
		LaunchKernelArgs.add(RValue::getAggregate(GridDim), Dim3Ty);
		LaunchKernelArgs.add(RValue::getAggregate(BlockDim), Dim3Ty);
		LaunchKernelArgs.add(RValue::get(KernelArgs.getPointer()),
		cudaLaunchKernelFD->getParamDecl(3)->getType());
		LaunchKernelArgs.add(RValue::get(CGF.Builder.CreateLoad(ShmemSize)),
		cudaLaunchKernelFD->getParamDecl(4)->getType());
		LaunchKernelArgs.add(RValue::get(CGF.Builder.CreateLoad(Stream)),
		cudaLaunchKernelFD->getParamDecl(5)->getType());

		QualType QT = cudaLaunchKernelFD->getType();
		QualType CQT = QT.getCanonicalType();
		llvm::Type *Ty = CGM.getTypes().ConvertFunctionType(CQT, cudaLaunchKernelFD);
		llvm::FunctionType *FTy = dyn_cast<llvm::FunctionType>(Ty);

		const CGFunctionInfo &FI =
		CGM.getTypes().arrangeFunctionDeclaration(cudaLaunchKernelFD);
		llvm::Constant *cudaLaunchKernelFn =
		CGM.CreateRuntimeFunction(FTy, "cudaLaunchKernel");
		CGF.EmitCall(FI, CGCallee::forDirect(cudaLaunchKernelFn), ReturnValueSlot(),
		LaunchKernelArgs);
		CGF.EmitBranch(EndBlock);

		CGF.EmitBlock(EndBlock);
}		}

void CGNVCUDARuntime::emitDeviceStubBody(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStubBodyLegacy(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
// Emit a call to cudaSetupArgument for each arg in Args.		// Emit a call to cudaSetupArgument for each arg in Args.
llvm::Constant *cudaSetupArgFn = getSetupArgumentFn();		llvm::Constant *cudaSetupArgFn = getSetupArgumentFn();
llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end");		llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end");
CharUnits Offset = CharUnits::Zero();		CharUnits Offset = CharUnits::Zero();
for (const VarDecl *A : Args) {		for (const VarDecl *A : Args) {
CharUnits TyWidth, TyAlign;		CharUnits TyWidth, TyAlign;
std::tie(TyWidth, TyAlign) =		std::tie(TyWidth, TyAlign) =
CGM.getContext().getTypeInfoInChars(A->getType());		CGM.getContext().getTypeInfoInChars(A->getType());
▲ Show 20 Lines • Show All 428 Lines • Show Last 20 Lines

lib/Headers/__clang_cuda_runtime_wrapper.h

	Show First 20 Lines • Show All 420 Lines • ▼ Show 20 Lines
	#define dim3 __cuda_builtin_blockDim_t			#define dim3 __cuda_builtin_blockDim_t
	#define uint3 __cuda_builtin_threadIdx_t			#define uint3 __cuda_builtin_threadIdx_t
	#include "curand_mtgp32_kernel.h"			#include "curand_mtgp32_kernel.h"
	#pragma pop_macro("dim3")			#pragma pop_macro("dim3")
	#pragma pop_macro("uint3")			#pragma pop_macro("uint3")
	#pragma pop_macro("__USE_FAST_MATH__")			#pragma pop_macro("__USE_FAST_MATH__")
	#pragma pop_macro("__CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS__")			#pragma pop_macro("__CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS__")

				// CUDA runtime uses this undocumented function to access kernel launch
				// configuration. The declaration is in crt/device_functions.h but that file
				// includes a lot of other stuff we don't want. Instead, we'll provide our own
				// declaration for it here.
				#if CUDA_VERSION >= 9020
				extern "C" unsigned __cudaPushCallConfiguration(dim3 gridDim, dim3 blockDim,
				size_t sharedMem = 0,
				void *stream = 0);
				#endif

	#endif // __CUDA__			#endif // __CUDA__
	#endif // __CLANG_CUDA_RUNTIME_WRAPPER_H__			#endif // __CLANG_CUDA_RUNTIME_WRAPPER_H__

lib/Sema/SemaCUDA.cpp

//===--- SemaCUDA.cpp - Semantic Analysis for CUDA constructs -------------===//		//===--- SemaCUDA.cpp - Semantic Analysis for CUDA constructs -------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// \file		/// \file
/// This file implements semantic analysis for CUDA constructs.		/// This file implements semantic analysis for CUDA constructs.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/AST/Decl.h"		#include "clang/AST/Decl.h"
#include "clang/AST/ExprCXX.h"		#include "clang/AST/ExprCXX.h"
		#include "clang/Basic/Cuda.h"
#include "clang/Lex/Preprocessor.h"		#include "clang/Lex/Preprocessor.h"
#include "clang/Sema/Lookup.h"		#include "clang/Sema/Lookup.h"
#include "clang/Sema/Sema.h"		#include "clang/Sema/Sema.h"
#include "clang/Sema/SemaDiagnostic.h"		#include "clang/Sema/SemaDiagnostic.h"
#include "clang/Sema/SemaInternal.h"		#include "clang/Sema/SemaInternal.h"
#include "clang/Sema/Template.h"		#include "clang/Sema/Template.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
Show All 12 Lines	bool Sema::PopForceCUDAHostDevice() {
return true;		return true;
}		}

ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,		ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,
MultiExprArg ExecConfig,		MultiExprArg ExecConfig,
SourceLocation GGGLoc) {		SourceLocation GGGLoc) {
FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();		FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();
if (!ConfigDecl)		if (!ConfigDecl)
return ExprError(		return ExprError(Diag(LLLLoc, diag::err_undeclared_var_use)
Diag(LLLLoc, diag::err_undeclared_var_use)		<< getCudaConfigureFuncName());
<< (getLangOpts().HIP ? "hipConfigureCall" : "cudaConfigureCall"));
QualType ConfigQTy = ConfigDecl->getType();		QualType ConfigQTy = ConfigDecl->getType();

DeclRefExpr *ConfigDR = new (Context)		DeclRefExpr *ConfigDR = new (Context)
DeclRefExpr(Context, ConfigDecl, false, ConfigQTy, VK_LValue, LLLLoc);		DeclRefExpr(Context, ConfigDecl, false, ConfigQTy, VK_LValue, LLLLoc);
MarkFunctionReferenced(LLLLoc, ConfigDecl);		MarkFunctionReferenced(LLLLoc, ConfigDecl);

return ActOnCallExpr(S, ConfigDR, LLLLoc, ExecConfig, GGGLoc, nullptr,		return ActOnCallExpr(S, ConfigDR, LLLLoc, ExecConfig, GGGLoc, nullptr,
/IsExecConfig=/true);		/IsExecConfig=/true);
▲ Show 20 Lines • Show All 897 Lines • ▼ Show 20 Lines

void Sema::inheritCUDATargetAttrs(FunctionDecl *FD,		void Sema::inheritCUDATargetAttrs(FunctionDecl *FD,
const FunctionTemplateDecl &TD) {		const FunctionTemplateDecl &TD) {
const FunctionDecl &TemplateFD = *TD.getTemplatedDecl();		const FunctionDecl &TemplateFD = *TD.getTemplatedDecl();
copyAttrIfPresent<CUDAGlobalAttr>(*this, FD, TemplateFD);		copyAttrIfPresent<CUDAGlobalAttr>(*this, FD, TemplateFD);
copyAttrIfPresent<CUDAHostAttr>(*this, FD, TemplateFD);		copyAttrIfPresent<CUDAHostAttr>(*this, FD, TemplateFD);
copyAttrIfPresent<CUDADeviceAttr>(*this, FD, TemplateFD);		copyAttrIfPresent<CUDADeviceAttr>(*this, FD, TemplateFD);
}		}

		std::string Sema::getCudaConfigureFuncName() const {
		if (getLangOpts().HIP)
		return "hipConfigureCall";

		// New CUDA kernel launch sequence.
		if (CudaFeatureEnabled(Context.getTargetInfo().getSDKVersion(),
		CudaFeature::CUDA_USES_NEW_LAUNCH))
		return "__cudaPushCallConfiguration";

		// Legacy CUDA kernel configuration call
		return "cudaConfigureCall";
		}

lib/Sema/SemaDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,140 Lines • ▼ Show 20 Lines	if (D.isRedeclaration() && !Previous.empty()) {
checkDLLAttributeRedeclaration(*this, Prev, NewFD,		checkDLLAttributeRedeclaration(*this, Prev, NewFD,
isMemberSpecialization \|\|		isMemberSpecialization \|\|
isFunctionTemplateSpecialization,		isFunctionTemplateSpecialization,
D.isFunctionDefinition());		D.isFunctionDefinition());
}		}

if (getLangOpts().CUDA) {		if (getLangOpts().CUDA) {
IdentifierInfo *II = NewFD->getIdentifier();		IdentifierInfo *II = NewFD->getIdentifier();
if (II &&		if (II && II->isStr(getCudaConfigureFuncName()) &&
II->isStr(getLangOpts().HIP ? "hipConfigureCall"
: "cudaConfigureCall") &&
!NewFD->isInvalidDecl() &&		!NewFD->isInvalidDecl() &&
NewFD->getDeclContext()->getRedeclContext()->isTranslationUnit()) {		NewFD->getDeclContext()->getRedeclContext()->isTranslationUnit()) {
if (!R->getAs<FunctionType>()->getReturnType()->isScalarType())		if (!R->getAs<FunctionType>()->getReturnType()->isScalarType())
Diag(NewFD->getLocation(), diag::err_config_scalar_return);		Diag(NewFD->getLocation(), diag::err_config_scalar_return)
		<< getCudaConfigureFuncName();
Context.setcudaConfigureCallDecl(NewFD);		Context.setcudaConfigureCallDecl(NewFD);
}		}

// Variadic functions, other than a declaration of printf, are not allowed		// Variadic functions, other than a declaration of printf, are not allowed
// in device-side CUDA code, unless someone passed		// in device-side CUDA code, unless someone passed
// -fcuda-allow-variadic-functions.		// -fcuda-allow-variadic-functions.
if (!getLangOpts().CUDAAllowVariadicFunctions && NewFD->isVariadic() &&		if (!getLangOpts().CUDAAllowVariadicFunctions && NewFD->isVariadic() &&
(NewFD->hasAttr<CUDADeviceAttr>() \|\|		(NewFD->hasAttr<CUDADeviceAttr>() \|\|
▲ Show 20 Lines • Show All 8,210 Lines • Show Last 20 Lines

test/CodeGenCUDA/Inputs/cuda.h

	Show All 9 Lines
	#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))			#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))

	struct dim3 {			struct dim3 {
	unsigned x, y, z;			unsigned x, y, z;
	__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}			__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}
	};			};

	typedef struct cudaStream *cudaStream_t;			typedef struct cudaStream *cudaStream_t;
				typedef enum cudaError {} cudaError_t;
	#ifdef __HIP__			#ifdef __HIP__
	int hipConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,			int hipConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,
	cudaStream_t stream = 0);			cudaStream_t stream = 0);
	#else			#else
	int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,			extern "C" int cudaConfigureCall(dim3 gridSize, dim3 blockSize,
				size_t sharedSize = 0,
				cudaStream_t stream = 0);
				extern "C" int __cudaPushCallConfiguration(dim3 gridSize, dim3 blockSize,
				size_t sharedSize = 0,
	cudaStream_t stream = 0);			cudaStream_t stream = 0);
				extern "C" cudaError_t cudaLaunchKernel(const void *func, dim3 gridDim,
				dim3 blockDim, void **args,
				size_t sharedMem, cudaStream_t stream);
	#endif			#endif

	extern "C" __device__ int printf(const char*, ...);			extern "C" __device__ int printf(const char*, ...);

test/CodeGenCUDA/device-stub.cu

	// RUN: echo "GPU binary would be here" > %t			// RUN: echo "GPU binary would be here" > %t
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - \			// RUN: -target-sdk-version=8.0 -fcuda-include-gpubinary %t -o - \
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=ALL,NORDC,CUDA,CUDANORDC			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s \
				// RUN: --check-prefixes=ALL,NORDC,CUDA,CUDANORDC,CUDA-OLD
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -target-sdk-version=8.0 -fcuda-include-gpubinary %t \
				// RUN: -o - -DNOGLOBALS \
				// RUN: \| FileCheck -allow-deprecated-dag-overlap %s \
				// RUN: -check-prefixes=NOGLOBALS,CUDANOGLOBALS
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -target-sdk-version=8.0 -fgpu-rdc -fcuda-include-gpubinary %t \
				// RUN: -o - \
				// RUN: \| FileCheck -allow-deprecated-dag-overlap %s \
				// RUN: --check-prefixes=ALL,RDC,CUDA,CUDARDC,CUDA-OLD
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS \			// RUN: -target-sdk-version=8.0 -o - \
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s -check-prefixes=NOGLOBALS,CUDANOGLOBALS			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s -check-prefix=NOGPUBIN

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fgpu-rdc -fcuda-include-gpubinary %t -o - \			// RUN: -target-sdk-version=9.2 -fcuda-include-gpubinary %t -o - \
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=ALL,RDC,CUDA,CUDARDC			// RUN: \| FileCheck %s -allow-deprecated-dag-overlap \
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - \			// RUN: --check-prefixes=ALL,NORDC,CUDA,CUDANORDC,CUDA-NEW
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -target-sdk-version=9.2 -fcuda-include-gpubinary %t -o - -DNOGLOBALS \
				// RUN: \| FileCheck -allow-deprecated-dag-overlap %s \
				// RUN: --check-prefixes=NOGLOBALS,CUDANOGLOBALS
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -target-sdk-version=9.2 -fgpu-rdc -fcuda-include-gpubinary %t -o - \
				// RUN: \| FileCheck %s -allow-deprecated-dag-overlap \
				// RUN: --check-prefixes=ALL,RDC,CUDA,CUDARDC,CUDA_NEW
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -target-sdk-version=9.2 -o - \
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s -check-prefix=NOGPUBIN			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s -check-prefix=NOGPUBIN

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - -x hip\			// RUN: -fcuda-include-gpubinary %t -o - -x hip\
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=ALL,NORDC,HIP,HIPEF			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=ALL,NORDC,HIP,HIPEF
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS -x hip \			// RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS -x hip \
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s -check-prefixes=NOGLOBALS,HIPNOGLOBALS			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s -check-prefixes=NOGLOBALS,HIPNOGLOBALS
	▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
	// * Alias to global symbol containing the NVModuleID.			// * Alias to global symbol containing the NVModuleID.
	// RDC: @__fatbinwrap[[MODULE_ID]] = alias { i32, i32, i8, i8 }			// RDC: @__fatbinwrap[[MODULE_ID]] = alias { i32, i32, i8, i8 }
	// RDC-SAME: { i32, i32, i8, i8 }* @__[[PREFIX]]_fatbin_wrapper			// RDC-SAME: { i32, i32, i8, i8 }* @__[[PREFIX]]_fatbin_wrapper

	// Test that we build the correct number of calls to cudaSetupArgument followed			// Test that we build the correct number of calls to cudaSetupArgument followed
	// by a call to cudaLaunch.			// by a call to cudaLaunch.

	// ALL: define{{.*}}kernelfunc			// ALL: define{{.*}}kernelfunc
	// ALL: call{{.*}}[[PREFIX]]SetupArgument
	// ALL: call{{.*}}[[PREFIX]]SetupArgument			// New launch sequence stores arguments into local buffer and passes array of
	// ALL: call{{.*}}[[PREFIX]]SetupArgument			// pointers to them directly to cudaLaunchKernel
	// ALL: call{{.*}}[[PREFIX]]Launch			// CUDA-NEW: alloca
				// CUDA-NEW: store
				// CUDA-NEW: store
				// CUDA-NEW: store
				// CUDA-NEW: call{{.*}}__cudaPopCallConfiguration
				// CUDA-NEW: call{{.*}}cudaLaunchKernel

				// Legacy style launch sequence sets up arguments by passing them to
				// [cuda\|hip]SetupArgument.
				// CUDA-OLD: call{{.*}}[[PREFIX]]SetupArgument
				// CUDA-OLD: call{{.*}}[[PREFIX]]SetupArgument
				// CUDA-OLD: call{{.*}}[[PREFIX]]SetupArgument
				// CUDA-OLD: call{{.*}}[[PREFIX]]Launch

				// HIP: call{{.*}}[[PREFIX]]SetupArgument
				// HIP: call{{.*}}[[PREFIX]]SetupArgument
				// HIP: call{{.*}}[[PREFIX]]SetupArgument
				// HIP: call{{.*}}[[PREFIX]]Launch
	__global__ void kernelfunc(int i, int j, int k) {}			__global__ void kernelfunc(int i, int j, int k) {}

	// Test that we've built correct kernel launch sequence.			// Test that we've built correct kernel launch sequence.
	// ALL: define{{.*}}hostfunc			// ALL: define{{.*}}hostfunc
	// ALL: call{{.*}}[[PREFIX]]ConfigureCall			// CUDA-OLD: call{{.*}}[[PREFIX]]ConfigureCall
				// CUDA-NEW: call{{.*}}__cudaPushCallConfiguration
				// HIP: call{{.*}}[[PREFIX]]ConfigureCall
	// ALL: call{{.*}}kernelfunc			// ALL: call{{.*}}kernelfunc
	void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }			void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }
	#endif			#endif

	// Test that we've built a function to register kernels and global vars.			// Test that we've built a function to register kernels and global vars.
	// ALL: define internal void @__[[PREFIX]]_register_globals			// ALL: define internal void @__[[PREFIX]]_register_globals
	// ALL: call{{.}}[[PREFIX]]RegisterFunction(i8* %0, {{.*}}kernelfunc			// ALL: call{{.}}[[PREFIX]]RegisterFunction(i8* %0, {{.*}}kernelfunc
	// ALL-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}device_var{{.}}i32 0, i32 4, i32 0, i32 0			// ALL-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}device_var{{.}}i32 0, i32 4, i32 0, i32 0
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

test/CodeGenCUDA/kernel-args-alignment.cu

	// RUN: %clang_cc1 --std=c++11 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \| \			// New CUDA kernel launch sequence does not require explicit specification of
	// RUN: FileCheck -check-prefix HOST -check-prefix CHECK %s			// size/offset for each argument, so only the old way is tested.
				//
				// RUN: %clang_cc1 --std=c++11 -triple x86_64-unknown-linux-gnu -emit-llvm \
				// RUN: -target-sdk-version=8.0 -o - %s \
				// RUN: \| FileCheck -check-prefixes=HOST-OLD,CHECK %s

	// RUN: %clang_cc1 --std=c++11 -fcuda-is-device -triple nvptx64-nvidia-cuda \			// RUN: %clang_cc1 --std=c++11 -fcuda-is-device -triple nvptx64-nvidia-cuda \
	// RUN: -emit-llvm -o - %s \| FileCheck -check-prefix DEVICE -check-prefix CHECK %s			// RUN: -emit-llvm -o - %s \| FileCheck -check-prefixes=DEVICE,CHECK %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	struct U {			struct U {
	short x;			short x;
	} __attribute__((packed));			} __attribute__((packed));

	struct S {			struct S {
	int *ptr;			int *ptr;
	char a;			char a;
	U u;			U u;
	};			};

	// Clang should generate a packed LLVM struct for S (denoted by the <>s),			// Clang should generate a packed LLVM struct for S (denoted by the <>s),
	// otherwise this test isn't interesting.			// otherwise this test isn't interesting.
	// CHECK: %struct.S = type <{ i32*, i8, %struct.U, [5 x i8] }>			// CHECK: %struct.S = type <{ i32*, i8, %struct.U, [5 x i8] }>

	static_assert(alignof(S) == 8, "Unexpected alignment.");			static_assert(alignof(S) == 8, "Unexpected alignment.");

	// HOST-LABEL: @_Z6kernelc1SPi			// HOST-LABEL: @_Z6kernelc1SPi
	// Marshalled kernel args should be:			// Marshalled kernel args should be:
	// 1. offset 0, width 1			// 1. offset 0, width 1
	// 2. offset 8 (because alignof(S) == 8), width 16			// 2. offset 8 (because alignof(S) == 8), width 16
	// 3. offset 24, width 8			// 3. offset 24, width 8
	// HOST: call i32 @cudaSetupArgument({{[^,]*}}, i64 1, i64 0)			// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 1, i64 0)
	// HOST: call i32 @cudaSetupArgument({{[^,]*}}, i64 16, i64 8)			// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 16, i64 8)
	// HOST: call i32 @cudaSetupArgument({{[^,]*}}, i64 8, i64 24)			// HOST-OLD: call i32 @cudaSetupArgument({{[^,]*}}, i64 8, i64 24)

	// DEVICE-LABEL: @_Z6kernelc1SPi			// DEVICE-LABEL: @_Z6kernelc1SPi
	// DEVICE-SAME: i8{{[^,]}}, %struct.S byval align 8{{[^,]}}, i32			// DEVICE-SAME: i8{{[^,]}}, %struct.S byval align 8{{[^,]}}, i32
	__global__ void kernel(char a, S s, int *b) {}			__global__ void kernel(char a, S s, int *b) {}

test/CodeGenCUDA/kernel-call.cu

	// RUN: %clang_cc1 -emit-llvm %s -o - \| FileCheck %s --check-prefixes=CUDA,CHECK			// RUN: %clang_cc1 -target-sdk-version=8.0 -emit-llvm %s -o - \
	// RUN: %clang_cc1 -x hip -emit-llvm %s -o - \| FileCheck %s --check-prefixes=HIP,CHECK			// RUN: \| FileCheck %s --check-prefixes=CUDA-OLD,CHECK
				// RUN: %clang_cc1 -target-sdk-version=9.2 -emit-llvm %s -o - \
				// RUN: \| FileCheck %s --check-prefixes=CUDA-NEW,CHECK
				// RUN: %clang_cc1 -x hip -emit-llvm %s -o - \
				// RUN: \| FileCheck %s --check-prefixes=HIP,CHECK


	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// CHECK-LABEL: define{{.*}}g1			// CHECK-LABEL: define{{.*}}g1
	// HIP: call{{.*}}hipSetupArgument			// HIP: call{{.*}}hipSetupArgument
	// HIP: call{{.*}}hipLaunchByPtr			// HIP: call{{.*}}hipLaunchByPtr
	// CUDA: call{{.*}}cudaSetupArgument			// CUDA-OLD: call{{.*}}cudaSetupArgument
	// CUDA: call{{.*}}cudaLaunch			// CUDA-OLD: call{{.*}}cudaLaunch
				// CUDA-NEW: call{{.*}}__cudaPopCallConfiguration
				// CUDA-NEW: call{{.*}}cudaLaunchKernel
	__global__ void g1(int x) {}			__global__ void g1(int x) {}

	// CHECK-LABEL: define{{.*}}main			// CHECK-LABEL: define{{.*}}main
	int main(void) {			int main(void) {
	// HIP: call{{.*}}hipConfigureCall			// HIP: call{{.*}}hipConfigureCall
	// CUDA: call{{.*}}cudaConfigureCall			// CUDA-OLD: call{{.*}}cudaConfigureCall
				// CUDA-NEW: call{{.*}}__cudaPushCallConfiguration
	// CHECK: icmp			// CHECK: icmp
	// CHECK: br			// CHECK: br
	// CHECK: call{{.*}}g1			// CHECK: call{{.*}}g1
	g1<<<1, 1>>>(42);			g1<<<1, 1>>>(42);
	}			}

test/Driver/cuda-simple.cu

	// Verify that we can parse a simple CUDA file with or without -save-temps			// Verify that we can parse a simple CUDA file with or without -save-temps
	// http://llvm.org/PR22936			// http://llvm.org/PR22936
	// RUN: %clang -nocudainc -nocudalib -Werror -fsyntax-only -c %s			// RUN: %clang -nocudainc -nocudalib -Werror -fsyntax-only -c %s
	//			//
	// Verify that we pass -x cuda-cpp-output to compiler after			// Verify that we pass -x cuda-cpp-output to compiler after
	// preprocessing a CUDA file			// preprocessing a CUDA file
	// RUN: %clang -Werror -### -save-temps -c %s 2>&1 \| FileCheck %s			// RUN: %clang -Werror -### -save-temps -c %s 2>&1 \| FileCheck %s
	// CHECK: "-cc1"			// CHECK: "-cc1"
	// CHECK: "-E"			// CHECK: "-E"
	// CHECK: "-x" "cuda"			// CHECK: "-x" "cuda"
	// CHECK-NEXT: "-cc1"			// CHECK-NEXT: "-cc1"
	// CHECK: "-x" "cuda-cpp-output"			// CHECK: "-x" "cuda-cpp-output"
	//			//
	// Verify that compiler accepts CUDA syntax with "-x cuda-cpp-output".			// Verify that compiler accepts CUDA syntax with "-x cuda-cpp-output".
	// RUN: %clang -Werror -fsyntax-only -x cuda-cpp-output -c %s			// RUN: %clang -Werror -fsyntax-only -x cuda-cpp-output -c %s

	int cudaConfigureCall(int, int);			extern "C" int cudaConfigureCall(int, int);
				extern "C" int __cudaPushCallConfiguration(int, int);

	__attribute__((global)) void kernel() {}			__attribute__((global)) void kernel() {}

	void func() {			void func() {
	kernel<<<1,1>>>();			kernel<<<1,1>>>();
	}			}

test/SemaCUDA/Inputs/cuda.h

	Show All 12 Lines
	#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))			#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))

	struct dim3 {			struct dim3 {
	unsigned x, y, z;			unsigned x, y, z;
	__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}			__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}
	};			};

	typedef struct cudaStream *cudaStream_t;			typedef struct cudaStream *cudaStream_t;
				typedef enum cudaError {} cudaError_t;

	int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,			extern "C" int cudaConfigureCall(dim3 gridSize, dim3 blockSize,
				size_t sharedSize = 0,
	cudaStream_t stream = 0);			cudaStream_t stream = 0);
				extern "C" int __cudaPushCallConfiguration(dim3 gridSize, dim3 blockSize,
				size_t sharedSize = 0,
				cudaStream_t stream = 0);
				extern "C" cudaError_t cudaLaunchKernel(const void *func, dim3 gridDim,
				dim3 blockDim, void **args,
				size_t sharedMem, cudaStream_t stream);

	// Host- and device-side placement new overloads.			// Host- and device-side placement new overloads.
	void operator new(__SIZE_TYPE__, void p) { return p; }			void operator new(__SIZE_TYPE__, void p) { return p; }
	void operator new[](__SIZE_TYPE__, void p) { return p; }			void operator new[](__SIZE_TYPE__, void p) { return p; }
	__device__ void operator new(__SIZE_TYPE__, void p) { return p; }			__device__ void operator new(__SIZE_TYPE__, void p) { return p; }
	__device__ void operator new[](__SIZE_TYPE__, void p) { return p; }			__device__ void operator new[](__SIZE_TYPE__, void p) { return p; }

	#endif // !__NVCC__			#endif // !__NVCC__

test/SemaCUDA/config-type.cu

	// RUN: %clang_cc1 -fsyntax-only -verify %s			// RUN: %clang_cc1 -target-sdk-version=8.0 -fsyntax-only -verify=legacy-launch %s
				// RUN: %clang_cc1 -target-sdk-version=9.2 -fsyntax-only -verify=new-launch %s

	void cudaConfigureCall(unsigned gridSize, unsigned blockSize); // expected-error {{must have scalar return type}}			// legacy-launch-error@+1 {{must have scalar return type}}
				void cudaConfigureCall(unsigned gridSize, unsigned blockSize);
				// new-launch-error@+1 {{must have scalar return type}}
				void __cudaPushCallConfiguration(unsigned gridSize, unsigned blockSize);

unittests/ASTMatchers/ASTMatchersTest.h

Show First 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	const std::string CudaHeader =
"struct dim3 {"		"struct dim3 {"
" unsigned x, y, z;"		" unsigned x, y, z;"
" __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1)"		" __host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1)"
" : x(x), y(y), z(z) {}"		" : x(x), y(y), z(z) {}"
"};"		"};"
"typedef struct cudaStream *cudaStream_t;"		"typedef struct cudaStream *cudaStream_t;"
"int cudaConfigureCall(dim3 gridSize, dim3 blockSize,"		"int cudaConfigureCall(dim3 gridSize, dim3 blockSize,"
" size_t sharedSize = 0,"		" size_t sharedSize = 0,"
" cudaStream_t stream = 0);";		" cudaStream_t stream = 0);"
		"extern \"C\" unsigned __cudaPushCallConfiguration("
		" dim3 gridDim, dim3 blockDim, size_t sharedMem = 0, void *stream = 0);";

bool Found = false, DynamicFound = false;		bool Found = false, DynamicFound = false;
MatchFinder Finder;		MatchFinder Finder;
VerifyMatch VerifyFound(nullptr, &Found);		VerifyMatch VerifyFound(nullptr, &Found);
Finder.addMatcher(AMatcher, &VerifyFound);		Finder.addMatcher(AMatcher, &VerifyFound);
VerifyMatch VerifyDynamicFound(nullptr, &DynamicFound);		VerifyMatch VerifyDynamicFound(nullptr, &DynamicFound);
if (!Finder.addDynamicMatcher(AMatcher, &VerifyDynamicFound))		if (!Finder.addDynamicMatcher(AMatcher, &VerifyDynamicFound))
return testing::AssertionFailure() << "Could not add dynamic matcher";		return testing::AssertionFailure() << "Could not add dynamic matcher";
▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines