This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGCUDANV.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
Inputs/
-
cuda.h
-
cxx-call-kernel.cpp
-
kernel-stub-name.cu
-
unnamed-types.cu

Differential D77743

[HIP] Emit symbols with kernel name in host binary
Needs RevisionPublic

Authored by yaxunl on Apr 8 2020, 9:53 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
hliao

Summary

HIP provide host API to allow C/C++ programs to
launch kernel. A C/C++ program can declare a HIP
kernel as an external function and pass it to
the kernel launching API. When linked with object
files built from HIP programs. These external functions
will resolve to symbols with the same name in HIP
programs so that kernels with the same name can be
found and launched.

This requires clang to emit symbols with the same
name as kernels in object files and use them to
identify kernels, instead of using device stub
functions to identify kernels, since device stub
function has different names than kernels.

This patch lets clang emits a void* type global
variable for each kernel in host IR, which is
called kernel handle. The kernel handle has the
same mangled name as kernel by host ABI. It is
passed to __hipRegisterFunction and kernel launching
functions for identifying kernels.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 8 2020, 9:53 AM

Would not this scheme create a conflict between the device-side mangled kernel name and the handle which we emit with the same name? I recall that the distinct stub name was introduced specifically to avoid confusion between device-side kernel and the host-side stub that were visible at the same time (to debugger only?). Now we seen to re-introduce the same name only for the host-side handle instead of the host-side stub.

In D77743#1970035, @tra wrote:

Would not this scheme create a conflict between the device-side mangled kernel name and the handle which we emit with the same name? I recall that the distinct stub name was introduced specifically to avoid confusion between device-side kernel and the host-side stub that were visible at the same time (to debugger only?). Now we seen to re-introduce the same name only for the host-side handle instead of the host-side stub.

we need the stub name to be different than the kernel name because otherwise the debugger will break on the stub function when the users put a break point on the kernel.

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

In D77743#1970163, @yaxunl wrote:

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

The patch appears to apply only to generated kernels. What happens when we take address of the kernel directly?

a.hip: 
__global__ void kernel() {}

auto kernel_ref() {
  return kernel;
}

b.hip:
extern __global__ void kernel(); // access the handle var
something kernel_ref(); // returns the stub pointer?

void f() {
  auto x = kernel_ref();
  auto y = kernel(); 
  hipLaunchKernel(x,...); // x is the stub pointer. 
  hipLaunchKernel(y,...);
}

Will x and y contain the same value? For CUDA the answer would be yes as they both would contain the address of the host-side stub with the kernel's name.
In this case external reference will point to the handle variable, but I'm not sure what would kernel_ref() return.
My guess is that it will be the stub address, which may be a problem. I may be wrong. It would be good to add a test to verify that we always get consistent results when we're referencing the kernel.

This revision is now accepted and ready to land.Apr 8 2020, 1:54 PM

In D77743#1970304, @tra wrote:
In D77743#1970163, @yaxunl wrote:

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

The patch appears to apply only to generated kernels. What happens when we take address of the kernel directly?
a.hip: 
__global__ void kernel() {}

auto kernel_ref() {
  return kernel;
}

b.hip:
extern __global__ void kernel(); // access the handle var
something kernel_ref(); // returns the stub pointer?

void f() {
  auto x = kernel_ref();
  auto y = kernel(); 
  hipLaunchKernel(x,...); // x is the stub pointer. 
  hipLaunchKernel(y,...);
}
Will x and y contain the same value? For CUDA the answer would be yes as they both would contain the address of the host-side stub with the kernel's name.
In this case external reference will point to the handle variable, but I'm not sure what would kernel_ref() return.
My guess is that it will be the stub address, which may be a problem. I may be wrong. It would be good to add a test to verify that we always get consistent results when we're referencing the kernel.

That's a good question. That introduces the ambiguity on the values of the same symbol (from the programmer point of view). To ensure we won't have ambiguity, we should always use that *alias* global variable for __global__ function on the host side as it will be used in the runtime API to query the device-side function.

This revision now requires changes to proceed.Apr 9 2020, 9:22 AM

In D77743#1972258, @hliao wrote:
In D77743#1970304, @tra wrote:
In D77743#1970163, @yaxunl wrote:

The kernel handle is a variable. Even if it has the same name as kernel, it is OK for the debugger since the debugger does not put break point on a variable.

The patch appears to apply only to generated kernels. What happens when we take address of the kernel directly?
a.hip: 
__global__ void kernel() {}

auto kernel_ref() {
  return kernel;
}

b.hip:
extern __global__ void kernel(); // access the handle var
something kernel_ref(); // returns the stub pointer?

void f() {
  auto x = kernel_ref();
  auto y = kernel(); 
  hipLaunchKernel(x,...); // x is the stub pointer. 
  hipLaunchKernel(y,...);
}
Will x and y contain the same value? For CUDA the answer would be yes as they both would contain the address of the host-side stub with the kernel's name.
In this case external reference will point to the handle variable, but I'm not sure what would kernel_ref() return.
My guess is that it will be the stub address, which may be a problem. I may be wrong. It would be good to add a test to verify that we always get consistent results when we're referencing the kernel.
That's a good question. That introduces the ambiguity on the values of the same symbol (from the programmer point of view). To ensure we won't have ambiguity, we should always use that *alias* global variable for __global__ function on the host side as it will be used in the runtime API to query the device-side function.

I think I need to initialize the kernel handle with the address of the stub function. Any reference to the kernel in host code will use the kernel handle instead of stub function. When the stub function is called, if it is known at compile time, it will be called directly. If it is indirectly called, I will load the stub function from the kernel handle and call it.

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

In D77743#1972292, @hliao wrote:

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

hipKernelLaunch does not call the stub function. The stub function calls hipKernelLaunch. Therefore user/runtime does not need to know about stub function to launch a kernel.

In D77743#1972298, @yaxunl wrote:

In D77743#1972292, @hliao wrote:

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

hipKernelLaunch does not call the stub function. The stub function calls hipKernelLaunch. Therefore user/runtime does not need to know about stub function to launch a kernel.

Since the code using hipKernelLuanch may be compiled by other compilers, we cannot force reinterpreting the use of that symbol by loading value from the symbol. For code like this

__global__ void foo();

hipKernelLaunch(foo, ...)

If it's compiled by other compiler, foo refers to the value of that symbol, i.e. a constant, instead of the value loading from that symbol. They are different.

In D77743#1972301, @hliao wrote:
In D77743#1972298, @yaxunl wrote:

In D77743#1972292, @hliao wrote:

In addition, we may also need to extend the registration to set up the mapping from that global variable to the host side stub function. hipKernelLaunch (implemented as a function call instead of the kernel launch syntax) to call into that stub function to prepare the arguments.

hipKernelLaunch does not call the stub function. The stub function calls hipKernelLaunch. Therefore user/runtime does not need to know about stub function to launch a kernel.

Since the code using hipKernelLuanch may be compiled by other compilers, we cannot force reinterpreting the use of that symbol by loading value from the symbol. For code like this
__global__ void foo();

hipKernelLaunch(foo, ...)
If it's compiled by other compiler, foo refers to the value of that symbol, i.e. a constant, instead of the value loading from that symbol. They are different.

Right. This will work. We don't need user to load from foo, because foo will resolve to kernel handle instead of stub function.

The ambiguity issue is still there. That __global__ function generates different code if it's compiled as HIP by clang or non-HIP code by clang or other compilers. That will break the resolving from the symbol value to its device kernel name.

Is the renaming just being done to avoid breakpoints from triggering in the stub? Can you not disable debugging the stub using whatever mechanism __attribute__((nodebug)) uses?

In D77743#1975822, @rjmccall wrote:

Is the renaming just being done to avoid breakpoints from triggering in the stub? Can you not disable debugging the stub using whatever mechanism __attribute__((nodebug)) uses?

I tried it. The source info and line number is gone, but gdb will still break on the function since symbol is still there.

Ah, too bad. Is there any way to suppress that in debug info? I'm not sure there's any other way to satisfy the competing requirements here, and if it's not going to be consistent, it'd be better to avoid the complexity of mangling the thunk differently.

t-tye added a subscriber: t-tye.Apr 22 2020, 5:36 PM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGCUDANV.cpp

28 lines

test/

CodeGenCUDA/

Inputs/

11 lines

19 lines

18 lines

2 lines

Diff 256047

clang/lib/CodeGen/CGCUDANV.cpp

Show All 35 Lines	private:
llvm::IntegerType IntTy, SizeTy;		llvm::IntegerType IntTy, SizeTy;
llvm::Type *VoidTy;		llvm::Type *VoidTy;
llvm::PointerType CharPtrTy, VoidPtrTy, *VoidPtrPtrTy;		llvm::PointerType CharPtrTy, VoidPtrTy, *VoidPtrPtrTy;

/// Convenience reference to LLVM Context		/// Convenience reference to LLVM Context
llvm::LLVMContext &Context;		llvm::LLVMContext &Context;
/// Convenience reference to the current module		/// Convenience reference to the current module
llvm::Module &TheModule;		llvm::Module &TheModule;
/// Keeps track of kernel launch stubs emitted in this module		/// Keeps track of kernel launch stubs and handles emitted in this module
struct KernelInfo {		struct KernelInfo {
llvm::Function *Kernel;		llvm::Function *Kernel; // stub function to help launch kernel
const Decl *D;		const Decl *D;
};		};
llvm::SmallVector<KernelInfo, 16> EmittedKernels;		llvm::SmallVector<KernelInfo, 16> EmittedKernels;
		// Map a device stub function to a symbol for identifying kernel in host code.
		// For CUDA, the symbol for identifying the kernel is the same as the device
		// stub function. For HIP, they are different.
		llvm::DenseMap<llvm::Function , llvm::GlobalValue > KernelHandles;
struct VarInfo {		struct VarInfo {
llvm::GlobalVariable *Var;		llvm::GlobalVariable *Var;
const VarDecl *D;		const VarDecl *D;
DeviceVarFlags Flags;		DeviceVarFlags Flags;
};		};
llvm::SmallVector<VarInfo, 16> DeviceVars;		llvm::SmallVector<VarInfo, 16> DeviceVars;
/// Keeps track of variable containing handle of GPU binary. Populated by		/// Keeps track of variable containing handle of GPU binary. Populated by
/// ModuleCtorFunction() and used to create corresponding cleanup calls in		/// ModuleCtorFunction() and used to create corresponding cleanup calls in
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	std::string CGNVCUDARuntime::getDeviceSideName(const NamedDecl *ND) {
} else		} else
DeviceSideName = std::string(ND->getIdentifier()->getName());		DeviceSideName = std::string(ND->getIdentifier()->getName());
return DeviceSideName;		return DeviceSideName;
}		}

void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});		EmittedKernels.push_back({CGF.CurFn, CGF.CurFuncDecl});
		llvm::GlobalValue *KernelHandle = CGF.CurFn;
		if (CGF.getLangOpts().HIP) {
		auto Linkage = CGF.CurFn->getLinkage();
		auto *Var = new llvm::GlobalVariable(
		TheModule, VoidPtrTy, /isConstant=/true, Linkage,
		/Initializer=/llvm::ConstantPointerNull::get(VoidPtrTy),
		CGM.getMangledName(GlobalDecl(cast<FunctionDecl>(CGF.CurFuncDecl),
		KernelReferenceKind::Kernel)));
		Var->setAlignment(CGM.getPointerAlign().getAsAlign());
		KernelHandle = Var;
		}
		KernelHandles[CGF.CurFn] = KernelHandle;
if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),		if (CudaFeatureEnabled(CGM.getTarget().getSDKVersion(),
CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|		CudaFeature::CUDA_USES_NEW_LAUNCH) \|\|
CGF.getLangOpts().HIPUseNewLaunchAPI)		CGF.getLangOpts().HIPUseNewLaunchAPI)
emitDeviceStubBodyNew(CGF, Args);		emitDeviceStubBodyNew(CGF, Args);
else		else
emitDeviceStubBodyLegacy(CGF, Args);		emitDeviceStubBodyLegacy(CGF, Args);
}		}

▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	llvm::FunctionCallee cudaPopConfigFn = CGM.CreateRuntimeFunction(
/isVarArg=/false),		/isVarArg=/false),
addUnderscoredPrefixToName("PopCallConfiguration"));		addUnderscoredPrefixToName("PopCallConfiguration"));

CGF.EmitRuntimeCallOrInvoke(cudaPopConfigFn,		CGF.EmitRuntimeCallOrInvoke(cudaPopConfigFn,
{GridDim.getPointer(), BlockDim.getPointer(),		{GridDim.getPointer(), BlockDim.getPointer(),
ShmemSize.getPointer(), Stream.getPointer()});		ShmemSize.getPointer(), Stream.getPointer()});

// Emit the call to cudaLaunch		// Emit the call to cudaLaunch
llvm::Value *Kernel = CGF.Builder.CreatePointerCast(CGF.CurFn, VoidPtrTy);		llvm::Value *Kernel =
		CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], VoidPtrTy);
CallArgList LaunchKernelArgs;		CallArgList LaunchKernelArgs;
LaunchKernelArgs.add(RValue::get(Kernel),		LaunchKernelArgs.add(RValue::get(Kernel),
cudaLaunchKernelFD->getParamDecl(0)->getType());		cudaLaunchKernelFD->getParamDecl(0)->getType());
LaunchKernelArgs.add(RValue::getAggregate(GridDim), Dim3Ty);		LaunchKernelArgs.add(RValue::getAggregate(GridDim), Dim3Ty);
LaunchKernelArgs.add(RValue::getAggregate(BlockDim), Dim3Ty);		LaunchKernelArgs.add(RValue::getAggregate(BlockDim), Dim3Ty);
LaunchKernelArgs.add(RValue::get(KernelArgs.getPointer()),		LaunchKernelArgs.add(RValue::get(KernelArgs.getPointer()),
cudaLaunchKernelFD->getParamDecl(3)->getType());		cudaLaunchKernelFD->getParamDecl(3)->getType());
LaunchKernelArgs.add(RValue::get(CGF.Builder.CreateLoad(ShmemSize)),		LaunchKernelArgs.add(RValue::get(CGF.Builder.CreateLoad(ShmemSize)),
Show All 40 Lines	for (const VarDecl *A : Args) {
llvm::BasicBlock *NextBlock = CGF.createBasicBlock("setup.next");		llvm::BasicBlock *NextBlock = CGF.createBasicBlock("setup.next");
CGF.Builder.CreateCondBr(CBZero, NextBlock, EndBlock);		CGF.Builder.CreateCondBr(CBZero, NextBlock, EndBlock);
CGF.EmitBlock(NextBlock);		CGF.EmitBlock(NextBlock);
Offset += TyWidth;		Offset += TyWidth;
}		}

// Emit the call to cudaLaunch		// Emit the call to cudaLaunch
llvm::FunctionCallee cudaLaunchFn = getLaunchFn();		llvm::FunctionCallee cudaLaunchFn = getLaunchFn();
llvm::Value *Arg = CGF.Builder.CreatePointerCast(CGF.CurFn, CharPtrTy);		llvm::Value *Arg =
		CGF.Builder.CreatePointerCast(KernelHandles[CGF.CurFn], CharPtrTy);
CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);		CGF.EmitRuntimeCallOrInvoke(cudaLaunchFn, Arg);
CGF.EmitBranch(EndBlock);		CGF.EmitBranch(EndBlock);

CGF.EmitBlock(EndBlock);		CGF.EmitBlock(EndBlock);
}		}

/// Creates a function that sets up state on the host side for CUDA objects that		/// Creates a function that sets up state on the host side for CUDA objects that
/// have a presence on both the host and device sides. Specifically, registers		/// have a presence on both the host and device sides. Specifically, registers
Show All 36 Lines	llvm::Function *CGNVCUDARuntime::makeRegisterGlobalsFn() {
// each emitted kernel.		// each emitted kernel.
llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();		llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();
for (auto &&I : EmittedKernels) {		for (auto &&I : EmittedKernels) {
llvm::Constant *KernelName =		llvm::Constant *KernelName =
makeConstantString(getDeviceSideName(cast<NamedDecl>(I.D)));		makeConstantString(getDeviceSideName(cast<NamedDecl>(I.D)));
llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);		llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
llvm::Value *Args[] = {		llvm::Value *Args[] = {
&GpuBinaryHandlePtr,		&GpuBinaryHandlePtr,
Builder.CreateBitCast(I.Kernel, VoidPtrTy),		Builder.CreateBitCast(KernelHandles[I.Kernel], VoidPtrTy),
KernelName,		KernelName,
KernelName,		KernelName,
llvm::ConstantInt::get(IntTy, -1),		llvm::ConstantInt::get(IntTy, -1),
NullPtr,		NullPtr,
NullPtr,		NullPtr,
NullPtr,		NullPtr,
NullPtr,		NullPtr,
llvm::ConstantPointerNull::get(IntTy->getPointerTo())};		llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
▲ Show 20 Lines • Show All 409 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/Inputs/cuda.h

	/* Minimal declarations for CUDA support. Testing purposes only. */			/* Minimal declarations for CUDA support. Testing purposes only. */

	#include <stddef.h>			#include <stddef.h>

				#if __HIP__ \|\| __CUDA__
	#define __constant__ __attribute__((constant))			#define __constant__ __attribute__((constant))
	#define __device__ __attribute__((device))			#define __device__ __attribute__((device))
	#define __global__ __attribute__((global))			#define __global__ __attribute__((global))
	#define __host__ __attribute__((host))			#define __host__ __attribute__((host))
	#define __shared__ __attribute__((shared))			#define __shared__ __attribute__((shared))
	#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))			#define __launch_bounds__(...) __attribute__((launch_bounds(__VA_ARGS__)))
				#else
				#define __constant__
				#define __device__
				#define __global__
				#define __host__
				#define __shared__
				#define __launch_bounds__(...)
				#endif

	struct dim3 {			struct dim3 {
	unsigned x, y, z;			unsigned x, y, z;
	__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}			__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}
	};			};

	#ifdef __HIP__			#if __HIP__ \|\| HIP_PLATFORM
	typedef struct hipStream *hipStream_t;			typedef struct hipStream *hipStream_t;
	typedef enum hipError {} hipError_t;			typedef enum hipError {} hipError_t;
	int hipConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,			int hipConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,
	hipStream_t stream = 0);			hipStream_t stream = 0);
	extern "C" hipError_t __hipPushCallConfiguration(dim3 gridSize, dim3 blockSize,			extern "C" hipError_t __hipPushCallConfiguration(dim3 gridSize, dim3 blockSize,
	size_t sharedSize = 0,			size_t sharedSize = 0,
	hipStream_t stream = 0);			hipStream_t stream = 0);
	extern "C" hipError_t hipLaunchKernel(const void *func, dim3 gridDim,			extern "C" hipError_t hipLaunchKernel(const void *func, dim3 gridDim,
	Show All 18 Lines

clang/test/CodeGenCUDA/cxx-call-kernel.cpp

This file was added.

				// RUN: %clang_cc1 -x hip -emit-llvm-bc %s -o %t.hip.bc
				// RUN: %clang_cc1 -mlink-builtin-bitcode %t.hip.bc -DHIP_PLATFORM -emit-llvm \
				// RUN: %s -o - \| FileCheck %s

				#include "Inputs/cuda.h"

				// CHECK: @_Z2g1i = internal constant i8* null
				#if __HIP__
				__global__ void g1(int x) {}
				#else
				extern void g1(int x);

				// CHECK: call i32 @hipLaunchKernel{{.*}}@_Z2g1i
				void test() {
				hipLaunchKernel((void*)g1, 1, 1, nullptr, 0, 0);
				}

				// CHECK: __hipRegisterFunction{{.*}}@_Z2g1i
				#endif

clang/test/CodeGenCUDA/kernel-stub-name.cu

	// RUN: echo "GPU binary would be here" > %t			// RUN: echo "GPU binary would be here" > %t

	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - -x hip\			// RUN: -fcuda-include-gpubinary %t -o - -x hip\
	// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=CHECK			// RUN: \| FileCheck -allow-deprecated-dag-overlap %s --check-prefixes=CHECK

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

				// Kernel handles

				// CHECK: @[[HCKERN:ckernel]] = constant i8* null
				// CHECK: @[[HNSKERN:_ZN2ns8nskernelEv]] = constant i8* null
				// CHECK: @[[HTKERN:_Z10kernelfuncIiEvv]] = linkonce_odr constant i8* null

	extern "C" __global__ void ckernel() {}			extern "C" __global__ void ckernel() {}

	namespace ns {			namespace ns {
	__global__ void nskernel() {}			__global__ void nskernel() {}
	} // namespace ns			} // namespace ns

	template<class T>			template<class T>
	__global__ void kernelfunc() {}			__global__ void kernelfunc() {}

	__global__ void kernel_decl();			__global__ void kernel_decl();

	// Device side kernel names			// Device side kernel names

	// CHECK: @[[CKERN:[0-9]]] = {{.}} c"ckernel\00"			// CHECK: @[[CKERN:[0-9]]] = {{.}} c"ckernel\00"
	// CHECK: @[[NSKERN:[0-9]]] = {{.}} c"_ZN2ns8nskernelEv\00"			// CHECK: @[[NSKERN:[0-9]]] = {{.}} c"_ZN2ns8nskernelEv\00"
	// CHECK: @[[TKERN:[0-9]]] = {{.}} c"_Z10kernelfuncIiEvv\00"			// CHECK: @[[TKERN:[0-9]]] = {{.}} c"_Z10kernelfuncIiEvv\00"

	// Non-template kernel stub functions			// Non-template kernel stub functions

	// CHECK: define{{.*}}@[[CSTUB:__device_stub__ckernel]]			// CHECK: define{{.*}}@[[CSTUB:__device_stub__ckernel]]
	// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[CSTUB]]			// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[HCKERN]]
	// CHECK: define{{.*}}@[[NSSTUB:_ZN2ns23__device_stub__nskernelEv]]			// CHECK: define{{.*}}@[[NSSTUB:_ZN2ns23__device_stub__nskernelEv]]
	// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[NSSTUB]]			// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[HNSKERN]]

	// CHECK-LABEL: define{{.*}}@_Z8hostfuncv()			// CHECK-LABEL: define{{.*}}@_Z8hostfuncv()
	// CHECK: call void @[[CSTUB]]()			// CHECK: call void @[[CSTUB]]()
	// CHECK: call void @[[NSSTUB]]()			// CHECK: call void @[[NSSTUB]]()
	// CHECK: call void @[[TSTUB:_Z25__device_stub__kernelfuncIiEvv]]()			// CHECK: call void @[[TSTUB:_Z25__device_stub__kernelfuncIiEvv]]()
	// CHECK: call void @[[DSTUB:_Z26__device_stub__kernel_declv]]()			// CHECK: call void @[[DSTUB:_Z26__device_stub__kernel_declv]]()
	void hostfunc(void) {			void hostfunc(void) {
	ckernel<<<1, 1>>>();			ckernel<<<1, 1>>>();
	ns::nskernel<<<1, 1>>>();			ns::nskernel<<<1, 1>>>();
	kernelfunc<int><<<1, 1>>>();			kernelfunc<int><<<1, 1>>>();
	kernel_decl<<<1, 1>>>();			kernel_decl<<<1, 1>>>();
	}			}

	// Template kernel stub functions			// Template kernel stub functions

	// CHECK: define{{.*}}@[[TSTUB]]			// CHECK: define{{.*}}@[[TSTUB]]
	// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[TSTUB]]			// CHECK: call{{.}}@hipLaunchByPtr{{.}}@[[HTKERN]]

	// CHECK: declare{{.*}}@[[DSTUB]]			// CHECK: declare{{.*}}@[[DSTUB]]

	// CHECK-LABEL: define{{.*}}@__hip_register_globals			// CHECK-LABEL: define{{.*}}@__hip_register_globals
	// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[CSTUB]]{{.*}}@[[CKERN]]			// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[HCKERN]]{{.*}}@[[CKERN]]
	// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[NSSTUB]]{{.*}}@[[NSKERN]]			// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[HNSKERN]]{{.*}}@[[NSKERN]]
	// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[TSTUB]]{{.*}}@[[TKERN]]			// CHECK: call{{.}}@__hipRegisterFunction{{.}}@[[HTKERN]]{{.*}}@[[TKERN]]

clang/test/CodeGenCUDA/unnamed-types.cu

	Show All 30 Lines
	// HOST: define internal void @_ZZ2f1PfENKUlS_E_clES_(			// HOST: define internal void @_ZZ2f1PfENKUlS_E_clES_(
	// DEVICE: define internal float @_ZZZ2f1PfENKUlS_E_clES_ENKUlfE_clEf(			// DEVICE: define internal float @_ZZZ2f1PfENKUlS_E_clES_ENKUlfE_clEf(
	void f1(float *p) {			void f1(float *p) {
	[](float *p) {			[](float *p) {
	k0<<<1,1>>>(p, [] __device__ (float x) { return x + 1.f; });			k0<<<1,1>>>(p, [] __device__ (float x) { return x + 1.f; });
	}(p);			}(p);
	}			}
	// HOST: @__hip_register_globals			// HOST: @__hip_register_globals
	// HOST: __hipRegisterFunction{{.}}@_Z17__device_stub__k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_{{.}}@0			// HOST: __hipRegisterFunction{{.}}@_Z2k0IZZ2f1PfENKUlS0_E_clES0_EUlfE_EvS0_T_{{.}}@0