This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CodeGenModule.cpp
-
TargetInfo.h
3
TargetInfo.cpp
-
test/CodeGenOpenCL/
-
CodeGenOpenCL/
-
amdgpu-kernel-calls.cl
-
visibility.cl

Differential D120566

[OpenCL][AMDGPU]: Do not allow a call to kernel
Needs ReviewPublic

Authored by cdevadas on Feb 25 2022, 6:47 AM.

Download Raw Diff

Details

Reviewers

rjmccall
Anastasia
yaxunl
arsenm

Summary

In OpenCL, a kernel is allowed to call other kernels as if
they are regular functions. To support it, clang emits
amdgpu_kernel calling convention for both caller and callee.
A backend pass in our downstream compiler alters such calls
by introducing regular function bodies which are clones of
the callee kernels. This implementation currently limits us
in certain ways. For instance, the restriction to not use
byref attribute for callee kernels.

To avoid such limitations, this patch brings in those
cloned functions early on and prevents clang from generating
amdgpu_kernel call sites. A new function body will be added
for each kernel in the compilation unit expecting that the
unused clones will get removed at link time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cdevadas created this revision.Feb 25 2022, 6:47 AM

Herald added subscribers: Naghasan, ldrumm, kerbowa and 5 others. · View Herald TranscriptFeb 25 2022, 6:47 AM

cdevadas requested review of this revision.Feb 25 2022, 6:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2022, 6:47 AM

Herald added subscribers: cfe-commits, wdng. · View Herald Transcript

One of my concerns is that all kernels are duplicated which may cause code object size doubled.

Do we need to make the clone always_inline and let the kernel call its clone to avoid duplicate function bodies? Or LLVM has some pass to do that?

Another concern is that the duplicate non-kernel functions have actual kernel ABI. Not sure if that can cause any issues.

Harbormaster completed remote builds in B151475: Diff 411402.Feb 25 2022, 7:14 AM

In D120566#3345604, @yaxunl wrote:

One of my concerns is that all kernels are duplicated which may cause code object size doubled.

Not really, the kernel should just be a stub that calls the real implementation function. In the real world this will always be inlined

Do we need to make the clone always_inline and let the kernel call its clone to avoid duplicate function bodies? Or LLVM has some pass to do that?

It's not a special case, there's no real need to put always_inline. Nobody uses this feature in the real world anyway, and single use functions will be inlined

Another concern is that the duplicate non-kernel functions have actual kernel ABI. Not sure if that can cause any issues.

My main question is how we have the symbol for the kernel and function coexist

arsenm added inline comments.Feb 25 2022, 7:19 AM

clang/lib/CodeGen/TargetInfo.cpp
9238	I don't think we can really start with the function IR. The TargetABIInfo could be different from the kernel and function form (and will due to using byval/byref etc.)
9240	I don't think adding a prefix and suffix is a good strategy for something which in principle should be ABI visible. A period + suffix I think would be a better convention
9478–9479	This is basically just moving what the current hack does into clang. Can we emit calls to the function version up front?

Is there something which stops you from taking the address of a kernel and then calling it? If not, are there actually any uses of kernels in the module that shouldn't be rewritten as uses of the clone?

I feel like this would be a lot easier to just fix in your LLVM pass so that you rewrite any uses of a kernel to use a clone instead before you rewrite the kernel.

In D120566#3346506, @rjmccall wrote:

Is there something which stops you from taking the address of a kernel and then calling it? If not, are there actually any uses of kernels in the module that shouldn't be rewritten as uses of the clone?

The actual amdgpu_kernel is uncallable and has a totally different ABI, and is invoked by external driver code. From the user's device code perspective, only the callable function version is meaningful.

I feel like this would be a lot easier to just fix in your LLVM pass so that you rewrite any uses of a kernel to use a clone instead before you rewrite the kernel.

Then we can't ban calls to kernels (and would be pushing some kind of symbol naming conflict decision into the backend) and in principle would have to handle this actual call.

We also don't really want these to have similar/compatible signatures where you can just swap out the call target. For example I want to more drastically change the IR used for aggregates between the two cases.

In D120566#3346533, @arsenm wrote:

In D120566#3346506, @rjmccall wrote:

Is there something which stops you from taking the address of a kernel and then calling it? If not, are there actually any uses of kernels in the module that shouldn't be rewritten as uses of the clone?

The actual amdgpu_kernel is uncallable and has a totally different ABI, and is invoked by external driver code. From the user's device code perspective, only the callable function version is meaningful.

I think you're misunderstanding what I'm asking. I believe that in OpenCL, you can do &someKernelFunction in source code and then call that. The rewrite in this patch does not handle non-call uses of the kernel function and so will continue to miscompile them.

I feel like this would be a lot easier to just fix in your LLVM pass so that you rewrite any uses of a kernel to use a clone instead before you rewrite the kernel.

Then we can't ban calls to kernels (and would be pushing some kind of symbol naming conflict decision into the backend) and in principle would have to handle this actual call.

Okay, this is not an accurate description of what you're trying to do, and this is important to be precise about. You are not "banning calls to kernels", which would be a novel language restriction and make you non-conformant to OpenCL. You still have a language requirement to allow code to directly use kernel functions. That is why this patch is modifying IR generation instead of emitting new errors in Sema.

What's happening here is that your target (very reasonably) requires kernels to have a special kernel entrypoint in order to be called from outside. That entrypoint uses a very different ABI from ordinary functions, one which simplifies being dynamically called by the runtime, and so it is important that ordinary uses of the function don't accidentally resolve against that special entrypoint. You therefore need two different functions for the kernel, one to satisfy standard uses and one to act as the kernel entrypoint.

Your current architecture is to generate code normally, which will produce what's roughly the standard entrypoint, and then have a backend pass break that down and produce a kernel entrypoint. I can understand why you find that frustratingly limited, and I agree that it doesn't seem to handle standard uses correctly. Something needs to change here.

Now, Clang supports many different kernel languages, all of which face very similar language/implementation problems. It is therefore always informative to go check to see how other language implementors have tried to solve the problem you're facing. So if you go and look at how CUDA is implemented in Clang, you will see that they have introduced a "kernel reference kind" to GlobalDecl, which allows them to distinguish between the kernel entrypoint and the standard entrypoint in IRGen. You could very easily build on this in your OpenCL implementation so that Clang emits the standard entrypoint and then either your pass or IRGen itself fills in the kernel entrypoint by marshaling arguments and then calling (presumably in a way that forces inlining) the standard entrypoint. That would also give you total control of how arguments are marshaled in the kernel entrypoint.

Alternatively, I think cloning the standard entrypoint so that uses of it are rewritten to a clone is reasonable enough. I don't really see why doing the cloning in IRGen is necessary when you already have a module pass that does similar kinds of rewriting. Doing the clone in IRGen also does not seem to move you closer to your goal of actually marshaling arguments differently. Most importantly, though, I believe you do need to rewrite all the uses and not just the direct calls.

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CodeGenModule.cpp

1 line

TargetInfo.h

4 lines

TargetInfo.cpp

48 lines

test/

CodeGenOpenCL/

amdgpu-kernel-calls.cl

60 lines

visibility.cl

35 lines

Diff 411402

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines	if (getTriple().isAMDGPU()) {
// times 100.		// times 100.
// ToDo: Enable module flag for all code object version when ROCm device		// ToDo: Enable module flag for all code object version when ROCm device
// library is ready.		// library is ready.
if (getTarget().getTargetOpts().CodeObjectVersion == TargetOptions::COV_5) {		if (getTarget().getTargetOpts().CodeObjectVersion == TargetOptions::COV_5) {
getModule().addModuleFlag(llvm::Module::Error,		getModule().addModuleFlag(llvm::Module::Error,
"amdgpu_code_object_version",		"amdgpu_code_object_version",
getTarget().getTargetOpts().CodeObjectVersion);		getTarget().getTargetOpts().CodeObjectVersion);
}		}
		getTargetCodeGenInfo().finalizeModule(TheModule);
}		}

emitLLVMUsed();		emitLLVMUsed();
if (SanStats)		if (SanStats)
SanStats->finish();		SanStats->finish();

if (CodeGenOpts.Autolink &&		if (CodeGenOpts.Autolink &&
(Context.getLangOpts().Modules \|\| !LinkerOptionsMetadata.empty())) {		(Context.getLangOpts().Modules \|\| !LinkerOptionsMetadata.empty())) {
▲ Show 20 Lines • Show All 6,079 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.h

Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	virtual void getDependentLibraryOption(llvm::StringRef Lib,
llvm::SmallString<24> &Opt) const;		llvm::SmallString<24> &Opt) const;

/// Gets the linker options necessary to detect object file mismatches on		/// Gets the linker options necessary to detect object file mismatches on
/// this platform.		/// this platform.
virtual void getDetectMismatchOption(llvm::StringRef Name,		virtual void getDetectMismatchOption(llvm::StringRef Name,
llvm::StringRef Value,		llvm::StringRef Value,
llvm::SmallString<32> &Opt) const {}		llvm::SmallString<32> &Opt) const {}

		/// Clean up and other special handling at the end when all functions are
		/// codegenerated.
		virtual void finalizeModule(llvm::Module &M) const {}

/// Get LLVM calling convention for OpenCL kernel.		/// Get LLVM calling convention for OpenCL kernel.
virtual unsigned getOpenCLKernelCallingConv() const;		virtual unsigned getOpenCLKernelCallingConv() const;

/// Get target specific null pointer.		/// Get target specific null pointer.
/// \param T is the LLVM type of the null pointer.		/// \param T is the LLVM type of the null pointer.
/// \param QT is the clang QualType of the null pointer.		/// \param QT is the clang QualType of the null pointer.
/// \return ConstantPointerNull with the given type \p T.		/// \return ConstantPointerNull with the given type \p T.
/// Each target can override it to return its own desired constant value.		/// Each target can override it to return its own desired constant value.
▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

clang/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 13 Lines
#include "TargetInfo.h"		#include "TargetInfo.h"
#include "ABIInfo.h"		#include "ABIInfo.h"
#include "CGBlocks.h"		#include "CGBlocks.h"
#include "CGCXXABI.h"		#include "CGCXXABI.h"
#include "CGValue.h"		#include "CGValue.h"
#include "CodeGenFunction.h"		#include "CodeGenFunction.h"
#include "clang/AST/Attr.h"		#include "clang/AST/Attr.h"
#include "clang/AST/RecordLayout.h"		#include "clang/AST/RecordLayout.h"
		#include "clang/Basic/Builtins.h"
#include "clang/Basic/CodeGenOptions.h"		#include "clang/Basic/CodeGenOptions.h"
#include "clang/Basic/DiagnosticFrontend.h"		#include "clang/Basic/DiagnosticFrontend.h"
#include "clang/Basic/Builtins.h"
#include "clang/CodeGen/CGFunctionInfo.h"		#include "clang/CodeGen/CGFunctionInfo.h"
#include "clang/CodeGen/SwiftCallingConv.h"		#include "clang/CodeGen/SwiftCallingConv.h"
#include "llvm/ADT/SmallBitVector.h"		#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/ADT/Triple.h"		#include "llvm/ADT/Triple.h"
#include "llvm/ADT/Twine.h"		#include "llvm/ADT/Twine.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/IntrinsicsNVPTX.h"		#include "llvm/IR/IntrinsicsNVPTX.h"
#include "llvm/IR/IntrinsicsS390.h"		#include "llvm/IR/IntrinsicsS390.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
		#include "llvm/Transforms/Utils/Cloning.h"
#include <algorithm> // std::sort		#include <algorithm> // std::sort

using namespace clang;		using namespace clang;
using namespace CodeGen;		using namespace CodeGen;

// Helper for coercing an aggregate argument or return value into an integer		// Helper for coercing an aggregate argument or return value into an integer
// array of the same size (including padding) and alignment. This alternate		// array of the same size (including padding) and alignment. This alternate
// coercion happens only for the RenderScript ABI and can be removed after		// coercion happens only for the RenderScript ABI and can be removed after
▲ Show 20 Lines • Show All 9,167 Lines • ▼ Show 20 Lines	llvm::SyncScope::ID getLLVMSyncScopeID(const LangOptions &LangOpts,
llvm::AtomicOrdering Ordering,		llvm::AtomicOrdering Ordering,
llvm::LLVMContext &Ctx) const override;		llvm::LLVMContext &Ctx) const override;
llvm::Function *		llvm::Function *
createEnqueuedBlockKernel(CodeGenFunction &CGF,		createEnqueuedBlockKernel(CodeGenFunction &CGF,
llvm::Function *BlockInvokeFunc,		llvm::Function *BlockInvokeFunc,
llvm::Value *BlockLiteral) const override;		llvm::Value *BlockLiteral) const override;
bool shouldEmitStaticExternCAliases() const override;		bool shouldEmitStaticExternCAliases() const override;
void setCUDAKernelCallingConvention(const FunctionType *&FT) const override;		void setCUDAKernelCallingConvention(const FunctionType *&FT) const override;
		void finalizeModule(llvm::Module &M) const override;
};		};
}		}

static bool requiresAMDGPUProtectedVisibility(const Decl *D,		static bool requiresAMDGPUProtectedVisibility(const Decl *D,
llvm::GlobalValue *GV) {		llvm::GlobalValue *GV) {
if (GV->getVisibility() != llvm::GlobalValue::HiddenVisibility)		if (GV->getVisibility() != llvm::GlobalValue::HiddenVisibility)
return false;		return false;

return D->hasAttr<OpenCLKernelAttr>() \|\|		return D->hasAttr<OpenCLKernelAttr>() \|\|
(isa<FunctionDecl>(D) && D->hasAttr<CUDAGlobalAttr>()) \|\|		(isa<FunctionDecl>(D) && D->hasAttr<CUDAGlobalAttr>()) \|\|
(isa<VarDecl>(D) &&		(isa<VarDecl>(D) &&
(D->hasAttr<CUDADeviceAttr>() \|\| D->hasAttr<CUDAConstantAttr>() \|\|		(D->hasAttr<CUDADeviceAttr>() \|\| D->hasAttr<CUDAConstantAttr>() \|\|
cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinSurfaceType() \|\|		cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinSurfaceType() \|\|
cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinTextureType()));		cast<VarDecl>(D)->getType()->isCUDADeviceBuiltinTextureType()));
}		}

		static llvm::Function *getKernelClone(llvm::Function &F) {
		arsenmUnsubmitted Not Done Reply Inline Actions I don't think we can really start with the function IR. The TargetABIInfo could be different from the kernel and function form (and will due to using byval/byref etc.) arsenm: I don't think we can really start with the function IR. The TargetABIInfo could be different…
		llvm::Module *M = F.getParent();
		SmallString<128> MangledName("__amdgpu_");
		arsenmUnsubmitted Not Done Reply Inline Actions I don't think adding a prefix and suffix is a good strategy for something which in principle should be ABI visible. A period + suffix I think would be a better convention arsenm: I don't think adding a prefix and suffix is a good strategy for something which in principle…
		MangledName.append(F.getName());
		MangledName.append("_kernel_body");
		llvm::Function *NewF = M->getFunction(MangledName);
		if (!NewF) {
		llvm::ValueToValueMapTy ignored;
		NewF = F.empty()
		? llvm::Function::Create(F.getFunctionType(),
		llvm::GlobalVariable::ExternalLinkage,
		"", M)
		: CloneFunction(&F, ignored);
		NewF->setCallingConv(llvm::CallingConv::C);
		NewF->setName(MangledName);
		}

		return NewF;
		}

void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(		void AMDGPUTargetCodeGenInfo::setFunctionDeclAttributes(
const FunctionDecl FD, llvm::Function F, CodeGenModule &M) const {		const FunctionDecl FD, llvm::Function F, CodeGenModule &M) const {
const auto *ReqdWGS =		const auto *ReqdWGS =
M.getLangOpts().OpenCL ? FD->getAttr<ReqdWorkGroupSizeAttr>() : nullptr;		M.getLangOpts().OpenCL ? FD->getAttr<ReqdWorkGroupSizeAttr>() : nullptr;
const bool IsOpenCLKernel =		const bool IsOpenCLKernel =
M.getLangOpts().OpenCL && FD->hasAttr<OpenCLKernelAttr>();		M.getLangOpts().OpenCL && FD->hasAttr<OpenCLKernelAttr>();
const bool IsHIPKernel = M.getLangOpts().HIP && FD->hasAttr<CUDAGlobalAttr>();		const bool IsHIPKernel = M.getLangOpts().HIP && FD->hasAttr<CUDAGlobalAttr>();

▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines
}		}

void AMDGPUTargetCodeGenInfo::setCUDAKernelCallingConvention(		void AMDGPUTargetCodeGenInfo::setCUDAKernelCallingConvention(
const FunctionType *&FT) const {		const FunctionType *&FT) const {
FT = getABIInfo().getContext().adjustFunctionType(		FT = getABIInfo().getContext().adjustFunctionType(
FT, FT->getExtInfo().withCallingConv(CC_OpenCLKernel));		FT, FT->getExtInfo().withCallingConv(CC_OpenCLKernel));
}		}

		void AMDGPUTargetCodeGenInfo::finalizeModule(llvm::Module &M) const {
		// Insert a cloned function body for each kernel and adjust the kernel
		// callsite to use its equivalent clone function. For extern kernel calls,
		// insert a declaration node since the body isn't available.
		if (!getABIInfo().getContext().getLangOpts().OpenCL)
		return;

		for (auto &F : M) {
		if (F.getCallingConv() != llvm::CallingConv::AMDGPU_KERNEL)
		continue;

		llvm::Function *Clone = getKernelClone(F);
		for (llvm::Function::user_iterator UI = F.user_begin(), UE = F.user_end();
		UI != UE;) {
		auto CI = dyn_cast<llvm::CallInst>(UI++);
		if (!CI)
		continue;

		CI->setCalledFunction(Clone);
		CI->setCallingConv(llvm::CallingConv::C);
		arsenmUnsubmitted Not Done Reply Inline Actions This is basically just moving what the current hack does into clang. Can we emit calls to the function version up front? arsenm: This is basically just moving what the current hack does into clang. Can we emit calls to the…
		}
		}
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SPARC v8 ABI Implementation.		// SPARC v8 ABI Implementation.
// Based on the SPARC Compliance Definition version 2.4.1.		// Based on the SPARC Compliance Definition version 2.4.1.
//		//
// Ensures that complex values are passed in registers.		// Ensures that complex values are passed in registers.
//		//
namespace {		namespace {
class SparcV8ABIInfo : public DefaultABIInfo {		class SparcV8ABIInfo : public DefaultABIInfo {
▲ Show 20 Lines • Show All 2,116 Lines • Show Last 20 Lines

clang/test/CodeGenOpenCL/amdgpu-kernel-calls.cl

This file was added.

				// REQUIRES: amdgpu-registered-target
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -S -disable-llvm-passes -emit-llvm -o - %s \| FileCheck %s

				// AMDGPU disallows kernel callsites from another kernels. For each kernel, clang codegen will introduce
				// a cloned function body with a non-kernel calling convention and amdgpu_kernel callsites will get
				// transformed to call appropriate clones.

				extern kernel void test_extern_kernel_callee(global int *in);

				// CHECK: define dso_local amdgpu_kernel void @test_kernel_callee(i32 addrspace(1)* noundef align 4 %in)
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[IN_ADDR:%.]] = alloca i32 addrspace(1), align 8, addrspace(5)
				// CHECK-NEXT: store i32 addrspace(1)* [[IN:%.]], i32 addrspace(1) addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load i32 addrspace(1), i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: store i32 10, i32 addrspace(1)* [[TMP0]], align 4
				// CHECK-NEXT: ret void
				//
				kernel void test_kernel_callee(global int *in) {
				*in = (int)(10);
				}

				// CHECK: define dso_local amdgpu_kernel void @test_kernel_caller(i32 addrspace(1)* noundef align 4 %in)
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[IN_ADDR:%.]] = alloca i32 addrspace(1), align 8, addrspace(5)
				// CHECK-NEXT: store i32 addrspace(1)* [[IN:%.]], i32 addrspace(1) addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: %{{.}} = load i32 addrspace(1), i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: call void @__amdgpu_test_kernel_callee_kernel_body(
				// CHECK-NOT: call amdgpu_kernel void @test_kernel_callee(
				// CHECK-NEXT: %{{.}} = load i32 addrspace(1), i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: call void @__amdgpu_test_extern_kernel_callee_kernel_body(
				// CHECK-NOT: call amdgpu_kernel void @test_kernel_callee(
				// CHECK-NEXT: ret void
				//
				kernel void test_kernel_caller(global int *in) {
				test_kernel_callee(in);
				test_extern_kernel_callee(in);
				}

				// CHECK: declare amdgpu_kernel void @test_extern_kernel_callee(i32 addrspace(1)* noundef align 4)

				// CHECK: define dso_local void @__amdgpu_test_kernel_callee_kernel_body(i32 addrspace(1)* noundef align 4 %in)
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[IN_ADDR:%.]] = alloca i32 addrspace(1), align 8, addrspace(5)
				// CHECK-NEXT: store i32 addrspace(1)* %in, i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: [[TMP0:%.]] = load i32 addrspace(1), i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: store i32 10, i32 addrspace(1)* [[TMP0]], align 4
				// CHECK-NEXT: ret void

				// CHECK: define dso_local void @__amdgpu_test_kernel_caller_kernel_body(i32 addrspace(1)* noundef align 4 %in)
				// CHECK-NEXT: entry:
				// CHECK-NEXT: [[IN_ADDR:%.]] = alloca i32 addrspace(1), align 8, addrspace(5)
				// CHECK-NEXT: store i32 addrspace(1)* [[IN:%.]], i32 addrspace(1) addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: %{{.}} = load i32 addrspace(1), i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: call void @__amdgpu_test_kernel_callee_kernel_body(
				// CHECK-NEXT: %{{.}} = load i32 addrspace(1), i32 addrspace(1)* addrspace(5)* [[IN_ADDR]], align 8
				// CHECK-NEXT: call void @__amdgpu_test_extern_kernel_callee_kernel_body(
				// CHECK-NEXT: ret void
				//

				// CHECK: declare void @__amdgpu_test_extern_kernel_callee_kernel_body(i32 addrspace(1)*)

clang/test/CodeGenOpenCL/visibility.cl

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	void use() {
ext_kern_protected();		ext_kern_protected();
ext_kern_default();		ext_kern_default();
ext_func();		ext_func();
ext_func_hidden();		ext_func_hidden();
ext_func_protected();		ext_func_protected();
ext_func_default();		ext_func_default();
}		}

// FVIS-DEFAULT: declare amdgpu_kernel void @ext_kern()
// FVIS-PROTECTED: declare protected amdgpu_kernel void @ext_kern()
// FVIS-HIDDEN: declare protected amdgpu_kernel void @ext_kern()

// FVIS-DEFAULT: declare protected amdgpu_kernel void @ext_kern_hidden()
// FVIS-PROTECTED: declare protected amdgpu_kernel void @ext_kern_hidden()
// FVIS-HIDDEN: declare protected amdgpu_kernel void @ext_kern_hidden()

// FVIS-DEFAULT: declare protected amdgpu_kernel void @ext_kern_protected()
// FVIS-PROTECTED: declare protected amdgpu_kernel void @ext_kern_protected()
// FVIS-HIDDEN: declare protected amdgpu_kernel void @ext_kern_protected()

// FVIS-DEFAULT: declare amdgpu_kernel void @ext_kern_default()
// FVIS-PROTECTED: declare amdgpu_kernel void @ext_kern_default()
// FVIS-HIDDEN: declare amdgpu_kernel void @ext_kern_default()


// FVIS-DEFAULT: declare void @ext_func()		// FVIS-DEFAULT: declare void @ext_func()
// FVIS-PROTECTED: declare protected void @ext_func()		// FVIS-PROTECTED: declare protected void @ext_func()
// FVIS-HIDDEN: declare hidden void @ext_func()		// FVIS-HIDDEN: declare hidden void @ext_func()

// FVIS-DEFAULT: declare hidden void @ext_func_hidden()		// FVIS-DEFAULT: declare hidden void @ext_func_hidden()
// FVIS-PROTECTED: declare hidden void @ext_func_hidden()		// FVIS-PROTECTED: declare hidden void @ext_func_hidden()
// FVIS-HIDDEN: declare hidden void @ext_func_hidden()		// FVIS-HIDDEN: declare hidden void @ext_func_hidden()

// FVIS-DEFAULT: declare protected void @ext_func_protected()		// FVIS-DEFAULT: declare protected void @ext_func_protected()
// FVIS-PROTECTED: declare protected void @ext_func_protected()		// FVIS-PROTECTED: declare protected void @ext_func_protected()
// FVIS-HIDDEN: declare protected void @ext_func_protected()		// FVIS-HIDDEN: declare protected void @ext_func_protected()

// FVIS-DEFAULT: declare void @ext_func_default()		// FVIS-DEFAULT: declare void @ext_func_default()
// FVIS-PROTECTED: declare void @ext_func_default()		// FVIS-PROTECTED: declare void @ext_func_default()
// FVIS-HIDDEN: declare void @ext_func_default()		// FVIS-HIDDEN: declare void @ext_func_default()

		// A kernel call will be emitted as a call to its cloned function
		// of non-kernel convention.
		// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_kernel_body()
		// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_kernel_body()
		// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_kernel_body()

		// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_hidden_kernel_body()
		// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_hidden_kernel_body()
		// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_hidden_kernel_body()

		// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_protected_kernel_body()
		// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_protected_kernel_body()
		// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_protected_kernel_body()

		// FVIS-DEFAULT: declare void @__amdgpu_ext_kern_default_kernel_body()
		// FVIS-PROTECTED: declare void @__amdgpu_ext_kern_default_kernel_body()
		// FVIS-HIDDEN: declare void @__amdgpu_ext_kern_default_kernel_body()