This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Set LLVM calling convention for CUDA kernel
ClosedPublic

Authored by yaxunl on Apr 3 2018, 12:32 PM.

Download Raw Diff

Details

Reviewers

rjmccall
tra

Commits

rG4306f2086fe8: [CUDA] Set LLVM calling convention for CUDA kernel
rL330447: [CUDA] Set LLVM calling convention for CUDA kernel
rC330447: [CUDA] Set LLVM calling convention for CUDA kernel

Summary

Some targets need special LLVM calling convention for CUDA kernel.
This patch does that through a TargetCodeGenInfo hook.

It only affects amdgcn target.

Patch by Greg Rodgers.
Revised and lit tests added by Yaxun Liu.

Diff Detail

Repository: rL LLVM

Event Timeline

yaxunl created this revision.Apr 3 2018, 12:32 PM

I think the appropriate place to do this is in IsStandardConversion, immediately after the call to ResolveAddressOfOverloadedFunction. You might want to add a general utility for getting the type-of-reference of a function decl.

In D45223#1056187, @rjmccall wrote:

I think the appropriate place to do this is in IsStandardConversion, immediately after the call to ResolveAddressOfOverloadedFunction. You might want to add a general utility for getting the type-of-reference of a function decl.

We may need to resolve overloaded functions with dropped calling conventions, e.g.

__global__ void EmptyKernel(float) {}

__global__ void EmptyKernel(double) {}

struct Dummy {
  /// Type definition of the EmptyKernel kernel entry point
  typedef void (*EmptyKernelPtr)(float);
  EmptyKernelPtr Empty() { return EmptyKernel; } 
};

In this case we have to drop the calling convention during the resolution.

Since the calling convention is invisible in the AST, why don't we just do not represent it in AST?

Going back to the original implementation in CodeGen:

if ((getTriple().getArch() == llvm::Triple::amdgcn) &&
    D->hasAttr<CUDAGlobalAttr>())
  Fn->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);

It is much simpler and straightforward.

Can we just reconsider implement this in CodeGen instead of Sema?

Yes, I'm sorry, I think you're right. I had misunderstood the language problem when I suggested going down this road.

In D45223#1071358, @rjmccall wrote:

Yes, I'm sorry, I think you're right. I had misunderstood the language problem when I suggested going down this road.

Never mind. I will update the diff for CodeGen approach.

Use CodeGen approach.

AFAICT this is the replacement for D44747. LGTM.

This revision is now accepted and ready to land.Apr 18 2018, 2:48 PM

In D45223#1071452, @tra wrote:

AFAICT this is the replacement for D44747. LGTM.

Yes. Thanks.

Closed by commit rC330447: [CUDA] Set LLVM calling convention for CUDA kernel (authored by yaxunl). · Explain WhyApr 20 2018, 10:06 AM

Closed by commit rL330447: [CUDA] Set LLVM calling convention for CUDA kernel (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptApr 20 2018, 10:06 AM

Revision Contents

Path

Size

cfe/

trunk/

lib/

CodeGen/

CodeGenModule.cpp

3 lines

TargetInfo.h

2 lines

TargetInfo.cpp

6 lines

test/

CodeGenCUDA/

kernel-amdgcn.cu

41 lines

Diff 143340

cfe/trunk/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 3,621 Lines • ▼ Show 20 Lines	void CodeGenModule::EmitGlobalFunctionDefinition(GlobalDecl GD,
auto *Fn = cast<llvm::Function>(GV);		auto *Fn = cast<llvm::Function>(GV);
setFunctionLinkage(GD, Fn);		setFunctionLinkage(GD, Fn);

// FIXME: this is redundant with part of setFunctionDefinitionAttributes		// FIXME: this is redundant with part of setFunctionDefinitionAttributes
setGVProperties(Fn, GD);		setGVProperties(Fn, GD);

MaybeHandleStaticInExternC(D, Fn);		MaybeHandleStaticInExternC(D, Fn);

		if (D->hasAttr<CUDAGlobalAttr>())
		getTargetCodeGenInfo().setCUDAKernelCallingConvention(Fn);

maybeSetTrivialComdat(D, Fn);		maybeSetTrivialComdat(D, Fn);

CodeGenFunction(*this).GenerateCode(D, Fn, FI);		CodeGenFunction(*this).GenerateCode(D, Fn, FI);

setNonAliasAttributes(GD, Fn);		setNonAliasAttributes(GD, Fn);
SetLLVMFunctionAttributesForDefinition(D, Fn);		SetLLVMFunctionAttributesForDefinition(D, Fn);

if (const ConstructorAttr *CA = D->getAttr<ConstructorAttr>())		if (const ConstructorAttr *CA = D->getAttr<ConstructorAttr>())
▲ Show 20 Lines • Show All 1,403 Lines • Show Last 20 Lines

cfe/trunk/lib/CodeGen/TargetInfo.h

Show First 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	public:
createEnqueuedBlockKernel(CodeGenFunction &CGF,		createEnqueuedBlockKernel(CodeGenFunction &CGF,
llvm::Function *BlockInvokeFunc,		llvm::Function *BlockInvokeFunc,
llvm::Value *BlockLiteral) const;		llvm::Value *BlockLiteral) const;

/// \return true if the target supports alias from the unmangled name to the		/// \return true if the target supports alias from the unmangled name to the
/// mangled name of functions declared within an extern "C" region and marked		/// mangled name of functions declared within an extern "C" region and marked
/// as 'used', and having internal linkage.		/// as 'used', and having internal linkage.
virtual bool shouldEmitStaticExternCAliases() const { return true; }		virtual bool shouldEmitStaticExternCAliases() const { return true; }

		virtual void setCUDAKernelCallingConvention(llvm::Function *F) const {}
};		};

} // namespace CodeGen		} // namespace CodeGen
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H		#endif // LLVM_CLANG_LIB_CODEGEN_TARGETINFO_H

cfe/trunk/lib/CodeGen/TargetInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,631 Lines • ▼ Show 20 Lines	LangAS getGlobalVarAddressSpace(CodeGenModule &CGM,
const VarDecl *D) const override;		const VarDecl *D) const override;
llvm::SyncScope::ID getLLVMSyncScopeID(SyncScope S,		llvm::SyncScope::ID getLLVMSyncScopeID(SyncScope S,
llvm::LLVMContext &C) const override;		llvm::LLVMContext &C) const override;
llvm::Function *		llvm::Function *
createEnqueuedBlockKernel(CodeGenFunction &CGF,		createEnqueuedBlockKernel(CodeGenFunction &CGF,
llvm::Function *BlockInvokeFunc,		llvm::Function *BlockInvokeFunc,
llvm::Value *BlockLiteral) const override;		llvm::Value *BlockLiteral) const override;
bool shouldEmitStaticExternCAliases() const override;		bool shouldEmitStaticExternCAliases() const override;
		void setCUDAKernelCallingConvention(llvm::Function *F) const override;
};		};
}		}

void AMDGPUTargetCodeGenInfo::setTargetAttributes(		void AMDGPUTargetCodeGenInfo::setTargetAttributes(
const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {		const Decl D, llvm::GlobalValue GV, CodeGen::CodeGenModule &M) const {
if (GV->isDeclaration())		if (GV->isDeclaration())
return;		return;
const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);		const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	AMDGPUTargetCodeGenInfo::getLLVMSyncScopeID(SyncScope S,
}		}
return C.getOrInsertSyncScopeID(Name);		return C.getOrInsertSyncScopeID(Name);
}		}

bool AMDGPUTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {		bool AMDGPUTargetCodeGenInfo::shouldEmitStaticExternCAliases() const {
return false;		return false;
}		}

		void AMDGPUTargetCodeGenInfo::setCUDAKernelCallingConvention(
		llvm::Function *F) const {
		F->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SPARC v8 ABI Implementation.		// SPARC v8 ABI Implementation.
// Based on the SPARC Compliance Definition version 2.4.1.		// Based on the SPARC Compliance Definition version 2.4.1.
//		//
// Ensures that complex values are passed in registers.		// Ensures that complex values are passed in registers.
//		//
namespace {		namespace {
class SparcV8ABIInfo : public DefaultABIInfo {		class SparcV8ABIInfo : public DefaultABIInfo {
▲ Show 20 Lines • Show All 1,449 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGenCUDA/kernel-amdgcn.cu

				// RUN: %clang_cc1 -triple amdgcn -fcuda-is-device -emit-llvm %s -o - \| FileCheck %s
				#include "Inputs/cuda.h"

				// CHECK: define amdgpu_kernel void @_ZN1A6kernelEv
				class A {
				public:
				static __global__ void kernel(){}
				};

				// CHECK: define void @_Z10non_kernelv
				__device__ void non_kernel(){}

				// CHECK: define amdgpu_kernel void @_Z6kerneli
				__global__ void kernel(int x) {
				non_kernel();
				}

				// CHECK: define amdgpu_kernel void @_Z11EmptyKernelIvEvv
				template <typename T>
				__global__ void EmptyKernel(void) {}

				struct Dummy {
				/// Type definition of the EmptyKernel kernel entry point
				typedef void (*EmptyKernelPtr)();
				EmptyKernelPtr Empty() { return EmptyKernel<void>; }
				};

				// CHECK: define amdgpu_kernel void @_Z15template_kernelI1AEvT_
				template<class T>
				__global__ void template_kernel(T x) {}

				void launch(void *f);

				int main() {
				Dummy D;
				launch((void*)A::kernel);
				launch((void*)kernel);
				launch((void*)template_kernel<A>);
				launch((void*)D.Empty());
				return 0;
				}