This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/Basic/
-
clang/
-
Basic/
-
LangOptions.def
-
lib/
-
CodeGen/
6/6
CGCUDANV.cpp
-
Frontend/
4/4
CompilerInvocation.cpp
2/2
InitPreprocessor.cpp
-
Sema/
-
SemaCUDA.cpp
1/1
SemaDecl.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
Inputs/
-
cuda.h
4/4
device-stub.cu
-
kernel-call.cu

Differential D44984

[HIP] Add hip input kind and codegen for kernel launching
ClosedPublic

Authored by yaxunl on Mar 28 2018, 9:25 AM.

Download Raw Diff

Details

Reviewers

rjmccall
tra

Commits

rG887c569bcb83: [HIP] Add hip input kind and codegen for kernel launching
rC330790: [HIP] Add hip input kind and codegen for kernel launching
rL330790: [HIP] Add hip input kind and codegen for kernel launching

Summary

HIP is a language similar to CUDA (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md ).
The language syntax is very similar, which allows a hip program to be compiled as a CUDA program by Clang. The main difference
is the host API. HIP has a set of vendor neutral host API which can be implemented on different platforms. Currently there is open source
implementation of HIP runtime on amdgpu target (https://github.com/ROCm-Developer-Tools/HIP).

This patch adds support of input kind and language standard hip.

When hip file is compiled, both LangOpts.CUDA and LangOpts.HIP is turned on. This allows compilation of hip program as CUDA
in most cases and only special handling of hip program is needed LangOpts.HIP is checked.

This patch also adds support of kernel launching of HIP program using HIP host API.

When -x hip is not specified, there is no behaviour change for CUDA.

Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.

Diff Detail

Event Timeline

yaxunl created this revision.Mar 28 2018, 9:25 AM

Herald added a subscriber: tpr. · View Herald TranscriptMar 28 2018, 9:25 AM

yaxunl added subscribers: t-tye, b-sumner.Mar 28 2018, 9:26 AM

tra added a reviewer: tra.Mar 28 2018, 9:32 AM

The changes appear to cover only some of the functionality needed to enable HIP support. Do you have more patches in queue? Having complete picture would help to make sense of the overall plan.
I did ask for it in D42800, but I don't think I've got the answers. It would help a lot if you or @gregrodgers could write a doc somewhere outlining overall plan for HIP support in clang, what are the main issues that need to be dealt with, and at least a general idea on how to handle them.

As far as "add -x hip, and tweak runtime glue codegen" goes, the change looks OK, but it's not very useful all by itself. It leaves a lot of other issues unsolved and no clear plan on whether/when/how you are planning to deal with them.

As things stand right now, with this patch clang will still attempt to include CUDA headers, which, among other things will provide threadIdx/blockIdx and other CUDA-specific features.
Perhaps it would make sense to disable pre-inclusion of CUDA headers and, probably, disable use of CUDA's libdevice bitcode library if we're compiling with -x hip (i.e. -nocudainc -nocudalib).
If you do depend on CUDA headers, then, I suspect, you may need to adjust some wrapper headers we use for CUDA and that change should probably come before this one.

test/CodeGenCUDA/device-stub.cu
2–9	Please wrap the long lines.

In D44984#1050526, @tra wrote:

The changes appear to cover only some of the functionality needed to enable HIP support. Do you have more patches in queue? Having complete picture would help to make sense of the overall plan.
I did ask for it in D42800, but I don't think I've got the answers. It would help a lot if you or @gregrodgers could write a doc somewhere outlining overall plan for HIP support in clang, what are the main issues that need to be dealt with, and at least a general idea on how to handle them.

As far as "add -x hip, and tweak runtime glue codegen" goes, the change looks OK, but it's not very useful all by itself. It leaves a lot of other issues unsolved and no clear plan on whether/when/how you are planning to deal with them.

As things stand right now, with this patch clang will still attempt to include CUDA headers, which, among other things will provide threadIdx/blockIdx and other CUDA-specific features.
Perhaps it would make sense to disable pre-inclusion of CUDA headers and, probably, disable use of CUDA's libdevice bitcode library if we're compiling with -x hip (i.e. -nocudainc -nocudalib).
If you do depend on CUDA headers, then, I suspect, you may need to adjust some wrapper headers we use for CUDA and that change should probably come before this one.

Hi Artem, I am responsible for upstreaming Greg's work and addressing reviewers' comments.

Yes we already have a basic working implementation of HIP compiler due to Greg's work. I will either update D42800 or create a new review about the toolchain changes for compiling and linking HIP programs. Essentially HIP has its own header files and device libraries which are taken care of by the toolchain patch.

Since the header file and library seem not to affect this patch, is it OK to defer their changes to be part of the toolchain patch?

Thanks.

In D44984#1050557, @yaxunl wrote:

Yes we already have a basic working implementation of HIP compiler due to Greg's work.

That is great, but it's not necessarily true that all these changes will make it into clang/llvm as is. LLVM/Clang is a community effort and it helps a lot to get the changes in when the community understands what is it you're planning to do. I personally am very glad to see AMD moving towards making clang a viable compiler for AMD GPUs, but there's only so much I'll be able to do to help you with reviews if all I have is either piecemeal patches with little idea how they all fit together or one humongous patch I would have no time to dive in and really understand. Considering that compilation for GPU is a fairly niche market my bet is that your progress will be bottlenecked by the code reviews. Whatever you can do to make reviewers jobs easier by giving more context will help a lot with upstreaming the patches.

I will either update D42800 or create a new review about the toolchain changes for compiling and linking HIP programs. Essentially HIP has its own header files and device libraries which are taken care of by the toolchain patch.

Fair enough. I'll wait for the rest of the patches. If you have multiple pending patches, it helps if you could arrange them as dependent patches in phabricator. It makes it easier to see the big picture.

Since the header file and library seem not to affect this patch, is it OK to defer their changes to be part of the toolchain patch?

I'm not sure I understand. Could you elaborate?

You should send an RFC to cfe-dev about adding this new language mode. I understand that it's very similar to an existing language mode that we already support, and that's definitely we'll consider, but we shouldn't just agree to add new language modes in patch review.

In D44984#1050672, @rjmccall wrote:

You should send an RFC to cfe-dev about adding this new language mode. I understand that it's very similar to an existing language mode that we already support, and that's definitely we'll consider, but we shouldn't just agree to add new language modes in patch review.

RFC sent http://lists.llvm.org/pipermail/cfe-dev/2018-March/057426.html

Thanks.

Since the header file and library seem not to affect this patch, is it OK to defer their changes to be part of the toolchain patch?

I'm not sure I understand. Could you elaborate?

clang -cc1 does not include __clang_cuda_runtime_wrapper.h by default when clang -cc1 is called directly to compile CUDA programs. CUDA toolchain adds -include __clang_cuda_runtime_wrapper.h when compiling CUDA program as kernel code. Therefore if clang -cc1 is used to compile HIP program in lit test, there is no need to use -fnocudainc.

This patch mainly changes kernel launching API function names. The implement and testing of this change does not depend on the CUDA/HIP header files. A minimum header like test/CodeGenCUDA/Input/cuda.h is sufficient for testing this patch.

Basically this patch is only concerns about -cc1 and therefore is independent of the toolchain changes about header and library files.

mkuron added a subscriber: mkuron.Mar 31 2018, 2:38 AM

ping. Any further changes need to be done for this patch? Thanks.

yaxunl added a child revision: D45212: Add HIP toolchain.Apr 10 2018, 9:47 AM

rjmccall added inline comments.Apr 13 2018, 12:06 AM

lib/CodeGen/CGCUDANV.cpp
98	Can you take these as StringRefs or Twines?
104	I think "addUnderscoredPrefixToName" would be better.
134	Please move this comment down into the else clause (and terminate it with a semicolon) and add your own declaration comment in your clause.
lib/Frontend/CompilerInvocation.cpp
2109	Why is this done here? We infer the language mode from the input kind somewhere else.

yaxunl marked 4 inline comments as done.Apr 13 2018, 8:35 AM

yaxunl added inline comments.

lib/Frontend/CompilerInvocation.cpp
2109	It is usually done through CompilerInvocation::setLangDefaults. However, HIP does not have its own input kind nor is it defined as a language standard. Therefore it cannot use CompilerInvocation::setLangDefaults to set Opts.HIP.

Revised by John's comments.

rjmccall added inline comments.Apr 14 2018, 2:59 AM

lib/Frontend/CompilerInvocation.cpp
2109	What are the values of -x if not input kinds or language standards?

yaxunl marked 2 inline comments as done.Apr 17 2018, 12:46 PM

yaxunl added inline comments.

lib/Frontend/CompilerInvocation.cpp
2109	I will add hip as input kind and language standard since it really is both.

Add hip as input kind and language standard.

yaxunl added a child revision: D45441: [HIP] Add predefined macros __HIPCC__ and __HIP_DEVICE_COMPILE__.Apr 17 2018, 12:51 PM

tra added inline comments.Apr 17 2018, 3:29 PM

lib/CodeGen/CGCUDANV.cpp
51–52	`const CodeGenModule &CGM`
lib/Frontend/InitPreprocessor.cpp
466–469	Is `__CUDA__` supposed to be set during HIP compilation? My guess is that `__HIP__` and `__CUDA__` should be mutually exclusive. You do set LangOpts.CUDA during HIP compilation, so this should be changed to `if (CUDA && ! HIP)`
lib/Sema/SemaDecl.cpp
9054–9055	This would be somewhat easier to read: if (II && II->isStr(getLangOpts().HIP ? "hipConfigureCall" : "cudaConfigureCall") && ...
test/CodeGenCUDA/device-stub.cu
3–10	The changes in this file do not seem to have anything related to the code changes in this patch. Did you intend to add some HIP tests here?

tra mentioned this in D45489: [HIP] Add input type for HIP.Apr 17 2018, 4:35 PM

yaxunl added a parent revision: D45489: [HIP] Add input type for HIP.Apr 18 2018, 11:07 AM

yaxunl removed a child revision: D45212: Add HIP toolchain.

yaxunl marked 4 inline comments as done.Apr 18 2018, 11:59 AM

yaxunl added inline comments.

lib/Frontend/InitPreprocessor.cpp
466–469	HIP documentation does not require `__CUDA__` to be defined. Will make changes as you suggested.

Revised by Artem's comments.

ping

Otherwise LGTM.

lib/CodeGen/CGCUDANV.cpp
51–52	Why doesn't the CGNVCUDARuntime just hold on to a reference to the CGM? That's what we do with all the other separated singletons (like the CGCXXABI), and it would let you avoid some of the redundant fields like Context and TheModule.

tra added inline comments.Apr 24 2018, 10:50 AM

lib/CodeGen/CGCUDANV.cpp
51–52	Actually, CGCUDARuntime already has CGM field, so the CGM argument can be just dropped.

Remove CodeGenModule argument from addPrefix* functions.

tra added inline comments.Apr 24 2018, 12:04 PM

test/CodeGenCUDA/device-stub.cu
3–10	Do you need these changes?

yaxunl marked 2 inline comments as done.Apr 24 2018, 12:36 PM

yaxunl added inline comments.

test/CodeGenCUDA/device-stub.cu
3–10	Sorry, some changes about HIP were lost during revision. I will get back those changes.

Add back HIP related changes to the tests.

tra accepted this revision.Apr 24 2018, 1:37 PM

This revision is now accepted and ready to land.Apr 24 2018, 1:37 PM

Thank you.

Closed by commit rL330790: [HIP] Add hip input kind and codegen for kernel launching (authored by yaxunl). · Explain WhyApr 24 2018, 6:16 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptApr 24 2018, 6:16 PM

Revision Contents

Path

Size

include/

clang/

Basic/

LangOptions.def

1 line

lib/

CodeGen/

CGCUDANV.cpp

54 lines

Frontend/

CompilerInvocation.cpp

6 lines

InitPreprocessor.cpp

2 lines

Sema/

SemaCUDA.cpp

5 lines

SemaDecl.cpp

6 lines

test/

CodeGenCUDA/

Inputs/

cuda.h

5 lines

device-stub.cu

79 lines

kernel-call.cu

13 lines

Diff 140090

include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	LANGOPT(ShortEnums , 1, 0, "short enum types")			LANGOPT(ShortEnums , 1, 0, "short enum types")

	LANGOPT(OpenCL , 1, 0, "OpenCL")			LANGOPT(OpenCL , 1, 0, "OpenCL")
	LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")			LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")
	LANGOPT(NativeHalfType , 1, 0, "Native half type support")			LANGOPT(NativeHalfType , 1, 0, "Native half type support")
	LANGOPT(NativeHalfArgsAndReturns, 1, 0, "Native half args and returns")			LANGOPT(NativeHalfArgsAndReturns, 1, 0, "Native half args and returns")
	LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")			LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")
	LANGOPT(CUDA , 1, 0, "CUDA")			LANGOPT(CUDA , 1, 0, "CUDA")
				LANGOPT(HIP , 1, 0, "HIP")
	LANGOPT(OpenMP , 32, 0, "OpenMP support and version of OpenMP (31, 40 or 45)")			LANGOPT(OpenMP , 32, 0, "OpenMP support and version of OpenMP (31, 40 or 45)")
	LANGOPT(OpenMPSimd , 1, 0, "Use SIMD only OpenMP support.")			LANGOPT(OpenMPSimd , 1, 0, "Use SIMD only OpenMP support.")
	LANGOPT(OpenMPUseTLS , 1, 0, "Use TLS for threadprivates or runtime calls")			LANGOPT(OpenMPUseTLS , 1, 0, "Use TLS for threadprivates or runtime calls")
	LANGOPT(OpenMPIsDevice , 1, 0, "Generate code only for OpenMP target device")			LANGOPT(OpenMPIsDevice , 1, 0, "Generate code only for OpenMP target device")
	LANGOPT(OpenMPCUDAMode , 1, 0, "Generate code for OpenMP pragmas in SIMT/SPMD mode")			LANGOPT(OpenMPCUDAMode , 1, 0, "Generate code for OpenMP pragmas in SIMT/SPMD mode")
	LANGOPT(RenderScript , 1, 0, "RenderScript")			LANGOPT(RenderScript , 1, 0, "RenderScript")

	LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")			LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	private:
llvm::SmallVector<std::pair<llvm::GlobalVariable *, unsigned>, 16> DeviceVars;		llvm::SmallVector<std::pair<llvm::GlobalVariable *, unsigned>, 16> DeviceVars;
/// Keeps track of variable containing handle of GPU binary. Populated by		/// Keeps track of variable containing handle of GPU binary. Populated by
/// ModuleCtorFunction() and used to create corresponding cleanup calls in		/// ModuleCtorFunction() and used to create corresponding cleanup calls in
/// ModuleDtorFunction()		/// ModuleDtorFunction()
llvm::GlobalVariable *GpuBinaryHandle = nullptr;		llvm::GlobalVariable *GpuBinaryHandle = nullptr;

llvm::Constant *getSetupArgumentFn() const;		llvm::Constant *getSetupArgumentFn() const;
llvm::Constant *getLaunchFn() const;		llvm::Constant *getLaunchFn() const;
		std::string addPrefixToName(CodeGenModule &CGM, std::string FuncName) const;
		std::string addPrefixToNameBar(CodeGenModule &CGM,
		traUnsubmitted Done Reply Inline Actions `const CodeGenModule &CGM` tra: `const CodeGenModule &CGM`
		rjmccallUnsubmitted Done Reply Inline Actions Why doesn't the CGNVCUDARuntime just hold on to a reference to the CGM? That's what we do with all the other separated singletons (like the CGCXXABI), and it would let you avoid some of the redundant fields like Context and TheModule. rjmccall: Why doesn't the CGNVCUDARuntime just hold on to a reference to the CGM? That's what we do with…
		traUnsubmitted Done Reply Inline Actions Actually, CGCUDARuntime already has CGM field, so the CGM argument can be just dropped. tra: Actually, CGCUDARuntime already has CGM field, so the CGM argument can be just dropped.
		std::string FuncName) const;

/// Creates a function to register all kernel stubs generated in this module.		/// Creates a function to register all kernel stubs generated in this module.
llvm::Function *makeRegisterGlobalsFn();		llvm::Function *makeRegisterGlobalsFn();

/// Helper function that generates a constant string and returns a pointer to		/// Helper function that generates a constant string and returns a pointer to
/// the start of the string. The result of this function can be used anywhere		/// the start of the string. The result of this function can be used anywhere
/// where the C code specifies const char*.		/// where the C code specifies const char*.
llvm::Constant *makeConstantString(const std::string &Str,		llvm::Constant *makeConstantString(const std::string &Str,
Show All 27 Lines	public:
/// Creates module constructor function		/// Creates module constructor function
llvm::Function *makeModuleCtorFunction() override;		llvm::Function *makeModuleCtorFunction() override;
/// Creates module destructor function		/// Creates module destructor function
llvm::Function *makeModuleDtorFunction() override;		llvm::Function *makeModuleDtorFunction() override;
};		};

}		}

		std::string CGNVCUDARuntime::addPrefixToName(CodeGenModule &CGM,
		std::string FuncName) const {
		rjmccallUnsubmitted Done Reply Inline Actions Can you take these as StringRefs or Twines? rjmccall: Can you take these as StringRefs or Twines?
		if (CGM.getLangOpts().HIP)
		return ((Twine("hip") + Twine(FuncName)).str());
		return ((Twine("cuda") + Twine(FuncName)).str());
		}
		std::string CGNVCUDARuntime::addPrefixToNameBar(CodeGenModule &CGM,
		std::string FuncName) const {
		rjmccallUnsubmitted Done Reply Inline Actions I think "addUnderscoredPrefixToName" would be better. rjmccall: I think "addUnderscoredPrefixToName" would be better.
		if (CGM.getLangOpts().HIP)
		return ((Twine("__hip") + Twine(FuncName)).str());
		return ((Twine("__cuda") + Twine(FuncName)).str());
		}

CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)		CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)
: CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),		: CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),
TheModule(CGM.getModule()) {		TheModule(CGM.getModule()) {
CodeGen::CodeGenTypes &Types = CGM.getTypes();		CodeGen::CodeGenTypes &Types = CGM.getTypes();
ASTContext &Ctx = CGM.getContext();		ASTContext &Ctx = CGM.getContext();

IntTy = CGM.IntTy;		IntTy = CGM.IntTy;
SizeTy = CGM.SizeTy;		SizeTy = CGM.SizeTy;
VoidTy = CGM.VoidTy;		VoidTy = CGM.VoidTy;

CharPtrTy = llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));		CharPtrTy = llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));
VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));		VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));
VoidPtrPtrTy = VoidPtrTy->getPointerTo();		VoidPtrPtrTy = VoidPtrTy->getPointerTo();
}		}

llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {		llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {
// cudaError_t cudaSetupArgument(void *, size_t, size_t)		// cudaError_t cudaSetupArgument(void *, size_t, size_t)
llvm::Type *Params[] = {VoidPtrTy, SizeTy, SizeTy};		llvm::Type *Params[] = {VoidPtrTy, SizeTy, SizeTy};
return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy,		return CGM.CreateRuntimeFunction(
Params, false),		llvm::FunctionType::get(IntTy, Params, false),
"cudaSetupArgument");		addPrefixToName(CGM, "SetupArgument"));
}		}

llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {		llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {
// cudaError_t cudaLaunch(char *)		// cudaError_t cudaLaunch(char *)
		rjmccallUnsubmitted Done Reply Inline Actions Please move this comment down into the else clause (and terminate it with a semicolon) and add your own declaration comment in your clause. rjmccall: Please move this comment down into the else clause (and terminate it with a semicolon) and add…
		if (CGM.getLangOpts().HIP)
		return CGM.CreateRuntimeFunction(
		llvm::FunctionType::get(IntTy, CharPtrTy, false), "hipLaunchByPtr");
		else
return CGM.CreateRuntimeFunction(		return CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch");		llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch");
}		}

void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,		void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF,
FunctionArgList &Args) {		FunctionArgList &Args) {
EmittedKernels.push_back(CGF.CurFn);		EmittedKernels.push_back(CGF.CurFn);
emitDeviceStubBody(CGF, Args);		emitDeviceStubBody(CGF, Args);
}		}

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
/// \endcode		/// \endcode
llvm::Function *CGNVCUDARuntime::makeRegisterGlobalsFn() {		llvm::Function *CGNVCUDARuntime::makeRegisterGlobalsFn() {
// No need to register anything		// No need to register anything
if (EmittedKernels.empty() && DeviceVars.empty())		if (EmittedKernels.empty() && DeviceVars.empty())
return nullptr;		return nullptr;

llvm::Function *RegisterKernelsFunc = llvm::Function::Create(		llvm::Function *RegisterKernelsFunc = llvm::Function::Create(
llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),		llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
llvm::GlobalValue::InternalLinkage, "__cuda_register_globals", &TheModule);		llvm::GlobalValue::InternalLinkage,
		addPrefixToNameBar(CGM, "_register_globals"), &TheModule);
llvm::BasicBlock *EntryBB =		llvm::BasicBlock *EntryBB =
llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);		llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);
CGBuilderTy Builder(CGM, Context);		CGBuilderTy Builder(CGM, Context);
Builder.SetInsertPoint(EntryBB);		Builder.SetInsertPoint(EntryBB);

// void __cudaRegisterFunction(void *, const char , char , const char ,		// void __cudaRegisterFunction(void *, const char , char , const char ,
// int, uint3, uint3, dim3, dim3, int*)		// int, uint3, uint3, dim3, dim3, int*)
llvm::Type *RegisterFuncParams[] = {		llvm::Type *RegisterFuncParams[] = {
VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy,		VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy,
VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, IntTy->getPointerTo()};		VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, IntTy->getPointerTo()};
llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction(		llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, RegisterFuncParams, false),		llvm::FunctionType::get(IntTy, RegisterFuncParams, false),
"__cudaRegisterFunction");		addPrefixToNameBar(CGM, "RegisterFunction"));

// Extract GpuBinaryHandle passed as the first argument passed to		// Extract GpuBinaryHandle passed as the first argument passed to
// __cuda_register_globals() and generate __cudaRegisterFunction() call for		// __cuda_register_globals() and generate __cudaRegisterFunction() call for
// each emitted kernel.		// each emitted kernel.
llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();		llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();
for (llvm::Function *Kernel : EmittedKernels) {		for (llvm::Function *Kernel : EmittedKernels) {
llvm::Constant *KernelName = makeConstantString(Kernel->getName());		llvm::Constant *KernelName = makeConstantString(Kernel->getName());
llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);		llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
llvm::Value *Args[] = {		llvm::Value *Args[] = {
&GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy),		&GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy),
KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1), NullPtr,		KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1), NullPtr,
NullPtr, NullPtr, NullPtr,		NullPtr, NullPtr, NullPtr,
llvm::ConstantPointerNull::get(IntTy->getPointerTo())};		llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
Builder.CreateCall(RegisterFunc, Args);		Builder.CreateCall(RegisterFunc, Args);
}		}

// void __cudaRegisterVar(void *, char , char , const char ,		// void __cudaRegisterVar(void *, char , char , const char ,
// int, int, int, int)		// int, int, int, int)
llvm::Type *RegisterVarParams[] = {VoidPtrPtrTy, CharPtrTy, CharPtrTy,		llvm::Type *RegisterVarParams[] = {VoidPtrPtrTy, CharPtrTy, CharPtrTy,
CharPtrTy, IntTy, IntTy,		CharPtrTy, IntTy, IntTy,
IntTy, IntTy};		IntTy, IntTy};
llvm::Constant *RegisterVar = CGM.CreateRuntimeFunction(		llvm::Constant *RegisterVar = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, RegisterVarParams, false),		llvm::FunctionType::get(IntTy, RegisterVarParams, false),
"__cudaRegisterVar");		addPrefixToNameBar(CGM, "RegisterVar"));
for (auto &Pair : DeviceVars) {		for (auto &Pair : DeviceVars) {
llvm::GlobalVariable *Var = Pair.first;		llvm::GlobalVariable *Var = Pair.first;
unsigned Flags = Pair.second;		unsigned Flags = Pair.second;
llvm::Constant *VarName = makeConstantString(Var->getName());		llvm::Constant *VarName = makeConstantString(Var->getName());
uint64_t VarSize =		uint64_t VarSize =
CGM.getDataLayout().getTypeAllocSize(Var->getValueType());		CGM.getDataLayout().getTypeAllocSize(Var->getValueType());
llvm::Value *Args[] = {		llvm::Value *Args[] = {
&GpuBinaryHandlePtr,		&GpuBinaryHandlePtr,
Show All 24 Lines	llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
if (GpuBinaryFileName.empty())		if (GpuBinaryFileName.empty())
return nullptr;		return nullptr;

// void __cuda_register_globals(void* handle);		// void __cuda_register_globals(void* handle);
llvm::Function *RegisterGlobalsFunc = makeRegisterGlobalsFn();		llvm::Function *RegisterGlobalsFunc = makeRegisterGlobalsFn();
// void ** __cudaRegisterFatBinary(void *);		// void ** __cudaRegisterFatBinary(void *);
llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(		llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),		llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),
"__cudaRegisterFatBinary");		addPrefixToNameBar(CGM, "RegisterFatBinary"));
// struct { int magic, int version, void * gpu_binary, void * dont_care };		// struct { int magic, int version, void * gpu_binary, void * dont_care };
llvm::StructType *FatbinWrapperTy =		llvm::StructType *FatbinWrapperTy =
llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy);		llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy);

// Register GPU binary with the CUDA runtime, store returned handle in a		// Register GPU binary with the CUDA runtime, store returned handle in a
// global variable and save a reference in GpuBinaryHandle to be cleaned up		// global variable and save a reference in GpuBinaryHandle to be cleaned up
// in destructor on exit. Then associate all known kernels with the GPU binary		// in destructor on exit. Then associate all known kernels with the GPU binary
// handle so CUDA runtime can figure out what to call on the GPU side.		// handle so CUDA runtime can figure out what to call on the GPU side.
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr =		llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr =
llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName);		llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName);
if (std::error_code EC = GpuBinaryOrErr.getError()) {		if (std::error_code EC = GpuBinaryOrErr.getError()) {
CGM.getDiags().Report(diag::err_cannot_open_file)		CGM.getDiags().Report(diag::err_cannot_open_file)
<< GpuBinaryFileName << EC.message();		<< GpuBinaryFileName << EC.message();
return nullptr;		return nullptr;
}		}

llvm::Function *ModuleCtorFunc = llvm::Function::Create(		llvm::Function *ModuleCtorFunc = llvm::Function::Create(
llvm::FunctionType::get(VoidTy, VoidPtrTy, false),		llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
llvm::GlobalValue::InternalLinkage, "__cuda_module_ctor", &TheModule);		llvm::GlobalValue::InternalLinkage,
		addPrefixToNameBar(CGM, "_module_ctor"), &TheModule);
llvm::BasicBlock *CtorEntryBB =		llvm::BasicBlock *CtorEntryBB =
llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc);		llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc);
CGBuilderTy CtorBuilder(CGM, Context);		CGBuilderTy CtorBuilder(CGM, Context);

CtorBuilder.SetInsertPoint(CtorEntryBB);		CtorBuilder.SetInsertPoint(CtorEntryBB);

const char *FatbinConstantName =		const char *FatbinConstantName =
CGM.getTriple().isMacOSX() ? "__NV_CUDA,__nv_fatbin" : ".nv_fatbin";		CGM.getTriple().isMacOSX() ? "__NV_CUDA,__nv_fatbin" : ".nv_fatbin";
Show All 9 Lines	llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
// Fatbin version.		// Fatbin version.
Values.addInt(IntTy, 1);		Values.addInt(IntTy, 1);
// Data.		// Data.
Values.add(makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "",		Values.add(makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "",
FatbinConstantName, 8));		FatbinConstantName, 8));
// Unused in fatbin v1.		// Unused in fatbin v1.
Values.add(llvm::ConstantPointerNull::get(VoidPtrTy));		Values.add(llvm::ConstantPointerNull::get(VoidPtrTy));
llvm::GlobalVariable *FatbinWrapper = Values.finishAndCreateGlobal(		llvm::GlobalVariable *FatbinWrapper = Values.finishAndCreateGlobal(
"__cuda_fatbin_wrapper", CGM.getPointerAlign(),		addPrefixToNameBar(CGM, "_fatbin_wrapper"), CGM.getPointerAlign(),
/constant/ true);		/constant/ true);
FatbinWrapper->setSection(FatbinSectionName);		FatbinWrapper->setSection(FatbinSectionName);

// GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);		// GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);
llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(		llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(
RegisterFatbinFunc, CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));		RegisterFatbinFunc, CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));
GpuBinaryHandle = new llvm::GlobalVariable(		GpuBinaryHandle = new llvm::GlobalVariable(
TheModule, VoidPtrPtrTy, false, llvm::GlobalValue::InternalLinkage,		TheModule, VoidPtrPtrTy, false, llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__cuda_gpubin_handle");		llvm::ConstantPointerNull::get(VoidPtrPtrTy),
		addPrefixToNameBar(CGM, "_gpubin_handle"));

CtorBuilder.CreateAlignedStore(RegisterFatbinCall, GpuBinaryHandle,		CtorBuilder.CreateAlignedStore(RegisterFatbinCall, GpuBinaryHandle,
CGM.getPointerAlign());		CGM.getPointerAlign());

// Call __cuda_register_globals(GpuBinaryHandle);		// Call __cuda_register_globals(GpuBinaryHandle);
if (RegisterGlobalsFunc)		if (RegisterGlobalsFunc)
CtorBuilder.CreateCall(RegisterGlobalsFunc, RegisterFatbinCall);		CtorBuilder.CreateCall(RegisterGlobalsFunc, RegisterFatbinCall);

CtorBuilder.CreateRetVoid();		CtorBuilder.CreateRetVoid();
Show All 10 Lines
llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() {		llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() {
// No need for destructor if we don't have a handle to unregister.		// No need for destructor if we don't have a handle to unregister.
if (!GpuBinaryHandle)		if (!GpuBinaryHandle)
return nullptr;		return nullptr;

// void __cudaUnregisterFatBinary(void ** handle);		// void __cudaUnregisterFatBinary(void ** handle);
llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction(		llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),		llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
"__cudaUnregisterFatBinary");		addPrefixToNameBar(CGM, "UnregisterFatBinary"));

llvm::Function *ModuleDtorFunc = llvm::Function::Create(		llvm::Function *ModuleDtorFunc = llvm::Function::Create(
llvm::FunctionType::get(VoidTy, VoidPtrTy, false),		llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
llvm::GlobalValue::InternalLinkage, "__cuda_module_dtor", &TheModule);		llvm::GlobalValue::InternalLinkage,
		addPrefixToNameBar(CGM, "_module_dtor"), &TheModule);

llvm::BasicBlock *DtorEntryBB =		llvm::BasicBlock *DtorEntryBB =
llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc);		llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc);
CGBuilderTy DtorBuilder(CGM, Context);		CGBuilderTy DtorBuilder(CGM, Context);
DtorBuilder.SetInsertPoint(DtorEntryBB);		DtorBuilder.SetInsertPoint(DtorEntryBB);

auto HandleValue =		auto HandleValue =
DtorBuilder.CreateAlignedLoad(GpuBinaryHandle, CGM.getPointerAlign());		DtorBuilder.CreateAlignedLoad(GpuBinaryHandle, CGM.getPointerAlign());
DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);		DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);

DtorBuilder.CreateRetVoid();		DtorBuilder.CreateRetVoid();
return ModuleDtorFunc;		return ModuleDtorFunc;
}		}

CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {		CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
return new CGNVCUDARuntime(CGM);		return new CGNVCUDARuntime(CGM);
}		}

lib/Frontend/CompilerInvocation.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	IsHeaderFile =			IsHeaderFile =
	!Preprocessed && !ModuleMap && XValue.consume_back("-header");			!Preprocessed && !ModuleMap && XValue.consume_back("-header");

	// Principal languages.			// Principal languages.
	DashX = llvm::StringSwitch<InputKind>(XValue)			DashX = llvm::StringSwitch<InputKind>(XValue)
	.Case("c", InputKind::C)			.Case("c", InputKind::C)
	.Case("cl", InputKind::OpenCL)			.Case("cl", InputKind::OpenCL)
	.Case("cuda", InputKind::CUDA)			.Case("cuda", InputKind::CUDA)
				.Case("hip", InputKind::CUDA)
	.Case("c++", InputKind::CXX)			.Case("c++", InputKind::CXX)
	.Case("objective-c", InputKind::ObjC)			.Case("objective-c", InputKind::ObjC)
	.Case("objective-c++", InputKind::ObjCXX)			.Case("objective-c++", InputKind::ObjCXX)
	.Case("renderscript", InputKind::RenderScript)			.Case("renderscript", InputKind::RenderScript)
	.Default(InputKind::Unknown);			.Default(InputKind::Unknown);

	// "objc[++]-cpp-output" is an acceptable synonym for			// "objc[++]-cpp-output" is an acceptable synonym for
	// "objective-c[++]-cpp-output".			// "objective-c[++]-cpp-output".
	▲ Show 20 Lines • Show All 495 Lines • ▼ Show 20 Lines
	<< A->getAsString(Args) << A->getValue();			<< A->getAsString(Args) << A->getValue();
	}			}
	else			else
	LangStd = OpenCLLangStd;			LangStd = OpenCLLangStd;
	}			}

	Opts.IncludeDefaultHeader = Args.hasArg(OPT_finclude_default_header);			Opts.IncludeDefaultHeader = Args.hasArg(OPT_finclude_default_header);

				if (const Arg *A = Args.getLastArg(OPT_x)) {
				if (!strcmp(A->getValue(), "hip"))
				Opts.HIP = true;
				}
				rjmccallUnsubmitted Done Reply Inline Actions Why is this done here? We infer the language mode from the input kind somewhere else. rjmccall: Why is this done here? We infer the language mode from the input kind somewhere else.
				yaxunlAuthorUnsubmitted Done Reply Inline Actions It is usually done through CompilerInvocation::setLangDefaults. However, HIP does not have its own input kind nor is it defined as a language standard. Therefore it cannot use CompilerInvocation::setLangDefaults to set Opts.HIP. yaxunl: It is usually done through CompilerInvocation::setLangDefaults. However, HIP does not have its…
				rjmccallUnsubmitted Done Reply Inline Actions What are the values of -x if not input kinds or language standards? rjmccall: What are the values of -x if not input kinds or language standards?
				yaxunlAuthorUnsubmitted Done Reply Inline Actions I will add hip as input kind and language standard since it really is both. yaxunl: I will add hip as input kind and language standard since it really is both.

	llvm::Triple T(TargetOpts.Triple);			llvm::Triple T(TargetOpts.Triple);
	CompilerInvocation::setLangDefaults(Opts, IK, T, PPOpts, LangStd);			CompilerInvocation::setLangDefaults(Opts, IK, T, PPOpts, LangStd);

	// -cl-strict-aliasing needs to emit diagnostic in the case where CL > 1.0.			// -cl-strict-aliasing needs to emit diagnostic in the case where CL > 1.0.
	// This option should be deprecated for CL > 1.0 because			// This option should be deprecated for CL > 1.0 because
	// this option was added for compatibility with OpenCL 1.0.			// this option was added for compatibility with OpenCL 1.0.
	if (Args.getLastArg(OPT_cl_strict_aliasing)			if (Args.getLastArg(OPT_cl_strict_aliasing)
	&& Opts.OpenCLVersion > 100) {			&& Opts.OpenCLVersion > 100) {
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

lib/Frontend/InitPreprocessor.cpp

Show First 20 Lines • Show All 457 Lines • ▼ Show 20 Lines	if (TI.isLittleEndian())
Builder.defineMacro("__ENDIAN_LITTLE__");		Builder.defineMacro("__ENDIAN_LITTLE__");

if (LangOpts.FastRelaxedMath)		if (LangOpts.FastRelaxedMath)
Builder.defineMacro("__FAST_RELAXED_MATH__");		Builder.defineMacro("__FAST_RELAXED_MATH__");
}		}
// Not "standard" per se, but available even with the -undef flag.		// Not "standard" per se, but available even with the -undef flag.
if (LangOpts.AsmPreprocessor)		if (LangOpts.AsmPreprocessor)
Builder.defineMacro("__ASSEMBLER__");		Builder.defineMacro("__ASSEMBLER__");
if (LangOpts.CUDA)		if (LangOpts.CUDA)
Builder.defineMacro("__CUDA__");		Builder.defineMacro("__CUDA__");
		if (LangOpts.HIP)
		Builder.defineMacro("__HIP__");
		traUnsubmitted Done Reply Inline Actions Is `__CUDA__` supposed to be set during HIP compilation? My guess is that `__HIP__` and `__CUDA__` should be mutually exclusive. You do set LangOpts.CUDA during HIP compilation, so this should be changed to `if (CUDA && ! HIP)` tra: Is `__CUDA__` supposed to be set during HIP compilation? My guess is that `__HIP__` and…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions HIP documentation does not require `__CUDA__` to be defined. Will make changes as you suggested. yaxunl: HIP documentation does not require `__CUDA__` to be defined. Will make changes as you suggested.
}		}

/// Initialize the predefined C++ language feature test macros defined in		/// Initialize the predefined C++ language feature test macros defined in
/// ISO/IEC JTC1/SC22/WG21 (C++) SD-6: "SG10 Feature Test Recommendations".		/// ISO/IEC JTC1/SC22/WG21 (C++) SD-6: "SG10 Feature Test Recommendations".
static void InitializeCPlusPlusFeatureTestMacros(const LangOptions &LangOpts,		static void InitializeCPlusPlusFeatureTestMacros(const LangOptions &LangOpts,
MacroBuilder &Builder) {		MacroBuilder &Builder) {
// C++98 features.		// C++98 features.
if (LangOpts.RTTI)		if (LangOpts.RTTI)
▲ Show 20 Lines • Show All 681 Lines • Show Last 20 Lines

lib/Sema/SemaCUDA.cpp

Show All 36 Lines	bool Sema::PopForceCUDAHostDevice() {
return true;		return true;
}		}

ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,		ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,
MultiExprArg ExecConfig,		MultiExprArg ExecConfig,
SourceLocation GGGLoc) {		SourceLocation GGGLoc) {
FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();		FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();
if (!ConfigDecl)		if (!ConfigDecl)
return ExprError(Diag(LLLLoc, diag::err_undeclared_var_use)		return ExprError(
<< "cudaConfigureCall");		Diag(LLLLoc, diag::err_undeclared_var_use)
		<< (getLangOpts().HIP ? "hipConfigureCall" : "cudaConfigureCall"));
QualType ConfigQTy = ConfigDecl->getType();		QualType ConfigQTy = ConfigDecl->getType();

DeclRefExpr *ConfigDR = new (Context)		DeclRefExpr *ConfigDR = new (Context)
DeclRefExpr(ConfigDecl, false, ConfigQTy, VK_LValue, LLLLoc);		DeclRefExpr(ConfigDecl, false, ConfigQTy, VK_LValue, LLLLoc);
MarkFunctionReferenced(LLLLoc, ConfigDecl);		MarkFunctionReferenced(LLLLoc, ConfigDecl);

return ActOnCallExpr(S, ConfigDR, LLLLoc, ExecConfig, GGGLoc, nullptr,		return ActOnCallExpr(S, ConfigDR, LLLLoc, ExecConfig, GGGLoc, nullptr,
/IsExecConfig=/true);		/IsExecConfig=/true);
▲ Show 20 Lines • Show All 851 Lines • Show Last 20 Lines

lib/Sema/SemaDecl.cpp

	Show First 20 Lines • Show All 992 Lines • ▼ Show 20 Lines
	checkDLLAttributeRedeclaration(*this, Prev, NewFD,			checkDLLAttributeRedeclaration(*this, Prev, NewFD,
	isMemberSpecialization \|\|			isMemberSpecialization \|\|
	isFunctionTemplateSpecialization,			isFunctionTemplateSpecialization,
	D.isFunctionDefinition());			D.isFunctionDefinition());
	}			}

	if (getLangOpts().CUDA) {			if (getLangOpts().CUDA) {
	IdentifierInfo *II = NewFD->getIdentifier();			IdentifierInfo *II = NewFD->getIdentifier();
	if (II && II->isStr("cudaConfigureCall") && !NewFD->isInvalidDecl() &&			if (II &&
				((getLangOpts().HIP && II->isStr("hipConfigureCall")) \|\|
				(!getLangOpts().HIP && II->isStr("cudaConfigureCall"))) &&
				!NewFD->isInvalidDecl() &&
	NewFD->getDeclContext()->getRedeclContext()->isTranslationUnit()) {			NewFD->getDeclContext()->getRedeclContext()->isTranslationUnit()) {
	if (!R->getAs<FunctionType>()->getReturnType()->isScalarType())			if (!R->getAs<FunctionType>()->getReturnType()->isScalarType())
				traUnsubmitted Done Reply Inline Actions This would be somewhat easier to read: if (II && II->isStr(getLangOpts().HIP ? "hipConfigureCall" : "cudaConfigureCall") && ... tra: This would be somewhat easier to read: ``` if (II && II->isStr(getLangOpts().HIP ?
	Diag(NewFD->getLocation(), diag::err_config_scalar_return);			Diag(NewFD->getLocation(), diag::err_config_scalar_return);

	Context.setcudaConfigureCallDecl(NewFD);			Context.setcudaConfigureCallDecl(NewFD);
	}			}

	// Variadic functions, other than a declaration of printf, are not allowed			// Variadic functions, other than a declaration of printf, are not allowed
	// in device-side CUDA code, unless someone passed			// in device-side CUDA code, unless someone passed
	// -fcuda-allow-variadic-functions.			// -fcuda-allow-variadic-functions.
	if (!getLangOpts().CUDAAllowVariadicFunctions && NewFD->isVariadic() &&			if (!getLangOpts().CUDAAllowVariadicFunctions && NewFD->isVariadic() &&
	(NewFD->hasAttr<CUDADeviceAttr>() \|\|			(NewFD->hasAttr<CUDADeviceAttr>() \|\|
	▲ Show 20 Lines • Show All 992 Lines • Show Last 20 Lines

test/CodeGenCUDA/Inputs/cuda.h

	Show All 10 Lines

	struct dim3 {			struct dim3 {
	unsigned x, y, z;			unsigned x, y, z;
	__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}			__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}
	};			};

	typedef struct cudaStream *cudaStream_t;			typedef struct cudaStream *cudaStream_t;

				#ifdef __HIP__
				int hipConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,
				cudaStream_t stream = 0);
				#else
	int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,			int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,
	cudaStream_t stream = 0);			cudaStream_t stream = 0);
				#endif

	extern "C" __device__ int printf(const char*, ...);			extern "C" __device__ int printf(const char*, ...);

test/CodeGenCUDA/device-stub.cu

	// RUN: echo "GPU binary would be here" > %t			// RUN: echo "GPU binary would be here" > %t
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -fcuda-include-gpubinary %t -o - \| FileCheck %s			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -fcuda-include-gpubinary %t -o - \| FileCheck -check-prefixes=CHECK,CUDA %s
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -fcuda-include-gpubinary %t -o - -DNOGLOBALS \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -fcuda-include-gpubinary %t -o - -DNOGLOBALS \
	// RUN: \| FileCheck %s -check-prefix=NOGLOBALS			// RUN: \| FileCheck %s -check-prefix=NOGLOBALS
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - \| FileCheck %s -check-prefix=NOGPUBIN			// RUN: %clang_cc1 -triple x86_64-linux-gnu -x hip -emit-llvm %s -o - \| FileCheck %s -check-prefix=NOGPUBIN
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -x hip -emit-llvm %s -fcuda-include-gpubinary %t -o - \| FileCheck -check-prefixes=CHECK,HIP %s
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -x hip -emit-llvm %s -fcuda-include-gpubinary %t -o - -DNOGLOBALS \
				// RUN: \| FileCheck %s -check-prefix=NOGLOBALS
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -x hip -emit-llvm %s -o - \| FileCheck %s -check-prefix=NOGPUBIN
				traUnsubmitted Done Reply Inline Actions Please wrap the long lines. tra: Please wrap the long lines.

				traUnsubmitted Done Reply Inline Actions The changes in this file do not seem to have anything related to the code changes in this patch. Did you intend to add some HIP tests here? tra: The changes in this file do not seem to have anything related to the code changes in this patch.
				traUnsubmitted Done Reply Inline Actions Do you need these changes? tra: Do you need these changes?
				yaxunlAuthorUnsubmitted Done Reply Inline Actions Sorry, some changes about HIP were lost during revision. I will get back those changes. yaxunl: Sorry, some changes about HIP were lost during revision. I will get back those changes.
	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	#ifndef NOGLOBALS			#ifndef NOGLOBALS
	// CHECK-DAG: @device_var = internal global i32			// CHECK-DAG: @device_var = internal global i32
	__device__ int device_var;			__device__ int device_var;

	// CHECK-DAG: @constant_var = internal global i32			// CHECK-DAG: @constant_var = internal global i32
	__constant__ int constant_var;			__constant__ int constant_var;
	Show All 28 Lines

	// Make sure that all parts of GPU code init/cleanup are there:			// Make sure that all parts of GPU code init/cleanup are there:
	// * constant unnamed string with the kernel name			// * constant unnamed string with the kernel name
	// CHECK: private unnamed_addr constant{{.}}kernelfunc{{.}}\00"			// CHECK: private unnamed_addr constant{{.}}kernelfunc{{.}}\00"
	// * constant unnamed string with GPU binary			// * constant unnamed string with GPU binary
	// CHECK: private unnamed_addr constant{{.GPU binary would be here.}}\00"			// CHECK: private unnamed_addr constant{{.GPU binary would be here.}}\00"
	// CHECK-SAME: section ".nv_fatbin", align 8			// CHECK-SAME: section ".nv_fatbin", align 8
	// * constant struct that wraps GPU binary			// * constant struct that wraps GPU binary
	// CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8, i8 }			// CUDA: @__[[PREFIX:cuda]]_fatbin_wrapper = internal constant { i32, i32, i8, i8 }
				// HIP: @__[[PREFIX:hip]]_fatbin_wrapper = internal constant { i32, i32, i8, i8 }
	// CHECK-SAME: { i32 1180844977, i32 1, {{.}}, i8 null }			// CHECK-SAME: { i32 1180844977, i32 1, {{.}}, i8 null }
	// CHECK-SAME: section ".nvFatBinSegment"			// CHECK-SAME: section ".nvFatBinSegment"
	// * variable to save GPU binary handle after initialization			// * variable to save GPU binary handle after initialization
	// CHECK: @__cuda_gpubin_handle = internal global i8** null			// CHECK: @__[[PREFIX]]_gpubin_handle = internal global i8** null
	// * Make sure our constructor/destructor was added to global ctor/dtor list.			// * Make sure our constructor/destructor was added to global ctor/dtor list.
	// CHECK: @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor			// CHECK: @llvm.global_ctors = appending global {{.*}}@__[[PREFIX]]_module_ctor
	// CHECK: @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor			// CHECK: @llvm.global_dtors = appending global {{.*}}@__[[PREFIX]]_module_dtor

	// Test that we build the correct number of calls to cudaSetupArgument followed			// Test that we build the correct number of calls to cudaSetupArgument followed
	// by a call to cudaLaunch.			// by a call to cudaLaunch.

	// CHECK: define{{.*}}kernelfunc			// CHECK: define{{.*}}kernelfunc
	// CHECK: call{{.*}}cudaSetupArgument			// CHECK: call{{.*}}[[PREFIX]]SetupArgument
	// CHECK: call{{.*}}cudaSetupArgument			// CHECK: call{{.*}}[[PREFIX]]SetupArgument
	// CHECK: call{{.*}}cudaSetupArgument			// CHECK: call{{.*}}[[PREFIX]]SetupArgument
	// CHECK: call{{.*}}cudaLaunch			// CHECK: call{{.*}}[[PREFIX]]Launch
	__global__ void kernelfunc(int i, int j, int k) {}			__global__ void kernelfunc(int i, int j, int k) {}

	// Test that we've built correct kernel launch sequence.			// Test that we've built correct kernel launch sequence.
	// CHECK: define{{.*}}hostfunc			// CHECK: define{{.*}}hostfunc
	// CHECK: call{{.*}}cudaConfigureCall			// CHECK: call{{.*}}[[PREFIX]]ConfigureCall
	// CHECK: call{{.*}}kernelfunc			// CHECK: call{{.*}}kernelfunc
	void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }			void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }
	#endif			#endif

	// Test that we've built a function to register kernels and global vars.			// Test that we've built a function to register kernels and global vars.
	// CHECK: define internal void @__cuda_register_globals			// CHECK: define internal void @__[[PREFIX]]_register_globals
	// CHECK: call{{.}}cudaRegisterFunction(i8* %0, {{.*}}kernelfunc			// CHECK: call{{.}}[[PREFIX]]RegisterFunction(i8* %0, {{.*}}kernelfunc
	// CHECK-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}device_var{{.}}i32 0, i32 4, i32 0, i32 0			// CHECK-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}device_var{{.}}i32 0, i32 4, i32 0, i32 0
	// CHECK-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}constant_var{{.}}i32 0, i32 4, i32 1, i32 0			// CHECK-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}constant_var{{.}}i32 0, i32 4, i32 1, i32 0
	// CHECK-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}ext_device_var{{.}}i32 1, i32 4, i32 0, i32 0			// CHECK-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}ext_device_var{{.}}i32 1, i32 4, i32 0, i32 0
	// CHECK-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}ext_constant_var{{.}}i32 1, i32 4, i32 1, i32 0			// CHECK-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}ext_constant_var{{.}}i32 1, i32 4, i32 1, i32 0
	// CHECK: ret void			// CHECK: ret void

	// Test that we've built contructor..			// Test that we've built contructor..
	// CHECK: define internal void @__cuda_module_ctor			// CHECK: define internal void @__[[PREFIX]]_module_ctor
	// .. that calls __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)			// .. that calls __[[PREFIX]]RegisterFatBinary(&__[[PREFIX]]_fatbin_wrapper)
	// CHECK: call{{.}}cudaRegisterFatBinary{{.}}__cuda_fatbin_wrapper			// CHECK: call{{.}}[[PREFIX]]RegisterFatBinary{{.}}__[[PREFIX]]_fatbin_wrapper
	// .. stores return value in __cuda_gpubin_handle			// .. stores return value in __[[PREFIX]]_gpubin_handle
	// CHECK-NEXT: store{{.*}}__cuda_gpubin_handle			// CHECK-NEXT: store{{.*}}__[[PREFIX]]_gpubin_handle
	// .. and then calls __cuda_register_globals			// .. and then calls __[[PREFIX]]_register_globals
	// CHECK-NEXT: call void @__cuda_register_globals			// CHECK-NEXT: call void @__[[PREFIX]]_register_globals

	// Test that we've created destructor.			// Test that we've created destructor.
	// CHECK: define internal void @__cuda_module_dtor			// CHECK: define internal void @__[[PREFIX]]_module_dtor
	// CHECK: load{{.*}}__cuda_gpubin_handle			// CHECK: load{{.*}}__[[PREFIX]]_gpubin_handle
	// CHECK-NEXT: call void @__cudaUnregisterFatBinary			// CHECK-NEXT: call void @__[[PREFIX]]UnregisterFatBinary

	// There should be no __cuda_register_globals if we have no			// There should be no __[[PREFIX]]_register_globals if we have no
	// device-side globals, but we still need to register GPU binary.			// device-side globals, but we still need to register GPU binary.
	// Skip GPU binary string first.			// Skip GPU binary string first.
	// NOGLOBALS: @0 = private unnamed_addr constant{{.*}}			// NOGLOBALS: @0 = private unnamed_addr constant{{.*}}
	// NOGLOBALS-NOT: define internal void @__cuda_register_globals			// NOGLOBALS-NOT: define internal void @__{{.*}}_register_globals
	// NOGLOBALS: define internal void @__cuda_module_ctor			// NOGLOBALS: define internal void @__[[PREFIX:.*]]_module_ctor
	// NOGLOBALS: call{{.}}cudaRegisterFatBinary{{.}}__cuda_fatbin_wrapper			// NOGLOBALS: call{{.}}[[PREFIX]]RegisterFatBinary{{.}}__[[PREFIX]]_fatbin_wrapper
	// NOGLOBALS-NOT: call void @__cuda_register_globals			// NOGLOBALS-NOT: call void @__[[PREFIX]]_register_globals
	// NOGLOBALS: define internal void @__cuda_module_dtor			// NOGLOBALS: define internal void @__[[PREFIX]]_module_dtor
	// NOGLOBALS: call void @__cudaUnregisterFatBinary			// NOGLOBALS: call void @__[[PREFIX]]UnregisterFatBinary

	// There should be no constructors/destructors if we have no GPU binary.			// There should be no constructors/destructors if we have no GPU binary.
	// NOGPUBIN-NOT: define internal void @__cuda_register_globals			// NOGPUBIN-NOT: define internal void @__[[PREFIX]]_register_globals
	// NOGPUBIN-NOT: define internal void @__cuda_module_ctor			// NOGPUBIN-NOT: define internal void @__[[PREFIX]]_module_ctor
	// NOGPUBIN-NOT: define internal void @__cuda_module_dtor			// NOGPUBIN-NOT: define internal void @__[[PREFIX]]_module_dtor

test/CodeGenCUDA/kernel-call.cu

	// RUN: %clang_cc1 -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -emit-llvm %s -o - \| FileCheck %s --check-prefixes=CUDA,CHECK
				// RUN: %clang_cc1 -x hip -emit-llvm %s -o - \| FileCheck %s --check-prefixes=HIP,CHECk


	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

				// CHECK-LABEL: define void @_Z2g1i(i32 %x)
				// HIP: call{{.*}}hipSetupArgument
				// HIP: call{{.*}}hipLaunchByPtr
				// CUDA: call{{.*}}cudaSetupArgument
				// CUDA: call{{.*}}cudaLaunch
	__global__ void g1(int x) {}			__global__ void g1(int x) {}

				// CHECK-LABEL: define i32 @main
	int main(void) {			int main(void) {
	// CHECK: call{{.*}}cudaConfigureCall			// HIP: call{{.*}}hipConfigureCall
				// CUDA: call{{.*}}cudaConfigureCall
	// CHECK: icmp			// CHECK: icmp
	// CHECK: br			// CHECK: br
	// CHECK: call{{.*}}g1			// CHECK: call{{.*}}g1
	g1<<<1, 1>>>(42);			g1<<<1, 1>>>(42);
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Add hip input kind and codegen for kernel launchingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 140090

include/clang/Basic/LangOptions.def

lib/CodeGen/CGCUDANV.cpp

lib/Frontend/CompilerInvocation.cpp

lib/Frontend/InitPreprocessor.cpp

lib/Sema/SemaCUDA.cpp

lib/Sema/SemaDecl.cpp

test/CodeGenCUDA/Inputs/cuda.h

test/CodeGenCUDA/device-stub.cu

test/CodeGenCUDA/kernel-call.cu

[HIP] Add hip input kind and codegen for kernel launching
ClosedPublic