This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
include/clang/
-
clang/
-
Basic/
-
IdentifierTable.h
-
LangOptions.def
-
Frontend/
-
FrontendOptions.h
-
LangStandards.def
-
lib/
-
CodeGen/
-
CGCUDANV.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
FrontendActions.cpp
-
InitPreprocessor.cpp
-
Sema/
-
SemaCUDA.cpp
-
SemaDecl.cpp
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
Inputs/
-
cuda.h
-
device-stub.cu
-
kernel-call.cu

Differential D44984

[HIP] Add hip input kind and codegen for kernel launching
ClosedPublic

Authored by yaxunl on Mar 28 2018, 9:25 AM.

Download Raw Diff

Details

Reviewers

rjmccall
tra

Commits

rG887c569bcb83: [HIP] Add hip input kind and codegen for kernel launching
rC330790: [HIP] Add hip input kind and codegen for kernel launching
rL330790: [HIP] Add hip input kind and codegen for kernel launching

Summary

HIP is a language similar to CUDA (https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md ).
The language syntax is very similar, which allows a hip program to be compiled as a CUDA program by Clang. The main difference
is the host API. HIP has a set of vendor neutral host API which can be implemented on different platforms. Currently there is open source
implementation of HIP runtime on amdgpu target (https://github.com/ROCm-Developer-Tools/HIP).

This patch adds support of input kind and language standard hip.

When hip file is compiled, both LangOpts.CUDA and LangOpts.HIP is turned on. This allows compilation of hip program as CUDA
in most cases and only special handling of hip program is needed LangOpts.HIP is checked.

This patch also adds support of kernel launching of HIP program using HIP host API.

When -x hip is not specified, there is no behaviour change for CUDA.

Patch by Greg Rodgers.
Revised and lit test added by Yaxun Liu.

Diff Detail

Repository: rL LLVM

Event Timeline

yaxunl created this revision.Mar 28 2018, 9:25 AM

Herald added a subscriber: tpr. · View Herald TranscriptMar 28 2018, 9:25 AM

yaxunl added subscribers: t-tye, b-sumner.Mar 28 2018, 9:26 AM

tra added a reviewer: tra.Mar 28 2018, 9:32 AM

The changes appear to cover only some of the functionality needed to enable HIP support. Do you have more patches in queue? Having complete picture would help to make sense of the overall plan.
I did ask for it in D42800, but I don't think I've got the answers. It would help a lot if you or @gregrodgers could write a doc somewhere outlining overall plan for HIP support in clang, what are the main issues that need to be dealt with, and at least a general idea on how to handle them.

As far as "add -x hip, and tweak runtime glue codegen" goes, the change looks OK, but it's not very useful all by itself. It leaves a lot of other issues unsolved and no clear plan on whether/when/how you are planning to deal with them.

As things stand right now, with this patch clang will still attempt to include CUDA headers, which, among other things will provide threadIdx/blockIdx and other CUDA-specific features.
Perhaps it would make sense to disable pre-inclusion of CUDA headers and, probably, disable use of CUDA's libdevice bitcode library if we're compiling with -x hip (i.e. -nocudainc -nocudalib).
If you do depend on CUDA headers, then, I suspect, you may need to adjust some wrapper headers we use for CUDA and that change should probably come before this one.

test/CodeGenCUDA/device-stub.cu
2–9 ↗	(On Diff #140090)	Please wrap the long lines.

In D44984#1050526, @tra wrote:

The changes appear to cover only some of the functionality needed to enable HIP support. Do you have more patches in queue? Having complete picture would help to make sense of the overall plan.
I did ask for it in D42800, but I don't think I've got the answers. It would help a lot if you or @gregrodgers could write a doc somewhere outlining overall plan for HIP support in clang, what are the main issues that need to be dealt with, and at least a general idea on how to handle them.

As far as "add -x hip, and tweak runtime glue codegen" goes, the change looks OK, but it's not very useful all by itself. It leaves a lot of other issues unsolved and no clear plan on whether/when/how you are planning to deal with them.

As things stand right now, with this patch clang will still attempt to include CUDA headers, which, among other things will provide threadIdx/blockIdx and other CUDA-specific features.
Perhaps it would make sense to disable pre-inclusion of CUDA headers and, probably, disable use of CUDA's libdevice bitcode library if we're compiling with -x hip (i.e. -nocudainc -nocudalib).
If you do depend on CUDA headers, then, I suspect, you may need to adjust some wrapper headers we use for CUDA and that change should probably come before this one.

Hi Artem, I am responsible for upstreaming Greg's work and addressing reviewers' comments.

Yes we already have a basic working implementation of HIP compiler due to Greg's work. I will either update D42800 or create a new review about the toolchain changes for compiling and linking HIP programs. Essentially HIP has its own header files and device libraries which are taken care of by the toolchain patch.

Since the header file and library seem not to affect this patch, is it OK to defer their changes to be part of the toolchain patch?

Thanks.

In D44984#1050557, @yaxunl wrote:

Yes we already have a basic working implementation of HIP compiler due to Greg's work.

That is great, but it's not necessarily true that all these changes will make it into clang/llvm as is. LLVM/Clang is a community effort and it helps a lot to get the changes in when the community understands what is it you're planning to do. I personally am very glad to see AMD moving towards making clang a viable compiler for AMD GPUs, but there's only so much I'll be able to do to help you with reviews if all I have is either piecemeal patches with little idea how they all fit together or one humongous patch I would have no time to dive in and really understand. Considering that compilation for GPU is a fairly niche market my bet is that your progress will be bottlenecked by the code reviews. Whatever you can do to make reviewers jobs easier by giving more context will help a lot with upstreaming the patches.

I will either update D42800 or create a new review about the toolchain changes for compiling and linking HIP programs. Essentially HIP has its own header files and device libraries which are taken care of by the toolchain patch.

Fair enough. I'll wait for the rest of the patches. If you have multiple pending patches, it helps if you could arrange them as dependent patches in phabricator. It makes it easier to see the big picture.

Since the header file and library seem not to affect this patch, is it OK to defer their changes to be part of the toolchain patch?

I'm not sure I understand. Could you elaborate?

You should send an RFC to cfe-dev about adding this new language mode. I understand that it's very similar to an existing language mode that we already support, and that's definitely we'll consider, but we shouldn't just agree to add new language modes in patch review.

In D44984#1050672, @rjmccall wrote:

You should send an RFC to cfe-dev about adding this new language mode. I understand that it's very similar to an existing language mode that we already support, and that's definitely we'll consider, but we shouldn't just agree to add new language modes in patch review.

RFC sent http://lists.llvm.org/pipermail/cfe-dev/2018-March/057426.html

Thanks.

Since the header file and library seem not to affect this patch, is it OK to defer their changes to be part of the toolchain patch?

I'm not sure I understand. Could you elaborate?

clang -cc1 does not include __clang_cuda_runtime_wrapper.h by default when clang -cc1 is called directly to compile CUDA programs. CUDA toolchain adds -include __clang_cuda_runtime_wrapper.h when compiling CUDA program as kernel code. Therefore if clang -cc1 is used to compile HIP program in lit test, there is no need to use -fnocudainc.

This patch mainly changes kernel launching API function names. The implement and testing of this change does not depend on the CUDA/HIP header files. A minimum header like test/CodeGenCUDA/Input/cuda.h is sufficient for testing this patch.

Basically this patch is only concerns about -cc1 and therefore is independent of the toolchain changes about header and library files.

mkuron added a subscriber: mkuron.Mar 31 2018, 2:38 AM

ping. Any further changes need to be done for this patch? Thanks.

yaxunl added a child revision: D45212: Add HIP toolchain.Apr 10 2018, 9:47 AM

rjmccall added inline comments.Apr 13 2018, 12:06 AM

lib/CodeGen/CGCUDANV.cpp
98 ↗	(On Diff #140090)	Can you take these as StringRefs or Twines?
104 ↗	(On Diff #140090)	I think "addUnderscoredPrefixToName" would be better.
134 ↗	(On Diff #140090)	Please move this comment down into the else clause (and terminate it with a semicolon) and add your own declaration comment in your clause.
lib/Frontend/CompilerInvocation.cpp
2109 ↗	(On Diff #140090)	Why is this done here? We infer the language mode from the input kind somewhere else.

yaxunl marked 4 inline comments as done.Apr 13 2018, 8:35 AM

yaxunl added inline comments.

lib/Frontend/CompilerInvocation.cpp
2109 ↗	(On Diff #140090)	It is usually done through CompilerInvocation::setLangDefaults. However, HIP does not have its own input kind nor is it defined as a language standard. Therefore it cannot use CompilerInvocation::setLangDefaults to set Opts.HIP.

Revised by John's comments.

rjmccall added inline comments.Apr 14 2018, 2:59 AM

lib/Frontend/CompilerInvocation.cpp
2109 ↗	(On Diff #140090)	What are the values of -x if not input kinds or language standards?

yaxunl marked 2 inline comments as done.Apr 17 2018, 12:46 PM

yaxunl added inline comments.

lib/Frontend/CompilerInvocation.cpp
2109 ↗	(On Diff #140090)	I will add hip as input kind and language standard since it really is both.

Add hip as input kind and language standard.

yaxunl added a child revision: D45441: [HIP] Add predefined macros __HIPCC__ and __HIP_DEVICE_COMPILE__.Apr 17 2018, 12:51 PM

tra added inline comments.Apr 17 2018, 3:29 PM

lib/CodeGen/CGCUDANV.cpp
51–52 ↗	(On Diff #142818)	`const CodeGenModule &CGM`
lib/Frontend/InitPreprocessor.cpp
466–467 ↗	(On Diff #142818)	Is `__CUDA__` supposed to be set during HIP compilation? My guess is that `__HIP__` and `__CUDA__` should be mutually exclusive. You do set LangOpts.CUDA during HIP compilation, so this should be changed to `if (CUDA && ! HIP)`
lib/Sema/SemaDecl.cpp
9051–9054 ↗	(On Diff #142818)	This would be somewhat easier to read: if (II && II->isStr(getLangOpts().HIP ? "hipConfigureCall" : "cudaConfigureCall") && ...
test/CodeGenCUDA/device-stub.cu
2–8 ↗	(On Diff #142818)	The changes in this file do not seem to have anything related to the code changes in this patch. Did you intend to add some HIP tests here?

tra mentioned this in D45489: [HIP] Add input type for HIP.Apr 17 2018, 4:35 PM

yaxunl added a parent revision: D45489: [HIP] Add input type for HIP.Apr 18 2018, 11:07 AM

yaxunl removed a child revision: D45212: Add HIP toolchain.

yaxunl marked 4 inline comments as done.Apr 18 2018, 11:59 AM

yaxunl added inline comments.

lib/Frontend/InitPreprocessor.cpp
466–467 ↗	(On Diff #142818)	HIP documentation does not require `__CUDA__` to be defined. Will make changes as you suggested.

Revised by Artem's comments.

ping

Otherwise LGTM.

lib/CodeGen/CGCUDANV.cpp
51–52 ↗	(On Diff #142818)	Why doesn't the CGNVCUDARuntime just hold on to a reference to the CGM? That's what we do with all the other separated singletons (like the CGCXXABI), and it would let you avoid some of the redundant fields like Context and TheModule.

tra added inline comments.Apr 24 2018, 10:50 AM

lib/CodeGen/CGCUDANV.cpp
51–52 ↗	(On Diff #142818)	Actually, CGCUDARuntime already has CGM field, so the CGM argument can be just dropped.

Remove CodeGenModule argument from addPrefix* functions.

tra added inline comments.Apr 24 2018, 12:04 PM

test/CodeGenCUDA/device-stub.cu
2–8 ↗	(On Diff #142818)	Do you need these changes?

yaxunl marked 2 inline comments as done.Apr 24 2018, 12:36 PM

yaxunl added inline comments.

test/CodeGenCUDA/device-stub.cu
2–8 ↗	(On Diff #142818)	Sorry, some changes about HIP were lost during revision. I will get back those changes.

Add back HIP related changes to the tests.

tra accepted this revision.Apr 24 2018, 1:37 PM

This revision is now accepted and ready to land.Apr 24 2018, 1:37 PM

Thank you.

Closed by commit rL330790: [HIP] Add hip input kind and codegen for kernel launching (authored by yaxunl). · Explain WhyApr 24 2018, 6:16 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptApr 24 2018, 6:16 PM

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Basic/

IdentifierTable.h

6 lines

LangOptions.def

1 line

Frontend/

FrontendOptions.h

1 line

LangStandards.def

4 lines

lib/

CodeGen/

CGCUDANV.cpp

58 lines

Frontend/

CompilerInvocation.cpp

13 lines

FrontendActions.cpp

1 line

InitPreprocessor.cpp

4 lines

Sema/

SemaCUDA.cpp

5 lines

SemaDecl.cpp

6 lines

test/

CodeGenCUDA/

Inputs/

cuda.h

5 lines

device-stub.cu

99 lines

kernel-call.cu

13 lines

Diff 143849

cfe/trunk/include/clang/Basic/IdentifierTable.h

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	public:
///		///
/// This is intended to be used for string literals only: II->isStr("foo").		/// This is intended to be used for string literals only: II->isStr("foo").
template <std::size_t StrLen>		template <std::size_t StrLen>
bool isStr(const char (&Str)[StrLen]) const {		bool isStr(const char (&Str)[StrLen]) const {
return getLength() == StrLen-1 &&		return getLength() == StrLen-1 &&
memcmp(getNameStart(), Str, StrLen-1) == 0;		memcmp(getNameStart(), Str, StrLen-1) == 0;
}		}

		/// \brief Return true if this is the identifier for the specified StringRef.
		bool isStr(llvm::StringRef Str) const {
		llvm::StringRef ThisStr(getNameStart(), getLength());
		return ThisStr == Str;
		}

/// \brief Return the beginning of the actual null-terminated string for this		/// \brief Return the beginning of the actual null-terminated string for this
/// identifier.		/// identifier.
const char *getNameStart() const {		const char *getNameStart() const {
if (Entry) return Entry->getKeyData();		if (Entry) return Entry->getKeyData();
// FIXME: This is gross. It would be best not to embed specific details		// FIXME: This is gross. It would be best not to embed specific details
// of the PTH file format here.		// of the PTH file format here.
// The 'this' pointer really points to a		// The 'this' pointer really points to a
// std::pair<IdentifierInfo, const char*>, where internal pointer		// std::pair<IdentifierInfo, const char*>, where internal pointer
▲ Show 20 Lines • Show All 818 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines
	LANGOPT(OpenCL , 1, 0, "OpenCL")			LANGOPT(OpenCL , 1, 0, "OpenCL")
	LANGOPT(OpenCLVersion , 32, 0, "OpenCL C version")			LANGOPT(OpenCLVersion , 32, 0, "OpenCL C version")
	LANGOPT(OpenCLCPlusPlus , 1, 0, "OpenCL C++")			LANGOPT(OpenCLCPlusPlus , 1, 0, "OpenCL C++")
	LANGOPT(OpenCLCPlusPlusVersion , 32, 0, "OpenCL C++ version")			LANGOPT(OpenCLCPlusPlusVersion , 32, 0, "OpenCL C++ version")
	LANGOPT(NativeHalfType , 1, 0, "Native half type support")			LANGOPT(NativeHalfType , 1, 0, "Native half type support")
	LANGOPT(NativeHalfArgsAndReturns, 1, 0, "Native half args and returns")			LANGOPT(NativeHalfArgsAndReturns, 1, 0, "Native half args and returns")
	LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")			LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")
	LANGOPT(CUDA , 1, 0, "CUDA")			LANGOPT(CUDA , 1, 0, "CUDA")
				LANGOPT(HIP , 1, 0, "HIP")
	LANGOPT(OpenMP , 32, 0, "OpenMP support and version of OpenMP (31, 40 or 45)")			LANGOPT(OpenMP , 32, 0, "OpenMP support and version of OpenMP (31, 40 or 45)")
	LANGOPT(OpenMPSimd , 1, 0, "Use SIMD only OpenMP support.")			LANGOPT(OpenMPSimd , 1, 0, "Use SIMD only OpenMP support.")
	LANGOPT(OpenMPUseTLS , 1, 0, "Use TLS for threadprivates or runtime calls")			LANGOPT(OpenMPUseTLS , 1, 0, "Use TLS for threadprivates or runtime calls")
	LANGOPT(OpenMPIsDevice , 1, 0, "Generate code only for OpenMP target device")			LANGOPT(OpenMPIsDevice , 1, 0, "Generate code only for OpenMP target device")
	LANGOPT(OpenMPCUDAMode , 1, 0, "Generate code for OpenMP pragmas in SIMT/SPMD mode")			LANGOPT(OpenMPCUDAMode , 1, 0, "Generate code for OpenMP pragmas in SIMT/SPMD mode")
	LANGOPT(RenderScript , 1, 0, "RenderScript")			LANGOPT(RenderScript , 1, 0, "RenderScript")

	LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")			LANGOPT(CUDAIsDevice , 1, 0, "compiling for CUDA device")
	▲ Show 20 Lines • Show All 101 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Frontend/FrontendOptions.h

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	enum Language {
///@{ Languages that the frontend can parse and compile.		///@{ Languages that the frontend can parse and compile.
C,		C,
CXX,		CXX,
ObjC,		ObjC,
ObjCXX,		ObjCXX,
OpenCL,		OpenCL,
CUDA,		CUDA,
RenderScript,		RenderScript,
		HIP,
///@}		///@}
};		};

/// The input file format.		/// The input file format.
enum Format {		enum Format {
Source,		Source,
ModuleMap,		ModuleMap,
Precompiled		Precompiled
▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Frontend/LangStandards.def

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	LANGSTANDARD_ALIAS_DEPR(opencl11, "CL1.1")			LANGSTANDARD_ALIAS_DEPR(opencl11, "CL1.1")
	LANGSTANDARD_ALIAS_DEPR(opencl12, "CL1.2")			LANGSTANDARD_ALIAS_DEPR(opencl12, "CL1.2")
	LANGSTANDARD_ALIAS_DEPR(opencl20, "CL2.0")			LANGSTANDARD_ALIAS_DEPR(opencl20, "CL2.0")

	// CUDA			// CUDA
	LANGSTANDARD(cuda, "cuda", CUDA, "NVIDIA CUDA(tm)",			LANGSTANDARD(cuda, "cuda", CUDA, "NVIDIA CUDA(tm)",
	LineComment \| CPlusPlus \| Digraphs)			LineComment \| CPlusPlus \| Digraphs)

				// HIP
				LANGSTANDARD(hip, "hip", HIP, "HIP",
				LineComment \| CPlusPlus \| Digraphs)

	#undef LANGSTANDARD			#undef LANGSTANDARD
	#undef LANGSTANDARD_ALIAS			#undef LANGSTANDARD_ALIAS
	#undef LANGSTANDARD_ALIAS_DEPR			#undef LANGSTANDARD_ALIAS_DEPR

cfe/trunk/lib/CodeGen/CGCUDANV.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	private:
bool RelocatableDeviceCode;		bool RelocatableDeviceCode;

llvm::Constant *getSetupArgumentFn() const;		llvm::Constant *getSetupArgumentFn() const;
llvm::Constant *getLaunchFn() const;		llvm::Constant *getLaunchFn() const;

llvm::FunctionType *getRegisterGlobalsFnTy() const;		llvm::FunctionType *getRegisterGlobalsFnTy() const;
llvm::FunctionType *getCallbackFnTy() const;		llvm::FunctionType *getCallbackFnTy() const;
llvm::FunctionType *getRegisterLinkedBinaryFnTy() const;		llvm::FunctionType *getRegisterLinkedBinaryFnTy() const;
		std::string addPrefixToName(StringRef FuncName) const;
		std::string addUnderscoredPrefixToName(StringRef FuncName) const;

/// Creates a function to register all kernel stubs generated in this module.		/// Creates a function to register all kernel stubs generated in this module.
llvm::Function *makeRegisterGlobalsFn();		llvm::Function *makeRegisterGlobalsFn();

/// Helper function that generates a constant string and returns a pointer to		/// Helper function that generates a constant string and returns a pointer to
/// the start of the string. The result of this function can be used anywhere		/// the start of the string. The result of this function can be used anywhere
/// where the C code specifies const char*.		/// where the C code specifies const char*.
llvm::Constant *makeConstantString(const std::string &Str,		llvm::Constant *makeConstantString(const std::string &Str,
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	public:
/// Creates module constructor function		/// Creates module constructor function
llvm::Function *makeModuleCtorFunction() override;		llvm::Function *makeModuleCtorFunction() override;
/// Creates module destructor function		/// Creates module destructor function
llvm::Function *makeModuleDtorFunction() override;		llvm::Function *makeModuleDtorFunction() override;
};		};

}		}

		std::string CGNVCUDARuntime::addPrefixToName(StringRef FuncName) const {
		if (CGM.getLangOpts().HIP)
		return ((Twine("hip") + Twine(FuncName)).str());
		return ((Twine("cuda") + Twine(FuncName)).str());
		}
		std::string
		CGNVCUDARuntime::addUnderscoredPrefixToName(StringRef FuncName) const {
		if (CGM.getLangOpts().HIP)
		return ((Twine("__hip") + Twine(FuncName)).str());
		return ((Twine("__cuda") + Twine(FuncName)).str());
		}

CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)		CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM)
: CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),		: CGCUDARuntime(CGM), Context(CGM.getLLVMContext()),
TheModule(CGM.getModule()),		TheModule(CGM.getModule()),
RelocatableDeviceCode(CGM.getLangOpts().CUDARelocatableDeviceCode) {		RelocatableDeviceCode(CGM.getLangOpts().CUDARelocatableDeviceCode) {
CodeGen::CodeGenTypes &Types = CGM.getTypes();		CodeGen::CodeGenTypes &Types = CGM.getTypes();
ASTContext &Ctx = CGM.getContext();		ASTContext &Ctx = CGM.getContext();

IntTy = CGM.IntTy;		IntTy = CGM.IntTy;
SizeTy = CGM.SizeTy;		SizeTy = CGM.SizeTy;
VoidTy = CGM.VoidTy;		VoidTy = CGM.VoidTy;

CharPtrTy = llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));		CharPtrTy = llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy));
VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));		VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy));
VoidPtrPtrTy = VoidPtrTy->getPointerTo();		VoidPtrPtrTy = VoidPtrTy->getPointerTo();
}		}

llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {		llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const {
// cudaError_t cudaSetupArgument(void *, size_t, size_t)		// cudaError_t cudaSetupArgument(void *, size_t, size_t)
llvm::Type *Params[] = {VoidPtrTy, SizeTy, SizeTy};		llvm::Type *Params[] = {VoidPtrTy, SizeTy, SizeTy};
return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy,		return CGM.CreateRuntimeFunction(
Params, false),		llvm::FunctionType::get(IntTy, Params, false),
"cudaSetupArgument");		addPrefixToName("SetupArgument"));
}		}

llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {		llvm::Constant *CGNVCUDARuntime::getLaunchFn() const {
// cudaError_t cudaLaunch(char *)		if (CGM.getLangOpts().HIP) {
		// hipError_t hipLaunchByPtr(char *);
		return CGM.CreateRuntimeFunction(
		llvm::FunctionType::get(IntTy, CharPtrTy, false), "hipLaunchByPtr");
		} else {
		// cudaError_t cudaLaunch(char *);
return CGM.CreateRuntimeFunction(		return CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch");		llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch");
}		}
		}

llvm::FunctionType *CGNVCUDARuntime::getRegisterGlobalsFnTy() const {		llvm::FunctionType *CGNVCUDARuntime::getRegisterGlobalsFnTy() const {
return llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false);		return llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false);
}		}

llvm::FunctionType *CGNVCUDARuntime::getCallbackFnTy() const {		llvm::FunctionType *CGNVCUDARuntime::getCallbackFnTy() const {
return llvm::FunctionType::get(VoidTy, VoidPtrTy, false);		return llvm::FunctionType::get(VoidTy, VoidPtrTy, false);
}		}
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines
/// \endcode		/// \endcode
llvm::Function *CGNVCUDARuntime::makeRegisterGlobalsFn() {		llvm::Function *CGNVCUDARuntime::makeRegisterGlobalsFn() {
// No need to register anything		// No need to register anything
if (EmittedKernels.empty() && DeviceVars.empty())		if (EmittedKernels.empty() && DeviceVars.empty())
return nullptr;		return nullptr;

llvm::Function *RegisterKernelsFunc = llvm::Function::Create(		llvm::Function *RegisterKernelsFunc = llvm::Function::Create(
getRegisterGlobalsFnTy(), llvm::GlobalValue::InternalLinkage,		getRegisterGlobalsFnTy(), llvm::GlobalValue::InternalLinkage,
"__cuda_register_globals", &TheModule);		addUnderscoredPrefixToName("_register_globals"), &TheModule);
llvm::BasicBlock *EntryBB =		llvm::BasicBlock *EntryBB =
llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);		llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc);
CGBuilderTy Builder(CGM, Context);		CGBuilderTy Builder(CGM, Context);
Builder.SetInsertPoint(EntryBB);		Builder.SetInsertPoint(EntryBB);

// void __cudaRegisterFunction(void *, const char , char , const char ,		// void __cudaRegisterFunction(void *, const char , char , const char ,
// int, uint3, uint3, dim3, dim3, int*)		// int, uint3, uint3, dim3, dim3, int*)
llvm::Type *RegisterFuncParams[] = {		llvm::Type *RegisterFuncParams[] = {
VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy,		VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy,
VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, IntTy->getPointerTo()};		VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, IntTy->getPointerTo()};
llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction(		llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, RegisterFuncParams, false),		llvm::FunctionType::get(IntTy, RegisterFuncParams, false),
"__cudaRegisterFunction");		addUnderscoredPrefixToName("RegisterFunction"));

// Extract GpuBinaryHandle passed as the first argument passed to		// Extract GpuBinaryHandle passed as the first argument passed to
// __cuda_register_globals() and generate __cudaRegisterFunction() call for		// __cuda_register_globals() and generate __cudaRegisterFunction() call for
// each emitted kernel.		// each emitted kernel.
llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();		llvm::Argument &GpuBinaryHandlePtr = *RegisterKernelsFunc->arg_begin();
for (llvm::Function *Kernel : EmittedKernels) {		for (llvm::Function *Kernel : EmittedKernels) {
llvm::Constant *KernelName = makeConstantString(Kernel->getName());		llvm::Constant *KernelName = makeConstantString(Kernel->getName());
llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);		llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy);
llvm::Value *Args[] = {		llvm::Value *Args[] = {
&GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy),		&GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy),
KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1), NullPtr,		KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1), NullPtr,
NullPtr, NullPtr, NullPtr,		NullPtr, NullPtr, NullPtr,
llvm::ConstantPointerNull::get(IntTy->getPointerTo())};		llvm::ConstantPointerNull::get(IntTy->getPointerTo())};
Builder.CreateCall(RegisterFunc, Args);		Builder.CreateCall(RegisterFunc, Args);
}		}

// void __cudaRegisterVar(void *, char , char , const char ,		// void __cudaRegisterVar(void *, char , char , const char ,
// int, int, int, int)		// int, int, int, int)
llvm::Type *RegisterVarParams[] = {VoidPtrPtrTy, CharPtrTy, CharPtrTy,		llvm::Type *RegisterVarParams[] = {VoidPtrPtrTy, CharPtrTy, CharPtrTy,
CharPtrTy, IntTy, IntTy,		CharPtrTy, IntTy, IntTy,
IntTy, IntTy};		IntTy, IntTy};
llvm::Constant *RegisterVar = CGM.CreateRuntimeFunction(		llvm::Constant *RegisterVar = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(IntTy, RegisterVarParams, false),		llvm::FunctionType::get(IntTy, RegisterVarParams, false),
"__cudaRegisterVar");		addUnderscoredPrefixToName("RegisterVar"));
for (auto &Pair : DeviceVars) {		for (auto &Pair : DeviceVars) {
llvm::GlobalVariable *Var = Pair.first;		llvm::GlobalVariable *Var = Pair.first;
unsigned Flags = Pair.second;		unsigned Flags = Pair.second;
llvm::Constant *VarName = makeConstantString(Var->getName());		llvm::Constant *VarName = makeConstantString(Var->getName());
uint64_t VarSize =		uint64_t VarSize =
CGM.getDataLayout().getTypeAllocSize(Var->getValueType());		CGM.getDataLayout().getTypeAllocSize(Var->getValueType());
llvm::Value *Args[] = {		llvm::Value *Args[] = {
&GpuBinaryHandlePtr,		&GpuBinaryHandlePtr,
Show All 29 Lines	llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
// We always need a function to pass in as callback. Create a dummy		// We always need a function to pass in as callback. Create a dummy
// implementation if we don't need to register anything.		// implementation if we don't need to register anything.
if (RelocatableDeviceCode && !RegisterGlobalsFunc)		if (RelocatableDeviceCode && !RegisterGlobalsFunc)
RegisterGlobalsFunc = makeDummyFunction(getRegisterGlobalsFnTy());		RegisterGlobalsFunc = makeDummyFunction(getRegisterGlobalsFnTy());

// void ** __cudaRegisterFatBinary(void *);		// void ** __cudaRegisterFatBinary(void *);
llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(		llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),		llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false),
"__cudaRegisterFatBinary");		addUnderscoredPrefixToName("RegisterFatBinary"));
// struct { int magic, int version, void * gpu_binary, void * dont_care };		// struct { int magic, int version, void * gpu_binary, void * dont_care };
llvm::StructType *FatbinWrapperTy =		llvm::StructType *FatbinWrapperTy =
llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy);		llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy);

// Register GPU binary with the CUDA runtime, store returned handle in a		// Register GPU binary with the CUDA runtime, store returned handle in a
// global variable and save a reference in GpuBinaryHandle to be cleaned up		// global variable and save a reference in GpuBinaryHandle to be cleaned up
// in destructor on exit. Then associate all known kernels with the GPU binary		// in destructor on exit. Then associate all known kernels with the GPU binary
// handle so CUDA runtime can figure out what to call on the GPU side.		// handle so CUDA runtime can figure out what to call on the GPU side.
llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr =		llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr =
llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName);		llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName);
if (std::error_code EC = GpuBinaryOrErr.getError()) {		if (std::error_code EC = GpuBinaryOrErr.getError()) {
CGM.getDiags().Report(diag::err_cannot_open_file)		CGM.getDiags().Report(diag::err_cannot_open_file)
<< GpuBinaryFileName << EC.message();		<< GpuBinaryFileName << EC.message();
return nullptr;		return nullptr;
}		}

llvm::Function *ModuleCtorFunc = llvm::Function::Create(		llvm::Function *ModuleCtorFunc = llvm::Function::Create(
llvm::FunctionType::get(VoidTy, VoidPtrTy, false),		llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
llvm::GlobalValue::InternalLinkage, "__cuda_module_ctor", &TheModule);		llvm::GlobalValue::InternalLinkage,
		addUnderscoredPrefixToName("_module_ctor"), &TheModule);
llvm::BasicBlock *CtorEntryBB =		llvm::BasicBlock *CtorEntryBB =
llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc);		llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc);
CGBuilderTy CtorBuilder(CGM, Context);		CGBuilderTy CtorBuilder(CGM, Context);

CtorBuilder.SetInsertPoint(CtorEntryBB);		CtorBuilder.SetInsertPoint(CtorEntryBB);

const char *FatbinConstantName;		const char *FatbinConstantName;
if (RelocatableDeviceCode)		if (RelocatableDeviceCode)
Show All 16 Lines	llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() {
// Fatbin version.		// Fatbin version.
Values.addInt(IntTy, 1);		Values.addInt(IntTy, 1);
// Data.		// Data.
Values.add(makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "",		Values.add(makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "",
FatbinConstantName, 8));		FatbinConstantName, 8));
// Unused in fatbin v1.		// Unused in fatbin v1.
Values.add(llvm::ConstantPointerNull::get(VoidPtrTy));		Values.add(llvm::ConstantPointerNull::get(VoidPtrTy));
llvm::GlobalVariable *FatbinWrapper = Values.finishAndCreateGlobal(		llvm::GlobalVariable *FatbinWrapper = Values.finishAndCreateGlobal(
"__cuda_fatbin_wrapper", CGM.getPointerAlign(),		addUnderscoredPrefixToName("_fatbin_wrapper"), CGM.getPointerAlign(),
/constant/ true);		/constant/ true);
FatbinWrapper->setSection(FatbinSectionName);		FatbinWrapper->setSection(FatbinSectionName);

// Register binary with CUDA runtime. This is substantially different in		// Register binary with CUDA runtime. This is substantially different in
// default mode vs. separate compilation!		// default mode vs. separate compilation!
if (!RelocatableDeviceCode) {		if (!RelocatableDeviceCode) {
// GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);		// GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper);
llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(		llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall(
RegisterFatbinFunc,		RegisterFatbinFunc,
CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));		CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy));
GpuBinaryHandle = new llvm::GlobalVariable(		GpuBinaryHandle = new llvm::GlobalVariable(
TheModule, VoidPtrPtrTy, false, llvm::GlobalValue::InternalLinkage,		TheModule, VoidPtrPtrTy, false, llvm::GlobalValue::InternalLinkage,
llvm::ConstantPointerNull::get(VoidPtrPtrTy), "__cuda_gpubin_handle");		llvm::ConstantPointerNull::get(VoidPtrPtrTy),
		addUnderscoredPrefixToName("_gpubin_handle"));

CtorBuilder.CreateAlignedStore(RegisterFatbinCall, GpuBinaryHandle,		CtorBuilder.CreateAlignedStore(RegisterFatbinCall, GpuBinaryHandle,
CGM.getPointerAlign());		CGM.getPointerAlign());

// Call __cuda_register_globals(GpuBinaryHandle);		// Call __cuda_register_globals(GpuBinaryHandle);
if (RegisterGlobalsFunc)		if (RegisterGlobalsFunc)
CtorBuilder.CreateCall(RegisterGlobalsFunc, RegisterFatbinCall);		CtorBuilder.CreateCall(RegisterGlobalsFunc, RegisterFatbinCall);
} else {		} else {
// Generate a unique module ID.		// Generate a unique module ID.
SmallString<64> NVModuleID;		SmallString<64> NVModuleID;
llvm::raw_svector_ostream OS(NVModuleID);		llvm::raw_svector_ostream OS(NVModuleID);
OS << "__nv_" << llvm::format("%x", FatbinWrapper->getGUID());		OS << "__nv_" << llvm::format("%x", FatbinWrapper->getGUID());
llvm::Constant *NVModuleIDConstant =		llvm::Constant *NVModuleIDConstant =
makeConstantString(NVModuleID.str(), "", NVModuleIDSectionName, 32);		makeConstantString(NVModuleID.str(), "", NVModuleIDSectionName, 32);

// Create an alias for the FatbinWrapper that nvcc will look for.		// Create an alias for the FatbinWrapper that nvcc will look for.
llvm::GlobalAlias::create(llvm::GlobalValue::ExternalLinkage,		llvm::GlobalAlias::create(llvm::GlobalValue::ExternalLinkage,
Twine("__fatbinwrap") + NVModuleID,		Twine("__fatbinwrap") + NVModuleID,
FatbinWrapper);		FatbinWrapper);

// void __cudaRegisterLinkedBinary%NVModuleID%(void ()(void ), void *,		// void __cudaRegisterLinkedBinary%NVModuleID%(void ()(void ), void *,
// void , void ()(void **))		// void , void ()(void **))
SmallString<128> RegisterLinkedBinaryName("__cudaRegisterLinkedBinary");		SmallString<128> RegisterLinkedBinaryName(
		addUnderscoredPrefixToName("RegisterLinkedBinary"));
RegisterLinkedBinaryName += NVModuleID;		RegisterLinkedBinaryName += NVModuleID;
llvm::Constant *RegisterLinkedBinaryFunc = CGM.CreateRuntimeFunction(		llvm::Constant *RegisterLinkedBinaryFunc = CGM.CreateRuntimeFunction(
getRegisterLinkedBinaryFnTy(), RegisterLinkedBinaryName);		getRegisterLinkedBinaryFnTy(), RegisterLinkedBinaryName);

assert(RegisterGlobalsFunc && "Expecting at least dummy function!");		assert(RegisterGlobalsFunc && "Expecting at least dummy function!");
llvm::Value *Args[] = {RegisterGlobalsFunc,		llvm::Value *Args[] = {RegisterGlobalsFunc,
CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy),		CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy),
NVModuleIDConstant,		NVModuleIDConstant,
Show All 15 Lines
llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() {		llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() {
// No need for destructor if we don't have a handle to unregister.		// No need for destructor if we don't have a handle to unregister.
if (!GpuBinaryHandle)		if (!GpuBinaryHandle)
return nullptr;		return nullptr;

// void __cudaUnregisterFatBinary(void ** handle);		// void __cudaUnregisterFatBinary(void ** handle);
llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction(		llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction(
llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),		llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false),
"__cudaUnregisterFatBinary");		addUnderscoredPrefixToName("UnregisterFatBinary"));

llvm::Function *ModuleDtorFunc = llvm::Function::Create(		llvm::Function *ModuleDtorFunc = llvm::Function::Create(
llvm::FunctionType::get(VoidTy, VoidPtrTy, false),		llvm::FunctionType::get(VoidTy, VoidPtrTy, false),
llvm::GlobalValue::InternalLinkage, "__cuda_module_dtor", &TheModule);		llvm::GlobalValue::InternalLinkage,
		addUnderscoredPrefixToName("_module_dtor"), &TheModule);

llvm::BasicBlock *DtorEntryBB =		llvm::BasicBlock *DtorEntryBB =
llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc);		llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc);
CGBuilderTy DtorBuilder(CGM, Context);		CGBuilderTy DtorBuilder(CGM, Context);
DtorBuilder.SetInsertPoint(DtorEntryBB);		DtorBuilder.SetInsertPoint(DtorEntryBB);

auto HandleValue =		auto HandleValue =
DtorBuilder.CreateAlignedLoad(GpuBinaryHandle, CGM.getPointerAlign());		DtorBuilder.CreateAlignedLoad(GpuBinaryHandle, CGM.getPointerAlign());
DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);		DtorBuilder.CreateCall(UnregisterFatbinFunc, HandleValue);

DtorBuilder.CreateRetVoid();		DtorBuilder.CreateRetVoid();
return ModuleDtorFunc;		return ModuleDtorFunc;
}		}

CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {		CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) {
return new CGNVCUDARuntime(CGM);		return new CGNVCUDARuntime(CGM);
}		}

cfe/trunk/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 1,602 Lines • ▼ Show 20 Lines	if (const Arg *A = Args.getLastArg(OPT_x)) {
IsHeaderFile =		IsHeaderFile =
!Preprocessed && !ModuleMap && XValue.consume_back("-header");		!Preprocessed && !ModuleMap && XValue.consume_back("-header");

// Principal languages.		// Principal languages.
DashX = llvm::StringSwitch<InputKind>(XValue)		DashX = llvm::StringSwitch<InputKind>(XValue)
.Case("c", InputKind::C)		.Case("c", InputKind::C)
.Case("cl", InputKind::OpenCL)		.Case("cl", InputKind::OpenCL)
.Case("cuda", InputKind::CUDA)		.Case("cuda", InputKind::CUDA)
		.Case("hip", InputKind::HIP)
.Case("c++", InputKind::CXX)		.Case("c++", InputKind::CXX)
.Case("objective-c", InputKind::ObjC)		.Case("objective-c", InputKind::ObjC)
.Case("objective-c++", InputKind::ObjCXX)		.Case("objective-c++", InputKind::ObjCXX)
.Case("renderscript", InputKind::RenderScript)		.Case("renderscript", InputKind::RenderScript)
.Default(InputKind::Unknown);		.Default(InputKind::Unknown);

// "objc[++]-cpp-output" is an acceptable synonym for		// "objc[++]-cpp-output" is an acceptable synonym for
// "objective-c[++]-cpp-output".		// "objective-c[++]-cpp-output".
▲ Show 20 Lines • Show All 263 Lines • ▼ Show 20 Lines	#if defined(CLANG_DEFAULT_STD_CXX)
LangStd = CLANG_DEFAULT_STD_CXX;		LangStd = CLANG_DEFAULT_STD_CXX;
#else		#else
LangStd = LangStandard::lang_gnucxx14;		LangStd = LangStandard::lang_gnucxx14;
#endif		#endif
break;		break;
case InputKind::RenderScript:		case InputKind::RenderScript:
LangStd = LangStandard::lang_c99;		LangStd = LangStandard::lang_c99;
break;		break;
		case InputKind::HIP:
		LangStd = LangStandard::lang_hip;
		break;
}		}
}		}

const LangStandard &Std = LangStandard::getLangStandardForKind(LangStd);		const LangStandard &Std = LangStandard::getLangStandardForKind(LangStd);
Opts.LineComment = Std.hasLineComments();		Opts.LineComment = Std.hasLineComments();
Opts.C99 = Std.isC99();		Opts.C99 = Std.isC99();
Opts.C11 = Std.isC11();		Opts.C11 = Std.isC11();
Opts.C17 = Std.isC17();		Opts.C17 = Std.isC17();
Show All 31 Lines	if (Opts.OpenCL) {
Opts.NativeHalfArgsAndReturns = 1;		Opts.NativeHalfArgsAndReturns = 1;
Opts.OpenCLCPlusPlus = Opts.CPlusPlus;		Opts.OpenCLCPlusPlus = Opts.CPlusPlus;
// Include default header file for OpenCL.		// Include default header file for OpenCL.
if (Opts.IncludeDefaultHeader) {		if (Opts.IncludeDefaultHeader) {
PPOpts.Includes.push_back("opencl-c.h");		PPOpts.Includes.push_back("opencl-c.h");
}		}
}		}

Opts.CUDA = IK.getLanguage() == InputKind::CUDA;		Opts.HIP = IK.getLanguage() == InputKind::HIP;
		Opts.CUDA = IK.getLanguage() == InputKind::CUDA \|\| Opts.HIP;
if (Opts.CUDA)		if (Opts.CUDA)
// Set default FP_CONTRACT to FAST.		// Set default FP_CONTRACT to FAST.
Opts.setDefaultFPContractMode(LangOptions::FPC_Fast);		Opts.setDefaultFPContractMode(LangOptions::FPC_Fast);

Opts.RenderScript = IK.getLanguage() == InputKind::RenderScript;		Opts.RenderScript = IK.getLanguage() == InputKind::RenderScript;
if (Opts.RenderScript) {		if (Opts.RenderScript) {
Opts.NativeHalfType = 1;		Opts.NativeHalfType = 1;
Opts.NativeHalfArgsAndReturns = 1;		Opts.NativeHalfArgsAndReturns = 1;
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	static bool IsInputCompatibleWithStandard(InputKind IK,
case InputKind::ObjCXX:		case InputKind::ObjCXX:
return S.getLanguage() == InputKind::CXX;		return S.getLanguage() == InputKind::CXX;

case InputKind::CUDA:		case InputKind::CUDA:
// FIXME: What -std= values should be permitted for CUDA compilations?		// FIXME: What -std= values should be permitted for CUDA compilations?
return S.getLanguage() == InputKind::CUDA \|\|		return S.getLanguage() == InputKind::CUDA \|\|
S.getLanguage() == InputKind::CXX;		S.getLanguage() == InputKind::CXX;

		case InputKind::HIP:
		return S.getLanguage() == InputKind::CXX \|\|
		S.getLanguage() == InputKind::HIP;

case InputKind::Asm:		case InputKind::Asm:
// Accept (and ignore) all -std= values.		// Accept (and ignore) all -std= values.
// FIXME: The -std= value is not ignored; it affects the tokenization		// FIXME: The -std= value is not ignored; it affects the tokenization
// and preprocessing rules if we're preprocessing this asm input.		// and preprocessing rules if we're preprocessing this asm input.
return true;		return true;
}		}

llvm_unreachable("unexpected input language");		llvm_unreachable("unexpected input language");
Show All 11 Lines	static const StringRef GetInputKindName(InputKind IK) {
case InputKind::ObjCXX:		case InputKind::ObjCXX:
return "Objective-C++";		return "Objective-C++";
case InputKind::OpenCL:		case InputKind::OpenCL:
return "OpenCL";		return "OpenCL";
case InputKind::CUDA:		case InputKind::CUDA:
return "CUDA";		return "CUDA";
case InputKind::RenderScript:		case InputKind::RenderScript:
return "RenderScript";		return "RenderScript";
		case InputKind::HIP:
		return "HIP";

case InputKind::Asm:		case InputKind::Asm:
return "Asm";		return "Asm";
case InputKind::LLVM_IR:		case InputKind::LLVM_IR:
return "LLVM IR";		return "LLVM IR";

case InputKind::Unknown:		case InputKind::Unknown:
break;		break;
▲ Show 20 Lines • Show All 1,131 Lines • Show Last 20 Lines

cfe/trunk/lib/Frontend/FrontendActions.cpp

	Show First 20 Lines • Show All 727 Lines • ▼ Show 20 Lines
	void PrintPreambleAction::ExecuteAction() {			void PrintPreambleAction::ExecuteAction() {
	switch (getCurrentFileKind().getLanguage()) {			switch (getCurrentFileKind().getLanguage()) {
	case InputKind::C:			case InputKind::C:
	case InputKind::CXX:			case InputKind::CXX:
	case InputKind::ObjC:			case InputKind::ObjC:
	case InputKind::ObjCXX:			case InputKind::ObjCXX:
	case InputKind::OpenCL:			case InputKind::OpenCL:
	case InputKind::CUDA:			case InputKind::CUDA:
				case InputKind::HIP:
	break;			break;

	case InputKind::Unknown:			case InputKind::Unknown:
	case InputKind::Asm:			case InputKind::Asm:
	case InputKind::LLVM_IR:			case InputKind::LLVM_IR:
	case InputKind::RenderScript:			case InputKind::RenderScript:
	// We can't do anything with these.			// We can't do anything with these.
	return;			return;
	Show All 14 Lines

cfe/trunk/lib/Frontend/InitPreprocessor.cpp

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	if (LangOpts.CPlusPlus) {

if (LangOpts.FastRelaxedMath)		if (LangOpts.FastRelaxedMath)
Builder.defineMacro("__FAST_RELAXED_MATH__");		Builder.defineMacro("__FAST_RELAXED_MATH__");
}		}
}		}
// Not "standard" per se, but available even with the -undef flag.		// Not "standard" per se, but available even with the -undef flag.
if (LangOpts.AsmPreprocessor)		if (LangOpts.AsmPreprocessor)
Builder.defineMacro("__ASSEMBLER__");		Builder.defineMacro("__ASSEMBLER__");
if (LangOpts.CUDA)		if (LangOpts.CUDA && !LangOpts.HIP)
Builder.defineMacro("__CUDA__");		Builder.defineMacro("__CUDA__");
		if (LangOpts.HIP)
		Builder.defineMacro("__HIP__");
}		}

/// Initialize the predefined C++ language feature test macros defined in		/// Initialize the predefined C++ language feature test macros defined in
/// ISO/IEC JTC1/SC22/WG21 (C++) SD-6: "SG10 Feature Test Recommendations".		/// ISO/IEC JTC1/SC22/WG21 (C++) SD-6: "SG10 Feature Test Recommendations".
static void InitializeCPlusPlusFeatureTestMacros(const LangOptions &LangOpts,		static void InitializeCPlusPlusFeatureTestMacros(const LangOptions &LangOpts,
MacroBuilder &Builder) {		MacroBuilder &Builder) {
// C++98 features.		// C++98 features.
if (LangOpts.RTTI)		if (LangOpts.RTTI)
▲ Show 20 Lines • Show All 681 Lines • Show Last 20 Lines

cfe/trunk/lib/Sema/SemaCUDA.cpp

Show All 36 Lines	bool Sema::PopForceCUDAHostDevice() {
return true;		return true;
}		}

ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,		ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,
MultiExprArg ExecConfig,		MultiExprArg ExecConfig,
SourceLocation GGGLoc) {		SourceLocation GGGLoc) {
FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();		FunctionDecl *ConfigDecl = Context.getcudaConfigureCallDecl();
if (!ConfigDecl)		if (!ConfigDecl)
return ExprError(Diag(LLLLoc, diag::err_undeclared_var_use)		return ExprError(
<< "cudaConfigureCall");		Diag(LLLLoc, diag::err_undeclared_var_use)
		<< (getLangOpts().HIP ? "hipConfigureCall" : "cudaConfigureCall"));
QualType ConfigQTy = ConfigDecl->getType();		QualType ConfigQTy = ConfigDecl->getType();

DeclRefExpr *ConfigDR = new (Context)		DeclRefExpr *ConfigDR = new (Context)
DeclRefExpr(ConfigDecl, false, ConfigQTy, VK_LValue, LLLLoc);		DeclRefExpr(ConfigDecl, false, ConfigQTy, VK_LValue, LLLLoc);
MarkFunctionReferenced(LLLLoc, ConfigDecl);		MarkFunctionReferenced(LLLLoc, ConfigDecl);

return ActOnCallExpr(S, ConfigDR, LLLLoc, ExecConfig, GGGLoc, nullptr,		return ActOnCallExpr(S, ConfigDR, LLLLoc, ExecConfig, GGGLoc, nullptr,
/IsExecConfig=/true);		/IsExecConfig=/true);
▲ Show 20 Lines • Show All 851 Lines • Show Last 20 Lines

cfe/trunk/lib/Sema/SemaDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,050 Lines • ▼ Show 20 Lines	if (D.isRedeclaration() && !Previous.empty()) {
checkDLLAttributeRedeclaration(*this, Prev, NewFD,		checkDLLAttributeRedeclaration(*this, Prev, NewFD,
isMemberSpecialization \|\|		isMemberSpecialization \|\|
isFunctionTemplateSpecialization,		isFunctionTemplateSpecialization,
D.isFunctionDefinition());		D.isFunctionDefinition());
}		}

if (getLangOpts().CUDA) {		if (getLangOpts().CUDA) {
IdentifierInfo *II = NewFD->getIdentifier();		IdentifierInfo *II = NewFD->getIdentifier();
if (II && II->isStr("cudaConfigureCall") && !NewFD->isInvalidDecl() &&		if (II &&
		II->isStr(getLangOpts().HIP ? "hipConfigureCall"
		: "cudaConfigureCall") &&
		!NewFD->isInvalidDecl() &&
NewFD->getDeclContext()->getRedeclContext()->isTranslationUnit()) {		NewFD->getDeclContext()->getRedeclContext()->isTranslationUnit()) {
if (!R->getAs<FunctionType>()->getReturnType()->isScalarType())		if (!R->getAs<FunctionType>()->getReturnType()->isScalarType())
Diag(NewFD->getLocation(), diag::err_config_scalar_return);		Diag(NewFD->getLocation(), diag::err_config_scalar_return);

Context.setcudaConfigureCallDecl(NewFD);		Context.setcudaConfigureCallDecl(NewFD);
}		}

// Variadic functions, other than a declaration of printf, are not allowed		// Variadic functions, other than a declaration of printf, are not allowed
// in device-side CUDA code, unless someone passed		// in device-side CUDA code, unless someone passed
// -fcuda-allow-variadic-functions.		// -fcuda-allow-variadic-functions.
if (!getLangOpts().CUDAAllowVariadicFunctions && NewFD->isVariadic() &&		if (!getLangOpts().CUDAAllowVariadicFunctions && NewFD->isVariadic() &&
(NewFD->hasAttr<CUDADeviceAttr>() \|\|		(NewFD->hasAttr<CUDADeviceAttr>() \|\|
▲ Show 20 Lines • Show All 7,802 Lines • Show Last 20 Lines

cfe/trunk/test/CodeGenCUDA/Inputs/cuda.h

	Show All 10 Lines

	struct dim3 {			struct dim3 {
	unsigned x, y, z;			unsigned x, y, z;
	__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}			__host__ __device__ dim3(unsigned x, unsigned y = 1, unsigned z = 1) : x(x), y(y), z(z) {}
	};			};

	typedef struct cudaStream *cudaStream_t;			typedef struct cudaStream *cudaStream_t;

				#ifdef __HIP__
				int hipConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,
				cudaStream_t stream = 0);
				#else
	int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,			int cudaConfigureCall(dim3 gridSize, dim3 blockSize, size_t sharedSize = 0,
	cudaStream_t stream = 0);			cudaStream_t stream = 0);
				#endif

	extern "C" __device__ int printf(const char*, ...);			extern "C" __device__ int printf(const char*, ...);

cfe/trunk/test/CodeGenCUDA/device-stub.cu

	// RUN: echo "GPU binary would be here" > %t			// RUN: echo "GPU binary would be here" > %t
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - \			// RUN: -fcuda-include-gpubinary %t -o - \
	// RUN: \| FileCheck %s --check-prefixes=ALL,NORDC			// RUN: \| FileCheck %s --check-prefixes=ALL,NORDC,CUDA
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS \			// RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS \
	// RUN: \| FileCheck %s -check-prefix=NOGLOBALS			// RUN: \| FileCheck %s -check-prefix=NOGLOBALS
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
	// RUN: -fcuda-rdc -fcuda-include-gpubinary %t -o - \			// RUN: -fcuda-rdc -fcuda-include-gpubinary %t -o - \
	// RUN: \| FileCheck %s --check-prefixes=ALL,RDC			// RUN: \| FileCheck %s --check-prefixes=ALL,RDC,CUDA
	// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - \			// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - \
	// RUN: \| FileCheck %s -check-prefix=NOGPUBIN			// RUN: \| FileCheck %s -check-prefix=NOGPUBIN

				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -fcuda-include-gpubinary %t -o - -x hip\
				// RUN: \| FileCheck %s --check-prefixes=ALL,NORDC,HIP
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -fcuda-include-gpubinary %t -o - -DNOGLOBALS -x hip \
				// RUN: \| FileCheck %s -check-prefix=NOGLOBALS
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s \
				// RUN: -fcuda-rdc -fcuda-include-gpubinary %t -o - -x hip \
				// RUN: \| FileCheck %s --check-prefixes=ALL,RDC,HIP
				// RUN: %clang_cc1 -triple x86_64-linux-gnu -emit-llvm %s -o - -x hip\
				// RUN: \| FileCheck %s -check-prefix=NOGPUBIN

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	#ifndef NOGLOBALS			#ifndef NOGLOBALS
	// ALL-DAG: @device_var = internal global i32			// ALL-DAG: @device_var = internal global i32
	__device__ int device_var;			__device__ int device_var;

	// ALL-DAG: @constant_var = internal global i32			// ALL-DAG: @constant_var = internal global i32
	__constant__ int constant_var;			__constant__ int constant_var;
	Show All 29 Lines
	// Make sure that all parts of GPU code init/cleanup are there:			// Make sure that all parts of GPU code init/cleanup are there:
	// * constant unnamed string with the kernel name			// * constant unnamed string with the kernel name
	// ALL: private unnamed_addr constant{{.}}kernelfunc{{.}}\00"			// ALL: private unnamed_addr constant{{.}}kernelfunc{{.}}\00"
	// * constant unnamed string with GPU binary			// * constant unnamed string with GPU binary
	// ALL: private unnamed_addr constant{{.GPU binary would be here.}}\00"			// ALL: private unnamed_addr constant{{.GPU binary would be here.}}\00"
	// NORDC-SAME: section ".nv_fatbin", align 8			// NORDC-SAME: section ".nv_fatbin", align 8
	// RDC-SAME: section "__nv_relfatbin", align 8			// RDC-SAME: section "__nv_relfatbin", align 8
	// * constant struct that wraps GPU binary			// * constant struct that wraps GPU binary
	// ALL: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8, i8 }			// CUDA: @__[[PREFIX:cuda]]_fatbin_wrapper = internal constant
				// CUDA-SAME: { i32, i32, i8, i8 }
				// HIP: @__[[PREFIX:hip]]_fatbin_wrapper = internal constant
				// HIP-SAME: { i32, i32, i8, i8 }
	// ALL-SAME: { i32 1180844977, i32 1, {{.}}, i8 null }			// ALL-SAME: { i32 1180844977, i32 1, {{.}}, i8 null }
	// ALL-SAME: section ".nvFatBinSegment"			// ALL-SAME: section ".nvFatBinSegment"
	// * variable to save GPU binary handle after initialization			// * variable to save GPU binary handle after initialization
	// NORDC: @__cuda_gpubin_handle = internal global i8** null			// NORDC: @__[[PREFIX]]_gpubin_handle = internal global i8** null
	// * constant unnamed string with NVModuleID			// * constant unnamed string with NVModuleID
	// RDC: [[MODULE_ID_GLOBAL:@.*]] = private unnamed_addr constant			// RDC: [[MODULE_ID_GLOBAL:@.*]] = private unnamed_addr constant
	// RDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32			// RDC-SAME: c"[[MODULE_ID:.+]]\00", section "__nv_module_id", align 32
	// * Make sure our constructor was added to global ctor list.			// * Make sure our constructor was added to global ctor list.
	// ALL: @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor			// ALL: @llvm.global_ctors = appending global {{.*}}@__[[PREFIX]]_module_ctor
	// * In separate mode we also register a destructor.			// * In separate mode we also register a destructor.
	// NORDC: @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor			// NORDC: @llvm.global_dtors = appending global {{.*}}@__[[PREFIX]]_module_dtor
	// * Alias to global symbol containing the NVModuleID.			// * Alias to global symbol containing the NVModuleID.
	// RDC: @__fatbinwrap[[MODULE_ID]] = alias { i32, i32, i8, i8 }			// RDC: @__fatbinwrap[[MODULE_ID]] = alias { i32, i32, i8, i8 }
	// RDC-SAME: { i32, i32, i8, i8 }* @__cuda_fatbin_wrapper			// RDC-SAME: { i32, i32, i8, i8 }* @__[[PREFIX]]_fatbin_wrapper

	// Test that we build the correct number of calls to cudaSetupArgument followed			// Test that we build the correct number of calls to cudaSetupArgument followed
	// by a call to cudaLaunch.			// by a call to cudaLaunch.

	// ALL: define{{.*}}kernelfunc			// ALL: define{{.*}}kernelfunc
	// ALL: call{{.*}}cudaSetupArgument			// ALL: call{{.*}}[[PREFIX]]SetupArgument
	// ALL: call{{.*}}cudaSetupArgument			// ALL: call{{.*}}[[PREFIX]]SetupArgument
	// ALL: call{{.*}}cudaSetupArgument			// ALL: call{{.*}}[[PREFIX]]SetupArgument
	// ALL: call{{.*}}cudaLaunch			// ALL: call{{.*}}[[PREFIX]]Launch
	__global__ void kernelfunc(int i, int j, int k) {}			__global__ void kernelfunc(int i, int j, int k) {}

	// Test that we've built correct kernel launch sequence.			// Test that we've built correct kernel launch sequence.
	// ALL: define{{.*}}hostfunc			// ALL: define{{.*}}hostfunc
	// ALL: call{{.*}}cudaConfigureCall			// ALL: call{{.*}}[[PREFIX]]ConfigureCall
	// ALL: call{{.*}}kernelfunc			// ALL: call{{.*}}kernelfunc
	void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }			void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); }
	#endif			#endif

	// Test that we've built a function to register kernels and global vars.			// Test that we've built a function to register kernels and global vars.
	// ALL: define internal void @__cuda_register_globals			// ALL: define internal void @__[[PREFIX]]_register_globals
	// ALL: call{{.}}cudaRegisterFunction(i8* %0, {{.*}}kernelfunc			// ALL: call{{.}}[[PREFIX]]RegisterFunction(i8* %0, {{.*}}kernelfunc
	// ALL-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}device_var{{.}}i32 0, i32 4, i32 0, i32 0			// ALL-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}device_var{{.}}i32 0, i32 4, i32 0, i32 0
	// ALL-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}constant_var{{.}}i32 0, i32 4, i32 1, i32 0			// ALL-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}constant_var{{.}}i32 0, i32 4, i32 1, i32 0
	// ALL-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}ext_device_var{{.}}i32 1, i32 4, i32 0, i32 0			// ALL-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}ext_device_var{{.}}i32 1, i32 4, i32 0, i32 0
	// ALL-DAG: call{{.}}cudaRegisterVar(i8* %0, {{.}}ext_constant_var{{.}}i32 1, i32 4, i32 1, i32 0			// ALL-DAG: call{{.}}[[PREFIX]]RegisterVar(i8* %0, {{.}}ext_constant_var{{.}}i32 1, i32 4, i32 1, i32 0
	// ALL: ret void			// ALL: ret void

	// Test that we've built a constructor.			// Test that we've built a constructor.
	// ALL: define internal void @__cuda_module_ctor			// ALL: define internal void @__[[PREFIX]]_module_ctor

	// In separate mode it calls __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)			// In separate mode it calls __[[PREFIX]]RegisterFatBinary(&__[[PREFIX]]_fatbin_wrapper)
	// NORDC: call{{.}}cudaRegisterFatBinary{{.}}__cuda_fatbin_wrapper			// NORDC: call{{.}}[[PREFIX]]RegisterFatBinary{{.}}__[[PREFIX]]_fatbin_wrapper
	// .. stores return value in __cuda_gpubin_handle			// .. stores return value in __[[PREFIX]]_gpubin_handle
	// NORDC-NEXT: store{{.*}}__cuda_gpubin_handle			// NORDC-NEXT: store{{.*}}__[[PREFIX]]_gpubin_handle
	// .. and then calls __cuda_register_globals			// .. and then calls __[[PREFIX]]_register_globals
	// NORDC-NEXT: call void @__cuda_register_globals			// NORDC-NEXT: call void @__[[PREFIX]]_register_globals

	// With relocatable device code we call __cudaRegisterLinkedBinary%NVModuleID%			// With relocatable device code we call __[[PREFIX]]RegisterLinkedBinary%NVModuleID%
	// RDC: call{{.*}}__cudaRegisterLinkedBinary[[MODULE_ID]](			// RDC: call{{.*}}__[[PREFIX]]RegisterLinkedBinary[[MODULE_ID]](
	// RDC-SAME: __cuda_register_globals, {{.*}}__cuda_fatbin_wrapper			// RDC-SAME: __[[PREFIX]]_register_globals, {{.*}}__[[PREFIX]]_fatbin_wrapper
	// RDC-SAME: [[MODULE_ID_GLOBAL]]			// RDC-SAME: [[MODULE_ID_GLOBAL]]

	// Test that we've created destructor.			// Test that we've created destructor.
	// NORDC: define internal void @__cuda_module_dtor			// NORDC: define internal void @__[[PREFIX]]_module_dtor
	// NORDC: load{{.*}}__cuda_gpubin_handle			// NORDC: load{{.*}}__[[PREFIX]]_gpubin_handle
	// NORDC-NEXT: call void @__cudaUnregisterFatBinary			// NORDC-NEXT: call void @__[[PREFIX]]UnregisterFatBinary

	// There should be no __cuda_register_globals if we have no			// There should be no __[[PREFIX]]_register_globals if we have no
	// device-side globals, but we still need to register GPU binary.			// device-side globals, but we still need to register GPU binary.
	// Skip GPU binary string first.			// Skip GPU binary string first.
	// NOGLOBALS: @0 = private unnamed_addr constant{{.*}}			// NOGLOBALS: @0 = private unnamed_addr constant{{.*}}
	// NOGLOBALS-NOT: define internal void @__cuda_register_globals			// NOGLOBALS-NOT: define internal void @__{{.*}}_register_globals
	// NOGLOBALS: define internal void @__cuda_module_ctor			// NOGLOBALS: define internal void @__[[PREFIX:.*]]_module_ctor
	// NOGLOBALS: call{{.}}cudaRegisterFatBinary{{.}}__cuda_fatbin_wrapper			// NOGLOBALS: call{{.}}[[PREFIX]]RegisterFatBinary{{.}}__[[PREFIX]]_fatbin_wrapper
	// NOGLOBALS-NOT: call void @__cuda_register_globals			// NOGLOBALS-NOT: call void @__[[PREFIX]]_register_globals
	// NOGLOBALS: define internal void @__cuda_module_dtor			// NOGLOBALS: define internal void @__[[PREFIX]]_module_dtor
	// NOGLOBALS: call void @__cudaUnregisterFatBinary			// NOGLOBALS: call void @__[[PREFIX]]UnregisterFatBinary

	// There should be no constructors/destructors if we have no GPU binary.			// There should be no constructors/destructors if we have no GPU binary.
	// NOGPUBIN-NOT: define internal void @__cuda_register_globals			// NOGPUBIN-NOT: define internal void @__[[PREFIX]]_register_globals
	// NOGPUBIN-NOT: define internal void @__cuda_module_ctor			// NOGPUBIN-NOT: define internal void @__[[PREFIX]]_module_ctor
	// NOGPUBIN-NOT: define internal void @__cuda_module_dtor			// NOGPUBIN-NOT: define internal void @__[[PREFIX]]_module_dtor

cfe/trunk/test/CodeGenCUDA/kernel-call.cu

	// RUN: %clang_cc1 -emit-llvm %s -o - \| FileCheck %s			// RUN: %clang_cc1 -emit-llvm %s -o - \| FileCheck %s --check-prefixes=CUDA,CHECK
				// RUN: %clang_cc1 -x hip -emit-llvm %s -o - \| FileCheck %s --check-prefixes=HIP,CHECK


	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

				// CHECK-LABEL: define void @_Z2g1i(i32 %x)
				// HIP: call{{.*}}hipSetupArgument
				// HIP: call{{.*}}hipLaunchByPtr
				// CUDA: call{{.*}}cudaSetupArgument
				// CUDA: call{{.*}}cudaLaunch
	__global__ void g1(int x) {}			__global__ void g1(int x) {}

				// CHECK-LABEL: define i32 @main
	int main(void) {			int main(void) {
	// CHECK: call{{.*}}cudaConfigureCall			// HIP: call{{.*}}hipConfigureCall
				// CUDA: call{{.*}}cudaConfigureCall
	// CHECK: icmp			// CHECK: icmp
	// CHECK: br			// CHECK: br
	// CHECK: call{{.*}}g1			// CHECK: call{{.*}}g1
	g1<<<1, 1>>>(42);			g1<<<1, 1>>>(42);
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Add hip input kind and codegen for kernel launchingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 143849

cfe/trunk/include/clang/Basic/IdentifierTable.h

cfe/trunk/include/clang/Basic/LangOptions.def

cfe/trunk/include/clang/Frontend/FrontendOptions.h

cfe/trunk/include/clang/Frontend/LangStandards.def

cfe/trunk/lib/CodeGen/CGCUDANV.cpp

cfe/trunk/lib/Frontend/CompilerInvocation.cpp

cfe/trunk/lib/Frontend/FrontendActions.cpp

cfe/trunk/lib/Frontend/InitPreprocessor.cpp

cfe/trunk/lib/Sema/SemaCUDA.cpp

cfe/trunk/lib/Sema/SemaDecl.cpp

cfe/trunk/test/CodeGenCUDA/Inputs/cuda.h

cfe/trunk/test/CodeGenCUDA/device-stub.cu

cfe/trunk/test/CodeGenCUDA/kernel-call.cu

[HIP] Add hip input kind and codegen for kernel launching
ClosedPublic