This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
include/clang/Basic/
-
clang/
-
Basic/
-
LangOptions.def
-
lib/
-
Basic/
-
Targets.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
InitPreprocessor.cpp
-
Sema/
-
SemaCUDA.cpp
-
test/SemaCUDA/
-
SemaCUDA/
-
function-target.cu

Differential D6457

CUDA host device code with two code paths
ClosedPublic

Authored by jpienaar on Nov 30 2014, 9:45 AM.

Download Raw Diff

Details

Reviewers

eliben
pcc
rnk

Commits

rGbbc017851815: CUDA host device code with two code paths
rC223271: CUDA host device code with two code paths
rL223271: CUDA host device code with two code paths

Summary

Allow CUDA host device functions with two code paths using CUDA_ARCH to differentiate between code path being compiled.

For example,

__host__ __device__ void host_device_function(void) {
#ifdef __CUDA_ARCH__
  device_only_function();
#else
  host_only_function();
#endif
}

Diff Detail

Repository: rL LLVM

Event Timeline

jpienaar updated this revision to Diff 16752.Nov 30 2014, 9:45 AM

jpienaar retitled this revision from to CUDA host device code with two code paths.

jpienaar updated this object.

jpienaar edited the test plan for this revision. (Show Details)

jpienaar added reviewers: pcc, eliben.

jpienaar added a subscriber: Unknown Object (MLST).

Can you remind me what the CUDA compilation model is currently? My memory was that the clang driver was eventually going to launch two -cc1 actions, one for device and one for host, presumably with different flags. I would expect that lib/Frontend/InitPreprocessor.cpp would define this macro when targeting the device.

If we're doing a single compilation with a fat object approach, we may need to do something weird to get this right. =/

I think your memory is correct (at least thats what I thought too). And
yes, the macro would be defined externally when targeting the device. At
that point we can remove this check for macro definition as we'd then be
able to check the flags directly. So I do see this as a temporary solution
which disrupts as little as possible.

Is the concern here with how the test is written? (i.e., the test
explicitly sets this macro which will in future be set by the compiler
itself). In which case it could be changed to

host device void hd1(void) {
#ifdef CUDA_ARCH

hd1d();
hd1h(); // expected-error {{no matching function}}

#else

hd1d(); // expected-error {{no matching function}}
hd1h();

#endif

hd1hd();
hd1g<<<1, 1>>>(); // expected-error {{reference to __global__

function 'hd1g' in host device function}}
}

There is already a flag for this, -fcuda-is-device. We should make that flag a LangOption and use it for this check.

In D6457#7, @pcc wrote:

There is already a flag for this, -fcuda-is-device. We should make that flag a LangOption and use it for this check.

Right, this seems like the correct approach. Is there concern that it would be too disruptive at this stage to define __CUDA_ARCH__?

Creating CUDAIsDevice as language options (using fcuda-is-device frontend option). Using this flag instead of definition of CUDA_ARCH to determine whether host/device compilation is occurring.

Thanks, that's a good idea. Is this close to what you in mind?

rnk added inline comments.Dec 3 2014, 10:59 AM

include/clang/Basic/LangOptions.def
160 ↗	(On Diff #16862)	I think you just want regular LANGOPT, given the description of BENIGN_LANGOPT: // BENIGN_LANGOPT: for options that don't affect the construction of the AST in // any way (that is, the value can be different between an implicit module // and the user of that module).
lib/Basic/Targets.cpp
1381 ↗	(On Diff #16862)	"... the NVPTX backend." maybe?
lib/Frontend/InitPreprocessor.cpp
873–878 ↗	(On Diff #16862)	I guess this definition is intended to satisfy targeting hypothetical non-NVPTX targets from CUDA. OK.
lib/Sema/SemaCUDA.cpp
91 ↗	(On Diff #16862)	The predominant style in clang for eliding text from a quotation is to use "text [...] text". I'm not trying to be pedantic, this actually threw me off when I read it. :) I guess this applies equally to the CUDA reference quotes above.

Made recommended changes.

lgtm

This revision is now accepted and ready to land.Dec 3 2014, 11:23 AM

Thanks (and the comment about text elision was helpful :) ). I don't have commit access, could you assist me in committing this? Thanks

Closed by commit rL223271 (authored by @rnk).

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Basic/

LangOptions.def

1 line

lib/

Basic/

Targets.cpp

47 lines

Frontend/

CompilerInvocation.cpp

3 lines

InitPreprocessor.cpp

7 lines

Sema/

SemaCUDA.cpp

19 lines

test/

SemaCUDA/

function-target.cu

35 lines

Diff 16890

cfe/trunk/include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 151 Lines • ▼ Show 20 Lines
	LANGOPT(ShortEnums , 1, 0, "short enum types")			LANGOPT(ShortEnums , 1, 0, "short enum types")

	LANGOPT(OpenCL , 1, 0, "OpenCL")			LANGOPT(OpenCL , 1, 0, "OpenCL")
	LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")			LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")
	LANGOPT(NativeHalfType , 1, 0, "Native half type support")			LANGOPT(NativeHalfType , 1, 0, "Native half type support")
	LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")			LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")
	LANGOPT(CUDA , 1, 0, "CUDA")			LANGOPT(CUDA , 1, 0, "CUDA")
	LANGOPT(OpenMP , 1, 0, "OpenMP support")			LANGOPT(OpenMP , 1, 0, "OpenMP support")
				LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")

	LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")			LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")
	LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")			LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")
	BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")			BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")
	BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")			BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")
	BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")			BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")
	BENIGN_LANGOPT(DumpVTableLayouts , 1, 0, "dumping the layouts of emitted vtables")			BENIGN_LANGOPT(DumpVTableLayouts , 1, 0, "dumping the layouts of emitted vtables")
	LANGOPT(NoConstantCFStrings , 1, 0, "no constant CoreFoundation strings")			LANGOPT(NoConstantCFStrings , 1, 0, "no constant CoreFoundation strings")
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

cfe/trunk/lib/Basic/Targets.cpp

Show First 20 Lines • Show All 1,371 Lines • ▼ Show 20 Lines	static const unsigned NVPTXAddrSpaceMap[] = {
0, // opencl_generic		0, // opencl_generic
1, // cuda_device		1, // cuda_device
4, // cuda_constant		4, // cuda_constant
3, // cuda_shared		3, // cuda_shared
};		};
class NVPTXTargetInfo : public TargetInfo {		class NVPTXTargetInfo : public TargetInfo {
static const char * const GCCRegNames[];		static const char * const GCCRegNames[];
static const Builtin::Info BuiltinInfo[];		static const Builtin::Info BuiltinInfo[];

		// The GPU profiles supported by the NVPTX backend
		enum GPUKind {
		GK_NONE,
		GK_SM20,
		GK_SM21,
		GK_SM30,
		GK_SM35,
		} GPU;

public:		public:
NVPTXTargetInfo(const llvm::Triple &Triple) : TargetInfo(Triple) {		NVPTXTargetInfo(const llvm::Triple &Triple) : TargetInfo(Triple) {
BigEndian = false;		BigEndian = false;
TLSSupported = false;		TLSSupported = false;
LongWidth = LongAlign = 64;		LongWidth = LongAlign = 64;
AddrSpaceMap = &NVPTXAddrSpaceMap;		AddrSpaceMap = &NVPTXAddrSpaceMap;
UseAddrSpaceMapMangling = true;		UseAddrSpaceMapMangling = true;
// Define available target features		// Define available target features
// These must be defined in sorted order!		// These must be defined in sorted order!
NoAsmVariants = true;		NoAsmVariants = true;
		// Set the default GPU to sm20
		GPU = GK_SM20;
}		}
void getTargetDefines(const LangOptions &Opts,		void getTargetDefines(const LangOptions &Opts,
MacroBuilder &Builder) const override {		MacroBuilder &Builder) const override {
Builder.defineMacro("__PTX__");		Builder.defineMacro("__PTX__");
Builder.defineMacro("__NVPTX__");		Builder.defineMacro("__NVPTX__");
		if (Opts.CUDAIsDevice) {
		// Set __CUDA_ARCH__ for the GPU specified.
		std::string CUDAArchCode;
		switch (GPU) {
		case GK_SM20:
		CUDAArchCode = "200";
		break;
		case GK_SM21:
		CUDAArchCode = "210";
		break;
		case GK_SM30:
		CUDAArchCode = "300";
		break;
		case GK_SM35:
		CUDAArchCode = "350";
		break;
		default:
		llvm_unreachable("Unhandled target CPU");
		}
		Builder.defineMacro("__CUDA_ARCH__", CUDAArchCode);
		}
}		}
void getTargetBuiltins(const Builtin::Info *&Records,		void getTargetBuiltins(const Builtin::Info *&Records,
unsigned &NumRecords) const override {		unsigned &NumRecords) const override {
Records = BuiltinInfo;		Records = BuiltinInfo;
NumRecords = clang::NVPTX::LastTSBuiltin-Builtin::FirstTSBuiltin;		NumRecords = clang::NVPTX::LastTSBuiltin-Builtin::FirstTSBuiltin;
}		}
bool hasFeature(StringRef Feature) const override {		bool hasFeature(StringRef Feature) const override {
return Feature == "ptx" \|\| Feature == "nvptx";		return Feature == "ptx" \|\| Feature == "nvptx";
Show All 26 Lines	const char *getClobbers() const override {
// FIXME: Is this really right?		// FIXME: Is this really right?
return "";		return "";
}		}
BuiltinVaListKind getBuiltinVaListKind() const override {		BuiltinVaListKind getBuiltinVaListKind() const override {
// FIXME: implement		// FIXME: implement
return TargetInfo::CharPtrBuiltinVaList;		return TargetInfo::CharPtrBuiltinVaList;
}		}
bool setCPU(const std::string &Name) override {		bool setCPU(const std::string &Name) override {
bool Valid = llvm::StringSwitch<bool>(Name)		GPU = llvm::StringSwitch<GPUKind>(Name)
.Case("sm_20", true)		.Case("sm_20", GK_SM20)
.Case("sm_21", true)		.Case("sm_21", GK_SM21)
.Case("sm_30", true)		.Case("sm_30", GK_SM30)
.Case("sm_35", true)		.Case("sm_35", GK_SM35)
.Default(false);		.Default(GK_NONE);

return Valid;		return GPU != GK_NONE;
}		}
};		};

const Builtin::Info NVPTXTargetInfo::BuiltinInfo[] = {		const Builtin::Info NVPTXTargetInfo::BuiltinInfo[] = {
#define BUILTIN(ID, TYPE, ATTRS) { #ID, TYPE, ATTRS, 0, ALL_LANGUAGES },		#define BUILTIN(ID, TYPE, ATTRS) { #ID, TYPE, ATTRS, 0, ALL_LANGUAGES },
#define LIBBUILTIN(ID, TYPE, ATTRS, HEADER) { #ID, TYPE, ATTRS, HEADER,\		#define LIBBUILTIN(ID, TYPE, ATTRS, HEADER) { #ID, TYPE, ATTRS, HEADER,\
ALL_LANGUAGES },		ALL_LANGUAGES },
#include "clang/Basic/BuiltinsNVPTX.def"		#include "clang/Basic/BuiltinsNVPTX.def"
▲ Show 20 Lines • Show All 5,235 Lines • Show Last 20 Lines

cfe/trunk/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 1,343 Lines • ▼ Show 20 Lines	#include "clang/Frontend/LangStandards.def"
// '-fgnu-keywords'. Clang conflates the two for simplicity under the single		// '-fgnu-keywords'. Clang conflates the two for simplicity under the single
// name, as it doesn't seem a useful distinction.		// name, as it doesn't seem a useful distinction.
Opts.GNUKeywords = Args.hasFlag(OPT_fgnu_keywords, OPT_fno_gnu_keywords,		Opts.GNUKeywords = Args.hasFlag(OPT_fgnu_keywords, OPT_fno_gnu_keywords,
Opts.GNUKeywords);		Opts.GNUKeywords);

if (Args.hasArg(OPT_fno_operator_names))		if (Args.hasArg(OPT_fno_operator_names))
Opts.CXXOperatorNames = 0;		Opts.CXXOperatorNames = 0;

		if (Args.hasArg(OPT_fcuda_is_device))
		Opts.CUDAIsDevice = 1;

if (Opts.ObjC1) {		if (Opts.ObjC1) {
if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {		if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {
StringRef value = arg->getValue();		StringRef value = arg->getValue();
if (Opts.ObjCRuntime.tryParse(value))		if (Opts.ObjCRuntime.tryParse(value))
Diags.Report(diag::err_drv_unknown_objc_runtime) << value;		Diags.Report(diag::err_drv_unknown_objc_runtime) << value;
}		}

if (Args.hasArg(OPT_fobjc_gc_only))		if (Args.hasArg(OPT_fobjc_gc_only))
▲ Show 20 Lines • Show All 701 Lines • Show Last 20 Lines

cfe/trunk/lib/Frontend/InitPreprocessor.cpp

Show First 20 Lines • Show All 864 Lines • ▼ Show 20 Lines	if (LangOpts.OpenMP) {
// OpenMP 2.2:		// OpenMP 2.2:
// In implementations that support a preprocessor, the _OPENMP		// In implementations that support a preprocessor, the _OPENMP
// macro name is defined to have the decimal value yyyymm where		// macro name is defined to have the decimal value yyyymm where
// yyyy and mm are the year and the month designations of the		// yyyy and mm are the year and the month designations of the
// version of the OpenMP API that the implementation support.		// version of the OpenMP API that the implementation support.
Builder.defineMacro("_OPENMP", "201307");		Builder.defineMacro("_OPENMP", "201307");
}		}

		// CUDA device path compilaton
		if (LangOpts.CUDAIsDevice) {
		// The CUDA_ARCH value is set for the GPU target specified in the NVPTX
		// backend's target defines.
		Builder.defineMacro("__CUDA_ARCH__");
		}

// Get other target #defines.		// Get other target #defines.
TI.getTargetDefines(LangOpts, Builder);		TI.getTargetDefines(LangOpts, Builder);
}		}

/// InitializePreprocessor - Initialize the preprocessor getting it and the		/// InitializePreprocessor - Initialize the preprocessor getting it and the
/// environment ready to process a single file. This returns true on error.		/// environment ready to process a single file. This returns true on error.
///		///
void clang::InitializePreprocessor(Preprocessor &PP,		void clang::InitializePreprocessor(Preprocessor &PP,
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

cfe/trunk/lib/Sema/SemaCUDA.cpp

	//===--- SemaCUDA.cpp - Semantic Analysis for CUDA constructs -------------===//			//===--- SemaCUDA.cpp - Semantic Analysis for CUDA constructs -------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// \file			/// \file
	/// \brief This file implements semantic analysis for CUDA constructs.			/// \brief This file implements semantic analysis for CUDA constructs.
	///			///
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "clang/Sema/Sema.h"			#include "clang/Sema/Sema.h"
	#include "clang/AST/ASTContext.h"			#include "clang/AST/ASTContext.h"
	#include "clang/AST/Decl.h"			#include "clang/AST/Decl.h"
				#include "clang/Lex/Preprocessor.h"
	#include "clang/Sema/SemaDiagnostic.h"			#include "clang/Sema/SemaDiagnostic.h"
	#include "llvm/ADT/Optional.h"			#include "llvm/ADT/Optional.h"
	#include "llvm/ADT/SmallVector.h"			#include "llvm/ADT/SmallVector.h"
	using namespace clang;			using namespace clang;

	ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,			ExprResult Sema::ActOnCUDAExecConfigExpr(Scope *S, SourceLocation LLLLoc,
	MultiExprArg ExecConfig,			MultiExprArg ExecConfig,
	SourceLocation GGGLoc) {			SourceLocation GGGLoc) {
	▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines

	bool Sema::CheckCUDATarget(CUDAFunctionTarget CallerTarget,			bool Sema::CheckCUDATarget(CUDAFunctionTarget CallerTarget,
	CUDAFunctionTarget CalleeTarget) {			CUDAFunctionTarget CalleeTarget) {
	// If one of the targets is invalid, the check always fails, no matter what			// If one of the targets is invalid, the check always fails, no matter what
	// the other target is.			// the other target is.
	if (CallerTarget == CFT_InvalidTarget \|\| CalleeTarget == CFT_InvalidTarget)			if (CallerTarget == CFT_InvalidTarget \|\| CalleeTarget == CFT_InvalidTarget)
	return true;			return true;

	// CUDA B.1.1 "The __device__ qualifier declares a function that is...			// CUDA B.1.1 "The __device__ qualifier declares a function that is [...]
	// Callable from the device only."			// Callable from the device only."
	if (CallerTarget == CFT_Host && CalleeTarget == CFT_Device)			if (CallerTarget == CFT_Host && CalleeTarget == CFT_Device)
	return true;			return true;

	// CUDA B.1.2 "The __global__ qualifier declares a function that is...			// CUDA B.1.2 "The __global__ qualifier declares a function that is [...]
	// Callable from the host only."			// Callable from the host only."
	// CUDA B.1.3 "The __host__ qualifier declares a function that is...			// CUDA B.1.3 "The __host__ qualifier declares a function that is [...]
	// Callable from the host only."			// Callable from the host only."
	if ((CallerTarget == CFT_Device \|\| CallerTarget == CFT_Global) &&			if ((CallerTarget == CFT_Device \|\| CallerTarget == CFT_Global) &&
	(CalleeTarget == CFT_Host \|\| CalleeTarget == CFT_Global))			(CalleeTarget == CFT_Host \|\| CalleeTarget == CFT_Global))
	return true;			return true;

	if (CallerTarget == CFT_HostDevice && CalleeTarget != CFT_HostDevice)			// CUDA B.1.3 "The __device__ and __host__ qualifiers can be used together
				// however, in which case the function is compiled for both the host and the
				// device. The __CUDA_ARCH__ macro [...] can be used to differentiate code
				// paths between host and device."
				bool InDeviceMode = getLangOpts().CUDAIsDevice;
				if (CallerTarget == CFT_HostDevice && CalleeTarget != CFT_HostDevice) {
				if ((InDeviceMode && CalleeTarget != CFT_Device) \|\|
				(!InDeviceMode && CalleeTarget != CFT_Host))
	return true;			return true;
				}

	return false;			return false;
	}			}

	/// When an implicitly-declared special member has to invoke more than one			/// When an implicitly-declared special member has to invoke more than one
	/// base/field special member, conflicts may occur in the targets of these			/// base/field special member, conflicts may occur in the targets of these
	/// members. For example, if one base's member __host__ and another's is			/// members. For example, if one base's member __host__ and another's is
	/// __device__, it's a conflict.			/// __device__, it's a conflict.
	▲ Show 20 Lines • Show All 157 Lines • Show Last 20 Lines

cfe/trunk/test/SemaCUDA/function-target.cu

	// RUN: %clang_cc1 -fsyntax-only -verify %s			// RUN: %clang_cc1 -fsyntax-only -verify %s
				// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	__host__ void h1h(void);			__host__ void h1h(void);
	__device__ void h1d(void); // expected-note {{candidate function not viable: call to __device__ function from __host__ function}}			__device__ void h1d(void); // expected-note {{candidate function not viable: call to __device__ function from __host__ function}}
	__host__ __device__ void h1hd(void);			__host__ __device__ void h1hd(void);
	__global__ void h1g(void);			__global__ void h1g(void);

	Show All 16 Lines

	__device__ void d1(void) {			__device__ void d1(void) {
	d1h(); // expected-error {{no matching function}}			d1h(); // expected-error {{no matching function}}
	d1d();			d1d();
	d1hd();			d1hd();
	d1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'd1g' in __device__ function}}			d1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'd1g' in __device__ function}}
	}			}

	__host__ void hd1h(void); // expected-note {{candidate function not viable: call to __host__ function from __host__ __device__ function}}			// Expected 0-1 as in one of host/device side compilation it is an error, while
	__device__ void hd1d(void); // expected-note {{candidate function not viable: call to __device__ function from __host__ __device__ function}}			// not in the other
				__host__ void hd1h(void); // expected-note 0-1 {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
				__device__ void hd1d(void); // expected-note 0-1 {{candidate function not viable: call to __device__ function from __host__ __device__ function}}
				__host__ void hd1hg(void);
				__device__ void hd1dg(void);
				#ifdef __CUDA_ARCH__
				__host__ void hd1hig(void); // expected-note {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
				#else
				__device__ void hd1dig(void); // expected-note {{candidate function not viable: call to __device__ function from __host__ __device__ function}}
				#endif
	__host__ __device__ void hd1hd(void);			__host__ __device__ void hd1hd(void);
	__global__ void hd1g(void); // expected-note {{'hd1g' declared here}}			__global__ void hd1g(void); // expected-note {{'hd1g' declared here}}

	__host__ __device__ void hd1(void) {			__host__ __device__ void hd1(void) {
	hd1h(); // expected-error {{no matching function}}			// Expected 0-1 as in one of host/device side compilation it is an error,
	hd1d(); // expected-error {{no matching function}}			// while not in the other
				hd1d(); // expected-error 0-1 {{no matching function}}
				hd1h(); // expected-error 0-1 {{no matching function}}

				// No errors as guarded
				#ifdef __CUDA_ARCH__
				hd1d();
				#else
				hd1h();
				#endif

				// Errors as incorrectly guarded
				#ifndef __CUDA_ARCH__
				hd1dig(); // expected-error {{no matching function}}
				#else
				hd1hig(); // expected-error {{no matching function}}
				#endif

	hd1hd();			hd1hd();
	hd1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'hd1g' in __host__ __device__ function}}			hd1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'hd1g' in __host__ __device__ function}}
	}			}