This is an archive of the discontinued LLVM Phabricator instance.

CUDA: Add option to allow host device functions to call host functions
ClosedPublic

Authored by jpienaar on Feb 23 2015, 3:06 PM.

Download Raw Diff

Details

Reviewers

eliben
rnk

Summary

nvcc allows host device functions to call host functions with only a warning being produced (host device functions calling device functions is an error in nvcc). This nvcc feature (calling host functions from host device functions) is used by some existing GPU code. Add an option to clang to allow similar behavior. This does not affect code generation and trying to call a host function from the GPU is still an error. We are investigating a more complete solution that would avoid this but this is a first step to allow tools analyzing GPU code to accept the same code as nvcc does.

Diff Detail

Event Timeline

jpienaar updated this revision to Diff 20545.Feb 23 2015, 3:06 PM

jpienaar retitled this revision from to CUDA: Add option to allow host device functions to call host functions.

jpienaar updated this object.

jpienaar edited the test plan for this revision. (Show Details)

jpienaar added a reviewer: rnk.

jpienaar added subscribers: Unknown Object (MLST), eliben.

The use case here is more about getting a useful AST out even when the source-program contains errors, right? Can we handle this by intelligent error recovery instead? Nothing stops us from continuing the parse when we encounter this error and producing a partially invalid AST. Would that be sufficient?

That is sort of part of a disagreement I've been having with someone who uses this feature. He feels this is supported behavior and not a program with errors. So if there is code like:

void bar() {}
__host__ __device__ foo() { bar(); }

And if foo is never called from device then the program "makes sense", as you are never attempting to have host code executed on the GPU, and the compiled program runs as expected. Now this is a silly example, he is doing some template metaprogramming to generate kernels for both host and device which makes his use-case understandable. Using this patch the code we generate also runs correctly. So it isn't just for analysis as it useful in our code generation too.

Normally I would think this could be fixed by ifdef-guarding on CUDA_ARCH but if bar were to perform a templated kernel launch, which happens in this client's code, then that would not be allowed usage under nvcc.

Adding Art to the CC so he sees this....

eliben added inline comments.Feb 24 2015, 8:45 AM

include/clang/Driver/CC1Options.td
612	I think the word "allow" should be in the flag somewhere. How about: "fcuda-allow-host-calls-from-host-device" ? Has the word "allow" AND is shorter ;-)
test/SemaCUDA/function-target.cu
1	Is DTEST_HOST and DTEST_DEVICE really different from the reliance on CUDA_ARCH that was there before? I think this test is getting too complex - maybe it's worthwhile splitting the HD parts to a separate test file

I'm surprised this change doesn't break the cuda codegen pipeline, because there aren't any changes to CodeGen in this patch. This is specifically relaxing the case of a host+device function calling host while in device mode. It's not actually possible to codegen this function, right? Is codegen already set up to compile this case to runtime error?

include/clang/Driver/CC1Options.td
612	+1 for the suggested name.

Changed option name, split test into two and added a codegen test.

rnk added inline comments.Feb 24 2015, 10:33 AM

test/CodeGenCUDA/host-device-calls-host.cu
22	I think this is a more interesting test case: extern "C" { void host_function() {} __host__ __device__ void hd_function(bool b) { if (b) host_function(); } __device__ void device_function() { hd_function(false); } } It actually tests emission of the bogus call, even though it can never occur in practice. What should clang do for that?

In D7841#129039, @rnk wrote:

I'm surprised this change doesn't break the cuda codegen pipeline, because there aren't any changes to CodeGen in this patch. This is specifically relaxing the case of a host+device function calling host while in device mode. It's not actually possible to codegen this function, right? Is codegen already set up to compile this case to runtime error?

Yes, what happens is that the host function becomes a declaration in the generated LLVM IR and is treated as an extern function in the generated PTX. If a call on the device from host device function to a host function were possible then it would result in a compilation error when the PTX gets compiled at runtime.

eliben added inline comments.Feb 24 2015, 11:07 AM

include/clang/Basic/DiagnosticSemaKinds.td
6070	Update this name to the new one for consistency

eliben added inline comments.Feb 24 2015, 11:13 AM

test/SemaCUDA/function-target-hd.cu
4	Much better thanks. Just a small nit: add a comment that explains the various permutations of CUDA_ARCH and TEST_WARN_HD to make this test file more readable a year from now

jpienaar added inline comments.Feb 24 2015, 12:40 PM

test/CodeGenCUDA/host-device-calls-host.cu
22	Both nvcc and clang (with this patch) accepts this and the resulting code executes without errors. clang should still warn that this can cause a runtime failure as there is no call-site analysis performed.

lgtm

One other mechanism you can consider is a default-error warning. We mostly added this mechanism so that we could emit errors by default, but suppress them in system headers or other places. However, I think this mostly confuses end users who have to pass things like -Wno-host-calls-from-host-device in order to silence something that looks like an error, not a warning.

test/CodeGenCUDA/host-device-calls-host.cu
22	Exciting. =D I think throwing this example into the IRgen test suite is nice because it's a good representative edge case.

This revision is now accepted and ready to land.Feb 24 2015, 1:04 PM

Updated the warning message's name and added a description for the permutations in the host device Sema test.

jpienaar added inline comments.Feb 24 2015, 1:31 PM

include/clang/Basic/DiagnosticSemaKinds.td
6070	Done.
test/CodeGenCUDA/host-device-calls-host.cu
22	Done.
test/SemaCUDA/function-target-hd.cu
4	Done.

lgtm

jpienaar closed this revision.Feb 24 2015, 1:49 PM

Revision Contents

Path

Size

include/

clang/

Basic/

DiagnosticSemaKinds.td

3 lines

LangOptions.def

1 line

Driver/

CC1Options.td

3 lines

lib/

Frontend/

CompilerInvocation.cpp

5 lines

Sema/

SemaCUDA.cpp

16 lines

test/

CodeGenCUDA/

host-device-calls-host.cu

32 lines

SemaCUDA/

function-target-hd.cu

71 lines

function-target.cu

38 lines

Diff 20620

include/clang/Basic/DiagnosticSemaKinds.td

Context not available.
	def err_ref_bad_target : Error<	def err_ref_bad_target : Error<
	"reference to %select{__device__\|__global__\|__host__\|__host__ __device__}0 "	"reference to %select{__device__\|__global__\|__host__\|__host__ __device__}0 "
	"function %1 in %select{__device__\|__global__\|__host__\|__host__ __device__}2 function">;	"function %1 in %select{__device__\|__global__\|__host__\|__host__ __device__}2 function">;
		def warn_host_calls_from_host_device : Warning<
		elibenUnsubmitted Not Done Reply Inline Actions Update this name to the new one for consistency eliben: Update this name to the new one for consistency
		jpienaarAuthorUnsubmitted Not Done Reply Inline Actions Done. jpienaar: Done.
		"calling __host__ function %0 from __host__ __device__ function %1 can lead to runtime errors">,
		InGroup<CudaCompat>;

	def warn_non_pod_vararg_with_format_string : Warning<	def warn_non_pod_vararg_with_format_string : Warning<
	"cannot pass %select{non-POD\|non-trivial}0 object of type %1 to variadic "	"cannot pass %select{non-POD\|non-trivial}0 object of type %1 to variadic "
Context not available.

include/clang/Basic/LangOptions.def

Context not available.
	LANGOPT(CUDA , 1, 0, "CUDA")	LANGOPT(CUDA , 1, 0, "CUDA")
	LANGOPT(OpenMP , 1, 0, "OpenMP support")	LANGOPT(OpenMP , 1, 0, "OpenMP support")
	LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")	LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")
		LANGOPT(CUDAAllowHostCallsFromHostDevice, 1, 0, "Allow host device functions to call host functions")

	LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")	LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")
	LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")	LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")
Context not available.

include/clang/Driver/CC1Options.td

Context not available.

	def fcuda_is_device : Flag<["-"], "fcuda-is-device">,	def fcuda_is_device : Flag<["-"], "fcuda-is-device">,
	HelpText<"Generate code for CUDA device">;	HelpText<"Generate code for CUDA device">;
		def fcuda_allow_host_calls_from_host_device : Flag<["-"],
		"fcuda-allow-host-calls-from-host-device">,
		elibenUnsubmitted Not Done Reply Inline Actions I think the word "allow" should be in the flag somewhere. How about: "fcuda-allow-host-calls-from-host-device" ? Has the word "allow" AND is shorter ;-) eliben: I think the word "allow" should be in the flag somewhere. How about: "fcuda-allow-host-calls…
		rnkUnsubmitted Not Done Reply Inline Actions +1 for the suggested name. rnk: +1 for the suggested name.
		HelpText<"Allow host device functions to call host functions">;

	} // let Flags = [CC1Option]	} // let Flags = [CC1Option]

Context not available.

lib/Frontend/CompilerInvocation.cpp

Context not available.
	for (unsigned i = 0, e = checkers.size(); i != e; ++i)	for (unsigned i = 0, e = checkers.size(); i != e; ++i)
	Opts.CheckersControlList.push_back(std::make_pair(checkers[i], enable));	Opts.CheckersControlList.push_back(std::make_pair(checkers[i], enable));
	}	}

	// Go through the analyzer configuration options.	// Go through the analyzer configuration options.
	for (arg_iterator it = Args.filtered_begin(OPT_analyzer_config),	for (arg_iterator it = Args.filtered_begin(OPT_analyzer_config),
	ie = Args.filtered_end(); it != ie; ++it) {	ie = Args.filtered_end(); it != ie; ++it) {
Context not available.
	if (Args.hasArg(OPT_fcuda_is_device))	if (Args.hasArg(OPT_fcuda_is_device))
	Opts.CUDAIsDevice = 1;	Opts.CUDAIsDevice = 1;

		if (Args.hasArg(OPT_fcuda_allow_host_calls_from_host_device))
		Opts.CUDAAllowHostCallsFromHostDevice = 1;

	if (Opts.ObjC1) {	if (Opts.ObjC1) {
	if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {	if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {
	StringRef value = arg->getValue();	StringRef value = arg->getValue();
Context not available.

lib/Sema/SemaCUDA.cpp

Context not available.
	if (Caller->isImplicit()) return false;	if (Caller->isImplicit()) return false;

	bool InDeviceMode = getLangOpts().CUDAIsDevice;	bool InDeviceMode = getLangOpts().CUDAIsDevice;
	if ((InDeviceMode && CalleeTarget != CFT_Device) \|\|	if (!InDeviceMode && CalleeTarget != CFT_Host)
	(!InDeviceMode && CalleeTarget != CFT_Host))	return true;
		if (InDeviceMode && CalleeTarget != CFT_Device) {
		// Allow host device functions to call host functions if explicitly
		// requested.
		if (CalleeTarget == CFT_Host &&
		getLangOpts().CUDAAllowHostCallsFromHostDevice) {
		Diag(Caller->getLocation(),
		diag::warn_host_calls_from_host_device)
		<< Callee->getNameAsString() << Caller->getNameAsString();
		return false;
		}

	return true;	return true;
		}
	}	}

	return false;	return false;
Context not available.

test/CodeGenCUDA/host-device-calls-host.cu

				// RUN: %clang_cc1 %s -triple nvptx-unknown-unknown -fcuda-allow-host-calls-from-host-device -fcuda-is-device -Wno-cuda-compat -emit-llvm -o - \| FileCheck %s

				#include "Inputs/cuda.h"

				extern "C"
				void host_function() {}

				// CHECK-LABEL: define void @hd_function_a
				extern "C"
				__host__ __device__ void hd_function_a() {
				// CHECK: call void @host_function
				host_function();
				}

				// CHECK: declare void @host_function

				// CHECK-LABEL: define void @hd_function_b
				extern "C"
				__host__ __device__ void hd_function_b(bool b) { if (b) host_function(); }

				// CHECK-LABEL: define void @device_function_b
				extern "C"
				rnkUnsubmitted Not Done Reply Inline Actions I think this is a more interesting test case: extern "C" { void host_function() {} __host__ __device__ void hd_function(bool b) { if (b) host_function(); } __device__ void device_function() { hd_function(false); } } It actually tests emission of the bogus call, even though it can never occur in practice. What should clang do for that? rnk: I think this is a more interesting test case: extern "C" { void host_function() {}…
				jpienaarAuthorUnsubmitted Not Done Reply Inline Actions Both nvcc and clang (with this patch) accepts this and the resulting code executes without errors. clang should still warn that this can cause a runtime failure as there is no call-site analysis performed. jpienaar: Both nvcc and clang (with this patch) accepts this and the resulting code executes without…
				rnkUnsubmitted Not Done Reply Inline Actions Exciting. =D I think throwing this example into the IRgen test suite is nice because it's a good representative edge case. rnk: Exciting. =D I think throwing this example into the IRgen test suite is nice because it's a…
				jpienaarAuthorUnsubmitted Not Done Reply Inline Actions Done. jpienaar: Done.
				__device__ void device_function_b() { hd_function_b(false); }

				// CHECK-LABEL: define void @global_function
				extern "C"
				__global__ void global_function() {
				// CHECK: call void @device_function_b
				device_function_b();
				}

				// CHECK: !{{[0-9]+}} = !{void ()* @global_function, !"kernel", i32 1}

test/SemaCUDA/function-target-hd.cu

				// Test the Sema analysis of caller-callee relationships of host device
				// functions when compiling CUDA code. There are 4 permutations of this test as
				// host and device compilation are separate compilation passes, and clang has
				// an option to allow host calls from host device functions. __CUDA_ARCH__ is
				elibenUnsubmitted Not Done Reply Inline Actions Much better thanks. Just a small nit: add a comment that explains the various permutations of CUDA_ARCH and TEST_WARN_HD to make this test file more readable a year from now eliben: Much better thanks. Just a small nit: add a comment that explains the various permutations of…
				jpienaarAuthorUnsubmitted Not Done Reply Inline Actions Done. jpienaar: Done.
				// defined when compiling for the device and TEST_WARN_HD when host calls are
				// allowed from host device functions. So for example, if __CUDA_ARCH__ is
				// defined and TEST_WARN_HD is not then device compilation is happening but
				// host device functions are not allowed to call device functions.

				// RUN: %clang_cc1 -fsyntax-only -verify %s
				// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s
				// RUN: %clang_cc1 -fsyntax-only -fcuda-allow-host-calls-from-host-device -verify %s -DTEST_WARN_HD
				// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -fcuda-allow-host-calls-from-host-device -verify %s -DTEST_WARN_HD

				#include "Inputs/cuda.h"

				__host__ void hd1h(void);
				#if defined(__CUDA_ARCH__) && !defined(TEST_WARN_HD)
				// expected-note@-2 {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
				#endif
				__device__ void hd1d(void);
				#ifndef __CUDA_ARCH__
				// expected-note@-2 {{candidate function not viable: call to __device__ function from __host__ __device__ function}}
				#endif
				__host__ void hd1hg(void);
				__device__ void hd1dg(void);
				#ifdef __CUDA_ARCH__
				__host__ void hd1hig(void);
				#if !defined(TEST_WARN_HD)
				// expected-note@-2 {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
				#endif
				#else
				__device__ void hd1dig(void); // expected-note {{candidate function not viable: call to __device__ function from __host__ __device__ function}}
				#endif
				__host__ __device__ void hd1hd(void);
				__global__ void hd1g(void); // expected-note {{'hd1g' declared here}}

				__host__ __device__ void hd1(void) {
				#if defined(TEST_WARN_HD) && defined(__CUDA_ARCH__)
				// expected-warning@-2 {{calling __host__ function hd1h from __host__ __device__ function hd1}}
				// expected-warning@-3 {{calling __host__ function hd1hig from __host__ __device__ function hd1}}
				#endif
				hd1d();
				#ifndef __CUDA_ARCH__
				// expected-error@-2 {{no matching function}}
				#endif
				hd1h();
				#if defined(__CUDA_ARCH__) && !defined(TEST_WARN_HD)
				// expected-error@-2 {{no matching function}}
				#endif

				// No errors as guarded
				#ifdef __CUDA_ARCH__
				hd1d();
				#else
				hd1h();
				#endif

				// Errors as incorrectly guarded
				#ifndef __CUDA_ARCH__
				hd1dig(); // expected-error {{no matching function}}
				#else
				hd1hig();
				#ifndef TEST_WARN_HD
				// expected-error@-2 {{no matching function}}
				#endif
				#endif

				hd1hd();
				hd1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'hd1g' in __host__ __device__ function}}
				}

test/SemaCUDA/function-target.cu

Context not available.
	d1hd();	d1hd();
	d1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'd1g' in __device__ function}}	d1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'd1g' in __device__ function}}
	}	}

	// Expected 0-1 as in one of host/device side compilation it is an error, while
	// not in the other
	__host__ void hd1h(void); // expected-note 0-1 {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
	__device__ void hd1d(void); // expected-note 0-1 {{candidate function not viable: call to __device__ function from __host__ __device__ function}}
	__host__ void hd1hg(void);
	__device__ void hd1dg(void);
	#ifdef __CUDA_ARCH__
	__host__ void hd1hig(void); // expected-note {{candidate function not viable: call to __host__ function from __host__ __device__ function}}
	#else
	__device__ void hd1dig(void); // expected-note {{candidate function not viable: call to __device__ function from __host__ __device__ function}}
	#endif
	__host__ __device__ void hd1hd(void);
	__global__ void hd1g(void); // expected-note {{'hd1g' declared here}}

	__host__ __device__ void hd1(void) {
	// Expected 0-1 as in one of host/device side compilation it is an error,
	// while not in the other
	hd1d(); // expected-error 0-1 {{no matching function}}
	hd1h(); // expected-error 0-1 {{no matching function}}

	// No errors as guarded
	#ifdef __CUDA_ARCH__
	hd1d();
	#else
	hd1h();
	#endif

	// Errors as incorrectly guarded
	#ifndef __CUDA_ARCH__
	hd1dig(); // expected-error {{no matching function}}
	#else
	hd1hig(); // expected-error {{no matching function}}
	#endif

	hd1hd();
	hd1g<<<1, 1>>>(); // expected-error {{reference to __global__ function 'hd1g' in __host__ __device__ function}}
	}
Context not available.