This is an archive of the discontinued LLVM Phabricator instance.

Create a frontend flag to disable CUDA cross-target call checks
ClosedPublic

Authored by eliben on Apr 15 2015, 12:25 PM.

Download Raw Diff

Details

Reviewers

tra
jpienaar
rnk

Commits

rG4bdc50eccb1d: Create a frontend flag to disable CUDA cross-target call checks
rC235049: Create a frontend flag to disable CUDA cross-target call checks
rL235049: Create a frontend flag to disable CUDA cross-target call checks

Summary

For CUDA source, Sema checks that the targets of call expressions make sense (e.g. a host function can't call a device function).

Adding a flag that lets us skip this check. Motivation: for source-to-source translation tools that have to accept code that's not strictly kosher CUDA but is still accepted by nvcc. The source-to-source translation tool can then fix the code and leave calls that are semantically valid for the actual compilation stage.

Diff Detail

Event Timeline

eliben updated this revision to Diff 23794.Apr 15 2015, 12:25 PM

eliben retitled this revision from to Create a frontend flag to disable CUDA cross-target call checks.

eliben updated this object.

eliben edited the test plan for this revision. (Show Details)

eliben added reviewers: jpienaar, tra, rnk.

eliben added a subscriber: Unknown Object (MLST).

tra added inline comments.Apr 15 2015, 12:47 PM

lib/Sema/SemaCUDA.cpp
63–69	Do we really need to disable the check completely? Can it be limited to calls from host-device functions only? What's expected to happen if we do let host->device or device->host call through? Do we expect an error further down in the pipeline if/when we get to generate the code for the call? Or will we generate valid code which would cause runtime error if the call were to happen? In either case this seems to need a test case to make sure we do a sensible thing for the call that is not expected to work.

eliben added inline comments.Apr 15 2015, 1:08 PM

lib/Sema/SemaCUDA.cpp
63–69	Do we expect an error further down in the pipeline if/when we get to generate the code for the call? Yes, that sums it up, pretty much. This flag, as far as I'm concerned, exists for the purpose of source-rewriting tools. There's no codegen involved, and no IR being emitted. In our current CUDA pipeline, real problems in user code will fail in subsequents steps of the pipeline.

LGTM.
On a side note, do you still need fcuda_allow_host_calls_from_host_device ?

This revision is now accepted and ready to land.Apr 15 2015, 1:13 PM

In D9036#156680, @tra wrote:

LGTM.
On a side note, do you still need fcuda_allow_host_calls_from_host_device ?

Yes, unfortunately. This one is used by the compilation step itself, where Clang then emits a warning (instead of an error). This mimics nvcc's behavior. Some host device functions are only really called on the host side, but the compiler has no way of proving it statically.

The normal way we do this is to create a default-error warning, and then you can say -Wno-my-warning and disable the error. This also suppresses the error in system headers, which might make sense if you can't parse lots of CUDA headers yet, and just want to compile such impossible calls down to 'unreachable'. See for example -W[no-]c++11-narrowing.

I don't think it's very intuitive that -W flags impact errors, but this is the way we've done things for one-off diagnostics like this and I'd rather stay consistent with that until we come up with a better interface later. What do you think?

As discussed, this flag actually affects overload resolution results, so it should probably be a language option as it is now. Maybe we shouldn't be having this target check affect overload results, but that's another big issue that we don't need to run down right this instant.

lgtm

In D9036#156719, @rnk wrote:

As discussed, this flag actually affects overload resolution results, so it should probably be a language option as it is now. Maybe we shouldn't be having this target check affect overload results, but that's another big issue that we don't need to run down right this instant.

lgtm

Right. In the long run, I think that letting it affect overload resolution is actually beneficial if we want to consider proper overloading based on target attributes.

Closed by commit rL235049: Create a frontend flag to disable CUDA cross-target call checks (authored by eliben). · Explain WhyApr 15 2015, 3:30 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

clang/

Basic/

LangOptions.def

1 line

Driver/

CC1Options.td

3 lines

lib/

Frontend/

CompilerInvocation.cpp

3 lines

Sema/

SemaCUDA.cpp

5 lines

test/

SemaCUDA/

function-target-disabled-check.cu

26 lines

Diff 23794

include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	LANGOPT(OpenCL , 1, 0, "OpenCL")			LANGOPT(OpenCL , 1, 0, "OpenCL")
	LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")			LANGOPT(OpenCLVersion , 32, 0, "OpenCL version")
	LANGOPT(NativeHalfType , 1, 0, "Native half type support")			LANGOPT(NativeHalfType , 1, 0, "Native half type support")
	LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")			LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")
	LANGOPT(CUDA , 1, 0, "CUDA")			LANGOPT(CUDA , 1, 0, "CUDA")
	LANGOPT(OpenMP , 1, 0, "OpenMP support")			LANGOPT(OpenMP , 1, 0, "OpenMP support")
	LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")			LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")
	LANGOPT(CUDAAllowHostCallsFromHostDevice, 1, 0, "Allow host device functions to call host functions")			LANGOPT(CUDAAllowHostCallsFromHostDevice, 1, 0, "Allow host device functions to call host functions")
				LANGOPT(CUDADisableTargetCallChecks, 1, 0, "Disable checks for call targets (host, device, etc.)")

	LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")			LANGOPT(AssumeSaneOperatorNew , 1, 1, "implicit __attribute__((malloc)) for C++'s new operators")
	LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")			LANGOPT(SizedDeallocation , 1, 0, "enable sized deallocation functions")
	BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")			BENIGN_LANGOPT(ElideConstructors , 1, 1, "C++ copy constructor elision")
	BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")			BENIGN_LANGOPT(DumpRecordLayouts , 1, 0, "dumping the layout of IRgen'd records")
	BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")			BENIGN_LANGOPT(DumpRecordLayoutsSimple , 1, 0, "dumping the layout of IRgen'd records in a simple form")
	BENIGN_LANGOPT(DumpVTableLayouts , 1, 0, "dumping the layouts of emitted vtables")			BENIGN_LANGOPT(DumpVTableLayouts , 1, 0, "dumping the layouts of emitted vtables")
	LANGOPT(NoConstantCFStrings , 1, 0, "no constant CoreFoundation strings")			LANGOPT(NoConstantCFStrings , 1, 0, "no constant CoreFoundation strings")
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

include/clang/Driver/CC1Options.td

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	// CUDA Options			// CUDA Options
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	def fcuda_is_device : Flag<["-"], "fcuda-is-device">,			def fcuda_is_device : Flag<["-"], "fcuda-is-device">,
	HelpText<"Generate code for CUDA device">;			HelpText<"Generate code for CUDA device">;
	def fcuda_allow_host_calls_from_host_device : Flag<["-"],			def fcuda_allow_host_calls_from_host_device : Flag<["-"],
	"fcuda-allow-host-calls-from-host-device">,			"fcuda-allow-host-calls-from-host-device">,
	HelpText<"Allow host device functions to call host functions">;			HelpText<"Allow host device functions to call host functions">;
				def fcuda_disable_target_call_checks : Flag<["-"],
				"fcuda-disable-target-call-checks">,
				HelpText<"Disable all cross-target (host, device, etc.) call checks in CUDA">;

	} // let Flags = [CC1Option]			} // let Flags = [CC1Option]


	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// cc1as-only Options			// cc1as-only Options
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	Show All 23 Lines

lib/Frontend/CompilerInvocation.cpp

	Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
	Opts.CXXOperatorNames = 0;			Opts.CXXOperatorNames = 0;

	if (Args.hasArg(OPT_fcuda_is_device))			if (Args.hasArg(OPT_fcuda_is_device))
	Opts.CUDAIsDevice = 1;			Opts.CUDAIsDevice = 1;

	if (Args.hasArg(OPT_fcuda_allow_host_calls_from_host_device))			if (Args.hasArg(OPT_fcuda_allow_host_calls_from_host_device))
	Opts.CUDAAllowHostCallsFromHostDevice = 1;			Opts.CUDAAllowHostCallsFromHostDevice = 1;

				if (Args.hasArg(OPT_fcuda_disable_target_call_checks))
				Opts.CUDADisableTargetCallChecks = 1;

	if (Opts.ObjC1) {			if (Opts.ObjC1) {
	if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {			if (Arg *arg = Args.getLastArg(OPT_fobjc_runtime_EQ)) {
	StringRef value = arg->getValue();			StringRef value = arg->getValue();
	if (Opts.ObjCRuntime.tryParse(value))			if (Opts.ObjCRuntime.tryParse(value))
	Diags.Report(diag::err_drv_unknown_objc_runtime) << value;			Diags.Report(diag::err_drv_unknown_objc_runtime) << value;
	}			}

	if (Args.hasArg(OPT_fobjc_gc_only))			if (Args.hasArg(OPT_fobjc_gc_only))
	▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

lib/Sema/SemaCUDA.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (D->hasAttr<CUDADeviceAttr>()) {
// Some implicit declarations (like intrinsic functions) are not marked.		// Some implicit declarations (like intrinsic functions) are not marked.
// Set the most lenient target on them for maximal flexibility.		// Set the most lenient target on them for maximal flexibility.
return CFT_HostDevice;		return CFT_HostDevice;
}		}

return CFT_Host;		return CFT_Host;
}		}

bool Sema::CheckCUDATarget(const FunctionDecl *Caller,		bool Sema::CheckCUDATarget(const FunctionDecl *Caller,
const FunctionDecl *Callee) {		const FunctionDecl *Callee) {
		// The CUDADisableTargetCallChecks short-circuits this check: we assume all
		// cross-target calls are valid.
		if (getLangOpts().CUDADisableTargetCallChecks)
		return false;

		traUnsubmitted Not Done Reply Inline Actions Do we really need to disable the check completely? Can it be limited to calls from host-device functions only? What's expected to happen if we do let host->device or device->host call through? Do we expect an error further down in the pipeline if/when we get to generate the code for the call? Or will we generate valid code which would cause runtime error if the call were to happen? In either case this seems to need a test case to make sure we do a sensible thing for the call that is not expected to work. tra: Do we really need to disable the check completely? Can it be limited to calls from host-device…
		elibenAuthorUnsubmitted Not Done Reply Inline Actions Do we expect an error further down in the pipeline if/when we get to generate the code for the call? Yes, that sums it up, pretty much. This flag, as far as I'm concerned, exists for the purpose of source-rewriting tools. There's no codegen involved, and no IR being emitted. In our current CUDA pipeline, real problems in user code will fail in subsequents steps of the pipeline. eliben: > Do we expect an error further down in the pipeline if/when we get to generate the code for…
CUDAFunctionTarget CallerTarget = IdentifyCUDATarget(Caller),		CUDAFunctionTarget CallerTarget = IdentifyCUDATarget(Caller),
CalleeTarget = IdentifyCUDATarget(Callee);		CalleeTarget = IdentifyCUDATarget(Callee);

// If one of the targets is invalid, the check always fails, no matter what		// If one of the targets is invalid, the check always fails, no matter what
// the other target is.		// the other target is.
if (CallerTarget == CFT_InvalidTarget \|\| CalleeTarget == CFT_InvalidTarget)		if (CallerTarget == CFT_InvalidTarget \|\| CalleeTarget == CFT_InvalidTarget)
return true;		return true;

▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

test/SemaCUDA/function-target-disabled-check.cu

This file was added.

				// Test that we can disable cross-target call checks in Sema with the
				// -fcuda-disable-target-call-checks flag. Without this flag we'd get a bunch
				// of errors here, since there are invalid cross-target calls present.

				// RUN: %clang_cc1 -fsyntax-only -verify %s -fcuda-disable-target-call-checks
				// RUN: %clang_cc1 -fsyntax-only -fcuda-is-device -verify %s -fcuda-disable-target-call-checks

				// expected-no-diagnostics

				#define __device__ __attribute__((device))
				#define __global__ __attribute__((global))
				#define __host__ __attribute__((host))

				__attribute__((host)) void h1();

				__attribute__((device)) void d1() {
				h1();
				}

				__attribute__((host)) void h2() {
				d1();
				}

				__attribute__((global)) void g1() {
				h2();
				}