This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Sema/
-
clang/
-
Sema/
3/5
Sema.h
-
test/CodeGenCUDA/
-
CodeGenCUDA/
-
function-overload.cu

Differential D61458

[hip] Relax CUDA call restriction within `decltype` context.
Needs ReviewPublic

Authored by hliao on May 2 2019, 1:03 PM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
yaxunl
jlebar

Summary

Within decltype, expressions are only type-inspected. The restriction on CUDA calls should be relaxed.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 31311
Build 31310: arc lint + arc unit

Event Timeline

hliao created this revision.May 2 2019, 1:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2019, 1:03 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B31306: Diff 197850.May 2 2019, 1:03 PM

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

clang/include/clang/Sema/Sema.h
10411	I think you want `return llvm::any_of(ExprEvalContexts, ...)` here and you can fold it directly into `if()` below.

In D61458#1488523, @tra wrote:

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

good point, do we have a dedicated context for sizeof? that make the checking easier.

clang/include/clang/Sema/Sema.h
10411	yeah, that's much simpler, I will make the change.

simplify the logic using llvm::any_of.

Harbormaster completed remote builds in B31311: Diff 197860.May 2 2019, 1:57 PM

In D61458#1488550, @hliao wrote:

In D61458#1488523, @tra wrote:

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

good point, do we have a dedicated context for sizeof? that make the checking easier.

Sema::isUnevaluatedContext() may be able to do the job.

tra added inline comments.May 2 2019, 2:27 PM

clang/include/clang/Sema/Sema.h
10407–10409	One more thing. The idea of this function is that we're checking if the `Caller` is allowed to call the `Callee`. However here, you're checking the current context, which may not necessarily be the same as the caller's. I.e. someone could potentially call it way after the context is gone. Currently all uses of this function obtain the caller from `CurContext`, but if we start relying on other properties of the current context other than the caller function, then we may neet to pass the context explicitly, or only pass the Callee and check if it's callable from the current context.

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

In D61458#1488970, @jlebar wrote:
Here's one for you:
__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}
What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

So, actually, I wonder if that's not the right answer. We generally allow different overloads to have different return types. What if, for example, the return type on the host is __float128 and on the device it's MyLongFloatTy?

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

In D61458#1488972, @hfinkel wrote:
In D61458#1488970, @jlebar wrote:
Here's one for you:
__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}
What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."
So, actually, I wonder if that's not the right answer. We generally allow different overloads to have different return types.

Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.

(Function templates can differ only by return type, but if both return types successfully instantiate for a given set of (possibly inferred) template arguments then the templates can only be distinguished when taking their address, not when calling.)

I think I've said before that adding this kind of overloading is not a good idea, but since it's apparently already there, you should consult the specification (or at least existing practice) to figure out what you're supposed to do.

Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.

Certainly. I didn't mean to imply otherwise.

In D61458#1488970, @jlebar wrote:
Here's one for you:
__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}
What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

This patch is introduced to allow function or template function from std library to be used with device function. By allowing different-side candidates with a context only caring type inspection, we have new issue as there are extra beyond the regular rule for C++ overloadable resolution. We need an extra policy to figure out which is one the best candidate by considering CUDA attributes. Says the case you proposed, we may consider the following order to choose an overloadable candidate, e.g.

SAME-SIDE (with the same CUDA attribute)
NATIVE (without any CUDA attribute)
WRONG-SIDE (with the opposite CUDA attribute)

or just

SAME-SIDE
NATIVE

It that a reasonable change?

In D61458#1488981, @rjmccall wrote:
In D61458#1488972, @hfinkel wrote:
In D61458#1488970, @jlebar wrote:
Here's one for you:
__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}
What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."
So, actually, I wonder if that's not the right answer. We generally allow different overloads to have different return types.
Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.

(Function templates can differ only by return type, but if both return types successfully instantiate for a given set of (possibly inferred) template arguments then the templates can only be distinguished when taking their address, not when calling.)

I think I've said before that adding this kind of overloading is not a good idea, but since it's apparently already there, you should consult the specification (or at least existing practice) to figure out what you're supposed to do.

BTW, just check similar stuff with nvcc, with more than one candidates, it accepts the following code

float bar(); // This line could be replaced by appendig `__host` or `__device__`, all of them are accepted.
__host__ __device__ auto foo() -> decltype(bar()) {}

however, if there are more than one candidates differenct on the return type (without or with CUDA attibute difference), it could raise the error

foo.cu(4): error: cannot overload functions distinguished by return type alone

it seems to me that that's also an acceptable policy to handle the issue after we allow different-side candidates in type-only context.

In D61458#1488970, @jlebar wrote:
Here's one for you:
__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}
What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

At from CUDA 10, that's not acceptable as we are declaring two functions only differ from the return type. It seems CUDA attributes do not contribute to the function signature. clang is quite different here.

hliao marked an inline comment as done.May 3 2019, 5:50 AM

hliao added inline comments.

clang/include/clang/Sema/Sema.h
10407–10409	as the expression within `decltype` may be quite complicated, the idea here is to relax that rule within `decltype` context, not only for a particular pair of caller/callee.

At [nvcc] from CUDA 10, that's not acceptable as we are declaring two functions only differ from the return type. It seems CUDA attributes do not contribute to the function signature. clang is quite different here.

Yes, this is an intentional and more relaxed semantics in clang. It's also sort of the linchpin of our mixed-mode compilation strategy, which is very different from nvcc's source-to-source splitting strategy.

Back in the day you could trick nvcc into allowing host/device overloading on same-signature functions by slapping a template on one or both of them. Checking just now it seems they fixed this, but I suspect there are still dark corners where nvcc relies on effectively the same behavior as we get in clang via true overloading.

tra added inline comments.May 3 2019, 9:23 AM

clang/include/clang/Sema/Sema.h
10407–10409	I understand the idea, but in this case the argument was more about the code style. Currently the contract is that the function's decision is derived from its arguments (and could, perhaps, be a static method). With this patch you start relying on the context, but it's not obvious from the function signature. Replacing Caller with context, or removing the caller altogether would bring the function signature closer to what the function does.

This patch is revived with more changes addressing the previous concerns.

Back to Justin's example:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) { return bar(); }

Even without this patch, that example already passed the compilation without
either errors or warnings. Says

clang -std=c++11 -x cuda -nocudainc -nocudalib --cuda-gpu-arch=sm_60 --cuda-device-only -S -emit-llvm -O3 foo.cu

In c++14, that example could be even simplified without decltype but the same ambiguity.

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() { return bar(); }

Without any change, clang also compiles the code as well and uses different return types between host-side and device-side compilation.[^1]

[^1]: The first example has the same return type between host-side and device-side but that seems incorrect or unreasonable to me.

The ambiguity issue is in fact not introduced by relaxing decltype. That's an inherent one as we allow overloading over target attributes. Issuing warnings instead of errors seems more reasonable to me for such cases.

In this patch, besides relaxing the CUDA call rule under decltype, it also generates warning during function overloading if there are more than candidates with different return types.

hliao marked an inline comment as done.Nov 12 2019, 11:26 AM

Harbormaster completed remote builds in B40830: Diff 228924.Nov 12 2019, 11:32 AM

Revision Contents

Path

Size

clang/

include/

clang/

Sema/

Sema.h

6 lines

test/

CodeGenCUDA/

function-overload.cu

13 lines

Diff 197860

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,398 Lines • ▼ Show 20 Lines	public:
CUDAFunctionPreference IdentifyCUDAPreference(const FunctionDecl *Caller,		CUDAFunctionPreference IdentifyCUDAPreference(const FunctionDecl *Caller,
const FunctionDecl *Callee);		const FunctionDecl *Callee);

/// Determines whether Caller may invoke Callee, based on their CUDA		/// Determines whether Caller may invoke Callee, based on their CUDA
/// host/device attributes. Returns false if the call is not allowed.		/// host/device attributes. Returns false if the call is not allowed.
///		///
/// Note: Will return true for CFP_WrongSide calls. These may appear in		/// Note: Will return true for CFP_WrongSide calls. These may appear in
/// semantically correct CUDA programs, but only if they're never codegen'ed.		/// semantically correct CUDA programs, but only if they're never codegen'ed.
bool IsAllowedCUDACall(const FunctionDecl *Caller,		bool IsAllowedCUDACall(const FunctionDecl *Caller,
const FunctionDecl *Callee) {		const FunctionDecl *Callee) {
		if (llvm::any_of(ExprEvalContexts,
		traUnsubmitted Not Done Reply Inline Actions One more thing. The idea of this function is that we're checking if the `Caller` is allowed to call the `Callee`. However here, you're checking the current context, which may not necessarily be the same as the caller's. I.e. someone could potentially call it way after the context is gone. Currently all uses of this function obtain the caller from `CurContext`, but if we start relying on other properties of the current context other than the caller function, then we may neet to pass the context explicitly, or only pass the Callee and check if it's callable from the current context. tra: One more thing. The idea of this function is that we're checking if the `Caller` is allowed to…
		hliaoAuthorUnsubmitted Done Reply Inline Actions as the expression within `decltype` may be quite complicated, the idea here is to relax that rule within `decltype` context, not only for a particular pair of caller/callee. hliao: as the expression within `decltype` may be quite complicated, the idea here is to relax that…
		traUnsubmitted Done Reply Inline Actions I understand the idea, but in this case the argument was more about the code style. Currently the contract is that the function's decision is derived from its arguments (and could, perhaps, be a static method). With this patch you start relying on the context, but it's not obvious from the function signature. Replacing Caller with context, or removing the caller altogether would bring the function signature closer to what the function does. tra: I understand the idea, but in this case the argument was more about the code style. Currently…
		[](const ExpressionEvaluationContextRecord &C) {
		return C.ExprContext ==
		traUnsubmitted Not Done Reply Inline Actions I think you want `return llvm::any_of(ExprEvalContexts, ...)` here and you can fold it directly into `if()` below. tra: I think you want `return llvm::any_of(ExprEvalContexts, ...)` here and you can fold it directly…
		hliaoAuthorUnsubmitted Done Reply Inline Actions yeah, that's much simpler, I will make the change. hliao: yeah, that's much simpler, I will make the change.
		ExpressionEvaluationContextRecord::EK_Decltype;
		}))
		return true;
return IdentifyCUDAPreference(Caller, Callee) != CFP_Never;		return IdentifyCUDAPreference(Caller, Callee) != CFP_Never;
}		}

/// May add implicit CUDAHostAttr and CUDADeviceAttr attributes to FD,		/// May add implicit CUDAHostAttr and CUDADeviceAttr attributes to FD,
/// depending on FD and the current compilation settings.		/// depending on FD and the current compilation settings.
void maybeAddCUDAHostDeviceAttrs(FunctionDecl *FD,		void maybeAddCUDAHostDeviceAttrs(FunctionDecl *FD,
const LookupResult &Previous);		const LookupResult &Previous);

▲ Show 20 Lines • Show All 743 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/function-overload.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target
	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target

	// Make sure we handle target overloads correctly. Most of this is checked in			// Make sure we handle target overloads correctly. Most of this is checked in
	// sema, but special functions like constructors and destructors are here.			// sema, but special functions like constructors and destructors are here.
	//			//
	// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \			// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -emit-llvm -o - %s \
	// RUN: \| FileCheck -check-prefix=CHECK-BOTH -check-prefix=CHECK-HOST %s			// RUN: \| FileCheck -check-prefix=CHECK-BOTH -check-prefix=CHECK-HOST %s
	// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -emit-llvm -o - %s \			// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -emit-llvm -o - %s \
	// RUN: \| FileCheck -check-prefix=CHECK-BOTH -check-prefix=CHECK-DEVICE %s			// RUN: \| FileCheck -check-prefix=CHECK-BOTH -check-prefix=CHECK-DEVICE %s
				// RUN: %clang_cc1 -std=c++11 -DCHECK_DECLTYPE -triple amdgcn -fcuda-is-device -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefix=CHECK-DECLTYPE %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// Check constructors/destructors for D/H functions			// Check constructors/destructors for D/H functions
	int x;			int x;
	struct s_cd_dh {			struct s_cd_dh {
	__host__ s_cd_dh() { x = 11; }			__host__ s_cd_dh() { x = 11; }
	__device__ s_cd_dh() { x = 12; }			__device__ s_cd_dh() { x = 12; }
	Show All 29 Lines

	// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdC2Ev(			// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdC2Ev(
	// CHECK-BOTH: store i32 31,			// CHECK-BOTH: store i32 31,
	// CHECK-BOTH: ret void			// CHECK-BOTH: ret void

	// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdD2Ev(			// CHECK-BOTH: define linkonce_odr void @_ZN7s_cd_hdD2Ev(
	// CHECK-BOTH: store i32 32,			// CHECK-BOTH: store i32 32,
	// CHECK-BOTH: ret void			// CHECK-BOTH: ret void

				#if defined(CHECK_DECLTYPE)
				int foo(float);
				// CHECK-DECLTYPE-LABEL: @_Z3barf
				// CHECK-DECLTYPE: fptosi
				// CHECK-DECLTYPE: sitofp
				__device__ float bar(float x) {
				decltype(foo(x)) y = x;
				return y + 3.f;
				}
				#endif