This is an archive of the discontinued LLVM Phabricator instance.

[hip] Relax CUDA call restriction within `decltype` context.
Needs ReviewPublic

Authored by hliao on May 2 2019, 1:03 PM.

Details

Summary
  • Within decltype, expressions are only type-inspected. The restriction on CUDA calls should be relaxed.

Event Timeline

hliao created this revision.May 2 2019, 1:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2019, 1:03 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
tra added a reviewer: jlebar.May 2 2019, 1:35 PM

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

clang/include/clang/Sema/Sema.h
10972

I think you want return llvm::any_of(ExprEvalContexts, ...) here and you can fold it directly into if() below.

hliao marked an inline comment as done.May 2 2019, 1:47 PM
In D61458#1488523, @tra wrote:

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

good point, do we have a dedicated context for sizeof? that make the checking easier.

clang/include/clang/Sema/Sema.h
10972

yeah, that's much simpler, I will make the change.

hliao updated this revision to Diff 197860.May 2 2019, 1:57 PM

simplify the logic using llvm::any_of.

tra added a comment.May 2 2019, 2:02 PM
In D61458#1488523, @tra wrote:

Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.

good point, do we have a dedicated context for sizeof? that make the checking easier.

Sema::isUnevaluatedContext() may be able to do the job.

tra added inline comments.May 2 2019, 2:27 PM
clang/include/clang/Sema/Sema.h
10969–10975

One more thing. The idea of this function is that we're checking if the Caller is allowed to call the Callee.
However here, you're checking the current context, which may not necessarily be the same as the caller's. I.e. someone could potentially call it way after the context is gone.

Currently all uses of this function obtain the caller from CurContext, but if we start relying on other properties of the current context other than the caller function, then we may neet to pass the context explicitly, or only pass the Callee and check if it's callable from the current context.

jlebar added a subscriber: rsmith.May 2 2019, 6:59 PM

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

hfinkel added a subscriber: hfinkel.May 2 2019, 7:04 PM

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

So, actually, I wonder if that's not the right answer. We generally allow different overloads to have different return types. What if, for example, the return type on the host is __float128 and on the device it's MyLongFloatTy?

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

So, actually, I wonder if that's not the right answer. We generally allow different overloads to have different return types.

Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.

(Function templates can differ only by return type, but if both return types successfully instantiate for a given set of (possibly inferred) template arguments then the templates can only be distinguished when taking their address, not when calling.)

I think I've said before that adding this kind of overloading is not a good idea, but since it's apparently already there, you should consult the specification (or at least existing practice) to figure out what you're supposed to do.

Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.

Certainly. I didn't mean to imply otherwise.

hliao added a comment.May 3 2019, 5:25 AM

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

This patch is introduced to allow function or template function from std library to be used with device function. By allowing different-side candidates with a context only caring type inspection, we have new issue as there are extra beyond the regular rule for C++ overloadable resolution. We need an extra policy to figure out which is one the best candidate by considering CUDA attributes. Says the case you proposed, we may consider the following order to choose an overloadable candidate, e.g.

SAME-SIDE (with the same CUDA attribute)
NATIVE (without any CUDA attribute)
WRONG-SIDE (with the opposite CUDA attribute)

or just

SAME-SIDE
NATIVE

It that a reasonable change?

hliao added a comment.May 3 2019, 5:36 AM

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

So, actually, I wonder if that's not the right answer. We generally allow different overloads to have different return types.

Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.

(Function templates can differ only by return type, but if both return types successfully instantiate for a given set of (possibly inferred) template arguments then the templates can only be distinguished when taking their address, not when calling.)

I think I've said before that adding this kind of overloading is not a good idea, but since it's apparently already there, you should consult the specification (or at least existing practice) to figure out what you're supposed to do.

BTW, just check similar stuff with nvcc, with more than one candidates, it accepts the following code

float bar(); // This line could be replaced by appendig `__host` or `__device__`, all of them are accepted.
__host__ __device__ auto foo() -> decltype(bar()) {}

however, if there are more than one candidates differenct on the return type (without or with CUDA attibute difference), it could raise the error

foo.cu(4): error: cannot overload functions distinguished by return type alone

it seems to me that that's also an acceptable policy to handle the issue after we allow different-side candidates in type-only context.

hliao added a comment.May 3 2019, 5:43 AM

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.

At from CUDA 10, that's not acceptable as we are declaring two functions only differ from the return type. It seems CUDA attributes do not contribute to the function signature. clang is quite different here.

hliao marked an inline comment as done.May 3 2019, 5:50 AM
hliao added inline comments.
clang/include/clang/Sema/Sema.h
10969–10975

as the expression within decltype may be quite complicated, the idea here is to relax that rule within decltype context, not only for a particular pair of caller/callee.

jlebar added a comment.EditedMay 3 2019, 8:41 AM

At [nvcc] from CUDA 10, that's not acceptable as we are declaring two functions only differ from the return type. It seems CUDA attributes do not contribute to the function signature. clang is quite different here.

Yes, this is an intentional and more relaxed semantics in clang. It's also sort of the linchpin of our mixed-mode compilation strategy, which is very different from nvcc's source-to-source splitting strategy.

Back in the day you could trick nvcc into allowing host/device overloading on same-signature functions by slapping a template on one or both of them. Checking just now it seems they fixed this, but I suspect there are still dark corners where nvcc relies on effectively the same behavior as we get in clang via true overloading.

tra added inline comments.May 3 2019, 9:23 AM
clang/include/clang/Sema/Sema.h
10969–10975

I understand the idea, but in this case the argument was more about the code style.

Currently the contract is that the function's decision is derived from its arguments (and could, perhaps, be a static method). With this patch you start relying on the context, but it's not obvious from the function signature. Replacing Caller with context, or removing the caller altogether would bring the function signature closer to what the function does.

hliao updated this revision to Diff 228924.Nov 12 2019, 11:25 AM

This patch is revived with more changes addressing the previous concerns.

Back to Justin's example:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) { return bar(); }

Even without this patch, that example already passed the compilation without
either errors or warnings. Says

clang -std=c++11 -x cuda -nocudainc -nocudalib --cuda-gpu-arch=sm_60 --cuda-device-only -S -emit-llvm -O3 foo.cu

In c++14, that example could be even simplified without decltype but the same ambiguity.

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() { return bar(); }

Without any change, clang also compiles the code as well and uses different return types between host-side and device-side compilation.[^1]

[^1]: The first example has the same return type between host-side and device-side but that seems incorrect or unreasonable to me.

The ambiguity issue is in fact not introduced by relaxing decltype. That's an inherent one as we allow overloading over target attributes. Issuing warnings instead of errors seems more reasonable to me for such cases.

In this patch, besides relaxing the CUDA call rule under decltype, it also generates warning during function overloading if there are more than candidates with different return types.

hliao marked an inline comment as done.Nov 12 2019, 11:26 AM