- Within decltype, expressions are only type-inspected. The restriction on CUDA calls should be relaxed.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 40830 Build 40964: arc lint + arc unit
Event Timeline
Perhaps we should allow this in all unevaluated contexts?
I.e. int s = sizeof(foo(x)); should also work.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
10972 | I think you want return llvm::any_of(ExprEvalContexts, ...) here and you can fold it directly into if() below. |
good point, do we have a dedicated context for sizeof? that make the checking easier.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
10972 | yeah, that's much simpler, I will make the change. |
clang/include/clang/Sema/Sema.h | ||
---|---|---|
10969–10975 | One more thing. The idea of this function is that we're checking if the Caller is allowed to call the Callee. Currently all uses of this function obtain the caller from CurContext, but if we start relying on other properties of the current context other than the caller function, then we may neet to pass the context explicitly, or only pass the Callee and check if it's callable from the current context. |
Here's one for you:
__host__ float bar(); __device__ int bar(); __host__ __device__ auto foo() -> decltype(bar()) {}
What is the return type of foo? :)
I don't believe the right answer is, "float when compiling for host, int when compiling for device."
I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.
So, actually, I wonder if that's not the right answer. We generally allow different overloads to have different return types. What if, for example, the return type on the host is __float128 and on the device it's MyLongFloatTy?
I'd be happy if we said this was an error, so long as it's well-defined what exactly we're disallowing. But I bet @rsmith can come up with substantially more evil testcases than this.
Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.
(Function templates can differ only by return type, but if both return types successfully instantiate for a given set of (possibly inferred) template arguments then the templates can only be distinguished when taking their address, not when calling.)
I think I've said before that adding this kind of overloading is not a good idea, but since it's apparently already there, you should consult the specification (or at least existing practice) to figure out what you're supposed to do.
Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.
Certainly. I didn't mean to imply otherwise.
This patch is introduced to allow function or template function from std library to be used with device function. By allowing different-side candidates with a context only caring type inspection, we have new issue as there are extra beyond the regular rule for C++ overloadable resolution. We need an extra policy to figure out which is one the best candidate by considering CUDA attributes. Says the case you proposed, we may consider the following order to choose an overloadable candidate, e.g.
SAME-SIDE (with the same CUDA attribute) NATIVE (without any CUDA attribute) WRONG-SIDE (with the opposite CUDA attribute)
or just
SAME-SIDE NATIVE
It that a reasonable change?
BTW, just check similar stuff with nvcc, with more than one candidates, it accepts the following code
float bar(); // This line could be replaced by appendig `__host` or `__device__`, all of them are accepted. __host__ __device__ auto foo() -> decltype(bar()) {}
however, if there are more than one candidates differenct on the return type (without or with CUDA attibute difference), it could raise the error
foo.cu(4): error: cannot overload functions distinguished by return type alone
it seems to me that that's also an acceptable policy to handle the issue after we allow different-side candidates in type-only context.
At from CUDA 10, that's not acceptable as we are declaring two functions only differ from the return type. It seems CUDA attributes do not contribute to the function signature. clang is quite different here.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
10969–10975 | as the expression within decltype may be quite complicated, the idea here is to relax that rule within decltype context, not only for a particular pair of caller/callee. |
At [nvcc] from CUDA 10, that's not acceptable as we are declaring two functions only differ from the return type. It seems CUDA attributes do not contribute to the function signature. clang is quite different here.
Yes, this is an intentional and more relaxed semantics in clang. It's also sort of the linchpin of our mixed-mode compilation strategy, which is very different from nvcc's source-to-source splitting strategy.
Back in the day you could trick nvcc into allowing host/device overloading on same-signature functions by slapping a template on one or both of them. Checking just now it seems they fixed this, but I suspect there are still dark corners where nvcc relies on effectively the same behavior as we get in clang via true overloading.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
10969–10975 | I understand the idea, but in this case the argument was more about the code style. Currently the contract is that the function's decision is derived from its arguments (and could, perhaps, be a static method). With this patch you start relying on the context, but it's not obvious from the function signature. Replacing Caller with context, or removing the caller altogether would bring the function signature closer to what the function does. |
This patch is revived with more changes addressing the previous concerns.
Back to Justin's example:
__host__ float bar(); __device__ int bar(); __host__ __device__ auto foo() -> decltype(bar()) { return bar(); }
Even without this patch, that example already passed the compilation without
either errors or warnings. Says
clang -std=c++11 -x cuda -nocudainc -nocudalib --cuda-gpu-arch=sm_60 --cuda-device-only -S -emit-llvm -O3 foo.cu
In c++14, that example could be even simplified without decltype but the same ambiguity.
__host__ float bar(); __device__ int bar(); __host__ __device__ auto foo() { return bar(); }
Without any change, clang also compiles the code as well and uses different return types between host-side and device-side compilation.[^1]
[^1]: The first example has the same return type between host-side and device-side but that seems incorrect or unreasonable to me.
The ambiguity issue is in fact not introduced by relaxing decltype. That's an inherent one as we allow overloading over target attributes. Issuing warnings instead of errors seems more reasonable to me for such cases.
In this patch, besides relaxing the CUDA call rule under decltype, it also generates warning during function overloading if there are more than candidates with different return types.
I think you want return llvm::any_of(ExprEvalContexts, ...) here and you can fold it directly into if() below.