Currently clang fails to compile the following CUDA program in device compilation:
__host__ int foo(int x) {
return 1;
}
template<class T>
__device__ __host__ int foo(T x) {
return 2;
}
__device__ __host__ int bar() {
return foo(1);
}
__global__ void test(int *a) {
*a = bar();
}This is due to foo is resolved to the __host__ foo instead of __device__ __host__ foo.
This seems to be a bug since __device__ __host__ foo is a viable callee for foo whereas
clang is unable to choose it.
nvcc has similar issue
https://cuda.godbolt.org/z/bGijLc
Although it only emits a warning and does not fail to compile. It emits a trap in the code
so that it will fail at run time.
This patch fixes that.
This is neglecting the case where they're both invalid.