Currently clang fails to compile the following CUDA program in device compilation:
__host__ int foo(int x) { return 1; } template<class T> __device__ __host__ int foo(T x) { return 2; } __device__ __host__ int bar() { return foo(1); } __global__ void test(int *a) { *a = bar(); }
This is due to foo is resolved to the __host__ foo instead of __device__ __host__ foo.
This seems to be a bug since __device__ __host__ foo is a viable callee for foo whereas
clang is unable to choose it.
nvcc has similar issue
https://cuda.godbolt.org/z/bGijLc
Although it only emits a warning and does not fail to compile. It emits a trap in the code
so that it will fail at run time.
This patch fixes that.
This is neglecting the case where they're both invalid.