Currently, clang does not resolve certain overloaded functions correctly in the initializer
of global variables, e.g.
template<typename T1, typename U> T1 mypow(T1, U); __attribute__((device)) double mypow(double, int); double t_extent = mypow(1.0, 2);
In the above example, mypow is supposed to resolve to the host version
but clang resolves it to the device version instead, and emits an error
(https://godbolt.org/z/17xxzaa67).
However, if the variable is assigned in a host function, there is no error.
The discrepancy in overloading resolution inside and outside of
a function is due to clang not accounting for the host/device target
when resolving functions called in the initializer of a global variable.
This patch introduces a global host/device target context for CUDA/HIP
for functions called outside of functions. For global variable initialization,
it is determined by the host/device attribute of the variable. For other
situations, a default value of host_device is sufficient.
It appears that Declarator D here is only used as an attribute carrier used to identify CUDA calling target.
Should we pass CudaTarget ContextTarget instead and let the caller figure out how to find it?
I'm just thinking that we're hardcoding just one specific way to find the target, while there may potentially be more.
The current way is OK, as we have just one use case at the moment.