This simply follows the scheme we have for other wrappers. It resolves
the current link problem, e.g., __muldc3 not found, when std::complex
operations are used on a device.
In "CUDA mode" this should allow simple complex operations to work in
target regions. Normal mode doesn't work because the globalization in
the std::complex operators is somehow broken. This will most likely not
allow complex make math function calls to work properly, e.g., sin, but
that is more complex (pan intended) anyway.
Nit: this creates impression that we fall back on double variant of the function, while in reality we'll end up using std::isnan<float>.
Perhaps it would be better to use fully specialized function template name in all these macros. It would also avoid potential issues if someone/somewhere adds other overloads. E.g. we may end up facing std::complex<half> which may overload resolution ambiguous in some cases.