The __CUDA__ macro is already defined for openmp/nvptx and is not used by
__clang_cuda_complex_builtins.h, so dropping that macro slightly simplifies
nvptx and avoids defining it on amdgcn (where it is likely to be harmful).
Also dropped a cplusplus test from a C++ header as compilation will have
failed on cmath earlier if it was included from C.
bit weird that these are weak, but not changing that here