The __CUDA__ macro is already defined for openmp/nvptx and is not used by
__clang_cuda_complex_builtins.h, so dropping that macro slightly simplifies
nvptx and avoids defining it on amdgcn (where it is likely to be harmful).
Also dropped a cplusplus test from a C++ header as compilation will have
failed on cmath earlier if it was included from C.
^ this header does not look for a macro called CUDA or include any other headers so I believe dropping the macro can make no change to that header.
It might affect other things that happen to be included after this header, but iiuc cuda and openmp-nvptx both define __CUDA__ anyway, so that could only break amdgpu applications that were erroneously looking for a cuda macro.